How I Removed 50,000+ Spam URLs and Saved a Hacked Website (The Ultimate Guide)
If you look at your Google Search Console coverage report, you expect to see a steady line of ‘Valid’ pages. You definitely don’t want to see what my client saw last Tuesday.
- Valid Pages: 142
- Spam Pages: 49,800+
In the blink of an eye, a massive injection of spam URLs had drowned out their legitimate content. The client was facing a potential Google penalty that could take years to recover from. I had to act fast. Here is the exact “Scorched Earth” protocol I used to fix it in just 72 hours.
Introduction: Why Traditional GSC Removal Fails Massive Attacks
Last year, I documented a massive cleanup effort in my post: Recovering from SEO Spam: How I Cleared 242,000 Japanese Spam Pages. In that case study, I relied heavily on the Google Search Console (GSC) Removal Tool.

While that method works for moderate infections, it has significant limitations when dealing with a massive attack. You can typically only submit about 1,000 URLs per day manually. When you are facing an aggressive attack of 50,000 or 100,000 auto-generated pages, the GSC removal tool simply cannot keep up. The math doesn’t work; you cannot wait 50 days to save your business while your reputation bleeds out in the search results.
Today, I am going to show you a more advanced, aggressive strategy. I call it the “410 Gone” Protocol. Instead of politely asking Google to temporarily hide the URLs, I force Google to de-index them permanently.
Need Emergency Help?
If your site is currently infected and you don’t have the technical skills to clean it safely, do not risk breaking your site further. Check out my specialized services:
Chapter 1: Diagnosing the Japanese Keyword Hack & Finding Patterns
The Japanese Keyword Hack (also known as the Japanese Symbol Hack or SEO Spam) is particularly nasty because it is often “cloaked.” This means the hacker writes a sophisticated script that detects who is visiting the website.
If a human visitor (like you or your customers) goes to the site, the script shows the normal, clean website. However, if a search engine bot (like Googlebot) visits, the server delivers the malicious Japanese spam content. To the client, the site looked fine. To Google, the site had turned into a Japanese casino affiliate farm.
Step 1: Using Search Operators to Identify Spam Patterns
The fastest way to confirm the infection is to ask Google exactly what it has indexed. I typed this simple command into the Google search bar:
site:example.com
The Result: I saw thousands of pages with title tags written in Kanji characters, promoting “No Deposit Bonus Casino” and “Fake Luxury Watches.” The URLs followed specific patterns, often ending in random numbers, such as /detail/837492837 or hidden inside a /pages/ directory.
Step 2: Harnessing the Google Search Analytics API for Bulk Data
For massive volumes, the standard Google Search Console interface is not enough. To verify exactly which spam pages were active, I utilized the API to pull up to 25,000 rows of pages and queries at once.
1. Access the API Interface
I visited the Google Search Analytics API explorer and selected the “Try it now” option to access the testing tool.

2. Configure the Query
I switched to full-screen view for easier navigation and input the client’s site URL. In the Request Body, I pasted the following JSON configuration to fetch the maximum amount of data:
{
"startDate": "2023-01-01",
"endDate": "2025-02-19",
"dimensions": ["QUERY", "PAGE"],
"rowLimit": 25000
}

3. Authenticate and Export the “Kill List”
I enabled OAuth 2.0 authentication and executed the query. After receiving a 200 OK response, I copied the raw JSON data and used Konklone’s JSON to CSV tool to convert it into a readable spreadsheet.
I filtered this massive CSV file for the specific spam patterns I identified earlier (casino keywords, .html files in /pages/, etc.). This gave me a precise list of nearly 6,000 unique bad URLs that were currently draining the client’s server resources.
Chapter 2: The “First Aid” Response: Using GSC Removals (Triage)
Before I implement the “Scorched Earth” permanent fix, I can use Google Search Console’s built-in tools for immediate, temporary relief. This is like putting a tourniquet on a wound while preparing for surgery.
If the hacker created obvious directories (like /odr/ or /pages/), I can use the “Removals” tool in GSC to hide that entire directory instantly.
How to perform a Prefix Removal:
- Go to Google Search Console > Removals.
- Click New Request.
- Choose “Remove all URLs with this prefix”.
- Enter the spam directory pattern, for example:
https://example.com/odr/ - Click Submit.
This will quickly hide thousands of URLs matching that pattern from Google search results within a few hours. However, remember that this is temporary (about 6 months) and does not actually de-index the pages; it just hides them.
Chapter 3: The SEO Strategy – Why “410 Gone” beats “404 Not Found”
Most website owners make a fatal mistake at this stage: they simply delete the hacked files or use a security plugin to “clean” the site. When you simply delete a file, your server sends a 404 Not Found code to Google.
The Problem with 404 Errors during Cleanup
When Googlebot crawls a URL and receives a 404 error, its logic is: “Maybe this page is just missing temporarily. It might come back. I will keep it in the index for now and check again next week.”
This “Soft Fade” approach means your spam links can stay in search results for months, continuing to hurt your brand reputation.
The Solution: The “410 Gone” Status
I chose a different HTTP status code: 410 Gone. A 410 code sends a very specific, final message to search engines: “This page is dead. It was deleted on purpose. It is never coming back. Remove it from the index immediately.”
This was my primary strategy for the 50,000 pages. I didn’t want to just hide the spam; I wanted to kill it instantly.
Chapter 4: Executing the “410 Protocol” (Two Methods)
I need to implement a system that intercepts any request for a spam URL and serves a 410 error. There are two ways to do this: the “Safe” way (Plugin) and the “Power User” way (Server). Here is how I approach both.
Method A: The “Safe” Way using a WordPress 410 Plugin
If editing server files scares you (and it should, one wrong move breaks your site), you can use a WordPress plugin to handle the 410 logic. I recommend the free and powerful “Redirection” plugin.
While this method is safer, it is slower because WordPress has to load fully to process the request, meaning your server still takes a hit from bot traffic.
- Install and activate the Redirection plugin.
- Go to Tools > Redirection.
- Click “Add New”.
- In the Source URL field, enter the Regex pattern I found in Chapter 1. For example, to block the 15-digit product IDs:
^/detail/([0-9]{10,15})$ - Change the “URL Options / Regex” dropdown to Regex.
- Change the “Target URL” dropdown to Error (410 Gone).
- Click “Add Redirect”.
Now, any URL matching that pattern will return a 410 error via WordPress.
Method B: The “Power User” Way (.htaccess Server-Side Fix)
For this specific client, with 50,000 pages, I needed maximum speed and minimum server load. I bypassed WordPress entirely and edited the .htaccess file on the Apache server. This blocks the bots at the door before WordPress even loads.
Here is a simplified version of the code I deployed to the top of the .htaccess file:
<IfModule mod_rewrite.c>
RewriteEngine On
# 1. Block Casino Keywords
# If the URL contains gambling terms, send a 410 GONE error immediately.
RewriteRule .*casino.* - [R=410,L]
RewriteRule .*slot.* - [R=410,L]
RewriteRule .*poker.* - [R=410,L]
# 2. Block the Numeric Product ID Pattern (The massive 40k pages)
# This targets the fake product pages like /detail/123456789
RewriteRule ^detail/([0-9]{10,15})$ - [R=410,L]
# 3. Block Malicious Directories
RewriteRule ^odr/.* - [R=410,L]
RewriteRule ^mbr/.* - [R=410,L]
RewriteRule ^pages/.*\.html$ - [R=410,L]
</IfModule>
The Result: Instantly, all 49,000+ spam links stopped working. Anyone (or any bot) trying to visit them received a hard “410 Gone” error. This immediately stopped the server overload.
Chapter 5: The “Reverse Psychology” Sitemap Strategy
Now that my firewall (plugin or .htaccess) is active, I need Google to see it. Usually, a sitemap is used to tell Google about your good pages. I did the opposite.
I created a specialized Spam Sitemap (named sitemap-spam.xml) containing thousands of the bad URLs I identified via the API. I then submitted this sitemap to Google Search Console.
Why would I submit a sitemap of bad links?
- I am “inviting” Google to crawl these specific links immediately.
- Googlebot visits the link.
- The 410 Firewall I built in Chapter 4 hits it with a 410 Gone error.
- Googlebot updates its database: “This URL is permanently gone. De-index it.”
By feeding the beast the poisoned links, I accelerated the cleanup process from months to days.
(For a deeper dive into how malware hides links in the first place, read my guide on Hidden Links Malware: The Simple Guide to Detection).

Chapter 6: Database Forensics & The “Good Page” Infection
The hack didn’t just create new pages; it infected real ones. I found that the client’s legitimate “Services” page was ranking, but the title tag in the search results was in Japanese.
I opened the database via phpMyAdmin and checked the wp_postmeta table. The hacker had injected a script that overwrote the Rank Math SEO Title settings with Japanese text. I ran a SQL query to clean these specific rows and regenerate the correct English titles.
Note: This injection method is very similar to the logic used in Pharma hacks. You can read more about that specific variation here: WordPress Pharma Hack Fix Guide.
Chapter 7: The Results – De-indexing 50k Pages in 72 Hours
The combination of the 410 protocol and the “Kill List” Sitemap worked perfectly.
- Day 1: The server CPU usage dropped by 80% as the bots were blocked at the door.
- Day 2: Google Search Console showed a massive spike in “410” errors (which is exactly what I wanted) and the number of indexed spam pages began to plummet.
- Day 3: The Japanese characters disappeared from the main search results for the client’s brand name. The correct English titles reappeared.
Chapter 8: WordPress Security Hardening to Prevent Re-infection
Cleaning a hack is pointless if you get hacked again next week. I implemented a strict Security Hardening plan for the client:
- Change All Passwords & Salts: I changed the WordPress Salt Keys in
wp-config.php. This instantly logged out every user currently logged in (including the hacker). - Two-Factor Authentication (2FA): I enforced 2FA on all administrator accounts.
- Disable File Editing: I added
define( 'DISALLOW_FILE_EDIT', true );to the config file, stopping anyone from editing PHP files from the dashboard. - Block PHP in Uploads: I blocked PHP execution in the
/wp-content/uploads/folder to prevent malicious scripts from running disguised as images.
Frequently Asked Questions (FAQ) About Japanese SEO Spam
What exactly is the Japanese Keyword Hack?
It is a type of SEO spam where hackers compromise a website and use scripts to auto-generate thousands of new pages. These pages usually contain Japanese text selling counterfeit goods, gambling sites, or illegal products. They leverage your site’s good reputation to rank these spam pages in Google.
Why do I only see the spam on Google, but my site looks normal to me?
This is called “Cloaking.” The hacker’s script detects the visitor’s “User-Agent.” If it detects a human visitor, it shows the normal website. If it detects a search engine bot (like Googlebot), it serves the malicious Japanese content. This delays detection by the site owner.
Can’t I just use a security plugin like Wordfence to fix it?
Security plugins are excellent for detection and blocking known malware files. However, for massive SEO spam injections involving tens of thousands of URLs and database corruption, plugins often struggle to clean up the mess efficiently or handle the complex de-indexing process required with Google.
Why is “410 Gone” better than just deleting the files (404 Not Found)?
When you delete a file, it returns a “404 Not Found” error. Google interprets a 404 as temporary, meaning it will keep checking that URL for weeks or months to see if it comes back. A “410 Gone” error tells Google explicitly that the page is permanently deleted and should be removed from the index immediately. It is much faster for SEO recovery.
How long does it take for Google to remove the Japanese characters?
If you only use standard methods, it can take 3-6 months for 50,000 pages to fade out. Using the aggressive “410 Gone” firewall method described in this case study, I typically see significant de-indexing start within 72 hours, with the majority of spam gone within 2-4 weeks.
Conclusion: Prevention is Better than Cure
Recovering from 50,000 spam pages takes high-level technical skill. If you make a small mistake in the .htaccess file, you can take your entire website offline instantly.
If you see Japanese characters in your search results, do not wait. The longer these links exist, the more damage they do to your domain authority and brand trust.
Need an Expert to Fix This For You?
I specialize in fixing complex WordPress malware infections that standard plugins cannot handle.
Or explore my full WordPress Malware Removal Services.
