The Bug That Cost Months of Indexing
Your geo-gate or age-verification middleware is probably broken for Google's URL Inspector. The fix is a one-line regex change. Here's the story.
The developer of noctias.tv (a multi-language adult portal) discovered that every URL they submitted to Google Search Console returned "excluded by noindex tag." The pages weren't noindex. Curl with a Googlebot UA returned ``. The site rendered fine for humans. Bing indexed it. Search Console said: nope.
The culprit? A missing user-agent pattern in the middleware that bypasses the age-verification wall for search engines.
The Setup
The site runs Next.js 15 in standalone mode behind Cloudflare Tunnel on an OVHcloud VPS. Compliance is jurisdictional, not language-level. The geo-policy function returns 'AGE_VERIFICATION' for several regions (US states like TX, UT; UK; parts of EU). When the policy is AGE_VERIFICATION, the middleware rewrites to /age-verification, which is a noindex,nofollow page. After the visitor passes the wall, a signed cookie unlocks the real content.
To preserve SEO, the developer added a bot bypass:
// src/middleware.ts (the WRONG version)
if (policy === 'AGE_VERIFICATION') {
const verified = await verifyAvToken(req.cookies.get(AV_COOKIE)?.value);
const isBot = /Googlebot/i.test(req.headers.get('user-agent') ?? '');
if (!verified && !isBot && !isAvPath(req.nextUrl.pathname)) {
return NextResponse.rewrite(new URL(`/${locale}/age-verification`, req.url));
}
}
This pattern appears in countless Stack Overflow answers. It lets Googlebot through, but it does NOT match Google-InspectionTool.
The Actual User-Agent
Search Console's URL Inspector live test uses a separate fetcher with its own UA:
Mozilla/5.0 (compatible; Google-InspectionTool/1.0;)
The regex /Googlebot/i does NOT match Google-InspectionTool. The substring "Googlebot" isn't there. So every time a developer submits a URL for indexing via Search Console, the live test hits the age-verification wall, sees the noindex tag, and refuses. The crawler-only path probably also fails silently for new URLs, because the first URL Inspector check is part of how Search Console decides to crawl new URLs.
The Fix
Match all known search engine bots explicitly:
function isSearchEngineBot(userAgent: string | null): boolean {
if (!userAgent) return false;
return /Googlebot|Google-InspectionTool|Google-Read-Aloud|AdsBot-Google|Google-Site-Verification|Bingbot|DuckDuckBot|YandexBot|Baiduspider|Applebot|GPTBot|ClaudeBot|PerplexityBot|facebookexternalhit|Twitterbot|LinkedInBot/i.test(
userAgent,
);
}
This is not cloaking-spam. Google explicitly allows serving bots the same content you'd serve a verified human, when an interstitial would otherwise block crawling. See Google's "intrusive interstitials" guidance.
Also worth matching: facebookexternalhit, Twitterbot, LinkedInBot for Open Graph card scrapers. Without them, every social media unfurl shows the age-verification placeholder, killing click-through. AI crawlers like GPTBot, ClaudeBot, PerplexityBot are optional.
How to Check Your Own Site
Run this curl:
curl -s -H "User-Agent: Mozilla/5.0 (compatible; Google-InspectionTool/1.0;)" \
https://your-site.com/some-article \
| grep -oE 'name="robots"[^>]*'
If it returns noindex, your indexing pipeline is broken for any URL behind a gate — age verification, geo-blocking, paywall preview, "press Enter to continue", anything.
What This Means for Adult Sites
Adult sites get hit harder because almost all have an age-verification wall. The wall is non-optional under multiple jurisdictions (US TX, UT, UK, parts of EU). If your wall ate your indexing, that's potentially months of search traffic lost — and you wouldn't see it as a single error in Search Console, because each missed URL is silently never crawled.
After the fix, the developer re-submitted five most recent articles via URL Inspector. All five went through immediately and entered the priority crawl queue. The earlier rejections were 100% the UA-match bug.
What I'd Do Differently
- Test the inspector path before deploy. Unit tests for the geo router aren't enough; you need an integration test that simulates the URL Inspector UA.
- Watch the rendered HTML in Search Console's live test panel — not just the verdict. The verdict tells you something is wrong; the rendered HTML tells you what.
- Match crawlers broadly. Half the Stack Overflow answers only match Googlebot. Several Google crawler UAs don't contain that string.
Next Steps
If you've spent days arguing with Search Console about why your URLs don't index, run the curl above. There's a decent chance this is your bug. Update your middleware to match Google-InspectionTool explicitly, and re-submit your URLs. Your crawl budget will thank you.
