OpenAI Search Crawler Reaches 55% Coverage Across Sites – Hostinger Report

Hostinger Analysis: Divergent AI Crawler Behavior Across the Web

Hostinger recently conducted an extensive analysis of 66 billion bot requests across more than 5 million websites, revealing two distinct trends in AI crawler activity.

  • LLM Training Bots: Access to these crawlers is declining as an increasing number of websites block them.

  • AI Assistant Bots: In contrast, crawlers that power AI assistants, such as ChatGPT, are expanding their reach.

The study leveraged anonymized server logs collected during three six-day windows, with bot classification informed by the AI.txt project standards.


Training Bots Are Losing Access

The most pronounced decline was observed for OpenAI’s GPTBot, which gathers data to train language models. Coverage of websites by GPTBot dropped from 84% to 12% during the study period.

Similarly, Meta’s ExternalAgent—the largest training-category crawler by request volume in Hostinger’s dataset—also exhibited a significant decrease in coverage. Hostinger attributes these declines to website operators actively blocking AI training crawlers.

These findings corroborate other studies. For example:

  • BuzzStream reported that 79% of top news publishers now block at least one AI training bot.

  • Cloudflare’s Year in Review identified GPTBot, ClaudeBot, and CCBot as the most commonly disallowed crawlers among top domains.

Hostinger interprets the reduction in training-bot coverage as a reflection of site-level restrictions, even where request volumes remain high.


AI Assistant Bots Are Expanding

Conversely, AI assistant bots—responsible for user-facing search and recommendation tools—are increasing their presence:

  • OpenAI’s OAI-SearchBot achieved an average coverage of 55.67%.

  • TikTok’s AI crawler reached 25.67% coverage, processing 1.4 billion requests.

  • Apple’s assistant bot expanded to 24.33% coverage.

These assistant crawls are typically user-triggered and target specific queries, serving content directly to end users rather than gathering data for model training. This functional difference appears to influence site operators’ willingness to allow access.


Traditional Search Crawlers Remain Stable

Classic search engine crawlers exhibited minimal change during the analysis period:

  • Googlebot maintained 72% coverage with 14.7 billion requests.

  • Bingbot remained at 57.67% coverage.

The stability reflects the high stakes of blocking primary search engine crawlers, which directly affect search visibility.


SEO and Marketing Crawlers Decline

Coverage for SEO-focused crawlers is also declining. While Ahrefs retained the largest footprint at 60% coverage, the overall category has contracted. Hostinger attributes this to:

  1. SEO tools increasingly prioritizing sites actively performing SEO work.

  2. Website owners restricting resource-intensive crawlers due to bandwidth concerns.

For context, Vercel data indicated that GPTBot generated 569 million requests in a single month, highlighting the potential server-load implications for publishers.


Implications for Website Operators

The analysis indicates that site operators are drawing clear distinctions between AI crawlers:

  • Training Bots: Block these to prevent your content from being used for model training, mitigating intellectual property concerns and bandwidth usage.

  • Assistant Bots: Allow these to enhance visibility in AI-driven search results, as they directly contribute to user discovery.

OpenAI recommends permitting OAI-SearchBot if the goal is to appear in ChatGPT search results, even if GPTBot is blocked. OAI-SearchBot governs inclusion in ChatGPT results and respects robots.txt rules, whereas user-initiated crawlers like ChatGPT-User may not be subject to the same restrictions.

Hostinger advises monitoring server logs to identify active bots and making access decisions aligned with strategic objectives. For sites concerned about performance, selective blocking at the CDN level can reduce load while maintaining AI visibility.


Key Takeaway

Website operators now face a nuanced AI landscape. By differentiating between training crawlers and user-facing assistant crawlers, organizations can balance:

  • Content protection and bandwidth management (via training-bot restrictions).

  • Visibility in AI-powered discovery (via selective allowance of assistant bots).

This strategic approach allows sites to participate in AI search ecosystems without inadvertently contributing data to model training.