Google Publishes Research on Training AI Agents for Deep Research
Google recently released a research paper on creating a challenging dataset for training AI agents in deep research tasks, offering insights that can also inform content optimization strategies. The system described in the paper is called SAGE, which stands for Steerable Agentic Data Generation for Deep Search with Execution Feedback.
Synthetic Question and Answer Pairs
Previous state-of-the-art AI training datasets—such as Musique and HotpotQA—required no more than four reasoning steps to answer questions. On average, Musique required 2.7 searches per question, HotpotQA 2.1, and Natural Questions (NQ) only 1.3. These datasets created a training gap for AI agents tasked with complex, real-world deep search problems. How can an AI handle genuinely difficult questions if it has never been trained for them?
How SAGE Works
SAGE is a dual-agent system:
-
One AI generates challenging questions.
-
A second “search agent” AI attempts to answer them, providing feedback on the question’s difficulty.
The first AI’s goal is to create questions that require multiple reasoning steps and searches. The second AI measures answerability and the minimum number of search steps needed. If the question is too easy or answered incorrectly, the execution trace—the steps and documents the AI used—is fed back to the first AI. This feedback identifies shortcuts that reduce the complexity of solving the question, allowing the first AI to refine its questions.
Four Ways Deep Research Was Avoided
The researchers found four main shortcuts that prevented AI agents from performing deeper research:
-
Information Co-Location (35%) – Multiple required facts appear in a single document, allowing the AI to answer in one hop.
-
Multi-Query Collapse (21%) – A clever query retrieves enough information across documents to answer multiple sub-questions at once.
-
Superficial Complexity (13%) – The question looks complex but is easily solvable without intermediate reasoning.
-
Overly Specific Questions (31%) – Questions with excessive detail make the answer obvious on the first search.
Even when some questions appear hard, they can be solved in fewer steps if the information is “co-located” on one page—a phenomenon that also occurs in real-life search scenarios.
SEO Takeaways
While this research is focused on AI training, it provides insights for publishers:
-
Information Co-Location: Consolidate scattered facts into a single page to reduce the AI’s need to hop to other sites.
-
Multi-Query Collapse: Structure content to answer multiple sub-questions in one place, enabling quicker solutions for both users and AI agents.
-
Eliminate Shortcuts: Provide clear, specific data points (calculations, dates, names) so AI or human users can get answers without additional searches.
Focus on Classic Search
Despite agentic AI insights, the research confirms the value of classic search optimization:
-
Ranking in the top three search results remains crucial.
-
Optimize pages for user experience and topic relevance, not for AI search.
-
Comprehensive, well-interlinked content can reduce “hops” for both AI and users.
-
Linking related pages can help those pages rank in classic search, supporting multi-step queries indirectly.
SAGE’s tests included pulling results from the top three ranked web pages via the Serper API, emphasizing that classic search rankings still guide AI agents’ information retrieval.
Publication: The research paper, SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback, was published by Google on January 26, 2026 and is available as a PDF.
