Google’s Titans and MIRAS Mark Major Step Forward in Long-Context AI

Google’s Titans Architecture and MIRAS Framework: AI That Remembers More, Works Faster

Google Research has unveiled two new innovations—Titans and MIRAS—designed to overcome a key limitation in modern AI: maintaining context over long sequences of information without slowing down or losing critical details. Together, these advancements give AI models a structured way to retain what matters, enabling them to handle extended documents, conversations, or data streams with improved continuity.


The Titans Architecture

Titans is a model family built around a Long-Term Memory module that actively learns as it processes data, guided by what Google calls a “surprise metric.”

The surprise metric is essentially an internal error signal. Mathematically, it measures the difference between the model’s current memory and new incoming data. When the difference is significant—i.e., when something unexpected or important appears—the model flags it for long-term storage.

Titans also employs momentum, a sustained focus mechanism, to decide how much of the surrounding sequence should be recorded. This ensures that relevant details following the initial surprising event are captured, even if they are not surprising on their own.

Finally, Titans uses an adaptive forgetting mechanism (weight decay) to gradually clear out outdated or less useful information. This prevents the memory from becoming cluttered and ensures the model prioritizes the most relevant data.

By combining these three components—surprise metric (what to notice), momentum (how much to record), and weight decay (what to forget)—Titans maintains a sharp, relevant memory regardless of the size of the data it processes.


The MIRAS Framework

While Titans is a specific model family, MIRAS is a broader framework for designing sequence models. MIRAS treats sequence models as associative memory systems, learning to link data points together according to an internal objective that guides how relationships between information are encoded.

Designing a model within MIRAS involves four key decisions:

  1. Memory Structure: The architecture of the memory itself, which can range from simple vectors to the deep MLP layers used in Titans.

  2. Attentional Bias: The objective guiding how memory prioritizes and links incoming information.

  3. Memory Stability and Retention: The mechanism that balances learning new data with retaining past information.

  4. Memory Algorithm: The learning method used to update the memory, such as gradient descent techniques that allow the model to continue learning during inference.


The Core Challenge: AI Can Process, But Struggles to Remember

Modern AI excels at analyzing information in front of it. But as context grows—through long documents, datasets, or conversations—models face a tradeoff between preserving detail and keeping computational costs manageable.

Current approaches include:

  • Attention Window: Revisiting earlier tokens as needed, repeatedly scanning prior text to determine relevance.

  • State Compression: Summarizing previous information into a smaller internal representation, trading detail for efficiency.

Both methods work for moderate context sizes. But as inputs grow longer:

  • Attention Window becomes computationally expensive, repeatedly revisiting vast sequences.

  • State Compression risks losing important details that may only matter later.

Titans and MIRAS aim to solve these problems by giving models memory systems that learn what to remember, how to retain it, and when to forget—allowing AI to handle far longer contexts without compromising accuracy or efficiency.

The Real Limitation: Memory, Not Scale or Speed

The challenge for modern AI isn’t computational scale or processing speed—it’s memory. Current systems don’t treat memory as something that can be actively managed. Instead, they rely on fixed architectural patterns: either scanning backward or compressing forward, with no structured way to decide what should be retained over long spans.

Titans and MIRAS tackle this limitation by treating memory as an active resource—something the model can manage deliberately rather than passively inherit from its architecture.


Why Two Papers?

Addressing long-term memory in AI requires more than a single technical tweak. Google’s research splits the solution into two parts:

  1. Practical demonstration: Showing that models can actively manage memory in operation.

  2. Design framework: Providing a structured approach to designing memory-driven models, rather than treating each new architecture as a one-off experiment.

The two papers reflect this distinction: one presents a concrete mechanism for long-term memory (Titans), and the other provides a general framework for designing models around that idea (MIRAS).


Titans: Adding Long-Term Memory

Titans focuses on practical memory enhancement. Its architecture allows models to accumulate and carry forward selected information over time. Unlike traditional systems that repeatedly reprocess past input or compress everything into a fixed-size summary, Titans uses a deep neural memory module capable of capturing far more complex and detailed information.

The result: models can handle extremely long inputs without repeatedly scanning the past or losing critical details. Titans isn’t meant to replace existing designs—it augments them, extending how they handle context while preserving existing capabilities.


MIRAS: A Framework for Memory-Driven Models

Where Titans provides a specific mechanism, MIRAS steps back to address the broader design question. It treats sequence models as systems that store and update associative memories over time. MIRAS organizes architectures around a small set of design choices:

  • Information storage: How data is encoded in memory.

  • Matching and association: How relationships between data points are learned.

  • Updating: How new information is incorporated.

  • Retention: How the model decides what to keep or forget.

MIRAS allows researchers to interpret systems like Titans and design new memory-driven models without starting from scratch.


Testing Long-Context Performance

To evaluate whether this memory-based approach offers practical advantages, Google tested these models on tasks requiring extremely long context handling.

  • Titans scaled to more than 2 million tokens while maintaining higher retrieval accuracy than baseline models.

  • On the BABILong benchmark, which requires reasoning across massive documents, Titans outperformed much larger models—including GPT-4—despite having significantly fewer parameters.

  • MIRAS demonstrated that these principles aren’t limited to one model. Systems built using its framework consistently delivered high performance across multiple tasks.

Together, these results show that structured, active memory allows models to maintain accuracy across vast datasets without increasing computational cost.


Researchers’ Conclusions

  • Titans: Combining short-range processing with dedicated long-term memory improves handling of extended inputs without relying solely on larger attention windows or aggressive compression. It’s designed as an add-on to existing architectures, not a replacement.

  • MIRAS: Provides a structured way to design and compare memory-driven models, treating memory behavior as an explicit, adjustable design dimension.

Google emphasizes the significance:

“Titans and MIRAS mark a major step forward in sequence modeling. By using deep neural networks as memory modules that learn as data comes in, they overcome the limitations of fixed-size recurrent states. MIRAS unifies online optimization, associative memory, and architecture design, enabling a new generation of sequence models that combine RNN efficiency with the expressiveness needed for long-context AI.”

Key insight: Improving long-context performance isn’t just about bigger attention windows or larger models—it’s about giving AI a structured way to manage memory.