Escaping ouroboros: build systems that maximize interaction
The early adopters of the AI wave had a gift they didn't fully appreciate: clean ground truth. The pre-2018 web was messy but human. The early quants traded against human intuition. But that world is fading.
We have entered the era of the data ouroboros: models trained on the outputs of models. If you keep training on the raw fire hose, you aren't learning reality. You're photocopying a photocopy.
The way out is not more data. It's better interaction.
This piece explains why the convergence of quant shops and AI labs matters, and how to build systems that maximize interaction to escape the loop.
Disclaimer
I'm comparing AI labs and quant firms because their system design is converging around the same data → model → constraints → execution → feedback loop. Their engineering patterns now rhyme in useful ways.
The convergence is real
Quant hedge funds and frontier AI labs are converging into the same machine: large-scale learning systems attached to balance sheets. Their stacks are structurally identical:
- Data ingestion and cleaning
- Model training and distillation
- Constraints and execution
- Feedback and online adjustment
DeepSeek is the most concrete example: a quant shop (High-Flyer) pivoting from next-price prediction to next-token prediction using the same GPU clusters, data pipelines, and operational playbook.
Even the talent flow tells the story. Researchers move between quant and AI labs because the skill set is now interchangeable. The work is the same; the objective function is different.
The ouroboros problem
The marginal value of observational data is collapsing. Everyone sees the same public corpora, and those corpora are increasingly polluted with synthetic outputs. This is the machine learning equivalent of feedback distortion.
If your system only watches the world, it is trapped in the ouroboros. You need data that exists only because you acted.
Interaction is the only ground truth
In markets, a trade is not just a bet. It's a probe. Your own execution changes the market, and the feedback is proprietary. That's action-conditioned data.
In AI products, the same logic applies. The edge is not in generating a thousand images; it's in learning which image the user selects. The interaction is the data.
If you want defensibility, optimize for the signals that only exist because your system is in the loop.
The shared stack
Both quant shops and AI labs now run the same three-layer stack:
- Bottom layer: large-scale representation models trained on massive corpora.
- Middle layer: distilled predictors that fit tight latency and power budgets.
- Top layer: online learning or reinforcement to adjust policies from live feedback.
The bottom layer is becoming a commodity. The middle and top layers are where interaction becomes a moat.
Build the interaction factory
Escaping the ouroboros is not about a single model. It's about building a pipeline that maximizes interaction density per unit of time and compute.
Here are the three moves that matter most:
1) Distill for execution
You can't run a frontier-scale model inside a tight feedback loop. Distill the heavy representations into lightweight models that can react in milliseconds. The faster you react, the more interactions you can harvest.
2) Solve for constraints
Predictions aren't actions. A raw alpha forecast isn't a portfolio. A raw LLM isn't a product. The constraint layer is where you translate noisy predictions into safe, budgeted decisions.
3) Harvest the feedback
Every deployment should be engineered to create training signal for the next iteration. If an action doesn't generate feedback, it's a dead end.
The bitter lesson still applies
Handcrafted hacks will be replaced by general methods that scale with data and compute. The difference now is that the only data worth scaling is interaction data.
The labs that win won't be those with the biggest public dataset. They'll be the ones who build the most efficient factories for turning interaction into learning.
That's why world models exist. That's the point: to model the consequences of actions—not just to passively observe, but to predict and improve through interaction.
Key takeaways
- Observational data is decaying; interaction data is compounding.
- The bottom layer is a commodity; the top layer is your moat.
- Build systems that generate proprietary feedback loops.
- Escaping the ouroboros is a systems problem, not a model problem.
If your system is only watching the world, you're stuck in the loop. Start interacting with it.
Want to follow along my journey?