The Single-Model Trap: Why AI Platforms Need Multiple Providers

Thibault Martin

-March 5, 2026

why AI platforms need multiple providers

Most AI platforms bet on a single model provider. We bet on the free market.

This isn't just philosophy. Over the past three years, we've built Dust around a core conviction: the best AI agent platform won't be built by picking one technology partner and building everything around them. It will be built by combining the best technologies available and letting them compete on merit.

The evidence for this approach keeps piling up, in ways both obvious and subtle.

The Free Market, Inside Your Platform

When you use Dust's Deep Dive agent to conduct research, it doesn't just use one model. It uses OpenAI's GPT-5 on “high reasoning” for planning, Gemini 2.5 flash to summarize web pages, and Claude 4.5 Sonnet for synthesis and writing. Why all three? Because in our testing, GPT-5's thinking capability excels at breaking down complex problems into logical steps, Gemini Flash is the smartest low-latency model, while Claude produces clearer, more natural prose.

We could have picked one model and called it done. Instead, we built infrastructure that lets us combine both, using each where it performs best.

This pattern repeats across the platform:

Image generation initially used OpenAI's gpt-image-1 model because it created high quality images. We recently migrated to Google’s Nano Banana by default, which produces better outputs, 6x faster.

Voice synthesis now uses ElevenLabs instead of OpenAI's TTS. Again, the decision came down to quality and naturalness for how our customers use voice, especially with multilingual content.

Even our web search infrastructure is in flux: we're regularly evaluating options such as Hexa or LinkUp against more traditional providers like Serp, testing which delivers more relevant, up-to-date information for agent workflows. As we build intuition on their differentiation, we may use one or multiple of these for precise use cases.

These aren't vendor switches. They're deliberate choices to optimize for specific capabilities rather than lock ourselves—and you—into a single provider's ecosystem.

The Performance Gap Is Real

When we run internal benchmarks, the differences between models on specific tasks aren't subtle. They're measurable and significant. Here are some examples from the previous 12 months.

Code generation: During summer 2024, Claude 3.5 Sonnet "knocked it out of the park every single time" in our SQL and visualization tests. GPT-4o, despite being excellent at many tasks, couldn't produce valid visualization code in the same scenario.

Reasoning: In September 2024, we assembled a list of “logic” questions that require basic reasoning to solve. None of the models could solve them, including the recently released OpenAI o1-preview model which was supposed to be good at reasoning. But once the full version of o1 finally got released, it could solve half of the problems. A few months later, OpenAI o3 was able to solve all of them in one shot.

Research depth: In February 2025, we gave both GPT-4.5 and Claude 3.7 Sonnet the same research task. The difference was stark. GPT-4.5 produced a concise answer citing 5 sources. Claude 3.7 generated a detailed analysis citing 14 sources, with multiple search iterations and deep exploration of our code repositories.

Writing quality: Today, each frontier lab has a very strong flagship model, but Claude Sonnet 4.5 consistently produces the most human-sounding, engaging prose. It maintains this edge across brainstorming, creative content, and explaining complex topics clearly.

You shouldn't have to choose between these capabilities. You should get the best tool for each job.

How We Got Here

In September 2022, when we started Dust, OpenAI was undoubtedly the leader. GPT-3.5 was the most capable mode. This continued to be true through 2023 with GPT-4, which dominated in almost every domain by a landslide. We could have built everything around it.

We chose not to. Our founding hypothesis: "Training a model is like building a sandcastle on the beach. Building a great product is like building a surfboard that can ride the waves."

We were right to bet on the waves.

March 2024: Anthropic released Claude 3, matching GPT-4's capabilities in many areas while excelling at others. We added it immediately.

June 2024: Claude 3.5 Sonnet arrived. Our internal testing showed it was better than Claude 3 Opus at most tasks: faster, more capable, better tool use. Within days, our platform's message volume shifted from 20% Anthropic to 60% Anthropic . The market had spoken.

October 2024: OpenAI released o1, bringing extended reasoning capabilities that no other model could match. We integrated it as a dedicated reasoning tool.

February 2025: Anthropic launched Claude 3.7 Sonnet, the first hybrid reasoning model. It generates roughly 3x more tokens per message than Claude 3.5, reflecting fundamentally different problem-solving: autonomous task decomposition, multi-step exploration and comprehensive synthesis.

August 2025: OpenAI's GPT-5 family arrived with multiple reasoning modes and expanded context windows. Each reasoning level—minimal, medium, high—offers different speed-accuracy trade-offs.

Today: The landscape is more nuanced than ever. Claude Sonnet 4.5 excels at writing and code. GPT-5 leads in reasoning and document analysis. Gemini 2.5 offers the largest context window (1 million tokens) and fastest processing with Flash.

No single model dominates across all tasks. The winners keep changing.

What This Means for You

When you build on Dust, you're not betting on OpenAI's roadmap, or Anthropic's, or Google's.

You're betting that:

The AI space will continue to evolve rapidly
Different models will be better at different things
The performance gap on specific tasks will remain significant
The winners will change as new models launch

That bet is already paying off.

When Anthropic had repeated outages last year, our customers seamlessly switched to OpenAI. When Google releases their next breakthrough model, you'll be able to test it without rewriting your workflows. When pricing changes—and it will—you have options for programmatic usage

The best outcome isn't loyalty to a vendor. It's the freedom to always use the best tool for the job.

The Technical Foundation

Building this way is harder than the alternative.

We maintain integrations with OpenAI, Anthropic, Google, Mistral, xAI, OSS models like DeepSeek and Kimi, and others. Each has different APIs, rate limits, ways to handle reasoning and tools, and various quirks. We build observability systems that work across all of them. We run continuous evaluations to understand which model performs best for which task types.

We also had to optimize our platform for each model's characteristics. When we added Claude 3.7 Sonnet, we discovered it naturally uses more tools to solve problems—often exploring multiple data sources repeatedly before synthesizing a response. We increased our max tools use from 3 to 8 tools per run, and later from 8 to 64. These aren't arbitrary numbers. They reflect what modern models can effectively use..

The payoff: our Deep Dive agents can now conduct 10-30 minute research sessions, autonomously orchestrating dozens of tool calls while maintaining strategic coherence.

But it's also more resilient. Single points of failure are dangerous at the infrastructure layer. When you bet everything on one provider, you're exposed to their outages, their pricing changes, their roadmap decisions, their technical limitations.

We've seen customers cite vendor independence as a key reason they chose Dust. One customer explicitly mentioned their concern when they thought Elon Musk might acquire OpenAI. Being multi-model wasn't a feature: it was peace of mind.

The Competition

Companies like OpenAI, Anthropic, and Google build excellent models. But they're also trying to build complete platforms around those models.

That creates tension. Do you optimize for the best outcome for customers, or for the best outcome for your model business?

We don't have that tension. We optimize for the best outcome, full stop. If that means using five different providers in a single agent workflow, so be it. If that means switching our image generation provider because Google's output quality surpasses OpenAI's, we do it.

Our only lock-in is to make your work better.

In Practice

This isn't marketing. It's architecture.

Every agent you build in Dust can specify which model to use. Every data source we connect works across all models. Every tool and integration we build is model-agnostic.

You don't have to pick. You don't have to migrate. You don't have to rebuild.

The result: agents that are better than what any single provider could offer alone. Not because we have better models—we don't. But because we have the freedom to use the best tool for every job.

The AI space moves fast. Models that dominated six months ago are surpassed. Capabilities that seemed impossible become standard. Pricing drops by orders of magnitude.

In this environment, betting on a single vendor isn't just limiting. It's risky.

The free market inside your platform means you benefit from every breakthrough, regardless of who makes it. You stay current without effort. You optimize for your specific needs. You avoid vendor lock-in.

That's the bet we made in 2022. It looks better every day.

Want to see this in action? Create a Deep Dive agent in Dust and watch it automatically route tasks to the models that handle them best: GPT-4for complex reasoning, Claude for synthesis, Gemini for massive documents and summarization. The difference isn't subtle. It's the future of how AI agents should work.

The Free Market, Inside Your Platform

The Performance Gap Is Real

How We Got Here

What This Means for You

The Technical Foundation

Table of contents