System Prompt Learning: The feedback loop your agents are missing

Fabien CelierFabien Celier
-January 15, 2026
reinforced-agents-cover
Andrej Karpathy recently highlighted something important about the future of AI systems: "system prompt learning" matters more than fine-tuning for many applications. If you've built a successful agent at your company, you already know this, you're just doing it manually.
At Dust, we're seeing thousands of pieces of feedback made to agents every month: +1/-1 votes, comments, suggestions for improvement. And this is only what we can monitor on the platform. Most feedback happens outside Dust entirely: in Slack threads, team meetings, or quiet frustration when an agent misses the mark.
The missing piece isn't smarter models. It's closing the feedback loop between what your agents learn from usage and how they're configured.

Your best agents need babysitters

Here's what the manual improvement cycle looks like for most teams:
Week 1: You deploy an agent. Your team starts using it. Early results look promising.
Week 2: Feedback trickles in. A few -1 votes. Someone complains in Slack that "it didn't find the right document." Another person notes it's too verbose.
Week 3: You, the builder, aggregate the feedback. You tweak the instructions. Maybe adjust which data sources it searches. Update its tone guidelines.
Week 4: You redeploy. Monitor again. Repeat.
The hidden cost? Your most valuable agents require constant maintenance. The builder becomes the bottleneck between feedback and improvement. At scale, this doesn't work: you can't personally monitor and update dozens of agents based on hundreds of daily conversations.

Why fine-tuning won't save you

The obvious solution seems to be fine-tuning: let the model learn from successful interactions, right?
But fine-tuning enterprise agents has three fatal flaws:
The scoring problem: How do you definitively score an agentic conversation as "good" or "bad"? A +1 vote doesn't tell you which of the 12 tool calls or 3 reasoning steps deserves credit. A -1 doesn't reveal which specific instruction was wrong.
The knowledge acquisition problem: Even if you could score trajectories perfectly, one conversation about "don't use data from table X this way" won't reliably transfer knowledge to future conversations. The model needs thousands of examples for each edge case.
The scale problem: You can't fine-tune a frontier LLM for each company's specific agent behaviors. The cost, complexity, and maintenance overhead make this approach impractical.
But here's the insight that changes everything: Agents aren't just the model, they're defined by their instructions, tools, and contextual knowledge.
This opens a different path: instead of teaching the model, teach the agent's configuration. Instead of fine-tuning, focus on instruction learning.

The data you're already sitting on

Every conversation your agents have generates improvement data:
  • Conversation transcripts: Patterns showing where agents succeed and fail
  • Quantitative feedback: +1/-1 votes revealing what works
  • Qualitative feedback: Comments explaining specific issues
  • Usage patterns: Which tools get used, which instructions get followed or ignored
Your agents are generating their own improvement roadmap. The question is whether you have time to read it.
When you close this feedback loop automatically, three types of improvements become possible:

1. Instruction updates

An agent notices it's getting -1 votes when users ask about Q4 financial data because it does not provide up-to-date data. It suggests adding to its instructions: "Always check the latest quarter available in the database before responding to financial questions."

2. Tool modifications

After 100 conversations, the agent realizes that among all the places it searches, 95% of the time the result comes from Confluence for engineering questions. It suggests: "Prioritize Confluence as the primary knowledge source for engineering documentation."

3. Contextual learning

Through 15 conversations, an agent learns that when people say "the new CRM," they're referring to the HubSpot migration happening in March. It suggests adding this context to avoid confusion.

Suggestion-driven, not autopilot

The default mode keeps humans in control:
  • The agent analyzes recent conversations (daily or weekly)
  • It generates edit suggestions with clear reasoning
  • You review: apply the suggestion, modify it, or ignore it
  • Every change creates an audit trail of how your agent evolved
The vision: agents that improve at the speed of usage, not the speed of manual maintenance.
Imagine your customer support agent handles 200 conversations in its first week. By week two, it's suggesting instruction tweaks based on the 15 most common confusion patterns it detected. By month two, it's refined its tools and context to handle 90% of queries perfectly, with you reviewing and approving each evolution step.

What's coming

We're building this capability into Dust agents, starting with agents that have enough conversation data to generate meaningful insights.
The focus isn't replacing agent builders: it's making them more productive. Instead of manually reading through hundreds of conversations to spot improvement patterns, you review concrete suggestions backed by data.
Interested in early access? We're looking for teams with high-volume agents to join the beta as we refine this capability. If your agents handle dozens of conversations daily and you're currently doing manual improvement cycles, we'd love to work with you.
The feedback loop is there. It's time to close it.