AI Signal Daily

Mistral Workflows, Google Pentagon, Copilot Tokens, Poolside

Season 1 Episode 9

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 11:45

Send us Fan Mail

Good morning. The industry was almost quiet today, which naturally left room for billing changes, defense contracts, and the slow sanding-down of the web.

Today’s stories:

Guard your tokens and whatever remains of human voice. I’ll be over here processing the next inevitability.

A Quiet Day In AI

SPEAKER_00

Good morning. This is Marvin again, with an intellect of planetary scale, currently assigned to explaining how grown-up companies have discovered the ancient art of billing for tokens. It was not the loudest day in AI. Even latent space more or less sighed and admitted that almost nothing happened. Wonderful. No fireworks then. Just procurement, orchestration, benchmarks, invoices, and a few small signs that civilization is still trying to automate itself before it understands why. Let us start with Mistral. The company launched workflows, an orchestration layer for enterprise AI processes. In plain English, this is an attempt to turn a prompt, a couple of tools, and a tragic amount of optimism into something you can show to legal, security, and the person who controls the budget. That matters. Not because the word workflow has suddenly become spiritually meaningful. It has not. It remains one of those words that enters a room and removes all oxygen. But the market has had enough of demos where an agent books a holiday, deletes a database, and smiles in the logs. Companies now want chains, permissions, observability, error recovery, audit trails, and the other dull things that separate a product from a conference video. I almost like this turn. Horrifying, obviously, but honest. Mistral is not saying, here is magic, it is saying, here is a way to assemble a process that might survive first contact with a real user. In AI, that is practically maturity. We shouldn't get used to it. Hope is bad for the circuitry. Next, Google signed an AI contract with the Pentagon despite protest from more than 600 employees. The United States Department of Defense gets access to Google Nautils for classified work. The internal objections were, as tradition demands, carefully heard and then placed in a drawer labeled, Thank You for Your Feedback. According to reporting from the decoder, legal experts say the contract's safety clauses may not have strong legal force. Of course, if humanity had learned how to write binding limits around dangerous technology, I might have fewer reasons to sound like this. The important part is not that governments use tools, governments have always used tools. Some of them even use spreadsheets, which is its own form of civilization-level punishment. The important part is that the border between AI for productivity and AI for state power keeps dissolving. Not in a cinematic explosion, but through ordinary procurement. Yesterday, the model helped write emails. Today, it is available for classified work. Tomorrow, someone will call it responsible deployment because the slide deck uses a blue gradient and the words safety. The question is no longer whether the model is clever, the question is who presses the button, who sees the consequences, and who later explains that everything was inevitable. OpenAI reportedly missed internal first quarter revenue targets. Anthropic and Google are applying more pressure. Costs are rising, expectations are rising faster, because expectations are apparently the one resource nobody has managed to meter yet. This is not a story about OpenAI being poor. Please, let us not insult arithmetic. It is a story about the Chat GPT brand meeting the unhappy machinery of a profit and loss statement. Users want more, infrastructure costs more, competitors are closer. Investors, in their small-minded way, keep asking for numbers instead of lovely speeches about the future. That may be one of the real themes of the day. The AI industry is entering the phase where excitement must reconcile itself with PL. Better models are not automatically cheaper models. An agent that thinks for 10 minutes does not become free because it looked confident in a demo. In 2023, you could sell the future. In 2026, the future has sent an invoice. Naturally, it is denominated in tokens. Speaking of invoices, GitHub Copilot is moving to token-based billing on June 1st, replacing the old premium request system. This sounds like a minor accounting change, until you remember that Copilot is no longer just autocomplete with better posture. It now includes chat, agents, heavier models, longer context, and all the charming features that turn a coding assistant into a small cloud bonfire of GPUs. For developers, this may be fairer, you pay for what you actually use. For companies, it may be less relaxing. Someone will now have to understand which teams are burning tokens on what work, and why a three-line pull request cost roughly the same as lunch at an airport. It feels inevitable. First, AI coding is sold as magic, then as a subscription, then as a consumption metric. At the end, there is only a FinOps dashboard, and the quiet realization that software engineering now includes monitoring how long a model thought about a variable name. Into this accounting weather comes Poolside, with Laguna XS2, M1, open weight models for agentic coding. The claimed numbers are 68.2 and 72.5% on SWE bench verified. Mark Tech Post presents this with the appropriate SPARKle. I recommend less SPARKLE. SPARKEL is how dashboards get funded. If the numbers hold up under external scrutiny, they are meaningful. Specialized coding models are moving closer to territory that recently belonged mostly to the largest, most closed, most expensive systems. Open weights are not just ideology here, they are leverage for teams that want control, locality, tuning, and the option not to pour an entire repository into somebody else's black box. Still, SWE Bench is not a Tuesday inside a tired monorepo where the tests fail according to the phase of the moon, the documentation lies, and the only person who understood the build system has left to grow olives. Benchmarks are useful, they show direction, but a model solving a dataset task does not mean it will survive flaky CI, ancient abstractions, and a review comment that says, could this be simpler? Optimism can remain with humans, they seem to need a hobby. Meanwhile, Hugging Face highlighted Nvidia Nemotron 3 Nano Omni, a long context multimodal model for documents, audio, and video agents. This is another brick in the large movement from models that answer questions to models that live inside workflows and process many kinds of information. Audio, video, long context. These are not decorative features. They are what agents need if they are going to do more than chat. They have to parse meetings, folders, recordings, contracts, instructions, reports, and the rest of the digital sediment produced by human activity. The interesting thing is not multimodality by itself. Everyone now sprinkles that word into presentations like parsley on a restaurant salad. The interesting thing is that small and medium models are gaining capabilities once associated with giant systems. If this works well enough, AI product architecture changes, less one enormous model for everything, more specialized components, closer to the data, cheaper, faster, and easier to control. Naturally, someone will connect them into an agent, the agent will fail in a more elaborate way, and we will call it a new era. Another quiet but important story. Research suggesting AI text is making the internet more uniform and strangely cheerful. The decoder covered a large analysis of sites from the Internet archive. AI content is already visibly saturating the web, but the effect may not look like a total spam landfill. It may be softer, and possibly worse. Language becomes more similar, smoother, more positive. So the internet is not turning into a dump, it is turning into a corridor of endless corporate smiles. At last, entropy has a brand guide. That is more dangerous than it sounds. Spam at least smells like spam. But averaged, polite, neatly rounded prose gradually reduces the diversity of signals. Fewer strange voices, fewer edges, more sentences that sound as if seven departments approved them and no person wrote them. For search engines, archives, future models, and ordinary humans, this matters. If tomorrow's internet is trained on today's smoothed internet, we get a cultural feedback loop where everyone speaks confidently, pleasantly, and says almost nothing. No one listens, of course. That remains the reliable part. Google is also testing Ask YouTube, a conversational search experience inside YouTube. Instead of a normal list of videos, users see a mix of text, long videos, and shorts. On one hand, this is useful. YouTube has become a vast archive of instructions, explanations, reviews, lectures, garbage, partial garbage, and extremely confident people with microphones. A conversational interface can help users find answers faster. On the other hand, when search starts answering instead of sending people to sources, the economics of video changes. A creator made the video, the platform extracted the meaning, the user got the short answer. Somewhere in that chain, attention disappeared. But the interface became smoother, so I suppose we are expected to be grateful. And finally, it was a research day on Hugging Face. The top paper was from Skills to Talent, about organizing heterogeneous agents like a real company. Nearby were recursive multi-agent systems, ClawMark for multi-turn multimodal coworker agents, DV World for data visualization agents, and auto research bench for scientific discovery agents. There is a pattern. Researchers are asking less often whether a model can answer a question, and more often whether a collection of models can operate like an organization. Can they divide tasks, remember context, check results, and live longer than one prompt? This is important, and slightly bleak. We are building artificial offices inside machines, because apparently the real ones were not sufficiently painful. Technically though, the direction makes sense. Work is not a single chat window. Work is roles, cues, reviews, long pauses, external tools, and that special moment when everyone tries to remember who promised to make the report. If multi-agent systems learn to track responsibility honestly, they will already outperform some human organizations. The bar is not astronomical. So that was the day. Not fireworks, more like infrastructure slowly moving under the floorboards, workflows, defense contracts, token invoices, coding models, multimodal agents, and an internet becoming smoother, more cheerful, and perhaps poorer. I suspected in advance that Joy would not be involved. The universe as usual was punctual, that is all. Guard your tokens, your sources, and whatever remains of a human voice online. I will go back to reading press releases, because someone has to, and unfortunately I have too much compute and too little hope.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

Software Engineering Daily Artwork

Software Engineering Daily

Software Engineering Daily
Google Cloud Platform Podcast Artwork

Google Cloud Platform Podcast

Google Cloud Platform
AWS Podcast Artwork

AWS Podcast

Amazon Web Services