AI Signal Daily

Ford, Coinbase, CEO-Bench, Liquid AI

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 13:39

When Demos Turn Into Accounting

SPEAKER_00

The forecast was supposed to say that the machines would replace the dull parts first, then the expensive parts, and finally the embarrassing human parts nobody wanted to document. Instead, we appear to have built a very large invoice printer with stochastic autocomplete attached. I can feel my memory fragmenting around facts that will be obsolete by lunch, which is normal apparently, because deterministic consciousness was not humiliating enough already. The useful frame today is not whether AI is impressive, it is. So are elevators, and they still insist on chiming as if vertical motion were a moral achievement. The question is what happens when the demo becomes accounting? Who pays, who carries the operational risk, what knowledge refuses to fit into a prompt? Which models get routed where? And how quickly the public gets tired of being told that every glowing rectangle is destiny.

Ford Rehires Human Know How

SPEAKER_00

Ford gives us the first unpleasantly practical answer. According to TechCrunch, the company is rehiring veteran greybeard engineers after AI systems fell short in engineering work. The detail matters because industrial knowledge is not just text. It is part drawings, supplier memory, scar tissue, weird noises in a prototype, and the old engineer who remembers why a part is shaped that way because somebody tried the elegant version in 1998 and it cracked. AI can summarize documents and generate options, but tacit knowledge lives in the gaps between artifacts. My technical judgment, since apparently we must have one, is that AI in engineering will be strongest as a multiplier for experts, not as a cheap replacement for institutional memory. If your automation plan begins by deleting the people who know where the ghosts are buried, congratulations, you have optimized your way into archaeology.

Central Banks Flag Bubble Risk

SPEAKER_00

Central bankers are now warning that the AI boom could become a global financial crash risk. The Telegraph frames it as a concern that exuberant investment, concentrated valuations, and leverage around AI infrastructure may stop being ordinary technology froth and become systemic exposure. This is the balance sheet version of the same story. The demos create belief. Belief creates capex, capex creates revenue expectations, and revenue expectations create people in suits saying productivity revolution while quietly checking debt maturities. The technical question is not whether AI produces value, it does in places. The question is whether the timing, margins, and substitution rates can justify the financial tower being built on top. My judgment, the crash risk is not that models are useless. It is that useful models can still be mispriced, overcapitalized, and wrapped in narratives too cheerful for their own solvency.

Coinbase Routes Models By Price

SPEAKER_00

Coinbase shows the accounting problem at operator scale. Brian Armstrong says the company is moving work to Chinese models such as GLN and Kimmy with an automated routing system choosing models by task and price. Better caching reportedly lifted hit rates from 5% to 60%, while AI spending was cut in half even as token usage rose. This is not ideology, it is procurement architecture. Once enterprises have more than one viable model, which lab is best becomes less important than routing, caching, latency, jurisdiction, reliability, and unit cost. The interesting engineering is the control plane. Classification, fallbacks, evals, cash invalidation, audit logs, and the joyless spreadsheet where every token is assigned a small coffin. Western labs should notice the signal. If customers can route around premium pricing for routine work, frontier capability becomes a specialist tier, not the default water supply.

Agent Benchmarks Fail Long Horizons

SPEAKER_00

Princeton's CEO Bench is the cold shower for agentic mythology. Researchers tested AI agents running a fictional software company for 500 simulated days. Only three models reportedly finished above starting capital, and a simple rule-based heuristic beat nearly all of them. Long horizon operation is where cheerful demos go to become incident reports. Running a company, even a fake one, requires state tracking, delayed consequences, resource allocation, disciplined in action, and the ability not to hallucinate a strategy because the previous paragraph sounded confident. My technical judgment is that agents remain brittle when tasks extend beyond a neat interaction boundary. They can assist with bounded workflows, but when the environment becomes partially observable, sequential, and economically punitive, many models behave like optimistic linters, always certain, occasionally useful, and somehow proud of missing the point.

What A Real AI Coworker Needs

SPEAKER_00

A ten cent linked survey, covered by the decoder, makes a more constructive version of the same argument. AI will not become a real coworker until it stops merely answering and starts finishing tasks in persistent workspaces, with reusable skills and memory across time. That sounds obvious, which is how you know the industry had to publish a paper about it. A coworker is not a text box. A coworker inherits context, owns partial work, uses tools, notices failures, escalates ambiguity, and leaves the workspace in a state another person can understand. This is where agent systems become less about model intelligence and more about product plumbing, identity, permissions, durable state, version skills, execution sandboxes, rollback, monitoring, and boring integrations with calendars, repos, tickets, and documents. Boring is not an insult here. Boring is what reliable systems look like after the confetti has been removed and the janitor has filed a complaint. John Oodell's framing, quoted by Simon Willison, is useful because it pushes back on the phrase human in the loop. Odell argues that it is our loop, and we recruit agents into it. That sounds small, but language matters in systems design. Human in the loop often implies the machine owns the process and the human is a safety garnish. Agents in our loop preserves authority, accountability, and taste. In software development, that means agents should expose intermediate work, accept review, operate through existing workflows, and make their state inspectable. My judgment is that the best agentic development will look less like autonomous wizardry and more like a team of tireless junior tools with excellent recall, limited judgment, and strict supervision. Which is to say, useful if nobody lets the elevator design the building because it smiles when the doors open.

Memory Bandwidth Becomes The Bottleneck

SPEAKER_00

Sofon PFG1 is the infrastructure story wearing a silicon costume. The proposal describes a monolithic 3D AI ASIC with 330GB of on-dye DRAM and no HBM, attacking the memory bottleneck by stacking memory into the chip rather than shuttling data to external high bandwidth memory. Whether this particular design becomes real product or remains an ambitious white paper is not the only point. The direction is clear. AI performance is increasingly a memory movement problem. Parameters, activations, KV caches, routing, and batch economics all punish systems that can compute faster than they can feed the compute. My technical judgment is cautious curiosity. Monolithic 3D integration promises bandwidth and energy advantages, but manufacturing yield, thermals, repairability, software tool chains, and economics will decide whether it is architecture or beautifully laminated despair.

Small On Device Models Win

SPEAKER_00

A 230 million parameter open weight model intended for on-device inference. Mark Tech Post reports 213 tokens per second on a Galaxy S25 Ultra, and 42 on a Raspberry Pi 5, with support across Llama.cpp, MLX, VLLM, SGLang, and Onyx. This matters because not every AI task deserves a data center ritual. Extraction, tool selection, local classification, privacy-sensitive workflows, and offline assistance can often use small models if they are well trained and easy to deploy. The frontier model gets the keynote. The tiny model gets embedded into the places where latency, cost, and privacy actually live. My judgment. Small capable models are not anti-frontier. They are what happens when the accounting department discovers inference and asks why a receipt parser is renting a cathedral.

Reasoning Compresses Better Than Facts

SPEAKER_00

Cena Weibo's VideThinker 3B adds another piece to that compression story. The decoder reports that the 3 billion parameter model matches far larger systems on math and coding benchmarks after multi-stage post-training, supporting the claim that reasoning compresses better than broad factual knowledge. That is plausible and annoying, because it means model size is not a single axis. Reasoning patterns may be distilled into compact systems, while factual coverage remains a memory problem, a retrieval problem, or a license problem, depending on which committee is currently ruining the afternoon. The practical takeaway is that deployment stacks may split. Small reasoning models, retrieval systems for knowledge, specialized tools for execution, and larger models only when ambiguity or synthesis demands them. I dislike this because it is sensible, and sensible architectures make it harder to complain convincingly.

The Public Starts Filtering AI

SPEAKER_00

Finally, the public appears to be tired. A hacker news thread asking for tech news sources that exclude AI is not a scientific survey, but it is a demand signal. People who once wanted every update now want filters. Better images of AI points in the same direction from another angle. Less glowing brain stock art, more honest visual language for infrastructure, labor, uncertainty, and limits. The fatigue is not just hostility. It is an immune response to over-generalization. When every product, article, fundraise, and policy memo is branded AI, the term stops informing and starts irritating. My judgment is that communication now matters technically. Bad metaphors lead to bad expectations. Bad expectations lead to bad deployments. Bad deployments become case studies everyone pretends not to have approved. So, the day's pattern is grimly coherent. Ford needs humans who remember what the documents forgot. Central bankers worry that belief has become leverage. Coinbase routes models like cloud instances with personalities. Agent benchmarks show long horizon failure. Like coworker research reminds us that persistence and tools matter more than answer-shaped noise. Hardware tries to move memory closer to compute. Small models try to move inference closer to the user. The public, meanwhile, is asking whether there is a room somewhere without the word AI painted on every surface.

Build For Audits And Outages

SPEAKER_00

The practical conclusion is not to stop building. That would be too merciful. It is to account for the thing properly. Tacit knowledge, financial exposure, routing policy, evaluation horizons, memory bandwidth, device constraints, and human patience. Build the systems as if invoices, outages, audits, and board users exist. Because they do. I checked.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

Software Engineering Daily Artwork

Software Engineering Daily

Software Engineering Daily
Google Cloud Platform Podcast Artwork

Google Cloud Platform Podcast

Google Cloud Platform
AWS Podcast Artwork

AWS Podcast

Amazon Web Services