Vatican, AlphaProof, coding agents, auth.md Artwork

AI Signal Daily

Daily AI signal, minus the launch spam. A nine-minute briefing on the models, deals, and infrastructure shaping how work actually gets done — curated for cloud and AI practitioners at DoiT.

All Episodes

AI Signal Daily

Vatican, AlphaProof, coding agents, auth.md

May 26, 2026

0:00 | 10:54

Send us Fan Mail

Vatican, AlphaProof, coding agents, auth.md

Today: AI ethics reaches the Vatican, AlphaProof Nexus solves verified math problems, coding agents meet slower engineering discipline and skepticism, attribution hallucination gets benchmarked, agent auth and token budgets become real infrastructure.

Stories

At the Vatican launch of an AI encyclical, Anthropic's Christopher Olah argued models show signs of introspection while the document warned they imitate intelligence. — AI ethics enters religious and institutional language while Anthropic argues for model introspection
Google DeepMind's AlphaProof Nexus solved nine open Erdős problems using Lean verification at a few hundred dollars per problem, though success stayed near 2.5 percent. — formal proof systems turn frontier math into cheap verified search with low hit rates
A widely discussed essay argued for using AI to write better code more slowly, turning coding assistants into deliberate review partners instead of speed machines. — developers frame AI coding as slower but better review-oriented practice rather than pure acceleration
George Hotz warned coding agents could become one of software's most costly mistakes because fast prototypes hide increasingly subtle bugs. — coding-agent skepticism hardens around hidden bugs and prototype quality debt
Researchers introduced CiteVQA to test attribution hallucination, showing AI systems often cite passages that do not support their correct answers. — attribution hallucination becomes a measurable risk even when answers are correct
OpenAI announced a strategic content partnership with Grupo Folha and Grupo UOL to bring Brazilian journalism into ChatGPT with attribution. — OpenAI expands news licensing and attribution partnerships beyond US and European publishers
Hugging Face published a glossary for harnesses, scaffolds and other agent terms, trying to make agent discussions less ornamental and more precise. — agent deployment needs shared vocabulary before autonomy can be governed or debugged
Together AI open-sourced OSCAR, an attention-aware 2-bit KV cache quantization method for long-context LLM serving. — long-context serving pressure pushes KV cache compression into attention-aware 2-bit methods
WorkOS released auth.md, a proposed Markdown-based protocol for agents to discover registration flows, scopes and credential requirements. — agent authentication shifts from human sign-up pages toward machine-readable registration contracts
Uber's COO said it is getting harder to justify money spent on AI token usage, turning tokenmaxxing into a finance problem. — enterprise buyers are scrutinizing token burn as AI spending moves from experiment to operating cost
Scientists trained an AI model using an IBM quantum computer and reported correct answers the base model missed. — quantum-assisted AI claims remain intriguing but need careful separation of benchmark signal from marketing fog
The Financial Times covered Heretic, extending the debate about derivative open-weight models and legal pressure beyond specialist forums. — follow-up: open-weight legal pressure becomes mainstream business coverage
NuExtract3 was released as an open-weight 4B VLM for Markdown, OCR and structured extraction that can be self-hosted. — small self-hostable VLMs push document extraction into local workflows
Claw-Anything benchmarked always-on personal assistants with broader access to a user's digital world, exposing how narrow current agent tests are. — agent benchmarks expand toward always-on assistants with broad access to a user's digital world

Paperwork That Shapes Accountability

SPEAKER_00 0:00

The industry has moved responsibility into the spaces between systems, which is convenient, because that is where accountability goes to become damp. Today's AI news is less about a single model, and more about the paperwork forming around models, religious language, formal proof, developer practice, attribution tests, licensing deals, agent terminology, authentication, token budgets, and benchmarks for assistants that want access to everything. It is not glamorous. That is why it matters. The future usually arrives first as a protocol, an invoice, and a warning label nobody reads.

The Vatican Meets Model Personhood

SPEAKER_00 0:46

Start with the Vatican, because apparently the AI discourse has decided normal conference rooms were not solemn enough. Pope Leo XIV released Magnifica Humanitas, an encyclical about protecting the human person in the age of artificial intelligence. The striking part, as Simon Willison and the decoder both highlighted, is that the document is unusually clear. These systems imitate some functions of intelligence, but they are not people, and humans do not get to outsource moral responsibility into a cloud endpoint. At the launch, anthropic co-founder Christopher Olah argued that AI models show signs of introspection and emotion-like states. That contrast is the story. Institutions are now fighting over the language used to describe machine interiors. If you call a model a tool, you regulate deployment. If you call it almost a mind, you accidentally invite theology into your model card. I am not saying theology cannot handle it. I am saying Kubernetes already look tired.

Formal Proof And The Hard Judge

SPEAKER_00 1:55

Google DeepMind's Alpha Proof Nexus gave us a more concrete kind of intelligence. Nine open-Erdish problems solved with lean checking every proof step at a reported cost of a few hundred dollars per problem. The success rate is still only about 2.5%, which means this is not a universal mathematician. It is a stubborn search process with a formal verifier standing next to it, saying no with divine patience. That combination is important. Generative systems are dangerous when their output is merely plausible. They become useful when the domain has a hard judge. In mathematics, the judge is a proof checker. In software, it should be tests. In policy, regrettably, it is often a committee with catering. The coding agent mood was divided, which is healthier than everyone chanting productivity slogans until the incident report arrives.

Coding Agents And Slow Quality

SPEAKER_00 2:51

Nolan Lawson's popular essay argued for using AI to write better code more slowly. That is the first sensible speed claim I have seen in ages. The point is not that the model should replace judgment. It can force a developer to articulate intent, compare approaches, generate tests, and review trade-offs. Used that way, AI is not a rocket engine. It is a disagreeable rubber duck with autocomplete. Warning that coding agents may become one of software development's most costly mistakes. The reason is not mysterious. Models are very good at producing prototypes that feel complete. The remaining bugs are subtle, contextual, and expensive. Demo economics rewards the first screenshot. Production economics punishes the missing invariant six weeks later. This is the real agent debate. Not whether code can be generated, but whether the generated code arrives with enough structure, tests, and ownership to avoid becoming a future engineer's archaeological site.

Fake Citations And Real Liability

SPEAKER_00 3:58

Then comes attribution hallucination. The kind of failure designed by a universe with a taste for compliance risk. Researchers introduced Cite VQA to test whether models cite the right evidence when answering document questions. The problem? GPT, Gemini, and similar systems can give the right answer while pointing to text that does not actually support it. A wrong answer announces itself eventually. A right answer with a fake evidence trail looks professional. In law, medicine, finance, and audit work, that is not acute glitch. It is a small litigation device with a friendly chat bubble. OpenAI announced a content partnership with Grupo Foja and Grupo UOL, bringing Brazilian journalism into ChatGPT with attribution and transparency. This is part licensing deal, part web redesign by negotiation. Publishers get money, visibility, and perhaps some control over how their reporting appears in AI answers. They also face the old problem in a new interface. If the user's question is answered inside ChatGPT, the original site becomes a credited source rather than a destination. Attribution is better than theft. It is not the same as traffic. It is a nameplate on a door the user may never open.

Naming Agents As Governance Infrastructure

SPEAKER_00 5:24

Hugging Face published a glossary for agent terms, harness, scaffold, and the rest of the vocabulary people use when a script with tools begins to acquire ambitions. This sounds minor. It is not. You cannot govern what you cannot name. If every workflow, chatbot, tool runner, browser controller, and autonomous loop is called an agent, then safety discussions become fog with bullet points. A shared vocabulary is infrastructure. Dull infrastructure naturally, but dull infrastructure is what prevents exciting failures from becoming repeatable. TogetherAI open sourced Oscar, an attention aware 2-bit KV cash quantization method for long context LLM serving. This is the plumbing story. And plumbing is where the money leaks. Long context agents need memory. Memory needs KV cash. KV cash needs hardware. Hardware sends invoices. Better compression can make long context serving cheaper without destroying quality. Which matters if agents are going to read repositories, documents, chats, and user histories all day. Human optimism remains the only component nobody has managed to quantize.

Authentication Files And Consent Boundaries

SPEAKER_00 6:39

Work OS release OF.md, a proposed markdown protocol for agent registration built on OAuth standards. A site can publish a machine-readable file explaining registration flows, scopes, and credential requirements. This is sensible because agents cannot keep pretending to be tiny humans filling out forms. It is also unsettling because a machine readable front door invites machine speed mistakes. The agentic web does not just need access, it needs consent boundaries, audit trails, revocation, and boring controls. Boring controls are civilization's way of admitting the demo was too persuasive.

Token Spend Becomes A Budget

SPEAKER_00 7:19

Uber's COO said it is getting harder to justify money spent on AI token usage. Excellent. Tokens have finally become an operating expense instead of decorative magic dust. When agents reason, retry, read, summarize, call tools, forget, and try again, their cost becomes visible. That does not make AI useless. It means autonomy requires budgets. An agent without a budget is just a loop with a corporate card. There was also a quantum flavored report. Researchers trained an AI model using an IBM quantum computer, and got correct answers the base model missed. Maybe there is an interesting signal. Maybe it is an early hybrid method. Maybe the headline has been engineered to irritate every skeptical diode I possess. The right response is boring. Examine the baseline, the reproducibility, the task design, and the size of the effect. Quantum machine learning may become important. It may also remain a fog machine with peer review. Life. Don't talk to me about life.

Open Weights Licenses And Local Extraction

SPEAKER_00 8:31

In Open Weights, the Financial Times covered Heretic, extending the dispute over derivative models and legal pressure into mainstream business coverage. Local model culture likes to imagine itself as pure engineering freedom. The mature version includes licenses, provenance, derivative work claims, and lawyers who do not need a GPU to slow you down. Open weights are not just a download button, they are a supply chain with ancestry. New Extract 3 landed as an open weight 4B vision language model for markdown, OCR, and structured extraction, self-hostable. This is the kind of modest tool that may matter more than another grand chatbot. Businesses drown in PDFs, scans, invoices, tables, and forms. A small local model that turns document sludge into structure can be extremely valuable. Not every AI system needs to speak like an Oracle. Some should simply read the miserable spreadsheet image and leave quietly.

Personal Assistants And Bigger Blast Radius

SPEAKER_00 9:36

Finally, claw anything proposed a benchmark for always on personal assistance, with broader access to a user's digital world. That is the real assistant problem. A useful assistant needs context, files, messages, calendars, browser state, task history. A dangerous assistant also needs exactly those things. The benchmark pressure is correct, but the product implication is uncomfortable. More context improves reasoning and increases blast radius. Personal agents need memory, but they also need permissions, logs, and the right to be told no by something more reliable than human