
196
INA new class of model is emerging: a fully general computer action model trained not on screenshots, but on internet-scale video of real computer use. 🎥⚙️
#FDM1 was trained on a portion of an 11-million-hour screen recording corpus. It compresses nearly two hours of 30 FPS video into ~1M tokens and learns directly from video streams, not static frames. The result? A system that can execute multi-step CAD workflows, fuzz complex user interfaces, and even generalize to real-world driving with minimal fine-tuning.
🧠 Traditional computer-use agents relied on contractor-labeled screenshots and narrow reinforcement environments. They struggled with long-horizon tasks and high frame-rate data.
🧬 FDM-1 introduces inverse dynamics labeling at scale, auto-generating action tokens (keystrokes, mouse deltas) across millions of hours.
🛸 A highly compressed video encoder unlocks multi-hour context windows, making sustained workflows feasible, not just reactive clicks.
⚡ The evaluation stack runs over a million rollouts per hour across tens of thousands of forked virtual machines, pushing computer action from a data-constrained regime to a compute-constrained one.
Why should executives care?
Because this reframes #EnterpriseAI from “assistive intelligence” to “operational execution.”
Gartner projects that by 2026, 40% of enterprise applications will embed task-specific AI agents, up from less than 5% in 2025. That’s not automation at the margins. That’s structural reconfiguration of digital labor.
When an agent can navigate any GUI, manipulate 3D models, test financial workflows, or orchestrate tooling without bespoke integrations, the interface itself becomes the API.
🧨 That changes your cost structure.
🧨 That changes your workforce model.
🧨 That changes your control plane.
https://si.inc/posts/fdm1/?utm_source=Generative_AI&utm_medium=Newsletter&utm_campaign=anthropic-told-to-drop-the-ethics-or-lose-the-200m-paycheck&_bhlid=253e6fcfd4f931414c0aab53fd30fb309c22d8ca
@insidetheworldofai










