DeepReinforce releases Ornith-1.0: open-source coding models that skip the human tutorial 🪶

DeepReinforce has released Ornith-1.0, a family of open-source coding models the AI research lab describes as "a self-improving family of open-source models specially for agentic coding tasks." The models are available on Hugging Face in four sizes—9 billion, 31 billion, 35 billion mixture-of-experts, and a 397 billion mixture-of-experts flagship—each released under an MIT license with no regional restrictions. DeepReinforce previously published the CUDA-L1 and IterX code-agent optimization loop work that preceded the release.

Unlike conversational AI that responds to a prompt and ends the exchange, agentic AI is designed to receive a task and take actions to complete it without step-by-step human guidance. In a coding context, that includes reading files, running tests, identifying failures, fixing the code, and looping through the workflow until completion. Most AI coding agents rely on a human-designed harness, a fixed set of rules governing tool calls, error handling, and multi-step decomposition.

Ornith-1.0 instead "treats the scaffold as a learnable object that co-evolves with the policy." During reinforcement learning, each training step happens in two stages: the model first reads the task and proposes a refined strategy for approaching it, then uses that strategy to generate a solution. The reward signal from the outcome flows back to both stages, optimizing the model for writing better strategies as well as better code. The lab said task-specific approaches emerge from the process without being engineered by humans.

DeepReinforce also takes reward hacking seriously. The lab stated that if the model can write its own training signals or exploit the evaluation harness, the relevant training signal is removed, and the model is penalized for attempting it. The lab framed this as necessary to keep the self-improvement loop from degrading. The release positions Ornith-1.0 among a growing set of open-source agentic coding models competing on benchmarks tied to multi-step development workflows rather than single-function generation tasks.

DeepReinforce releases Ornith-1.0: open-source coding models that skip the human tutorial 🪶

Share Article

Quick Info