Alibaba's Qwen Just Open-Sourced the Robot OS, Because Even Androids Need a Cloud
Back to feed

Alibaba's Qwen Just Open-Sourced the Robot OS, Because Even Androids Need a Cloud

Alibaba's Qwen team released the Qwen-Robot Suite on Tuesday, a three-model package the company describes as a "full stack for embodied intelligence." The bundle includes Qwen-RobotNav for mobility, Qwen-RobotManip for manipulation, and Qwen-RobotWorld, a language-conditioned video model that simulates the physics enabling both. Each component operates independently, and together they form what Alibaba is positioning as a foundational layer for the physical deployment of AI agents, running on top of the company's existing chips, cloud and serving infrastructure.

Qwen-RobotNav unifies five navigation tasks in a single model: instruction following, point-goal navigation, object search, target tracking and autonomous driving. Most navigation systems hardcode a single visual memory strategy, but Qwen-RobotNav exposes a parameterized interface covering token budget, temporal decay and per-camera weights that a planner can reconfigure mid-episode. The system was trained on 15.6 million samples with randomization across all parameters. It reports 76.5% success on VLN-CE RxR, a benchmark for vision-and-language navigation in real-world environments, and 90% tracking on EVT-Bench, which evaluates an agent's ability to consistently follow moving targets.

Qwen-RobotManip addresses a core incompatibility in robotic control: different hardware represents actions differently. A Franka arm operates through joint angles, an ALOHA robot uses the position and orientation of its grippers, and humanoids rely on whole-body coordinates. To bridge these action spaces, Alibaba synthesized approximately 38,100 hours of training data drawn from open-source robot datasets and human videos, without proprietary data collection. The model ranks first on RoboChallenge Table30-v1, outperforming prior approaches by 20%, according to the company.

Qwen-RobotWorld, the most ambitious of the three, treats natural language as a universal action interface, allowing prompts such as "Pick up the red cup and pour water on the flower" to direct grippers, autonomous vehicles and other physical systems through a shared generative model. The suite was announced by the Qwen team on June 16, 2026, extending Alibaba's stack across chips, cloud, models, serving platforms and applications, with robotics framed as the most physical expression of that integrated bet.

Share:
Publishercryptonewsroom.xyz
Published
CategoryAltcoins

Disclaimer: This content is for information and entertainment purposes only. It does not constitute financial, investment, legal, or tax advice. Always do your own research and consult with qualified professionals before making any financial decisions.

See our Terms of Service, Privacy Policy, and Editorial Policy.