Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • How to Create AI Agents that Use Short-Term Memory, Long-Term Memory, and Episodic memory
  • A Coding Analysis and Experimentation of Decentralized Federated Education with Gossip protocols and Differential privacy
  • Jeffrey Epstein Had a ‘Personal Hacker,’ Informant Claims
  • PyKEEN: Coding for Training, Optimizing and Evaluating Knowledge Graph Embeddings
  • Robbyant LingBot World – a Real Time World Model of Interactive Simulations and Embodied AI
  • SERA is a Soft Verified Coding agent, built with only Supervised training for practical Repository level Automation Workflows.
  • I Let Google’s ‘Auto Browse’ AI Agent Take Over Chrome. It didn’t quite click
  • DeepSeek AI releases DeepSeek OCR 2 with Causal visual flow encoder for layout-aware document understanding
AI-trends.todayAI-trends.today
Home»Tech»Genie Envisioner: A Unified Video-Generative Platform for Scalable, Instruction-Driven Robotic Manipulation

Genie Envisioner: A Unified Video-Generative Platform for Scalable, Instruction-Driven Robotic Manipulation

Tech By Gavin Wallace12/08/20254 Mins Read
Facebook Twitter LinkedIn Email
NVIDIA Introduces ProRL: Long-Horizon Reinforcement Learning Boosts Reasoning and Generalization
NVIDIA Introduces ProRL: Long-Horizon Reinforcement Learning Boosts Reasoning and Generalization
Share
Facebook Twitter LinkedIn Email

The future of robotics will be shaped by AI agents who can act, perceive and think in real-world situations. It is a challenge to build scalable and reliable robot manipulation. This involves the ability to control objects by selectively contacting them. A number of advances have been made, including analytic techniques, models-based methods and data-driven approaches. However, the majority of systems continue to operate at separate stages for data collection and training. These stages are often characterized by custom setups and manual curation. They also require task-specific adjustments. This creates friction, slows down the progress of the system, conceals patterns of failure, and hinders reproducibility. It is clear that a framework for learning and assessment must be unified. 

Research on robot manipulation is moving from the analytical model to the neural world model that can learn dynamically from inputs directly, while using latent space and pixels. While large-scale models of video creation can generate realistic visuals they lack long-term consistency in time and the multi-view reasoning necessary for control. Vision-language-action models follow instructions but are limited by imitation-based learning, preventing error recovery and planning. The evaluation of policy remains a challenge, since physics simulations need to be fine-tuned and testing in the real world is expensive. Existing evaluation metrics emphasize visual qualities over task achievement, underscoring the need to develop benchmarks which better reflect real-world manipulation performances. 

Genie Envisioner is a platform developed by AgiBot Genie, NUS LV-Lab & BUAA. It combines video-generative technology with simulations, policy-learning, evaluation and evaluation. Its core is GE-Base – a large-scale video diffusion system that uses instructions to capture the spatial, time, and semantic dynamics for real-world tasks. GE-Act translates these representations into action trajectories that are precise, whereas GE-Sim is a fast and action-based video-based simulator. EWMBench evaluates visual accuracy, physical accuracy and alignment of instruction to action. GE’s embodied intelligence is scalable and memory-aware. It can be applied to robots of all types and for a variety of tasks. 

GE’s design is divided into three main parts. GE-Base is a multi-view, instruction-conditioned video diffusion model trained on over 1 million robotic manipulation episodes. The model learns how to capture latent trajectory that shows the evolution of scenes under certain commands. GE Act then translates the latent video into action signals using a flow matching decoder. It can control robots that aren’t in training data with precision and speed. GE-Sim repurposes GE-Base’s generative powers into a neural simulator that is action-conditioned, allowing for video-based, closed-loop rollouts at speeds well beyond actual hardware. EWMBench then assesses the entire system in terms of video realism as well as physical consistency and aligning instructions to resulting action.

Genie Envisioner demonstrated strong performance in real world and simulations across a variety of robotic manipulation tasks. GE-Act achieved rapid control generation (54-step trajectories in 200 ms) and consistently outperformed leading vision-language-action baselines in both step-wise and end-to-end success rates. The software adapted quickly to robot types like Agilex’s Cobot Magic or Dual Franka with just an hour worth of specific data. GE-Sim provided high-fidelity video simulations with action conditioning for scalable closed-loop testing. EWMBench confirmed GE-Base’s superiority in terms of temporal alignment and motion consistency. It also showed that the scene was more stable. 

Genie Envisioner can be summarized as a platform for robotic dual-arm manipulation which combines simulation and evaluation with policy-learning into a single video-generative frame work. Its core is GE-Base – an instructional-guided model of video diffusion that captures the spatio-temporal and semantic patterns in real robot interaction. GE Act builds upon this, converting representations to precise, adaptable actions plans for new robot types, all with minimum retraining. GE-Sim provides high-fidelity simulations with action-conditioned feedback for policy refinement. EWMBench offers a rigorous evaluation of realism and alignment. Tests in the real-world have shown that this system is superior, and can be used to build a foundation of general-purpose, intelligence-driven by instruction. 


Take a look at the Paper The following are some examples of how to get started: GitHub Page. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe now our Newsletter.


Sana Hassan has a passion for applying AI and technology to real world challenges. He has a passion for solving real-world problems and brings an innovative perspective at the intersection between AI and practical solutions.

video
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

How to Create AI Agents that Use Short-Term Memory, Long-Term Memory, and Episodic memory

02/02/2026

A Coding Analysis and Experimentation of Decentralized Federated Education with Gossip protocols and Differential privacy

02/02/2026

PyKEEN: Coding for Training, Optimizing and Evaluating Knowledge Graph Embeddings

31/01/2026

Robbyant LingBot World – a Real Time World Model of Interactive Simulations and Embodied AI

31/01/2026
Top News

Google Wants to Get Better at Spotting Wildfires From Space

Discovering the Exploration of “My First Robots” Kit: Empowering Next Generation of Engineers

OpenAI Raid on Thinking Machines Lab

OpenAI’s unreleased AGI Paper could complicate Microsoft’s negotiations

Six scary predictions for AI by 2026

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

A Data Center Leasing by Elon Musk’s X is on Fire

27/05/2025

Google AI Launches Personal Health Agents: Multi-agent Framework to Enable Personalized Interactions for Individual Health Needs

06/09/2025
Latest News

How to Create AI Agents that Use Short-Term Memory, Long-Term Memory, and Episodic memory

02/02/2026

A Coding Analysis and Experimentation of Decentralized Federated Education with Gossip protocols and Differential privacy

02/02/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.