What Is TraceGen?
TraceGen is a research project that is made up of two key parts:
- TraceForge: A data pipeline which converts heterogeneous, in-the-wild data into consistent 3D traces
- TraceGen: A world model that learns from the traces and predicts future motion in trace-space
When combined, these two components allow for the use of cross-embodiment, in-the-wild data to train a world model which can be used across different embodiments.
Why Is this Important?
This project combines two individually very important components to create a powerful pipeline that enables cross-embodiment robot learning. Previous methods often relied on video-generation models, which due to both their overly diverse pretraining data, as well as their tendency to hallucinate, are not very reliable. So instead, rather than using pixel data, using trace data allows for reduced hallucination, and more reliable training.
TraceForge
TraceForge is super important because as things stand, all robotics datasets use different cameras with different intrinsics, different positions, and with different embodiments. TraceForge effectively unifies all of these datasets by getting rid of the unneeded, excess information that would be present in pixel space that are often embodiment or environment specific such as color or size. Trace space focuses on the key components of robot motion that actually matter. This allows for the use of not just other cross-embodiment datasets, but also the use of in-the-wild human data.
TraceGen
TraceGen is essentially proof that this method of extracting 3D traces from in the wild data actually works for training a world model. It massive gains in reliability over preexisting methods, but more importantly it cuts down the inference time massively. A model trained on a type of data that is better suited for robot learning that shows both gains in reliability and inference time is a huge step forward in the field of robotics.
What’s Next?
TraceGen is a very exciting project that I am super happy I got to work on. But it’s not perfect for use in the wild, and it still has yet to achieve zero-shot task generalization which is one of my personal goals. So I’m going to continue working with this project, but with the intention of making it usable in real world applications.
For more technical details, check out the project website and the paper.