
On February 22, during the 2025 Global Developers Conference (GDC) held in Shanghai, Wang Xiaogang, CEO of SenseTime Jueying, co-founder and chief scientist of SenseTime Technology, released the industry's first "end-to-end autonomous driving route R-UniAD that interacts collaboratively with the world model."

It is reported that this route generates an online interactive simulation environment by building a world model, so as to conduct reinforcement learning training of the end-to-end model. It has the same origin as the DeepSeek technology innovation idea that has attracted market attention: upgrading and evolving from imitation learning to reinforcement learning, so as to achieve end-to-end autonomous driving that exceeds human driving performance.
It is reported that the essence of end-to-end autonomous driving is to achieve the best "imitation" driving effect through massive amounts of high-quality human driving data. However, due to the scarcity of high-quality scene data and the uneven quality of driving data, it is not easy for end-to-end intelligent driving solutions to reach the ceiling of human driving capabilities. The return of tens of millions of clips of high-quality data has formed a scale threshold.
The DeepSeek-R1, which has attracted much attention, is based on the key innovation of pure reinforcement learning. Through cold start with a small amount of high-quality data, the model conducts multi-stage reinforcement learning training, effectively reducing the data scale threshold for large model training, while also allowing the law of scale to continue, paving the way for the model to become larger and stronger. More importantly, reinforcement learning can enable large models to emerge with long thinking chain capabilities, significantly improve reasoning effects, and may even have thinking capabilities that surpass humans.
Wang Xiaogang said that the technological innovation of reinforcement learning can also be transferred to the field of end-to-end autonomous driving.
It is based on reinforcement learning that SenseTime Jueying proposed the "end-to-end technical route of collaborative interaction with the world model". It is divided into three stages. First, the cloud-based end-to-end autonomous driving large model is trained through imitation learning based on cold start data; then, based on reinforcement learning, the cloud-based end-to-end large model is made to collaboratively interact with the world model to continuously improve the performance of the end-to-end model; finally, the cloud-based large model realizes the vehicle-side deployment of a high-performance end-to-end autonomous driving small model through efficient distillation.
According to reports, a powerful world model that can generate high-fidelity scene data, ensure long-term deduction consistency, and support online interaction is the core cornerstone of R-UniAD. Compared with other competitors, SenseTime Jueying's advantage lies in that it has both large-scale computing infrastructure and a large cloud-based autonomous driving model.
Based on the UniAD end-to-end autonomous driving solution and the "enlightened" world model, SenseTime Jueying R-UniAD will accelerate the leapfrog evolution of intelligent driving through reinforcement learning. At the upcoming Shanghai Auto Show in April, SenseTime Jueying will release the R-UniAD end-to-end autonomous driving solution and complete the actual vehicle deployment. The mass production end-to-end intelligent driving solution of SenseTime Jueying is expected to be delivered by the end of the year, and the "enlightened" world model will also be officially used for data production, seizing the initiative in new technology routes.