In China's automotive industry, there is currently debate about the right technological path to automated driving. At professional conferences, in industry media, and on social networks, two AI models clash: the World-Action Model and the Vision-Language-Action Model.
VLA or WA – that is the question. In China's automotive industry, there is currently debate about the right technological path to automated driving.
Top executives of leading Chinese companies are publicly taking a stand and are not afraid to adopt sharp tones. Recently, He Xiaopeng, founder and CEO of the electric car manufacturer Xpeng, openly questioned the competition. He stated that he did not know of any Chinese manufacturer that had developed a genuine Vision-Language-Action Model (VLA) “instead of just a deformed version.” According to He, who is always very self-confident, Xpeng was the only company in China to have achieved this, to the best of his knowledge. Although He did not name names, it was clear that he was primarily referring to rival Li Auto, which had previously announced the production readiness of its own VLA system.
At the end of August, Huawei also joined the debate. The technology company, which has developed into an influential supplier of driver assistance and autonomous systems in recent years, remains unwavering in its commitment to the World-Action Model (WA). Jin Yuzhi, head of Huawei’s Intelligent Automotive division, made it clear that his company would not follow the VLA trend. "Huawei will not take the VLA path. Huawei places more emphasis on WA, or World Action, which skips the language step," Jin emphasized in an interview. VLA attempts to convert video data into "linguistic tokens" using advanced language model technology and derive vehicle control commands from them. While this approach may seem clever and has helped some automakers make quick progress in assistance functions, Jin argued that it is not the key to true autonomy.
Huawei, instead, relies on a direct end-to-end model, where sensor data—whether visual impressions, sounds, or other signals—is converted directly into driving actions without detouring through language processing. While this approach may currently seem particularly demanding at first glance, Jin Yuzhi is confident that it is the only way to enable fully autonomous driving.
Advantages of Language Models
The idea behind VLA is to use a large language model (LLM) for driving automation. Camera images and other sensor data are translated into descriptive language, which an AI system logically analyzes to then make corresponding driving decisions. Several Chinese automakers, led by XPeng and Li Auto, have made significant progress with this approach in recent months.
Li Auto integrated an initial "MindVLA" function into its production vehicles, and Xpeng announced that its new P7 model would receive a VLA-based system via a software update this fall. Observers are referring to a potential "shortcut" to highly advanced driver assistance systems. By utilizing existing large language models and massive datasets, these companies were able to significantly enhance their autonomous driving functions in a short period of time, it was reported.
Xpeng, for example, developed its own base model with 72 billion parameters, which is simplified through distillation to run in its vehicles. Li Auto, on the other hand, pursues a hybrid approach. A small VLA model component operates in the vehicle, while a large "world model AI" simulates scenarios in the data center and continuously improves the system.
Critics like He Xiaopeng argue that Li Auto has merely "patched together" VLA and is using the buzzword without having a fully functional model on board. Huawei, on the other hand, strategically and unwaveringly pursues the traditional, sensor-based approach. The company's proprietary Autonomous Driving Solution (ADS) system is already integrated into over one million vehicles, which together have completed more than four billion kilometers of assisted driving.
Based on the WA principle, Huawei has further refined this approach and developed the World Engine, World Action (WEWA) architecture. This is utilized in the new ADS 4.0 platform and aims to enable highly precise autonomous driving through direct sensory world modeling. Huawei emphasizes that WA, without the intermediate language step, offers particular advantages in spatial perception. This is exactly the area where VLA shows weaknesses due to its abstract "language layer." Additionally, Huawei strongly relies on extensive sensor technology—such as multiple LiDAR per vehicle—and high-performance hardware to provide the WA model with as complete environmental information as possible in real time.
New Business Models
Huawei is willing to accept the initially higher costs, as safety reserves and robust performance over the entire vehicle lifecycle are the priorities, according to Jin Yuzhi. He views the fact that some competitors initially offer their driver assistance for free critically, stating, "Nothing in the world is free." Such offers are either time-limited, cross-subsidized in the vehicle price, or simply underdeveloped, effectively using drivers as test pilots, he concluded in a harsh judgment.
Date: 08.12.2025
Naturally, we always handle your personal data responsibly. Any personal data we receive from you is processed in accordance with applicable data protection legislation. For detailed information please see our privacy policy.
Consent to the use of data for promotional purposes
I hereby consent to Vogel Communications Group GmbH & Co. KG, Max-Planck-Str. 7-9, 97082 Würzburg including any affiliated companies according to §§ 15 et seq. AktG (hereafter: Vogel Communications Group) using my e-mail address to send editorial newsletters. A list of all affiliated companies can be found here
Newsletter content may include all products and services of any companies mentioned above, including for example specialist journals and books, events and fairs as well as event-related products and services, print and digital media offers and services such as additional (editorial) newsletters, raffles, lead campaigns, market research both online and offline, specialist webportals and e-learning offers. In case my personal telephone number has also been collected, it may be used for offers of aforementioned products, for services of the companies mentioned above, and market research purposes.
Additionally, my consent also includes the processing of my email address and telephone number for data matching for marketing purposes with select advertising partners such as LinkedIn, Google, and Meta. For this, Vogel Communications Group may transmit said data in hashed form to the advertising partners who then use said data to determine whether I am also a member of the mentioned advertising partner portals. Vogel Communications Group uses this feature for the purposes of re-targeting (up-selling, cross-selling, and customer loyalty), generating so-called look-alike audiences for acquisition of new customers, and as basis for exclusion for on-going advertising campaigns. Further information can be found in section “data matching for marketing purposes”.
In case I access protected data on Internet portals of Vogel Communications Group including any affiliated companies according to §§ 15 et seq. AktG, I need to provide further data in order to register for the access to such content. In return for this free access to editorial content, my data may be used in accordance with this consent for the purposes stated here. This does not apply to data matching for marketing purposes.
Right of revocation
I understand that I can revoke my consent at will. My revocation does not change the lawfulness of data processing that was conducted based on my consent leading up to my revocation. One option to declare my revocation is to use the contact form found at https://contact.vogel.de. In case I no longer wish to receive certain newsletters, I have subscribed to, I can also click on the unsubscribe link included at the end of a newsletter. Further information regarding my right of revocation and the implementation of it as well as the consequences of my revocation can be found in the data protection declaration, section editorial newsletter.
Huawei pursues a different business model. Through continuous OTA updates and improvements during the usage period, the systems are meant to continuously learn – a service for which the customer pays but which, according to the company's spokesperson, ultimately provides greater safety and utility in the long run.
No Absolute Truths Yet
This occasionally heated controversy over "VLA versus WA" also has a cultural dimension. Advocates of the new VLA approach hail it as a technological breakthrough. Zhou Guang, head of the startup Yuanrong Qixing, confidently stated that the performance lower limit of the VLA model has now surpassed the upper limit of traditional end-to-end systems, thanks in part to features such as built-in inference chains and complex language understanding modules that characterize VLA.
Industry veterans, however, view the excitement quite calmly. A senior engineer from Horizon Robotics commented that, at their core, all current solutions, whether VLM extension, VLA, or Huawei’s world model, are merely different variations of the end-to-end learning approach.
One should not overestimate the new buzzwords. In fact, the entire industry is in an early "trial-and-error" phase, where different concepts are being tested. Absolute truths do not yet exist.
What Are the Implications of a Competition of Approaches?
Some experts even consider hybrid models conceivable, combining elements of both worlds. What is certain is that China’s automakers are at a crossroads. While companies like Xpeng and Li Auto are moving aggressively with VLA-supported AI, Huawei relies on its data-driven WA concept and years of investment in hardware.
The competition between approaches could shape the development of automated and autonomous driving technically, economically, and strategically. Whether one of the two paths emerges as clearly superior or a combination ultimately proves to be the best solution remains to be seen in the future. (se)