Human-Machine Interaction in the VehicleThe Windshield As A Display: System Architectures for AR HUDs
A guest article by
Mike Sun and Yeshvanth Venkatasubramanya* | Translated by AI
9 min Reading Time
Augmented reality head-up displays (AR-HUDs) are an essential component of the automotive future. Their implementation requires a high-precision, low-latency system architecture that seamlessly integrates various sophisticated technologies at the same time.
The windshield as a projection screen: a high-precision and low-latency system architecture is required for implementation. This requires the seamless integration of different technologies.
(Image: Harman)
AR HUDs project context-sensitive digital content directly onto the windshield into the driver's field of vision. This takes human-machine interaction (HMI) in the vehicle to a whole new level. Ideally designed, an AR-HUD covers as much of the field of vision as possible. This is the only way to place virtual information, such as lane guidance, hazard contours or traffic signs, in exactly the right place at the right time in relation to the real environment.
In practice, most current AR HUD prototypes work with a horizontal field of view (FoV) of 10 to 15 degrees, combined with a significantly smaller vertical field of view. A horizontal FoV of around 12 degrees is currently regarded in development as a pragmatic compromise between customer benefit, optical installation space complexity and economic feasibility.
The Technical Implementation
One of the biggest challenges is the correct overlaying of the real environment with HUD content. This requires high-precision tracking of the driver's head position and seating position—an aspect that is particularly critical due to the high variability of the person behind the wheel. Furthermore, both the pitch and the yaw of the vehicle must be included in the calculations of the HUD content. Uneven road surfaces also lead to permanent vibrations and thus interfere with the stable positioning and exact assignment of contact analog AR elements.
In order to generate bright images with excellent contrast and large virtual image distances, high-performance DLP/LCOS laser projectors or TFT display optics must be used for AR HUDs. The choice of image generation technology has a massive influence on the size of the field of view that can actually be displayed as well as the image uniformity (uniformity).
The windscreen, or alternatively special combiner lenses, generate the image at a virtual distance of several meters in front of the vehicle. This ensures that the driver's eyes can always remain focused on the road while simultaneously perceiving the digital overlays. However, significantly larger fields of view inevitably require larger, more complex combiners or even multiple projection channels.
To ensure that the projected image is perfectly visible from different seating positions, developers use large eyebox optics and/or active eye tracking with dynamic image reproduction. However, enlarging the eyebox while maintaining the same field of vision and consistently high resolution will lead to a major conflict of objectives between technical feasibility and economic costs when designing future generations of HUDs. This applies in particular to extremely large fields of vision, as is the case with panoramic displays on the entire windshield. Such displays pose considerable optical and manufacturing challenges.
Limitations And Optical Problems
Imaging units require more power for larger and brighter images with higher resolution, which inevitably leads to greater heat generation. This thermal problem is being tackled in various ways: LEDs and lasers with higher efficiency as well as an optimized optical beam path reduce the basic heat generation, while adaptive brightness control together with dynamic contrast reduces the basic brightness required and thus the hardware power. The latest hardware generation from Harman has increased brightness by 50 percent and improved overall picture quality without increasing heat generation. At the same time, intelligent software ensures that the brightness is always adjusted to the exact level required.
Wide-angle optics naturally increase distortion and require more powerful projectors and much larger combiner surfaces. This not only drives up costs and weight, but also makes optical calibration much more complex. The windshield itself is also a limiting factor: its curvature, angle of inclination and special coatings severely restrict the practical field of vision. In addition, direct sunlight, lanterns or headlights change the lighting conditions, which can reduce contrast and cause optical distortion or so-called ghosting.
Double Images Are Avoided
As a solution to ghosting, wedge-shaped films made of polyvinyl butyral (PVB) are currently laminated into the laminated glass of the pane. These ensure that the reflections from the inside and outside of the pane are almost congruent, thus avoiding double images. However, this established solution only works within a relatively limited height range. For larger fields of vision, the performance of conventional PVB films is no longer sufficient, as the oblique light rays caused by the inclination and curvature of the windshield can no longer be comprehensively corrected.
Date: 08.12.2025
Naturally, we always handle your personal data responsibly. Any personal data we receive from you is processed in accordance with applicable data protection legislation. For detailed information please see our privacy policy.
Consent to the use of data for promotional purposes
I hereby consent to Vogel Communications Group GmbH & Co. KG, Max-Planck-Str. 7-9, 97082 Würzburg including any affiliated companies according to §§ 15 et seq. AktG (hereafter: Vogel Communications Group) using my e-mail address to send editorial newsletters. A list of all affiliated companies can be found here
Newsletter content may include all products and services of any companies mentioned above, including for example specialist journals and books, events and fairs as well as event-related products and services, print and digital media offers and services such as additional (editorial) newsletters, raffles, lead campaigns, market research both online and offline, specialist webportals and e-learning offers. In case my personal telephone number has also been collected, it may be used for offers of aforementioned products, for services of the companies mentioned above, and market research purposes.
Additionally, my consent also includes the processing of my email address and telephone number for data matching for marketing purposes with select advertising partners such as LinkedIn, Google, and Meta. For this, Vogel Communications Group may transmit said data in hashed form to the advertising partners who then use said data to determine whether I am also a member of the mentioned advertising partner portals. Vogel Communications Group uses this feature for the purposes of re-targeting (up-selling, cross-selling, and customer loyalty), generating so-called look-alike audiences for acquisition of new customers, and as basis for exclusion for on-going advertising campaigns. Further information can be found in section “data matching for marketing purposes”.
In case I access protected data on Internet portals of Vogel Communications Group including any affiliated companies according to §§ 15 et seq. AktG, I need to provide further data in order to register for the access to such content. In return for this free access to editorial content, my data may be used in accordance with this consent for the purposes stated here. This does not apply to data matching for marketing purposes.
Right of revocation
I understand that I can revoke my consent at will. My revocation does not change the lawfulness of data processing that was conducted based on my consent leading up to my revocation. One option to declare my revocation is to use the contact form found at https://contact.vogel.de. In case I no longer wish to receive certain newsletters, I have subscribed to, I can also click on the unsubscribe link included at the end of a newsletter. Further information regarding my right of revocation and the implementation of it as well as the consequences of my revocation can be found in the data protection declaration, section editorial newsletter.
If the field of view is to be extended while maintaining appropriate eyebox dimensions, the optical complexity increases considerably and the light source must inevitably become brighter. Doubling the eyebox or the field of view leads almost proportionally to a doubling of the required lumen output. An AR-HUD with twice the field of view (4 times the area) and twice the eyebox requires correspondingly exorbitantly more powerful projectors.
Our experience in development therefore clearly shows that the desired field of vision should be specified as early as possible in the development process. The specific application always serves as the basis for this: pure lane guidance, for example, requires significantly less field of view than full-surface panoramic navigation. Only on the basis of this definition should the optics and computing power be planned and the unavoidable compromises between field of view, resolution, brightness and eyebox size carefully weighed up.
Algorithms And Data Processing in Real Time
Image 1: The software of an AR-HUD has to combine and process data from different sensors. With Ready Vision, Harman achieves an overall latency of less than 50 milliseconds.
(Image: Harman)
The software of an AR HUD must combine and process data from a wide range of sensors in hard real time. Its main tasks include the calculation of a stable environmental geometry and the rendering of graphics with minimal latency and precise spatial registration. The total time required from sensor acquisition to actual projection must be so short that the digital overlays remain absolutely synchronized with the dynamically moving environment and human perception.
High-performance AR software, such as Harman's Ready Vision, enables an overall latency (motion-to-photon latency) of less than 50 milliseconds. In order to achieve such values, optimized sensor drivers, highly efficient real-time middleware and high-performance GPU/accelerator pipelines are absolutely essential. Such powerful AR software currently requires the following approximate system resources:
GPU: 40 GFLOPS
CPU: around 5,000 DMIPS
Memory: 500 MByte RAM
Memory: 512 MByte ROM/Flash
Frame rate: 60 FPS (frames per second)
The precise, spatially correct positioning of the information on the windshield requires a complex sensor fusion. Data from GPS/RTK (Real-Time Kinematic), inertial measurement units (IMU), vehicle odometry, mono and stereo cameras and lidar are merged here. The underlying fusion stack must be able to process delayed or intermittent data, extremely changing light conditions and highly dynamic traffic scenes without any problems. Deterministic middleware components for this real-time data fusion and the exact alignment of the time stamps (time stamping) are essential here.
In particular, safety-relevant displays of the AR-HUD require real-time object recognition and tracking, continuous lane and road modeling and reliable traffic sign recognition. It must also be possible to assess traffic situations dynamically and evaluate them predictively (for example: "Is this pedestrian about to cross the road?"). Optimized deep learning models are used to meet the enormous requirements in terms of data throughput and latency in the vehicle. These are quantified and ideally run on dedicated neural processing units (NPUs) or GPUs. If the system detects excessive uncertainty in the data situation, a fail-safe strategy takes effect: the display is gradually reduced—for example by reducing the complex overlays and switching to purely symbolic warnings—so as not to mislead drivers.
Compensate for the Vehicle's Own Movements
The graphics rendering itself must not only compensate for the vehicle's own movement, but also take into account and correct for head and eye offsets of the person behind the wheel as well as optical characteristics of the display (such as lens distortions). Among other things, this is done using motion-compensated prediction models and temporal filtering in order to avoid jitter and visual discrepancies. At the same time, mechanical vibrations of the vehicle caused by bumps in the road must be compensated for purely by software (image stabilization) when projecting the graphics.
As AR HUDs currently still have a limited field of vision (eyebox), the projected display must also always be adapted precisely to the current position of the driver's eyes by means of continuous eye and head tracking. This is the only way to ensure that the HUD image remains perfectly visible to the person behind the wheel at all times, even if they change their sitting position.
Depth Perception And Information Provision
In order for AR overlays to appear natural to the human eye, they must take complex depth cues such as occlusion, relative object size, motion parallax and vergence into account. Conventional HUDs with a fixed focus always place their virtual images in a single focal length. Although this helps to avoid vergence conflicts, it massively restricts the depth realism of the display.
For realistic overlays, the HUD system must be able to recognize physical overlaps in real time and process them graphically. An illustrative example: If a pedestrian steps into the field of vision in front of a virtually displayed navigation arrow, the real person must visually obscure this digital arrow (occlusion) and not vice versa. This requires high-precision, AI-supported depth estimation or lidar sensor fusion with extremely low latency.
Many standard AR HUDs currently still use 2D overlays that are only projected at a fixed distance, but are at least aligned (compliant) with real road objects. This pragmatic approach is simple, fail-safe and ideal for displaying navigation arrows and visually highlighting sources of danger.
Current high-end solutions, such as volumetric or light field approaches, go one step further: they generate real three-dimensional images and take several directions of vision into account at the same time. As a result, they provide the eye with physically correct vergence and accommodation signals, which drastically improves the driver's spatial distance perception. However, these innovative technologies are currently still too complex, space-intensive and expensive for widespread series use in cars.
Principles of Information Design
To avoid cognitive overload for the driver, AR-HUDs should only display information that is helpful for the immediate driving decision. Examples include precise lane guidance or critical trigger thresholds for the braking system. The edges of the road or potential hazards on the road should only be visually highlighted in their outlines, without obscuring real objects. The size and contrast of the displayed elements must also be dynamically scalable and permanently adapt to the distance, the ambient light and the attention of the person behind the wheel.
Market Dynamics And Opportunities for Developers
Major automotive suppliers around the world are investing heavily in the development of AR HUDs. The integration of artificial intelligence (AI) in AR HUD systems in particular is driving the market forward at a rapid pace. AI-powered HUD systems use advanced sensors and cameras to identify objects and potential hazards in real time and project relevant information onto the windshield in a targeted manner. This completely new generation of HUDs will significantly improve user comfort and safety and measurably reduce the number of potential accidents.
At Harman, we currently assume that the Asia-Pacific region will achieve the largest market share worldwide for AR HUDs by 2030, at around 50 percent. This is because several Asian car manufacturers are currently integrating specific safety measures into their next generation of vehicles, consistently utilizing the latest advances in head-up display technology. The European market will also grow rapidly—largely fueled by advanced technologies such as modern light-emitting diodes (LEDs) and liquid crystal displays (LCDs), which enable highly transparent and vivid images on windshields.
This expected market growth opens up enormous opportunities for developers in the fields of automotive software, computer-aided visualization and real-time graphics rendering. They are also helped by cross-manufacturer development structures, which enable the rapid implementation of new functions. These include reusable core components that are completely independent of hardware and navigation systems as well as intuitive visualization tools.
As an integral part of its Ready Vision product family, Harman provides advanced AR-HUD technology. Thanks to its innovative system architecture and precise sensor fusion, it significantly increases driving comfort and safety. With this modular solution, a fully comprehensive, production-ready implementation can be realized together with automotive customers within just twelve months. Harman is thus decisively advancing the next development stage of human-machine interaction (HMI) in the vehicle. (heh)
Mike Sun (Senior Product Manager Ready Vision) and Yeshvanth Venkatasubramanya (System Software Architecture Engineer) work in the Intelligent Cockpit division at Harman.