Human-Machine Interaction in the Vehicle The Windshield As A Display: System Architectures for AR HUDs

A guest article by Mike Sun and Yeshvanth Venkatasubramanya* | Translated by AI 9 min Reading Time

Augmented reality head-up displays (AR-HUDs) are an essential component of the automotive future. Their implementation requires a high-precision, low-latency system architecture that seamlessly integrates various sophisticated technologies at the same time.

The windshield as a projection screen: a high-precision and low-latency system architecture is required for implementation. This requires the seamless integration of different technologies.(Image: Harman)
The windshield as a projection screen: a high-precision and low-latency system architecture is required for implementation. This requires the seamless integration of different technologies.
(Image: Harman)

AR HUDs project context-sensitive digital content directly onto the windshield into the driver's field of vision. This takes human-machine interaction (HMI) in the vehicle to a whole new level. Ideally designed, an AR-HUD covers as much of the field of vision as possible. This is the only way to place virtual information, such as lane guidance, hazard contours or traffic signs, in exactly the right place at the right time in relation to the real environment.

In practice, most current AR HUD prototypes work with a horizontal field of view (FoV) of 10 to 15 degrees, combined with a significantly smaller vertical field of view. A horizontal FoV of around 12 degrees is currently regarded in development as a pragmatic compromise between customer benefit, optical installation space complexity and economic feasibility.

The Technical Implementation

One of the biggest challenges is the correct overlaying of the real environment with HUD content. This requires high-precision tracking of the driver's head position and seating position—an aspect that is particularly critical due to the high variability of the person behind the wheel. Furthermore, both the pitch and the yaw of the vehicle must be included in the calculations of the HUD content. Uneven road surfaces also lead to permanent vibrations and thus interfere with the stable positioning and exact assignment of contact analog AR elements.

In order to generate bright images with excellent contrast and large virtual image distances, high-performance DLP/LCOS laser projectors or TFT display optics must be used for AR HUDs. The choice of image generation technology has a massive influence on the size of the field of view that can actually be displayed as well as the image uniformity (uniformity).

The windscreen, or alternatively special combiner lenses, generate the image at a virtual distance of several meters in front of the vehicle. This ensures that the driver's eyes can always remain focused on the road while simultaneously perceiving the digital overlays. However, significantly larger fields of view inevitably require larger, more complex combiners or even multiple projection channels.

To ensure that the projected image is perfectly visible from different seating positions, developers use large eyebox optics and/or active eye tracking with dynamic image reproduction. However, enlarging the eyebox while maintaining the same field of vision and consistently high resolution will lead to a major conflict of objectives between technical feasibility and economic costs when designing future generations of HUDs. This applies in particular to extremely large fields of vision, as is the case with panoramic displays on the entire windshield. Such displays pose considerable optical and manufacturing challenges.

Limitations And Optical Problems

Imaging units require more power for larger and brighter images with higher resolution, which inevitably leads to greater heat generation. This thermal problem is being tackled in various ways: LEDs and lasers with higher efficiency as well as an optimized optical beam path reduce the basic heat generation, while adaptive brightness control together with dynamic contrast reduces the basic brightness required and thus the hardware power. The latest hardware generation from Harman has increased brightness by 50 percent and improved overall picture quality without increasing heat generation. At the same time, intelligent software ensures that the brightness is always adjusted to the exact level required.

Wide-angle optics naturally increase distortion and require more powerful projectors and much larger combiner surfaces. This not only drives up costs and weight, but also makes optical calibration much more complex. The windshield itself is also a limiting factor: its curvature, angle of inclination and special coatings severely restrict the practical field of vision. In addition, direct sunlight, lanterns or headlights change the lighting conditions, which can reduce contrast and cause optical distortion or so-called ghosting.

Double Images Are Avoided

As a solution to ghosting, wedge-shaped films made of polyvinyl butyral (PVB) are currently laminated into the laminated glass of the pane. These ensure that the reflections from the inside and outside of the pane are almost congruent, thus avoiding double images. However, this established solution only works within a relatively limited height range. For larger fields of vision, the performance of conventional PVB films is no longer sufficient, as the oblique light rays caused by the inclination and curvature of the windshield can no longer be comprehensively corrected.

Subscribe to the newsletter now

Don't Miss out on Our Best Content

By clicking on „Subscribe to Newsletter“ I agree to the processing and use of my data according to the consent form (please expand for details) and accept the Terms of Use. For more information, please see our Privacy Policy. The consent declaration relates, among other things, to the sending of editorial newsletters by email and to data matching for marketing purposes with selected advertising partners (e.g., LinkedIn, Google, Meta)

Unfold for details of your consent

If the field of view is to be extended while maintaining appropriate eyebox dimensions, the optical complexity increases considerably and the light source must inevitably become brighter. Doubling the eyebox or the field of view leads almost proportionally to a doubling of the required lumen output. An AR-HUD with twice the field of view (4 times the area) and twice the eyebox requires correspondingly exorbitantly more powerful projectors.

Our experience in development therefore clearly shows that the desired field of vision should be specified as early as possible in the development process. The specific application always serves as the basis for this: pure lane guidance, for example, requires significantly less field of view than full-surface panoramic navigation. Only on the basis of this definition should the optics and computing power be planned and the unavoidable compromises between field of view, resolution, brightness and eyebox size carefully weighed up.

Algorithms And Data Processing in Real Time

Image 1: The software of an AR-HUD has to combine and process data from different sensors. With Ready Vision, Harman achieves an overall latency of less than 50 milliseconds.(Image: Harman)
Image 1: The software of an AR-HUD has to combine and process data from different sensors. With Ready Vision, Harman achieves an overall latency of less than 50 milliseconds.
(Image: Harman)

The software of an AR HUD must combine and process data from a wide range of sensors in hard real time. Its main tasks include the calculation of a stable environmental geometry and the rendering of graphics with minimal latency and precise spatial registration. The total time required from sensor acquisition to actual projection must be so short that the digital overlays remain absolutely synchronized with the dynamically moving environment and human perception.

High-performance AR software, such as Harman's Ready Vision, enables an overall latency (motion-to-photon latency) of less than 50 milliseconds. In order to achieve such values, optimized sensor drivers, highly efficient real-time middleware and high-performance GPU/accelerator pipelines are absolutely essential. Such powerful AR software currently requires the following approximate system resources:

  • GPU: 40 GFLOPS
  • CPU: around 5,000 DMIPS
  • Memory: 500 MByte RAM
  • Memory: 512 MByte ROM/Flash
  • Frame rate: 60 FPS (frames per second)

The precise, spatially correct positioning of the information on the windshield requires a complex sensor fusion. Data from GPS/RTK (Real-Time Kinematic), inertial measurement units (IMU), vehicle odometry, mono and stereo cameras and lidar are merged here. The underlying fusion stack must be able to process delayed or intermittent data, extremely changing light conditions and highly dynamic traffic scenes without any problems. Deterministic middleware components for this real-time data fusion and the exact alignment of the time stamps (time stamping) are essential here.

In particular, safety-relevant displays of the AR-HUD require real-time object recognition and tracking, continuous lane and road modeling and reliable traffic sign recognition. It must also be possible to assess traffic situations dynamically and evaluate them predictively (for example: "Is this pedestrian about to cross the road?"). Optimized deep learning models are used to meet the enormous requirements in terms of data throughput and latency in the vehicle. These are quantified and ideally run on dedicated neural processing units (NPUs) or GPUs. If the system detects excessive uncertainty in the data situation, a fail-safe strategy takes effect: the display is gradually reduced—for example by reducing the complex overlays and switching to purely symbolic warnings—so as not to mislead drivers.

Compensate for the Vehicle's Own Movements

The graphics rendering itself must not only compensate for the vehicle's own movement, but also take into account and correct for head and eye offsets of the person behind the wheel as well as optical characteristics of the display (such as lens distortions). Among other things, this is done using motion-compensated prediction models and temporal filtering in order to avoid jitter and visual discrepancies. At the same time, mechanical vibrations of the vehicle caused by bumps in the road must be compensated for purely by software (image stabilization) when projecting the graphics.

As AR HUDs currently still have a limited field of vision (eyebox), the projected display must also always be adapted precisely to the current position of the driver's eyes by means of continuous eye and head tracking. This is the only way to ensure that the HUD image remains perfectly visible to the person behind the wheel at all times, even if they change their sitting position.

Depth Perception And Information Provision

In order for AR overlays to appear natural to the human eye, they must take complex depth cues such as occlusion, relative object size, motion parallax and vergence into account. Conventional HUDs with a fixed focus always place their virtual images in a single focal length. Although this helps to avoid vergence conflicts, it massively restricts the depth realism of the display.

For realistic overlays, the HUD system must be able to recognize physical overlaps in real time and process them graphically. An illustrative example: If a pedestrian steps into the field of vision in front of a virtually displayed navigation arrow, the real person must visually obscure this digital arrow (occlusion) and not vice versa. This requires high-precision, AI-supported depth estimation or lidar sensor fusion with extremely low latency.

Many standard AR HUDs currently still use 2D overlays that are only projected at a fixed distance, but are at least aligned (compliant) with real road objects. This pragmatic approach is simple, fail-safe and ideal for displaying navigation arrows and visually highlighting sources of danger.

Current high-end solutions, such as volumetric or light field approaches, go one step further: they generate real three-dimensional images and take several directions of vision into account at the same time. As a result, they provide the eye with physically correct vergence and accommodation signals, which drastically improves the driver's spatial distance perception. However, these innovative technologies are currently still too complex, space-intensive and expensive for widespread series use in cars.

Principles of Information Design

To avoid cognitive overload for the driver, AR-HUDs should only display information that is helpful for the immediate driving decision. Examples include precise lane guidance or critical trigger thresholds for the braking system. The edges of the road or potential hazards on the road should only be visually highlighted in their outlines, without obscuring real objects. The size and contrast of the displayed elements must also be dynamically scalable and permanently adapt to the distance, the ambient light and the attention of the person behind the wheel.

Market Dynamics And Opportunities for Developers

Major automotive suppliers around the world are investing heavily in the development of AR HUDs. The integration of artificial intelligence (AI) in AR HUD systems in particular is driving the market forward at a rapid pace. AI-powered HUD systems use advanced sensors and cameras to identify objects and potential hazards in real time and project relevant information onto the windshield in a targeted manner. This completely new generation of HUDs will significantly improve user comfort and safety and measurably reduce the number of potential accidents.

At Harman, we currently assume that the Asia-Pacific region will achieve the largest market share worldwide for AR HUDs by 2030, at around 50 percent. This is because several Asian car manufacturers are currently integrating specific safety measures into their next generation of vehicles, consistently utilizing the latest advances in head-up display technology. The European market will also grow rapidly—largely fueled by advanced technologies such as modern light-emitting diodes (LEDs) and liquid crystal displays (LCDs), which enable highly transparent and vivid images on windshields.

This expected market growth opens up enormous opportunities for developers in the fields of automotive software, computer-aided visualization and real-time graphics rendering. They are also helped by cross-manufacturer development structures, which enable the rapid implementation of new functions. These include reusable core components that are completely independent of hardware and navigation systems as well as intuitive visualization tools.

As an integral part of its Ready Vision product family, Harman provides advanced AR-HUD technology. Thanks to its innovative system architecture and precise sensor fusion, it significantly increases driving comfort and safety. With this modular solution, a fully comprehensive, production-ready implementation can be realized together with automotive customers within just twelve months. Harman is thus decisively advancing the next development stage of human-machine interaction (HMI) in the vehicle. (heh)

Mike Sun (Senior Product Manager Ready Vision) and Yeshvanth Venkatasubramanya (System Software Architecture Engineer) work in the Intelligent Cockpit division at Harman.