In their guest contribution, Pedro Santos and Andre Stork describe an innovative 3D capture system that operates without traditional teaching. With dynamic path planning, it manages to capture objects fully automatically and in impressive detail.
The fully automatic 3D capture system operates without traditional teaching.
(Image: Fraunhofer IGD)
Robotic systems for 3D capturing of objects and components have been available for some time. A characteristic feature of these systems is that they require teaching. For components of batch size 1, it may not be economically viable to teach the robot the path for capturing the object—during this time, the object may instead be captured using a handheld scanner.
We present the first robotic system that fully automatically captures objects with the desired quality. Without any manual post-processing, the system generates highly accurate, true-color 3D models. The key to full automation lies in algorithms developed by us. These are used for the dynamic path planning of the robot's views from which the object is captured. The intelligent automatic view planning ensures complete coverage of the visible object surface in the desired target resolution of up to 10 micrometers. Due to its ease of use and high quality, the system is already in use in several museums for the fully automatic capture of cultural artifacts and can also be used for quality assurance.
The Capture Techniques
Thanks to the availability of affordable hardware, virtually anyone with a smartphone today can capture their environment in three-dimensional digital form—it has become almost as simple as taking a photo. On the path to democratizing 3D capture, Microsoft's Kinect was certainly a key factor. Today, its principles can be found in many smartphones. Depending on the application, not only the geometry of the object but also its appearance is relevant, so in addition to depth images, traditional 2-dimensional color images are also captured.
In industrial 3D capture and quality control, various approaches and systems are used depending on the requirements—these fall into the two major categories of contact and non-contact systems. Tactile coordinate measuring machines are primarily used for highly accurate point-by-point scanning of manufactured features. Non-contact systems, on the other hand, are more commonly used for large-scale 3D reconstruction of surfaces—a complete capture is achieved by assembling the partial surfaces recorded from different views. Thus, many captures with varying sensor positions and orientations must be made to fully capture the surface.
This is also necessary to perform 3D reconstruction and obtain a 3D model of the physical object. The capture process is relatively simple with hand-held 3D scanners, which are manually moved around the object. Depending on the shape and size, the user must move the scanner to many different positions and orientations to capture all visible surfaces or just the task-relevant areas—the data processing often occurs in real-time, allowing the user to watch as the digital image becomes increasingly complete. The completeness and quality of the result depend, among other factors, on the user's experience—as does repeatability. During the process, the user is "tied up" and unable to do anything else—unlike the approach with robotic-assisted methods.
Capture in Three Steps
In industrial processes for quality assurance, steps have been taken to automate such processes, as relying on human experience or daily performance is not ideal. The first stage consists of combinations of 3D scanners and turntables, which rotate the object in front of the scanner, allowing a larger portion of the surface to be captured automatically from one perspective. A second stage combines the turntable with a scanner that can be moved automatically up or down linearly. This allows more to be captured, though a complete representation of the visible surface without repositioning the object is not guaranteed. In the third stage, the robot moves to individual positions and orientations to capture the object as completely as possible. Recordings are made from each of these positions.
3D Capture Without Prior Teaching
Teaching the robot these positions and orientations in a teaching process is time-consuming. If only a few objects of a type need to be measured, the teaching may not be worthwhile. The real challenge lies in identifying the necessary positions and orientations required for training purposes. The goal is to achieve a complete reconstruction at the desired quality. At the same time, the number of positions and orientations must be minimized to make the scanning process as efficient and fast as possible.
This is where our solution comes in, enabling fully automatic 3D capture without prior teaching. After setting up the system, a calibration step is performed. The user then only needs to enter the height of the area to be captured—the diameter is predefined by the turntable by default but can be overridden by the user. Once the object is placed, the process starts "with a single click" and then runs fully automatically. During this time, the operator can attend to other tasks—a visualization provides updates on the progress or completion of the process. By default, we use photogrammetry, and while the generation of the high-resolution final 3D model is underway, the next object can already be captured if necessary.
Date: 08.12.2025
Naturally, we always handle your personal data responsibly. Any personal data we receive from you is processed in accordance with applicable data protection legislation. For detailed information please see our privacy policy.
Consent to the use of data for promotional purposes
I hereby consent to Vogel Communications Group GmbH & Co. KG, Max-Planck-Str. 7-9, 97082 Würzburg including any affiliated companies according to §§ 15 et seq. AktG (hereafter: Vogel Communications Group) using my e-mail address to send editorial newsletters. A list of all affiliated companies can be found here
Newsletter content may include all products and services of any companies mentioned above, including for example specialist journals and books, events and fairs as well as event-related products and services, print and digital media offers and services such as additional (editorial) newsletters, raffles, lead campaigns, market research both online and offline, specialist webportals and e-learning offers. In case my personal telephone number has also been collected, it may be used for offers of aforementioned products, for services of the companies mentioned above, and market research purposes.
Additionally, my consent also includes the processing of my email address and telephone number for data matching for marketing purposes with select advertising partners such as LinkedIn, Google, and Meta. For this, Vogel Communications Group may transmit said data in hashed form to the advertising partners who then use said data to determine whether I am also a member of the mentioned advertising partner portals. Vogel Communications Group uses this feature for the purposes of re-targeting (up-selling, cross-selling, and customer loyalty), generating so-called look-alike audiences for acquisition of new customers, and as basis for exclusion for on-going advertising campaigns. Further information can be found in section “data matching for marketing purposes”.
In case I access protected data on Internet portals of Vogel Communications Group including any affiliated companies according to §§ 15 et seq. AktG, I need to provide further data in order to register for the access to such content. In return for this free access to editorial content, my data may be used in accordance with this consent for the purposes stated here. This does not apply to data matching for marketing purposes.
Right of revocation
I understand that I can revoke my consent at will. My revocation does not change the lawfulness of data processing that was conducted based on my consent leading up to my revocation. One option to declare my revocation is to use the contact form found at https://contact.vogel.de. In case I no longer wish to receive certain newsletters, I have subscribed to, I can also click on the unsubscribe link included at the end of a newsletter. Further information regarding my right of revocation and the implementation of it as well as the consequences of my revocation can be found in the data protection declaration, section editorial newsletter.
The Solution: Individual and Dynamic Algorithmics
Hardware-wise, our solution essentially consists of a robotic arm, a turntable, and a scanning system—which is not uncommon at first glance. The hardware stands out for its flexibility and a design that is as simple as possible, allowing even non-experts to set it up effortlessly. The key lies in the intelligent algorithms that control the process individually and dynamically. This is done based on the information already captured. The algorithm adjusts the subsequent steps to achieve maximum coverage of the visible surface with a minimal number of camera positions and orientations, all within the predefined target resolution.
A teaching of the robot is therefore not required. The data cleaning, often necessary with handheld scanners, is performed automatically. The result is a 3D model that can be used for quality assurance investigations. Additionally, the colored 3D model can also be used for other purposes.
Camera and Ring Light: The Hardware Components
We differentiate the hardware components into capture and positioning devices. Both are synchronized and controlled by our algorithms, which run on a standard PC. For capture, we combine a high-resolution photo camera with a custom-made ring light and optional background lighting. The camera is a Phase One IXH with 150 megapixels. The ring light features a D50 spectrum, which is ideal for true-color capture. It allows polarization filters to be attached to better capture specular surfaces.
For positioning, a lightweight robotic arm, such as the Universal Robots UR10 or UR20, is used to hold the camera. This is combined with a turntable on which the object is placed. This allows the object to be captured from all sides while the camera's movements are restricted to one side of the turntable. The robotic arm does not need to "reach over the object" to capture it from the opposite side. This enables the use of smaller reach and more affordable robots. Cavities are captured—as far as they are visible from the outside—since the camera can be moved to any orientation and position with the robotic arm.
Since the object to be captured must not be damaged under any circumstances, we take precautions already at the hardware level: brakes that remain open under power lock the joints of the robotic arm in the event of a power failure. The software ensures that no collisions occur during the scanning process. This not only protects the object but also ensures workplace safety.
Image 1: The fully automatic 3D capture system in the compact version.
(Image: Fraunhofer IGD)
The 3D capture system is available in two variants:
a lightweight, compact desktop version that supports a payload of up to 100 kilograms and offers a digitization volume of approximately 80 centimeters (approx. 30 inches) in height and 60 centimeters (approx. 20 inches) in diameter, as well as
A heavier, foldable version equipped with a freely positionable floor turntable, a payload of up to 1000 kilograms, a lift kit for the robotic arm, and a "mushroom" for the turntable. This configuration allows objects ranging from the size of a screw to a vehicle axle to be captured within a digitization volume of approximately 230 centimeters (approx. 90 inches) in height and 130 centimeters (approx. 50 inches) in diameter.
Clamping devices can be attached to the turntable to secure objects that cannot stand on their own. Both variants are mobile and can be quickly set up at new locations. Other camera-lens combinations are also possible. All relevant hardware components were modeled, simulated, and integrated into a unified virtual 3D environment during the development process of the system, which is part of the complete system's software.
The Software Components for View Calculation
The key to the automated process is a dynamic approach to view planning for the camera. This ensures an optimal number of positions and orientations (poses) to capture the entire visible surface at a predefined target resolution. All requirements of the photogrammetric approach are considered, such as 70 percent overlap of adjacent images for sufficient feature matching. A consistent distance between the object’s surface and the camera’s focal plane is also maintained.
During the process, the image sharpness is continuously analyzed. Only the image regions that are within the camera's depth of field are used for the 3D reconstruction. This information is continuously incorporated into the view calculation process.
Image 2: Visualization of the dynamic view planning.
(Image: Fraunhofer IGD)
Our software visualizes the current robot pose and the corresponding camera angle for the user. It processes the robot's sensor data in real time to create a virtual 3D representation of the real situation that is easy to understand. Additionally, the software displays the next calculated (planned) views in the virtual 3D scene as a green, semi-transparent overlay. Interim reconstruction results are displayed within the previously defined and also visualized safety cylinder around the object. They provide a preview of the resulting 3D model. The display is continuously updated.
Not only must the positions and orientations for image capture be calculated, but also the robot's paths between them. For this purpose, forward and inverse kinematics techniques have been implemented. These calculate the robot trajectories to ensure the fastest yet safest—i.e., collision-free—movement between consecutive poses. Camera perspectives that would result in a collision are discarded or adjusted. All robot components are equipped with collision detection systems to prevent collisions.
Initial Calibration
After setting up the system, an automatic self-calibration is performed, which includes the following three steps:
Calibration of the camera intrinsics
Robot arm-sensor calibration
Calibration of the turntable
This geometric calibration is necessary to ensure the precision of the 3D results.
First, the camera intrinsics are determined to define the actual field of view and correct lens distortions. Next, the robot arm-sensor transformation between the optical center of the camera (the sensor) and the tool frame of the robot (the arm) is established. This determines the optical center of the camera relative to the base of the robot arm. Finally, the calibration of the turntable defines its position in space and the rotation axis of the scan volume. Once the calibration target is placed on the turntable and the process is started with a mouse click, all calibration data is determined automatically.
Subsequently, the color properties of the camera are determined by placing a color target, such as the X-Rite Color Checker SG for standard setups or the Rez Checker Target for macro setups. The user is informed at all times about the required actions and the progress of the calibration via the user interface.
Image Capture and 3D Reconstruction
The 3D scanning station reconstructs 3D models using photogrammetry. The captured raw data consists of high-resolution images of the object. Structure-from-Motion and Multi-View Stereo are used to identify features and triangulate 3D information. The high quality of the final 3D model is achieved by calculating the robot's poses in such a way that all externally visible parts of the surface are fully captured. Typical achievable resolutions of the 3D model are in the range of 10–15 micrometers.
For optimal resolution, focused camera macro lenses can be used—even for objects larger than the camera's measurement volume (defined by field of view and depth of field). Typically, only a portion of the captured object surface is sharply imaged per photo. Therefore, many images are needed to cover the entire surface in high resolution and sharp focus. For the user, this means deciding in advance on a compromise between scan time and target quality to best meet their 3D digitization goals.
With a 150-megapixel Phase One IXH camera, we capture 1.2 images per second at 14-bit color depth and four channels, which equates to approximately 4,300 images per hour. For further processing, the images are transferred and stored via the camera's 10 gigabit per second Ethernet connection. The capture rate is primarily determined by the camera's transmission speed. To achieve the highest possible capture rate, the poses calculated by the dynamic view planning for the current scan step are continuously sorted. These poses are always arranged based on their proximity to one another. Transitions from one pose to the next are approached via the shortest path, taking into account the image transfer duration from the camera to the PC.
The duration of the 3D reconstruction of a color 3D model in full resolution using photogrammetry is relatively long compared to other 3D reconstruction methods. For complex objects, the 3D reconstruction can take several hours—however, the next object can already be captured during this time. During the scanning process, we calculate lower-resolution models in the meantime to
a) to make decisions for view planning based on these approximate 3D models and
b) to inform the user about the progress and the current appearance of the 3D model.
Intelligent Dynamic View Planning Instead of Teaching
To free the user from determining all camera poses necessary to completely capture an object and then teaching a robot, we developed and implemented an intelligent dynamic view planning system. This means that the robot operates autonomously. Hence, the view planning also contributes to autonomous robotics.
The view planning calculates the smallest possible set of camera poses to fully capture all externally visible object surface parts. These are necessary to reconstruct a 3D model with the desired quality.
Image 3: Estimation of depth of field. (Green: optimally focused. Blue: far plane. Red: near plane.)
(Image: Fraunhofer IGD)
The view planning can be considered an optimization problem. It aims to maximize the overall quality of the model, minimize the number of captures, and also meet safety requirements during the process. Our method is incremental and implements a feedback process of planning, capture, and reconstruction, where interim reconstructions influence the subsequent planning steps. One challenge was to find a quality metric that could be determined on interim results during the scanning process and reliably estimate the quality of the final 3D model.
After the initial user inputs of diameter and height of the object/scan volume, an initial set of views is calculated. Image 3 shows a preliminary 3D reconstruction from the initial quick scan with 40 low-resolution images. The system automatically evaluates the density of the point cloud, identifies areas with low density and holes (highlighted in red), and those with sufficiently dense points (shown in blue). Less dense areas result from occlusions or surfaces aligned parallel to the camera's viewing direction.
Based on the approximate 3D reconstructions, the set of additional views to be targeted is planned during the process. For this purpose, the camera parameters from the calibration and rendering techniques are used to simulate the effects of the view candidates. Image 3 shows how the camera's depth of field is mapped onto the object. Candidates that maximize the area in focus for low-density regions are selected. The robot is then guided to perform the next scanning phase. Meanwhile, the 3D reconstruction is updated. This iterative process continues until the desired surface density is achieved.
Empirically, we observed a strong correlation between the density determinations on the approximate 3D reconstructions and the surface quality of the resulting final 3D model. Furthermore, it is crucial that the captured images are sharp, meaning the distance from the camera to the object is precisely maintained by the robot—as we deliberately avoid using autofocus lenses in favor of image quality and sharpness.
The Final 3D Reconstruction
At the end of the digitization process stands the final 3D reconstruction of the object. The object to be digitized can be captured with resolutions of up to 10 micrometers, which are reflected in the high-resolution 3D model. The final results can then be visualized and analyzed (see Image 4).
Example visualization and analysis of the curvature behavior of the object's surface.
(Image: Fraunhofer IGD)
Summary and Outlook
The demand for economical and accurate 3D capture of objects and components is rapidly increasing, not only for quality assurance but also for interactive online visualizations or virtual reality. At the same time, the growing need highlights the shortage of skilled professionals, making autonomous fully automated systems the only viable solution to meet the demand for 3D capture in the medium term.
We have developed the first fully automated and true-color solution for robotic 3D capture and efficient processing of 3D data and images, achieving consistently high quality at a predefined target resolution without manual post-processing for the final 3D models. Along the way, "by-products" such as 3D web models, rendered videos, and 3D print models are created. The system is highly flexible and can be configured in various ways, making it a platform for future improvements and the integration of additional measurement technologies.
The combination of intelligent algorithms and autonomous robotics represents a significant contribution to the advancement of innovative digitization technologies. The solution presented here is being marketed by our spin-off: Verus Digital GmbH. Fraunhofer IGD remains the primary contact for further developments of the system.
The Fraunhofer IGD has successfully transferred the underlying concepts of this solution to other application fields. For example, the first fully automated, robot-assisted decontamination system has been put into operation. This system removes layers from components using high-pressure water jets during the decommissioning of the Biblis nuclear power plant. RWE Nuclear GmbH operates this technology. As an independent institution, we are open to further adaptations.
Future-Proof Authentication with Universal RFID Readers