Robot perception German researchers are teaching robots to see

Source: DFKI Kaiserslautern | Translated by AI 3 min Reading Time

Related Vendors

What comes intuitively to us with language learning, the determination of meaning, regardless of the actual linguistic expression, robots should now be able to do as well.

Although robots do somehow see their environment, they are not as flexible as humans in object recognition. Experts at DFKI now want to change this so that systems can correctly interpret objects in changing contexts.(Image: Three-peas-in-a-pod.)
Although robots do somehow see their environment, they are not as flexible as humans in object recognition. Experts at DFKI now want to change this so that systems can correctly interpret objects in changing contexts.
(Image: Three-peas-in-a-pod.)

How can a machine learn to visually orient itself in our world? This is the question that scientists at the German Research Institute for Artificial Intelligence (DFKI) want to answer, it is said. An answer, as they emphasize, is "MiKASA" (Multi Key Anchor Scene Aware Transformer for 3D Visual Grounding). For it allows complex spatial dependencies and features of objects in three-dimensional space to be identified and semantically understood. Ultimately, it is supposed to show robots and machines how we can perceive our environment, in order to let them understand it in a similar way.

The context is obviously the most important thing.

If we perceive a large, cube-shaped object in a kitchen, for instance, we can naturally assume that it is likely a dishwasher, the DFKI experts explain. If we recognize a similar shape in a bathroom, however, the assumption that it is a washing machine is more plausible. Meaning for us, therefore, depends on the context. This relationship is essential for a nuanced understanding of our surroundings, they continue. Through a so-called "scene-aware object recognizer," machines can now also draw conclusions from the surroundings of a reference object. Then they can recognize and correctly define the object in question more accurately. Another challenge for programs, however, is understanding relative spatial dependencies. After all, "the chair in front of the blue monitor" is entwined with a different perspective than "the chair behind the monitor." So that the machine understands that both chairs are actually one and the same object, "MiKASA" works with a so-called "multi key anchor concept". This communicates the coordinates of anchor points in the field of view, which happens in relation to the target object, as the researchers explain. Then the system evaluates the importance of nearby objects based on text descriptions.

Robots recognize objects as accurately as never before ...

Semantic references can help to locate the object. A chair is typically placed in the direction of a table or it stands against a wall, the experts specify. The presence of a table or a wall therefore indirectly defines the orientation of the chair. By linking language models, learned semantics, and the recognition of objects in the real three-dimensional space, "MiKASA" achieves an accuracy of up to 78.6 percent (Sr3D Challenge). This has increased the hit rate for object detection by around 10 percent compared to the best previous technology in this area!

Subscribe to the newsletter now

Don't Miss out on Our Best Content

By clicking on „Subscribe to Newsletter“ I agree to the processing and use of my data according to the consent form (please expand for details) and accept the Terms of Use. For more information, please see our Privacy Policy. The consent declaration relates, among other things, to the sending of editorial newsletters by email and to data matching for marketing purposes with selected advertising partners (e.g., LinkedIn, Google, Meta)

Unfold for details of your consent