How can a robot learn efficient perceptual representations of its body and of external objects given initially only low-level perceptual capabilities?
Segmenting the perceptual spaces into discrete objects representations such as one’s own body, humans or manipulable objects is one of the fundamental capabilities for a robot to operate in any environment. Many approaches have been developed in the computer vision community for these tasks, but the developmental approach to robotics imposes several specific constraints: the approach should be generic and apply to all potential “objects” present in the environment (e.g. parts of the robot body, human faces and bodies, objects) and it should perform on-line and incremental learning of new objects in an open-ended scenario. These constraints imply that such a system will probably be less efficient than algorithms specially tailored for specific tasks in specific environments. But, on the other hand, it will give a generic visual learning capacity that can be applied to many novel robot tasks. One of the main topics is therefore the selection of appropriate visual information representations that allow for their efficient creation and modification as new data is acquired.
In MACSi, our goal is to make progress in an integrated approach for on-line visual learning and recognition on a robot. We will therefore reuse as much as possible state-of-the-art computer vision techniques, adapt them to the developmental robotics constraints and complement them by the different possibilities offered by the fact that they are implemented on a robot in a social context. We will notably develop new “proto-object” models, i.e., new low level image representations that extend existing “bag of visual words” approaches to new low level features and to incremental on-line learning. We will then build on these “proto-objects” to learn higher-level concepts such as one’s own body, humans or manipulable objects. Building these concepts will notably entail a guided exploration of the large sensory space provided by the proto-object model and a tight coupling with the motor representation learning capability (see challenge 2).
Challenge leader : ENSTA-ParisTech (Paris).
Dernière modification le 31/05/2011