Home

Published

- 2 min read

Keynote (I-RIM Workshop on Artificial Perception)

img of Keynote (I-RIM Workshop on Artificial Perception)

Stefano Melacci summarized the research activities on Lifelong Learning from Video Streams in his keynote at the Workshop on Artificial Perception: from current state of the art in research and industry to the next frontiers, 4th edition of the Italian Conference on Robotics and Intelligent Machines (2022), Rome (Italy).

Full Title: Lifelong Learning from Video Streams

Abstract

The remarkable progress in computer vision on object recognition in the last few years achieved by deep convolutional neural networks is strongly connected with the availability of huge labeled data paired with strong and suitable computational resources. Clearly, the corresponding supervised communication protocol between machines and the visual environments is far from being natural. Current deep learning approaches based on supervised images mostly neglect the crucial role of temporal coherence: When computer scientists began to cultivate the idea of interpreting natural video, in order to simplify the problem they removed time, the connecting wire between frames. As we decide to frame visual learning processes in their own natural video environment, we soon realize that perceptual visual skills cannot emerge from massive supervision on different object categories. Foveated animals move their eyes, which means that even still images are perceived as patterns that change over time. Since information is interwound with motion, we propose to explore the consequences of stressing the assumption that the focus on motion is in fact nearly “all that we need”. When trusting this viewpoint, one realizes that time plays a crucial role, and the underlying computational model must refer to single pixels at a certain time.