21 February 2014
San Francesco - Cappella Guinigi
Three years ago, researchers at Google labs extracted some 10 million images from YouTube and fed them into what was popularized as the Google Brain – a network of about 1,000 computers charged of making predictions all over the Web. In addition to this extraordinary effort, we have been experimenting a growing interest in merging computer vision and machine learning techniques, with special emphasis on deep learning models, which seem to overcome the limitations of shallow architectures like Support Vector Machines. While all this is definitely exciting, it is not clear how those machines can actually perform on unrestricted visual environments. In this talk, we present a new approach to unrestricted computer vision, where our visual agents live in their own environment by capturing video streams with the purpose of learning to see, thus following the typical human developmental plan to acquire visual skills. Basically, they are capable of working in any visual environment and acquire visual concepts by human interaction only. We discuss the basic idea and exhibit our LEASE software system on simple visual environments, so as to give a clear picture of its functional behavior and concrete capabilities. Finally, we propose a crowd-sourcing approach to evaluate the mechanisms of learning to see, and we claim that it can potentially suggest other novel approaches, and a truly paradigm shift on the way the research in computer vision will be carried out for the years to come.