Arguably, one of the hardest challenges faced now by the systems community stems from the exponential explosion in the availability of data, fueled by recent advances in sensing and actuation capabilities. Simply stated, classical techniques are ill equipped to handle very large volumes of (heterogeneous) data, due to poor scaling properties, and to impose the structural constraints required to implement ubiquitous sensing and control. For example, the powerful Linear Matrix Inequality framework developed in the past 20 years and associated semidefinite program based methods have proven very successful in providing global solutions to many control and identification problems. However, in may cases these methods break down when considering problems involving just a few hundred data points. On the other hand, several in-principle non-convex problems (e.g identification of classes of switched systems) can be efficiently solved in cases involving large amounts of data. Thus the traditional convex/non-convex dichotomy may fail to capture the intrinsic difficulty of some problems. The goal of this talk is to explore how this "curse of dimensionality" can be potentially overcome by exploiting the twin "blessings" of self-similarity (high degree of spatio-temporal correlation in the data) and inherent underlying sparsity, and to answer the question of "what is Big Data in systems theory?". While these ideas have already been recently used in machine learning (for instance in the context of dimensionality reduction and variable selection), they have hitherto not been fully exploited in systems theory. By appealing to a deep connection to semi-algebraic optimization, rank minimization and matrix completion we will show that, in the context of systems theory, the limiting factor is given by the "memory" of the system rather than the size of the data itself, and discuss the implications of this fact. These concepts will be illustrated examining examples of "easy" and "hard" problems, including identification and control of hybrid systems and (in)validation of switched models. We will conclude the talk by exploring the connection between hybrid systems identification, information extraction, and machine learning, and point out to new research directions in systems theory and in machine learning motivated by these problems.