10 July 2014
San Francesco - Via della Quarquonia 1 (Classroom 1 )
Limitations of capabilities of shallow (one-hidden-layer) neural networks will be discussed. Motivated by recent interest in deep networks (with more than one hidden layer), we investigate tasks whose computation by shallow networks requires considerably higher model complexities than by deep ones. A seeming paradox will be shown: for large d, almost any randomly chosen function on d-dimensional Boolean cube cannot by tractably represented by shallow networks with popular computational units (such as Heaviside perceptrons or Gaussian support vector machine), but concrete examples of such functions are difficult to construct. The situation can be rephrased in analogy with coding theory as "almost any function which we cannot think is untractable". The results are based on a property of high-dimensional spaces called "concentration of measure'' combined with a geometric characterization of a variation of a function with respect to a dictionary of computational units, and relatively "small'' sizes of such dictionaries corresponding to popular computational units.