2 April 2014
San Francesco - Cappella Guinigi
Data mining tools are essential in many information search and filtering tasks, e.g., document ranking in web search. This talk explores a few opportunities and challenges in mining user generated content (UGC) and in exploiting UGC in data mining tasks. Results in the following research areas are illustrated: pattern mining, image search, query log mining, text annotation (entity linking) and news recommendation. We first introduce the pattern mining task, and discuss a recent algorithm for mining the most interesting patterns from binary datasets according to the Minimum Description Length principle. In the area of image search, the impact of approximate caching is discussed by highlighting efficiency and efficacy benefits. Query logs is a special kind of UGC, and it is crucial for web search engines. Our goal is that of discovering how users' tasks are turned into web queries. Finally, the illustrate how UGC structured contents such as Wikipedia, can be exploited to annotate text and improve its analysis. We show a case study where text annotation is exploited together with Twitter data, which is another UGC, to improve personalised news recommendations.