1 July 2015
The abstraction layer provided by cloud computing obfuscates several details of the provided services, which, in turn, hinders the effectiveness of autonomic managers. Data-driven approaches, particularly those relying on service clustering, have been shown to help the autonomic management, for example, with the scheduling and deployment of services. One aspect that complicates this approach is that the information provided by the monitoring contains both continuous (e.g. CPU load) and categorical (e.g. VM instance type) data. Current approaches treat this problem in a heuristic fashion. In this talk, we will present an approach name RF+PAM, which uses all kinds of data and learns, in a data-driven fashion, the similarities and resource usage patterns among the services. In particular, we use an unsupervised formulation of the Random Forest algorithm to calculate similarities and provide them as input to a clustering algorithm. Finally, we show the applicability of our approach in the domain with a service scheduler that uses the notion of similarity among services in a cloud test-bed.