Boris Mirkin

Using Hierarchical Ontologies for Interpretation of Activities and Texts
In this talk, to “interpret” the meaning of a cluster is to present it using concepts that have been selected for this goal. I divide all cluster interpretation approaches according to the relative granularities of concepts involved in clusters and those used for interpretation of clusters. The most popular interpretation device in ontologies, the concept of overrepresentation, applies in the cases when concepts in clusters are much more granular than those used for interpretation. A cluster is overrepresented in a concept if the conditional frequency of the concept in the cluster is much greater of the concept frequency overall (e.g. Robinson 2011). I consider a different case, at which both clustered concepts and interpreting concepts belong to the same hierarchical ontology (taxonomy) such as the Classification of Computing Subjects developed by the ACM (ACM CCS 2012). Consider a taxonomy represented by a rooted tree whose nodes are annotated by concepts according to the relation captured by the tree. Let a fuzzy or crisp set of its leaves L represent an externally found cluster which is to be interpreted in T. An interpretation of L is represented by a set of T nodes some of which cover elements of L (‘head subjects’ and ‘offshoots’), whereas the others do not overlap L (‘gaps’). The goal is to find a cheapest interpretation with respect to a penalty function summarizing all the penalties associated with its elements weighted by the L fuzzy membership values extended to the entire T. We develop a recursive algorithm for globally minimizing the penalty function (Mirkin, Fenner, Nascimento, 2010, 2011) and apply it at several settings. I will briefly describe applications to the analysis of working of CS research organizations, annotation of papers in ACM journals, and analysis of city dwellers’ complaints.