Relationship Graph Filtering Drawing on the Example of the Site Hierarchy Extraction

Boris Belyaev
Russia, Yandex
Relationship graph filtering is a well-known problem. Basic data of the problem is usually presented as a graph with great number of garbage edges. Graph filtering and clear relationships construction sets a variety of algorithmic problems. A solution to the problem drawing on the example of site hierarchy extraction will be presented in this talk. Trying to get a site’s hierarchy, we face some technical challenges: different types of subdomains, the site’s menu, metalinks, filters’ combinatorial explosion, different types of sortings, the hierarchy’s uncertainty. Some of the challenges are closely connected with the site’s topic, but most of them are widely distributed among different topics and types of sites. How to use different techniques of graph filtering to solve these problems will be shown in the presentation. Different methods of URL templates, hierarchy corrections, and statistical analysis of elements will be examined.