Dr. Andrey Mishchenko
Toloka https://toloka.yandex.com/ is a Yandex crowd-sourcing platform, a kind of Russian version of the Amazon Mechanical Turk. It offers some payment for performing simple tasks and thus suffers from bots trying to replace the human workers and cheat the system.
This report covers our experience of dealing with this problem. Embedding the tasks with known answers (honeypots) may be used as the simplest method of bot detection. But an elaborated machine learned model is likely to do better so we focus on a high-tech solution. The general approach is that we define a probabilistic model of worker behaviour and optimize its parameters by a slightly modified EM algorithm. As an output we simultaneously receive the aggregated labels for the tasks and the worker’s performance ranking.
The report includes the description of the models variety that we considered as well as the internals of the implementation. In particular, it presents a framework for an automated derivatives evaluation used for gradient descent. We also suggest some hints on the methodology of the research process and quality measurement.