To achieve full understanding of the use and application of ML algorithms, our participants will work on a real-life industry project, translating theoretical knowledge to practical process and overcoming realistic challenges.
Scope:~400 work hours total
Data:Real data provided by company
Guidance:Experienced mentors provided by Y-DATA
Support:Weekly meetings with company data-owner
PlaybuzzContent to context matching
Playbuzz is a storytelling platform offering real-time analytics tools to enable partners to engage users, boost reach, raise brand awareness, improve monetization capabilities, and optimize content for maximum social interaction. Provided with a dataset of the publishers’ pages and the relevant Playbuzz tagged units as well as previous users’ behavior on those pages, the company is looking to increase the users engagement with Playbuzz content units. Previous research has shown that users tend to engage better with a content unit that relates to the page content and context (subject, entities etc.). The aim of this project is to analyze the publishers web pages’ context (especially news and content sites) using NLP and try to find the most suitable matching between relevant content pieces via ML, which will be tested on site with A/B testing to verify its effectiveness.
Full project cycle
The process of working on the project follows popular industry standards and methodologies and incorporates a growing set of tools the students possess to methodically understand and solve a real-world problem. Our students have a full-cycle data science project in their portfolio upon graduation, covering all industry-standard stages: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation.
Example ProjectAutomatic detection of low-value queries in technical Q&A forum
A customer operates a forum where programmers ask each other questions, provide answers and rate questions giving them \"ups\" and \"downs\". The forum has a core expert community that provides good answers and valuable insights. However, they often waste their time handling questions of little to no value: marking questions as duplicates and redirecting them, closing topics with incoherent or irrelevant questions etc. Because of this, the overall efficiency of the system suffers.