Data Science capstone projects batch #21

by Ekaterina Butyugina

students-working-on-a-data-science-project
We are delighted to present the outstanding achievements of our talented data science students from batch #21 in Zurich and the successful second batch in Munich in this exclusive blog post. In just three months, our students successfully completed a variety of challenging projects, demonstrating their exceptional skills and dedication. We invite you to take a closer look at the impressive results they have achieved in such a brief time and to celebrate their hard work and excellence.

Each project is a testament to their expertise, perseverance, and unwavering commitment to excellence. Through careful analysis and innovative techniques, they have developed solutions that will impress you.

We encourage you to view the remarkable work our students have produced. Experience firsthand the transformative power of data science as they push boundaries, gain insights, and create significant impact. It is with great admiration and respect that we celebrate their achievements and recognize the immense value they bring to the field.

Join us on this journey of discovery as we acknowledge and honor these exceptional data science practitioners. Together, let us commend their remarkable contributions and celebrate their resounding success.
 

Intents analysis and categorization in radio communication

Students: Yeeun Kim, Adriano Persegani, Thi Tuyen Nguyen, Ibrahima Ba

Constructor is a leading intelligent platform that streamlines and expedites the research life cycle through AI-based computational modeling. Backed by a robust infrastructure and supported by over 40 secure partner data centers worldwide, Rolos significantly enhances the quality of scientific experiments while boosting the productivity of research teams in both scientific and industrial domains.

One of Rolos' esteemed clients is NASCAR, the renowned car racing league. With 45 radio channels facilitating communication between drivers and engineers, manually monitoring and analyzing all these channels becomes a laborious and resource-intensive task. Thus, there arises a pressing need for an automated real-time radio analytics service capable of delivering valuable insights.

To address this challenge, our team diligently analyzed text messages and categorized their intents. By training prediction models on a diverse range of 29 predefined categories, including fuel and tires, confirmation, and more, we were able to effectively classify the intents of these messages. This classification enables engineers to swiftly identify the nature of the message and respond accordingly, thereby ensuring the safety of the driver and maintaining the car in optimal condition throughout the race.

Employing cutting-edge techniques such as Natural Language Processing (NLP), machine learning, and deep learning models, our group successfully predicted the content of unseen data. Remarkably, our best-performing model achieves an impressive accuracy rate of 87% in categorizing messages across the 29 predefined categories.

With Rolos' advanced technology and expertise, we empower organizations like NASCAR to leverage the power of AI and data-driven insights, revolutionizing their operations and achieving unprecedented levels of efficiency and performance.            

levels of efficiency and performance


With the good performance of our models, the NASCAR teams can optimize their strategies and make adjustments in real-time, which can lead to better results and help reduce the risk of accidents or other safety issues.

Furthermore, the team has developed a captivating Streamlit application for users to explore and evaluate the label classification app. We invite you to embark on this exciting journey by visiting radionascar.com and experiencing it firsthand.


successful-findings

 

Decoding Infertility: streamlining the analysis of telomere length

Students: Raquel Riquelme Borja, Nefeli Dellepiane, and Dora Köhalmi

Beyond Genomix is a clinical-stage Swiss medtech company dedicated to the development of cutting-edge technologies for the analysis of non-coding DNA. Our primary focus lies in the realm of reproductive health diagnostics. Leveraging our expertise, we have successfully engineered an innovative infertility test centered around the precise measurement of telomere length. What sets our test apart is its remarkable simplicity - requiring only a few drops of blood for accurate results.


Comparing-measurement-of-telomere-length


Telomeres, situated at the ends of chromosomes, play a crucial role as protective caps for DNA. Typically spanning 8000 to 10000 DNA bases, telomere length can vary based on factors such as age, sex, ethnicity, and lifestyle. Shorter telomeres have been identified as potential biomarkers for age-related diseases and infertility in both males and females. In cases where other tests yield inconclusive results, telomere length testing can assist in diagnosing infertility.

Beyond Genomix utilizes patented microscopy technology to measure telomere length and provides the results in a text file format. These results enable the calculation of an individual's average telomere length per cell, which is then visualized through graphs to offer a comparison against healthy individuals. A comprehensive document, including these informative graphs, is shared with the patient. However, the manual production of these graphs has been a time-consuming process.

Beyond Genomix's microscopy technology also generates data concerning the cell nucleus, which could be valuable for distinguishing patients from healthy individuals yet remains unexplored territory.

With the objective of automating the analysis and visualization of data obtained from newly tested individuals, Raquel, Nefeli, and Dora embarked on a mission to facilitate the swift generation of graphs required for patient reports. Furthermore, their aim was to explore untapped aspects of the data and determine how it could be leveraged for the benefit of the company.

To achieve these objectives, the team devised a robust pipeline for processing, cleansing, analyzing, and visualizing the existing data. They seamlessly integrated this pipeline into a Streamlit app, ensuring easy access to the most relevant information in the form of graphs and tables. As a result, when a new individual undergoes testing, their microscope data can be automatically analyzed, and the necessary graphs, similar to the one depicted below, can be downloaded with a single click for the patient report.


Patient-report


Through their innovative efforts, Raquel, Nefeli, and Dora have revolutionized the efficiency of data analysis and visualization at Beyond Genomix. The implementation of their streamlined pipeline and user-friendly app empowers the company to deliver comprehensive and visually appealing reports promptly, enhancing the overall patient experience.

Simultaneously with the development of the app, our students underwent training to create various machine-learning models using the company's datasets. These models aimed to predict whether a newly tested individual aligns more closely with the patient or healthy class. To achieve this, the models leveraged previously untapped data. Notably, the top performers were Linear Regression and XGboost Decision Tree, boasting impressive accuracy rates of 97% and 95%, respectively. These models were seamlessly integrated into the Streamlit app, enabling users to swiftly assess the status of the tested individual.

Additionally, an extra page was incorporated into the Streamlit app, showcasing vital information and visualizations of the current datasets. This invaluable addition empowers the company to monitor crucial metrics within the dataset populations, including average age, gender distribution, and other metrics indicative of data quality. As Beyond Genomix strives to expand its datasets, this page serves as a valuable tool for assessing their growth and maintaining data integrity.

By streamlining the entire data processing pipeline, from text file format to the generation of visualizations for patient reports, Raquel, Dora, and Nefeli have significantly simplified and accelerated the process. Their contributions have ensured that Beyond Genomix can seamlessly continue providing diagnostic tests while focusing on expanding its datasets. Moreover, their efforts have yielded invaluable machine-learning models that make efficient use of available data. They have provided the company with a robust framework to monitor and manage the ongoing growth of their datasets.
 

NEAR Social: A recommender system for an on-chain social network

Students: Christian Kühner, Daniel Herrmann, Agustin Rojo Serrano

This project was proposed by Pagoda, a company developing the operating system of the NEAR blockchain.

NEAR Social is a cutting-edge social network built on the NEAR Blockchain, where users access the platform using their NEAR wallet address. Each user action, whether it's posting, following others, liking content, or updating their profile, is meticulously recorded as a blockchain transaction and permanently stored in the public ledger. This decentralized approach ensures that user data remains in the hands of the individuals, while developers have the freedom to enhance the platform's functionality through the creation of open-source apps called widgets, without requiring permission.


Architectural-overview


Our primary objective was to develop a robust user recommendation system that would facilitate meaningful connections among like-minded individuals and fuel the growth of the network. To achieve this, we undertook the task of designing a sophisticated system that takes into account the diverse data available for each user. Leveraging this data, we employed four distinct approaches to provide personalized recommendations:
  • Trending users have been identified through a trending metric, which is defined as the ratio of engagement to activity over the past 30 days. This approach enables us to recommend users who consistently produce captivating and high-quality content.
  • Employing unsupervised learning techniques, we have devised an algorithm that selects second-degree connections with a higher likelihood of being connected to a particular user based on their mutual friends. For instance, if a user shares all our connections as friends, it is highly probable that this user shares similar interests.
  • To enhance user experiences, we have introduced profile tags that can be displayed alongside their names. Our Tag Similarity algorithm intelligently suggests other users who possess similar tags, fostering connections among individuals with shared interests.
  • Furthermore, we have harnessed the power of Large Language Models to develop a Post Similarity algorithm. When a user posts a message on our social network, this algorithm retrieves the most similar post from our vast corpus of previous posts, ultimately recommending the user who authored the comparable post.
  • Through these innovative approaches, we aim to enrich user interactions and foster a sense of community within our platform. By leveraging advanced algorithms and intelligent recommendations, we strive to provide an engaging and personalized social networking experience for all users.

Post-similarity


This recommender system was designed and implemented for a growing social network to engage users. In the near future, each profile will have an on-chain widget.

Thank you everybody for a fantastic partnership and an amazing project period! We at Constructor Academy wish our Data Science graduates the best of luck.

We would like to extend our heartfelt gratitude to each and every one of you for your outstanding partnership and the incredible journey we have embarked upon together! Constructor Academy is immensely proud to have collaborated with such exceptional individuals during this remarkable project period. As we bid farewell, we wish our Data Science graduates nothing but the utmost success and fulfillment in their future endeavors.
 

Elevate your career with Constructor Academy's cutting-edge Data Science Bootcamp.


Are you ready to unlock a world of limitless possibilities in a highly demanding, esteemed, and financially rewarding career? Look no further than the esteemed Data Science bootcamp offered by Constructor Academy.

Designed to equip you with the essential techniques and technologies for harnessing the power of real-world data, our bootcamp offers two flexible options: full-time (12 weeks) and part-time (22 weeks). Throughout this immersive experience, you will master transformative technologies including machine learning, natural language processing (NLP), Python, deep learning, and data visualization. 

But wait, there's more! Embark on your data science journey with our complimentary introduction to the captivating realm of data science. Simply click here to access this valuable resource and start your exploration today.

Get ready to embrace a future brimming with endless opportunities. Constructor Academy is committed to empowering aspiring data scientists like you to unleash your true potential and pave the way for unparalleled success. Join us on this exhilarating adventure, and let's shape the future of data science together.

Interested in reading more about Constructor Academy and tech related topics? Then check out our other blog posts.

Read more
Blog