Data Science final projects batch #15

by Badru Stanicki

Data Science student projects class #15

Idorsia: Single-Cell Classification

Students: Peerawan Wiwattananon, Martina Trippel
 
When there are tumor cells, immune cells infiltrate into the tumor to kill tumor cells. In order to investigate drug performances to cure cancers, it is necessary to develop a tool to classify immune cells inside tumors before and after the patients get their treatments. This tool can compare which cells still exist inside the tumors and which ones have disappeared due to the treatment (Figure 1).
The students developed a tool to classify cells inside the tumors by using the single-cell reference Atlas to create a cell-type classifier. This tool can then be used to automatically assign cell labels to the cells found in new single-cell studies.
 
The project group and SIT Academy are proud of getting the opportunity to contribute to Idorsia's effort of developing drugs for curing cancers.

visualization single cell

Figure 1: Schematic visualization of using single-cell classification for developing new treatments

Idorsia

Figure 2: Projection of the high dimensional feature space into 2 dimensions, showing the labels assigned by the model that was developed by the students. 
 

Nispera: Detection of Wind Turbine Underperformance

Students: Mario Kovacs, Pedro Pereira, Lisa Christl
 
Nispera is a Zurich-based company providing data intelligence services for renewable energy plants. One of their services is the optimization of the performance of wind turbine farms.
 
The first part of this project was to describe the typical relationship between environmental factors, such as wind speed and temperature, and the produced electrical power of a specific set of wind turbines. For determining such a standard power curve of a wind turbine it was required to analyze the data and develop filter algorithms for removing irregular and inconsistent values. 

Raw data
Picture 1: Raw Data, removing unphysical data, removing outliers

After obtaining a clean and robust dataset, Lisa, Pedro, and Mario developed various regression algorithms for modeling the performance of each turbine. 

Filtered data
Picture 2: Filtered data (blue), ML Model (green)
 
In the subsequent part of the project, the team used their models to design an alarm mechanism that triggers once underperformance is detected. 
 
This approach can be used to notice wind farm operators in real-time to act and reduce potential monetary losses. For the analyzed set of 10 wind turbines, the accumulated loss over 3 years due to underperformance was estimated to be in the range of 100.000 USD. The methods developed during the project could enable Nispera to develop a new service for improving the performance of wind energy production.

Chart Nispera
Picture 3: Cumulative energy loss of each of the 10 wind turbines 

 

Sentifi: Stock Selection Model Based on AI-powered ESG Score

Students: Eduardo Aguilar Moreno, Anselme Borgeaud, Rubén Coll Menéndez
 
An ESG (Environmental, Social, and Corporate Governance) score is an evaluation of a firm’s collective conscientiousness for social, environmental and governance factors. Investors are increasingly applying these non-financial factors as part of their investment process to identify material risks and growth opportunities. To help investors and the government, data providers provide ESG ratings for companies/stocks. The problem with these ratings is that they are created manually by analysts and only updated every year.
 
Sentifi has developed an ESG score that is calculated in real-time by an AI engine. Sentifi’s AI engine scans 500 million news articles, blogs, forums, and tweets per day. It detects ESG events reported in these sources and updates the score accordingly based on the intensity and sentiment of the discussion around an ES event. The goal of this project was to develop a machine learning model that uses Sentifi’s ESG score and related features (such as ESG events, sentiment, attention, etc.) to select stocks in such a way that the respective portfolio outperforms the market.
process
 
Using Sentifi's ESG scores and attention data as features, Eduardo, Anselme and Ruben trained XGBoost models for predicting the expected performance of stocks. By using their model to pick the most promising stocks out of the S&P 500, they managed to outperform the base index over a period of 6 years by 20% and to outperform a random selection strategy by more than 45%. As the image below shows outperforming the market was archived while the portfolio selection remained diversified across sectors. Furthermore, by using machine learning explainability tools they could prove the relevance of Sentifi's ESG related features. Specifically, they could show how higher scores for environmental and social awareness correlate with better overall performance.
 
 

Contovista: Categorizing Card Transactions

Students:  Lingxuan Zhang, Juan Aguirre, Matthias Galipaud, Mevluet Polat
 
Many people have financial problems because they cannot properly control the amount of their daily expenses. How much did I spend last week on restaurants? Did I spend too much on transportation last year? Contovista's services help to answer these questions in a simple and automatic way.
 
Lingxuan, Juan, Matthias, and Mevluet developed an automatic analysis pipeline that processes card transactions data in order to classify payments into specific categories of goods. The transaction data itself provides very limited information about the involved merchant, but usually merchants have websites from which further information can be retrieved. The project team started by developing an ML model for verifying whether the web pages found by a search engine actually belonged to the right merchant. In the next step, they used state-of-the-art multilingual Deep Learning models for categorizing these websites. The project team and SIT Academy is proud that this work might help Contovista to further simplify online banking for all of us.

Overview Contovista

Interested in reading more about Constructor Academy and tech related topics? Then check out our other blog posts.

Read more
Blog