Definition of a robust machine learning tool integrating metabolomic and genomic information, for biomarker discovery in high dimensional data. Obesity and T2DM in the GCAT cohort. HealthForecast
- Dr. Rafael de Cid (IGTP), Vicent Ripoll (EURECAT)
- Disease Genomics-GCATlab Group-The Institute for Health Science Research Germans Trias i Pujol (IGTP) – PMPPC. Can Ruti Biomedical Campus
HealthForecast aims to build a prototype Big Data platform that allows connecting, process and analysing data from the GCAT study. The joint analysis of this data will help correlations between genetic variants with common diseases such as cancer, cardiovascular disease or metabolic diseases, drug use, or acute episodes that have required hospitalization. During the last decade, the interest to apply machine learning algorithms to genomic data has signifficantly increased for a variety of bioinformatics applications. Analyzing this type of data entails tackling difficulties related to high-dimensionality and class imbalance for knowledge extraction and identifying important features. In this project we tackle those challenges by stacking different machine learning algorithms and techniques to be used for classification and identify relevant SNPs.