Federated genetic data analysis using DATASHIELD
- Juan R González
The GCAT cohort includes sensible data, as administrative health records and genomic information, that is used to define genetic risk on studied condition. However, in complex diseases, small gene effect requires meta-analysis of large cohorts to find significant findings.
Assembling large cohorts is not feasible due the huge effort and joint analysis using summary statistics do not allow a fine and deep analysis, due the summary nature of the data, and a framework analysis is needed to allow cohort analysis and integration of sensible data, as genome and clinical data, preserving required privacy but allowing to address the scientific questions to GCAT.
We will test the scientific, practical and legal advantages and challenges of privacy-preserving technologies through the implementation of genetic data analysis using DataSHIELD, a federated data infrastructure for health data, where researchers have non-disclosive access to sensible personaldata (i.e summarized data) within secure processing environments, hosted by independent research organizations.