Universität Bielefeld Play

[MA]

Diffusion Models/modern Deep Learning architectures and their specific application for genetic data

Contact: Philip Kenneweg

We want to take a deeper look at how to apply deep learning methods to the human genome. At the moment this is mostly done by looking at small sections of the human genome [2] but not at the whole genome at once. In this master thesis we would crawl publicly available datasets [1] with regards towards SNP data and various medical information. Then we would combine this, do pre-training with various architectures [3] and analyze the embedding space/the results.

Keywords: Genetics, Deep Learning, Transformers, Diffusion Models

Literature

  1. publicy available data https://www.personalgenomes.org/

  2. DNA Diffusion https://github.com/pinellolab/DNA-Diffusion

  3. Vision Transformer (ViT) : https://arxiv.org/abs/2010.11929