We want to take a deeper look at how to apply deep learning methods to the human genome. At the moment this is mostly done by looking at small sections of the human genome [2] but not at the whole genome at once. In this master thesis we would crawl publicly available datasets [1] with regards towards SNP data and various medical information. Then we would combine this, do pre-training with various architectures [3] and analyze the embedding space/the results.
Keywords: Genetics, Deep Learning, Transformers, Diffusion Models
Literature
publicy available data https://www.personalgenomes.org/
DNA Diffusion https://github.com/pinellolab/DNA-Diffusion
Vision Transformer (ViT) : https://arxiv.org/abs/2010.11929