AI ALGORITHMS FOR GENOMIC DATA ANALYSIS AND DISEASE RISK PREDICTION
Abstract
Over the past two decades, genomic and omics data generation has increased exponentially. High-throughput sequencing technologies, GWAS, and large-scale biobank projects have produced datasets containing millions of genetic variants across diverse human populations. These data promise deep insights into genetic contributions to disease risk, progression, and therapeutic response. However, conventional statistical models — like linear or logistic regression — frequently fail to capture complex genotype–phenotype relationships, epistasis (gene–gene interactions), nonlinearity, and the influence of regulatory or epigenetic factors (especially for complex diseases) (Cordell, 2009).
