Abstract:
Human genomic research has been concentrated in populations of European descent resulted in large portion of the global populations, including Thais, underrepresented. The bias in representation limited transferability of genetics findings to understudied populations and exacerbate health disparities. This study aims to examine medically relevant genetic variation in Thai population uses whole genome sequences. The study examined prevalence of pharmacogenomics variants (part I), variant associated with autosomal recessive disorder (part II) and risk alleles recently identified to associate with severe COVID-19 infection symptoms (part III). The study further examined the effect of genetic variation in Thais on reference panel selection for genotype imputation (part IV). In pharmacogenomics, over 25% of Thais carried a high-risk diplotype in CYP3A5, CYP2C19, CYP2D6, NAT2, SLCO1B1, and UGT1A1 genes. Allele frequencies of CYP3A5*3 (rs776746), CYP2B6*6 (rs2279343), and NAT2 (rs1041983) were significantly higher in Thais than East-Asian and global populations. 121 variants, which is unreported, have potential to exert clinical impact, majority were rare and population-specific, with 60.3% of variants absent from gnomAD database. In examining variants associated with autosomal recessive disorder, 263 likely pathogenic/pathogenic variants were identified with 6 well-established pathogenic variants have carrier rate of higher than 0.01. Analysis of variant distribution based on genetics structure shows significant enrichment of pathogenic variants associated with thalassemia, galactosaemic and deafness in some subpopulation. When examined prevalence of severe COVID-19 risk alleles, the frequency of risk allele at 3p21.31 locus, which was highly correlated with disease severity and replicated in multiple studies, found to differs vastly among Southeast Asians. Allele frequencies ranging from 0.21 in the Filipino population to 0.06 in the Thai population and are extremely rare in Northeast Asians. Lastly, the choice of reference panel showed to strongly affect imputation performance. While imputation using the TOPMed panel yielded the largest number of variants (~271 million), GenomeAsia 100K achieved the best imputation accuracy with a median genotype concordance rate of 0.97. GenomeAsia 100K also offered the best accuracy for rare variants with 30.3% reduction in concordance rates. In conclusion, this study reports genetic variations in Thai that are clinically relevance in different fields of medical science. This study findings provide an essential information that have wide range of application from the design of genetic testing through to conducting genomic research. In addition to the prevalence of multiple variants in Thai found to differ from other global populations, large number of the variants identified are population-specifics. This stresses the importance of constructing Thai genetic database with larger sample size to enable a better understanding of low frequencies and rare variants in the population that often exert higher clinical impact.