Abstract:
A tumour-specific neoantigen-based cancer vaccine is a potentially powerful treatment option, which utilises unique mutated peptides from tumour cells to boost the immune response and selectively attack cancer cells. Thus, the characterisation of the specifically targeted peptides that can be selectively recognised by the immune system is essential for this approach. However, a major problem in neoantigen prediction is obtaining false positives, leading to poor outcomes in clinical research and practice. This thesis aims to address some of the computational issues in neoantigen prediction, including developing more reliable statistics for assessing peptide binding to a major histocompatibility complex (MHC) protein and using machine learning to predict which peptides will generate an immune response. Specifically, the thesis introduces an approach for parameter estimation using the modified expectation maximisation (EM) framework with the method of moments for a two-component beta mixture model, representing the distribution of true and false scores from peptide binding prediction. The estimated parameters obtaining from the model can be further used for estimating false discovery rate (FDR) or a local peptide-level statistic such as the posterior error probability (PEP) to develop a robust method for MHC binding peptide selection. Next, the thesis introduces a new immunogenicity prediction model to classify immunogenic and non-immunogenic peptides using machine learning. A data set was assembled containing peptides classes as immunogenic and non-immunogenic peptides, and peptide features of physicochemical properties and homology features were used for constructing the Random Forest classifier for immunogenicity prediction. The two innovations were assembled into an end-to-end pipeline that provides the final probability described true MHC binding ability and the potential for immunogenicity. The final probability of MHC binding and T cell recognition provides a statistical framework to guide users in defining the appropriate thresholds, and prioritising peptides with the highest chance for being real neoantigens.