Improvement of selection criteria and prioritisation for neoantigen prediction

Phorutai Pearngam

dc.contributor.advisor	Trairak Pisitkun
dc.contributor.advisor	Sira Sriswasdi
dc.contributor.advisor	Thanyada Rungrotmongkol
dc.contributor.author	Phorutai Pearngam
dc.contributor.other	Chulalongkorn University. Graduate School
dc.date.accessioned	2023-02-03T03:53:46Z
dc.date.available	2023-02-03T03:53:46Z
dc.date.issued	2021
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/81453
dc.description	Thesis (Ph.D.)--Chulalongkorn University, 2021
dc.description.abstract	A tumour-specific neoantigen-based cancer vaccine is a potentially powerful treatment option, which utilises unique mutated peptides from tumour cells to boost the immune response and selectively attack cancer cells. Thus, the characterisation of the specifically targeted peptides that can be selectively recognised by the immune system is essential for this approach. However, a major problem in neoantigen prediction is obtaining false positives, leading to poor outcomes in clinical research and practice. This thesis aims to address some of the computational issues in neoantigen prediction, including developing more reliable statistics for assessing peptide binding to a major histocompatibility complex (MHC) protein and using machine learning to predict which peptides will generate an immune response. Specifically, the thesis introduces an approach for parameter estimation using the modified expectation maximisation (EM) framework with the method of moments for a two-component beta mixture model, representing the distribution of true and false scores from peptide binding prediction. The estimated parameters obtaining from the model can be further used for estimating false discovery rate (FDR) or a local peptide-level statistic such as the posterior error probability (PEP) to develop a robust method for MHC binding peptide selection. Next, the thesis introduces a new immunogenicity prediction model to classify immunogenic and non-immunogenic peptides using machine learning. A data set was assembled containing peptides classes as immunogenic and non-immunogenic peptides, and peptide features of physicochemical properties and homology features were used for constructing the Random Forest classifier for immunogenicity prediction. The two innovations were assembled into an end-to-end pipeline that provides the final probability described true MHC binding ability and the potential for immunogenicity. The final probability of MHC binding and T cell recognition provides a statistical framework to guide users in defining the appropriate thresholds, and prioritising peptides with the highest chance for being real neoantigens.
dc.description.abstractalternative	นีโอแอนติเจนคือเส้นเปปไทด์ที่มีตำแหน่งกลายพันธ์ที่จำเพาะต่อเนื้อเยื่อมะเร็งของผู้ป่วย วัคซีนมะเร็งที่พัฒนาจากนีโอแอนติเจนเป็นหนึ่งในทางเลือกสำหรับการรักษาโรคมะเร็งที่มีประสิทธิภาพ เพราะการใช้เปปไทด์กลายพันธุ์ที่มีลักษณะเฉพาะจากเซลล์มะเร็ง สามารถเพิ่มการตอบสนองภูมิคุ้มกันของผู้ป่วยและไปทำลายเซลล์มะเร็งได้อย่างแม่นยำ ดังนั้นการระบุว่าเส้นเปปไทด์นั้นๆสามารถเป็นนีโอแอนติเจนได้หรือไม่ จึงมีความสำคัญอย่างมากในการพัฒนาวัคซีนมะเร็ง ซึ่งปัญหาหลักในการทำนายนีโอแอนติเจนคือมีอัตราเสี่ยงสูงที่จะได้ผลการทำนายที่เป็นผลบวกปลอม (False Discovery Rate, FDR) คือการที่ได้เส้นเปปไทด์ที่มีคะแนนการทำนายความสมารถในการเป็นนีโอแอนติเจนได้ดีเยี่ยม แต่ไม่สามารถจับกับ MHC โปรตีนได้ หรือไม่มีความสามารถในการกระตุ้นภูมิคุ้มกัน ซึ่งความผิดพลาดในขั้นตอนการทำนายนี้จะทำให้ผลการทดลองในระดับห้องปฏิบัติการหรือระดับคลินิกมีความคลาดเคลื่อน งานวิจัยนี้จึงมีจุดมุ่งหมายที่จะแก้ไขปัญหาการทำนายนีโอแอนติเจนด้วยวิธีการทางคอมพิวเตอร์ โดยการพัฒนาโมเดลที่สามารถคำนวณค่า FDR จากผลการทำนายค่าที่บ่งบอกความสามารถในการจับกันระหว่างเส้นเปปไทด์และ MHC โปรตีน การคำนวณค่า FDR จะใช้วิธีการเรียนรู้จากการกระจายตัวของข้อมูลผลการทำนาย และใช้หลักการทางคณิตศาสตร์ (Expectation Maximisation) ในการประเมิณค่าพารามิเตอร์ทางสถิติที่สอดคล้องกับการกระจายตัวของข้อมูลนั้น นอกจากนี้ ในวิทยานิพนธ์ฉบับนี้ยังได้ศึกษาและพัฒนาโมเดลที่ใช้ทำนายความสามารถการกระตุ้นภูมิของเส้นเปปไทด์ โดยใช้วิธีการทางคอมพิวเตอร์ที่เรียกว่า Machine Learning ซึ่งในงานวิจัยนี้ใช้การคำนวณแบบ Random Forest โมเดลจะเรียนรู้จากชุดข้อมูลที่ประกอบไปด้วยเส้นเปปไทด์ที่มีความสามารถในการกระตุ้นภูมิคุ้มกันและเส้นเปปไทด์ที่ไม่สามารถกระตุ้นภูมิคุ้มกันได้ ซึ่งผลการทำนายจากโมเดลนี้จะระบุคะแนนที่บ่งบอกถึงความเป็นไปได้ที่เส้นเปปไทด์จะสามารถกระตุ้นภูมิคุ้มกันได้ เมื่อรวมคะแนนจากค่า FDR และความน่าจะเป็นของความสามารถในการกระตุ้นภูมิ คะแนนจากผลรวมนั้นจะบ่งบอกถึงความน่าจะเป็นของเส้นเปปไทด์ในการจับกับ MHC โปรตีนและความสามารถในการกระตุ้นภูมิคุ้มกัน ซึ่งค่าคะแนนรวมนี้จะช่วยให้สามารถคัดเลือกเส้นเปปไทด์ที่จะสามารถเป็นนีโอแอนติเจนได้อย่างมีประสิทธิภาพและลดความผิดพลาดในกระบวนการทำนายนีโอแอนติเจนได้
dc.language.iso	en
dc.publisher	Chulalongkorn University
dc.relation.uri	http://doi.org/10.58837/CHULA.THE.2021.16
dc.rights	Chulalongkorn University
dc.subject.classification	Multidisciplinary
dc.title	Improvement of selection criteria and prioritisation for neoantigen prediction
dc.title.alternative	การพัฒนาวิธีการคัดเลือก และการจัดอันดับผลการทำนายจากวิธีการทางคอมพิวเตอร์เพื่อระบุเปปไทด์ที่มีความสามารถในการเป็นนีโอแอนติเจน
dc.type	Thesis
dc.degree.name	Doctor of Philosophy
dc.degree.level	Doctoral Degree
dc.degree.discipline	Bioinformatics and Computational Biology
dc.degree.grantor	Chulalongkorn University
dc.identifier.DOI	10.58837/CHULA.THE.2021.16