Technique for predicting an ambiguous nucleotide symbol in a dna sequence

Kitiporn Plaimas

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/3607

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Chidchanok Lursinsap	-
dc.contributor.author	Kitiporn Plaimas	-
dc.date.accessioned	2007-07-04T03:06:05Z	-
dc.date.available	2007-07-04T03:06:05Z	-
dc.date.issued	2004	-
dc.identifier.isbn	9741764987	-
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/3607	-
dc.description	Thesis (M.Sc.)--Chulalongkorn University, 2004	en
dc.description.abstract	DNA sequences obtained from a DNA sequencer usually contain some ambiguous symbol N, which can be interpreted as either A, or T, or C, or G. This ambiguity can effect the informative analysis of the DNA sequence. This research focused on transforming this problem to a problem of recognizing a prefix sequence of symbol N. By our assumption that nucleotides and their positions may be related to their neighboring nucleotides, the relative positions are used as the feature of the sequence during the learning and recognizing processes of a neural network for each nucleotide. However, recognizing these features from a training set may take a lengthy time. The problem of increasing the training speed in forms of parallel recognition was also investigated. Experimenting on four Eschericia coli genomes, we selected similar regions of about 40,000 bases from any regions. Each region can train an artificial neural network to recognize all similarity and predictthe actual symbol of N. From random query testing sets, the recognition accuracy is more than 80%.	en
dc.description.abstractalternative	ลำดับดีเอ็นเอหรือลำดับของนิวคลีโอไทด์ A, T, C และ G ที่สกัดมาจากเซลล์ของสิ่งมีชีวิตโดยเครื่องอ่านลำดับดีเอ็นเอ อาจให้ลำดับดีเอ็นเอไม่สมบูรณ์ ที่มีบางลำดับของดีเอ็นเอเป็นสัญลักษณ์ที่คลุมเครืออย่างสัญลักษณ์ N ที่หมายถึง A, T, C หรือ G ในงานวิจัยนี้ได้ศึกษาหาวิธีการแก้ปัญหาดังกล่าวไปเป็นปัญหาการรู้จำลำดับก่อนหน้าของสัญลักษณ์ N ด้วนสมมุติฐานที่ว่านิวคลีโอไทด์แต่ละตำแหน่งในลำดับดีเอ็นเอย่อมมีความสัมพันธ์กับนิวคลีโอไทด์ในบริเวณข้างเคียง ดังนั้นตำแหน่งที่สัมพันธ์กันของนิวคลีโอไทด์จึงเป็นรูปแบบหลักที่ใช้ในการสอนและรู้จำของโครงข่ายประสาทเทียมอย่างไรก็ตาม การรู้จำคุณลักษณะทั้งหมดของข้อมูลที่ใช้สอนจะใช้เวลานาน ดังนั้นเราจึงพิจารณาถึงการเพิ่มความเร็วของการรู้จำแบบขนานด้วย และได้ทำการทดสอบกับจีโนมของแบคทีเรียอีโคไลทั้งหมด 4 สารพันธุ์ โดยสุ่มเลือกบริเวณที่มีลำดับบริเวณที่ลำดับการเรียงตัวของนิวคลีโอไทด์ใกล้เคียงกันในดีเอ็นเอด้วยความยาวกว่า 4 หมื่นเบสมาหลายๆ บริเวณด้วยกันโดยไม่คำนึงถึงและอิทธิพลการเกิดนิวคลีโอไทด์ตัวถัดไปได้เพื่อทำนายสัญลักษณ์ที่แท้จริงของ N ได้ ดังนั้นเมื่อทดลองสุ่มข้อมูลเพื่อทดสอบการทำนายของโครงข่ายประสาทเทียมแล้วให้ความถูกต้องในการรู้จำมากกว่า 80%	en
dc.format.extent	914416 bytes	-
dc.format.mimetype	application/pdf	-
dc.language.iso	en	en
dc.publisher	Chulalongkorn University	en
dc.rights	Chulalongkorn University	en
dc.subject	Nucleotide sequence	en
dc.subject	Neural networks (Computer sciences)	en
dc.title	Technique for predicting an ambiguous nucleotide symbol in a dna sequence	en
dc.title.alternative	เทคนิคการทำนายสัญลักษณ์นิวคลีโอไทด์ที่คลุมเครือในลำดับดีเอ็นเอ	en
dc.type	Thesis	en
dc.degree.name	Master of Science	en
dc.degree.level	Master's Degree	en
dc.degree.discipline	Computational Science	en
dc.degree.grantor	Chulalongkorn University	en
Appears in Collections:	Sci - Theses

Files in This Item:

File	Description	Size	Format
Kitiporn.pdf		1.21 MB	Adobe PDF	View/Open

Show simple item record