การเพิ่มประสิทธิภาพการเข้าถึงไฟล์ขนาดเล็กสาหรับฮาดูปอาร์ไคฟส์

จตุพร วรพงศ์กิติพันธ์

Please use this identifier to cite or link to this item: https://cuir.car.chula.ac.th/handle/123456789/43809

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	ณัฐวุฒิ หนูไพโรจน์	en_US
dc.contributor.author	จตุพร วรพงศ์กิติพันธ์	en_US
dc.contributor.other	จุฬาลงกรณ์มหาวิทยาลัย. คณะวิศวกรรมศาสตร์	en_US
dc.date.accessioned	2015-06-24T06:45:05Z
dc.date.available	2015-06-24T06:45:05Z
dc.date.issued	2556	en_US
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/43809
dc.description	วิทยานิพนธ์ (วศ.ม.)--จุฬาลงกรณ์มหาวิทยาลัย, 2556	en_US
dc.description.abstract	Hadoop Distributed File System หรือ HDFS เป็นระบบ open source ที่ถูกออกแบบมาเพื่อทำงานบน commodity hardware และเหมาะสำหรับการทำงานกับข้อมูลที่มีขนาดใหญ่ (terabytes) โดยมีโครงสร้างในการทำงานเป็นแบบ master-slaves ซึ่งจะมี NameNode ทำหน้าที่เป็น master จำนวน 1 ตัว ที่คอยทำหน้าที่ในการจัดการกับ metadata ต่างๆของ slaves ต่างๆที่อยู่ภายในระบบ ซึ่งทำให้ NameNode เกิดปัญหาที่เรียกว่าคอขวด โดยเฉพาะอย่างยิ่งเมื่อต้องคอยรองรับการทำงานของไฟล์ขนาดเล็กจำนวนมาก ทั้งนี้เพราะ NameNode จัดเก็บ metadata ทั้งหมดของ HDFS เอาไว้ใน main memory ซึ่งทำให้การใช้งาน memory ไม่มีประสิทธิภาพ เมื่อมีไฟล์ขนาดเล็กจำนวนมาก จากปัญหาข้างต้น ในงานวิจัยนี้จึงนำเสนอกลไกในการจัดการกับ memory ให้มีความเหมาะสมและเพิ่มประสิทธิภาพในการเข้าถึงไฟล์ขนาดบน HDFS ให้มีประสิทธิภาพที่ดีมากยิ่งขึ้น โดยนำหลักการในการทำงานของ Hadoop Archive หรือ HAR มาใช้เป็นพื้นฐานในการวิจัย โดยที่งานวิจัยนี้จะนำเสนอ Hadoop Archive ในรูปแบบใหม่ที่เรียกว่า New Hadoop Archive (NHAR) ซึ่งเป็นการปรับปรุงโครงสร้างการทำงานของ HAR ขึ้นมาใหม่เพื่อให้มีประสิทธิภาพในการเข้าถึงที่ดีมากยิ่งขึ้น นอกเหนือจากนี้ ในงานวิจัยนี้ยังเพิ่มความสามารถในการทำงานของ HAR โดยการปรับปรุงโครงสร้างการทำงานของ HAR ให้สามารถเพิ่มไฟล์ลงไปไฟล์ archive ที่มีอยู่แล้ว ซึ่งผลลัพธ์ที่ได้จากการทดลองแสดงให้เห็นว่า วิธีที่นำเสนอสามารถเพิ่มประสิทธิภาพในการเข้าถึงไฟล์ข้อมูลขนาดเล็กได้มากถึง 85.47% เมื่อทำการเปรียบเทียบกับการเข้าถึงไฟล์ขนาดเล็กของ HAR	en_US
dc.description.abstractalternative	The Hadoop Distributed File System or HDFS is an open source system which is designed to run on commodity hardware and is suitable for applications that have large data sets (terabytes). As HDFS architecture bases on master-slaves architecture. There is one NameNode that serves as master which handle metadata management for multiple slaves, NameNode often becomes bottleneck, especially when handling large number of small files. Since, NameNode stores the entire metadata of HDFS in its main memory. With too many small files, and the memory usage can be inefficient. In our approach, we propose a mechanism for improve the memory utilization for metadata and enhance the efficiency of accessing small files in HDFS based on Hadoop Archive or HAR, called New Hadoop Archive (NHAR) which re-design the architecture of HAR to improve the efficiency of accessing small files. In addition, we also extend HAR capabilities to allow additional files to be inserted into the existing archive files. Our experiment results show that our approach can to improve the access efficiencies of small files drastically as it outperforms HAR up to 85.47%.	en_US
dc.language.iso	th	en_US
dc.publisher	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.relation.uri	http://doi.org/10.14457/CU.the.2013.1271	-
dc.rights	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.subject	ระบบคอมพิวเตอร์
dc.subject	วิศวกรรมคอมพิวเตอร์
dc.subject	Computer systems
dc.subject	Computer engineering
dc.title	การเพิ่มประสิทธิภาพการเข้าถึงไฟล์ขนาดเล็กสาหรับฮาดูปอาร์ไคฟส์	en_US
dc.title.alternative	PERFORMANCE IMPROVEMENT OF SMALL-FILE ACCESS FOR HADOOP ARCHIVE	en_US
dc.type	Thesis	en_US
dc.degree.name	วิศวกรรมศาสตรมหาบัณฑิต	en_US
dc.degree.level	ปริญญาโท	en_US
dc.degree.discipline	วิศวกรรมคอมพิวเตอร์	en_US
dc.degree.grantor	จุฬาลงกรณ์มหาวิทยาลัย	en_US
dc.email.advisor	natawut.n@chula.ac.th	en_US
dc.identifier.DOI	10.14457/CU.the.2013.1271	-
Appears in Collections:	Eng - Theses

Files in This Item:

File	Description	Size	Format
5470136021.pdf		2.72 MB	Adobe PDF	View/Open

Show simple item record