Real-time instance segmentation and point cloud extraction for Japanese food using RGB-D camera

Suthiwat Yarnchalothorn

dc.contributor.advisor	Nattapol Damrongplasit
dc.contributor.advisor	Hayashi Eiji
dc.contributor.author	Suthiwat Yarnchalothorn
dc.contributor.other	Chulalongkorn University. Faculty of Engineering
dc.date.accessioned	2021-09-22T23:39:26Z
dc.date.available	2021-09-22T23:39:26Z
dc.date.issued	2020
dc.identifier.uri	http://cuir.car.chula.ac.th/handle/123456789/77290
dc.description	Thesis (M.Eng.)--Chulalongkorn University, 2020
dc.description.abstract	Innovation in technology is playing an important role in the development of food industry, as is indicated by the growing number of food review and food delivery applications. Similarly, it is expected that the process of producing and packaging food itself will become increasingly automated using a robotic system. The shift towards food automation would help ensure quality control of food products and improve production line efficiency. One key enabler for such automated system is the ability to detect and classify food object with great accuracy and speed. In this study, we explore real-time food object segmentation using RGB-D depth camera. Instance segmentation based on 2D RGB data is used to classify Japanese food objects at a pixel-level with Cascade Mask R-CNN and Hybrid Task Cascade deep learning models. The model is trained on both local GPU and cloud service. The precision and recall values for classifying food objects under different scenario conditions are investigated. Furthermore, we construct 3D point cloud of food objects using depth information from the camera, which will help facilitate food automation operation such as precision grasping of food object with numerous shapes and sizes. The result shows that the trained HTC model has better precision than Cascade Mask R-CNN model, albeit at a lower detection speed. The inference speed of both models monotonically decreases as the number of food objects and image resolution of the processed image increase. In addition, it is found that that the accuracy of the HTC detection can be quite sensitive to environmental factors such as background colors, low brightness, and having an incomplete object. The 2D segmentation result is combined with 3D point cloud extraction to realize real-time 3D segmentation of Japanese food objects with an average framerate of 6.71 fps.
dc.description.abstractalternative	ในปัจจุบันนวัตกรรมส่งผลให้เกิดการพัฒนาอุตสาหกรรมอาหาร สังเกตได้จากความนิยมที่เพิ่มขึ้นของการวิจารณ์อาหารบนอินเตอร์เน็ตและธุรกิจการจัดส่งอาหารแบบรวดเร็ว ในทำนองเดียวกันกระบวนการผลิตและกระบวนการบรรจุอาหารใส่บรรจุภัณฑ์จะเปลี่ยนจากใช้แรงงานคนเป็นอัตโนมัติโดยใช้หุ่นยนต์เข้ามาแทนที่อย่างแพร่หลาย การเปลี่ยนเปลงนี้จะทำให้ผู้ผลิตสามารถควบคุมคุณภาพอาหารและเพิ่มประสิทธิภาพในกระบวนการผลิตได้ อย่างไรก็ตามปัจจัยที่สำคัญอย่างหนึ่งที่จะทำให้สิ่งนี้เป็นไปได้คือความสามารถในการตรวจจับและแยกประเภทของอาหารจากภาพถ่ายอย่างแม่นยำด้วยความเร็วสูง ในงานวิจัยนี้เราจะศึกษาการตรวจจับวัตถุอาหารแบบทันทีโดยใช้ภาพจากกล้องวัดความลึก วิธีที่เลือกใช้คือการตรวจจับวัตุในระดับพิกเซลโดยใช้การเรียนรู้แบบอัตโนมัติที่มีโครงข่ายประสาทหลายชั้นเพื่อตรวจจับชิ้นอาหารญี่ปุ่นในระดับพิกเซล ในที่นี้จะใช้แบบจำลอง 2 แบบ คือ Cascade Mask R-CNN และ Hybrid Task Cascade โดยแบบจำลองทั้งหมดจะเรียนรู้ด้วยตัวมันเองบนทั้งหมดสองแพลตฟอร์ม คือ บนเครื่องคอมพิวเตอร์ และบนบริการคลาวด์ จากนั้นได้ทำการศึกษาแบบจำลองที่สร้างขึ้นในสภาวะต่าง ๆ นอกจากนี้จะนำข้อมูลความลึกที่ได้จากกล้องมาประสานกับข้อมูลการตรวจจับวัตถุที่ได้จากขั้นตอนแรกเพื่อสกัดข้อมูลพิกัดสามมิติของวัตถุอาหารซึ่งจะสามารถนำมาใช้ประโยชน์ในกระบวนการผลิตอาหารแบบอัตโนมัติ เช่น การหยิบและวางชิ้นอาหารซึ่งมีรูปร่างและขนาดที่หลากหลายได้อย่างแม่นยำ จากผลการทดลองพบว่าแบบจำลอง HTC มีความแม่นยำสูงกว่าแบบจำลอง Cascade Mask R-CNN บนทั้งสองแพลตฟอร์มที่ใช้ในการเรียนรู้อัตโนมัติ แต่ในทางกลับกันแบบจำลอง HTC จะมีความเร็วในการตรวจจับที่ช้ากว่า จากนั้นยังพบว่าความเร็วในการตรวจจับวัตถุของทั้งสองแบบจำลองมีแนวโน้มจะลดลงเมื่อจำนวนวัตถุในภาพเพิ่มขึ้นและเมื่อความละเอียดของภาพเพิ่มขึ้น ยิ่งไปกว่านั้นผลการทดลองแสดงให้เห็นว่าการเปลี่ยนแปลงสภาพแวดล้อม ได้แก่ การเปลี่ยนสีพื้นหลัง การปรับลดความสว่าง การวางวัตถุอาหารซ้อนทับ และการใช้อาหารที่ไม่สมบูรณ์ ส่งผลให้ความแม่นยำของแบบจำลอง HTC ลดลง หลังจากนั้นได้ทำการสกัดพิกัดสามมิติของวัตถุอาหารออกมาโดยมีความเร็วเฉลี่ยอยู่ที่ 6.71 เฟรมต่อวินาที
dc.language.iso	en
dc.publisher	Chulalongkorn University
dc.relation.uri	http://doi.org/10.58837/CHULA.THE.2020.148
dc.rights	Chulalongkorn University
dc.subject.classification	Engineering
dc.subject.classification	Engineering
dc.subject.classification	Computer Science
dc.title	Real-time instance segmentation and point cloud extraction for Japanese food using RGB-D camera
dc.title.alternative	การตรวจจับวัตถุในระดับพิกเซลแบบทันทีและการสกัดพิกัดสามมิติสำหรับอาหารญี่ปุ่นโดยใช้กล้อง RGB-D
dc.type	Thesis
dc.degree.name	Master of Engineering
dc.degree.level	Master's Degree
dc.degree.discipline	Cyber-Physical System
dc.degree.grantor	Chulalongkorn University
dc.identifier.DOI	10.58837/CHULA.THE.2020.148