Abstract:
Innovation in technology is playing an important role in the development of food industry, as is indicated by the growing number of food review and food delivery applications. Similarly, it is expected that the process of producing and packaging food itself will become increasingly automated using a robotic system. The shift towards food automation would help ensure quality control of food products and improve production line efficiency. One key enabler for such automated system is the ability to detect and classify food object with great accuracy and speed.
In this study, we explore real-time food object segmentation using RGB-D depth camera. Instance segmentation based on 2D RGB data is used to classify Japanese food objects at a pixel-level with Cascade Mask R-CNN and Hybrid Task Cascade deep learning models. The model is trained on both local GPU and cloud service. The precision and recall values for classifying food objects under different scenario conditions are investigated. Furthermore, we construct 3D point cloud of food objects using depth information from the camera, which will help facilitate food automation operation such as precision grasping of food object with numerous shapes and sizes.
The result shows that the trained HTC model has better precision than Cascade Mask R-CNN model, albeit at a lower detection speed. The inference speed of both models monotonically decreases as the number of food objects and image resolution of the processed image increase. In addition, it is found that that the accuracy of the HTC detection can be quite sensitive to environmental factors such as background colors, low brightness, and having an incomplete object. The 2D segmentation result is combined with 3D point cloud extraction to realize real-time 3D segmentation of Japanese food objects with an average framerate of 6.71 fps.