Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/138259
Type: Thesis
Title: Deep Learning for Robotic Scene Understanding
Author: Sun, Libo
Issue Date: 2022
School/Discipline: School of Computer Science
Abstract: Scene understanding is a complex yet essential task for intelligent robots. However, how to achieve reliable scene understanding is still a challenging problem. With the widely successful application of deep learning, many breakthroughs have been witnessed in various areas. In this thesis, we aim to investigate how to use deep learning-based methods to significantly improve the scene understanding ability of robots. Specifically, our work involves four fundamental robotic scene understanding subtasks, namely road detection, semantic segmentation, depth estimation, and visual odometry (VO). We present details of how to use proposed deep learning-based approaches to improve the performance of these subtasks. To begin with, as drivable area detection is a critically important task for autonomous driving and robotics, we propose a road detection method which can reduce device reliance while maintaining performance. Unlike previous road detection methods that rely on LiDAR, our method can obtain state-of-the-art performance with RGB images only. In our framework, we exploit a pseudo-LiDAR using monocular depth estimation and propose a feature fusion network to fuse RGB and pseudo- LiDAR information. To optimize the network architecture and improve the efficiency of our network, we propose a method to search for the information propagation paths. Additionally, we design a modality distillation strategy which can significantly reduce network parameters and inference time. Furthermore, because autonomous vehicles and robots are commonly equipped with stereo cameras to capture binocular images, we propose a stereo vision-based semantic segmentation framework which enables current monocular architectures to exploit stereo image data to improve semantic segmentation performance. The improvements are obtained via two approaches: label generation and pre-training, and stereo vision-based information fusion. Comprehensive experiments using different well-known semantic segmentation architectures on different datasets demonstrate the efficacy of our method. Finally, to obtain better 3D scene understanding, we propose a framework to exploit monocular depth estimation for improving monocular VO. The core of this framework is a monocular depth estimation module with a strong generalization capability for diverse scenes. It consists of two separate working modes to assist the localization and mapping. With a single monocular image input, the depth estimation module predicts a relative depth to help the localization module on improving the accuracy. With a sparse depth map and an RGB image input, the depth estimation module can generate accurate scale-consistent depth for dense mapping. Compared with current learning-based VO methods, our method demonstrates a stronger generalization ability to diverse scenes. More significantly, our framework is able to boost the performances of existing geometry-based VO methods by a large margin.
Advisor: Shen, Chunhua
Liu, Yifan
Pang, Guansong
Dissertation Note: Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 2022
Keywords: Deep learning
neural networks
robotic scene understanding
Provenance: This thesis is currently under Embargo and not available.
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
Sun2022_PhD.pdf
  Restricted Access
Library staff access only.29.33 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.