Single View Depth Completion of Sparse 3D Reconstructions
Hloubkový obraz z jednoho pohledu a řídké 3D rekonstrukce
Authors
Supervisors
Reviewers
Editors
Other contributors
Journal Title
Journal ISSN
Volume Title
Publisher
České vysoké učení technické v Praze
Czech Technical University in Prague
Czech Technical University in Prague
Date
Files
Abstract
This work outlines a methodology to infer dense depth of a scene from an RGB image, and it’s corresponding sparse point cloud using an unsupervised training paradigm and combining it with a visual odometry algorithm such as ORB SLAM [2] in an offline step, to densify the sparse point clouds from its sparse mapping. The network consists of a sparse to dense module, and an encoder to create a 3D positional encoding of the image with a Calibrated Backprojection layer, and the decoder produces the dense depth map. This network is trained without supervision on the data from SLAM by minimizing the photometric reprojection error between frames. Inference is then run on the SLAM Keyframes and sparse depth from its corresponding keypoints to produce dense depth. With thed depth estimate, points from these Key-frames are then back-projected to the point cloud, thus resulting in a denser representation of the scene, especially in low-textured areas where the reconstruction from SLAM ususally fails.
This work outlines a methodology to infer dense depth of a scene from an RGB image, and it’s corresponding sparse point cloud using an unsupervised training paradigm and combining it with a visual odometry algorithm such as ORB SLAM [2] in an offline step, to densify the sparse point clouds from its sparse mapping. The network consists of a sparse to dense module, and an encoder to create a 3D positional encoding of the image with a Calibrated Backprojection layer, and the decoder produces the dense depth map. This network is trained without supervision on the data from SLAM by minimizing the photometric reprojection error between frames. Inference is then run on the SLAM Keyframes and sparse depth from its corresponding keypoints to produce dense depth. With thed depth estimate, points from these Key-frames are then back-projected to the point cloud, thus resulting in a denser representation of the scene, especially in low-textured areas where the reconstruction from SLAM ususally fails.
This work outlines a methodology to infer dense depth of a scene from an RGB image, and it’s corresponding sparse point cloud using an unsupervised training paradigm and combining it with a visual odometry algorithm such as ORB SLAM [2] in an offline step, to densify the sparse point clouds from its sparse mapping. The network consists of a sparse to dense module, and an encoder to create a 3D positional encoding of the image with a Calibrated Backprojection layer, and the decoder produces the dense depth map. This network is trained without supervision on the data from SLAM by minimizing the photometric reprojection error between frames. Inference is then run on the SLAM Keyframes and sparse depth from its corresponding keypoints to produce dense depth. With thed depth estimate, points from these Key-frames are then back-projected to the point cloud, thus resulting in a denser representation of the scene, especially in low-textured areas where the reconstruction from SLAM ususally fails.