Laser Radar Dynamic Obstacle Removal - Research and Prospects
As one of the most commonly used sensors in autonomous driving, LiDAR has excellent depth sensing characteristics, which has led to the widespread application of SLAM methods mainly based on laser SLAM. However, we have found that in densely populated and vehicular scenarios, point cloud localization often results in poor performance, which traditional filtering methods cannot solve. This article will mainly focus on deep learning methods to discuss the problem of LiDAR segmentation.
1. SLAM registration and mapping
Regardless of the point cloud registration method (point to point/point to feature/point to grid/NDT), it is based on static assumptions. In theory, dynamic points will definitely affect the accuracy of registration, and of course, this is also applicable for positioning in building maps. If the proportion of dynamic points in a frame is too high, it can cause a decrease in trajectory accuracy, and even the possibility of running and flying cannot be ruled out. At this level, dynamic points can only be identified and eliminated in real-time before or during registration.
If we believe that the interference of dynamic objects on registration is limited and does not significantly affect trajectory accuracy, we still cannot tolerate the "ghosting" of a large number of dynamic objects in the final generated map (as shown in the figure below), which will have adverse effects on later map based localization or map based feasible region planning (path planning).
Insert image description here
1.1 Traditional registration approach - predicting and filtering dynamic obstacles through clustering and Kalman filtering
Traditional methods, such as removing points that are too far away during registration iterations, typically involve the following object detection process:
Considering that there are multiple sensors working together in the vehicle, it is necessary to perform time synchronization and extrinsic calibration on the input laser point cloud.
Considering the sampling noise and large amount of point cloud data in LiDAR, it is necessary to preprocess the point cloud, reduce the amount of data, and remove noisy points.
Each frame of point cloud data contains a large number of ground points, and the purpose of detection is to obtain road obstacle information, which requires further segmentation of the point cloud on the ground.
The obstacle points on the ground are usually formed into multiple clusters using unsupervised clustering algorithms, with each cluster representing an obstacle.
Object recognition for clusters can be determined based on task requirements. If category information is required, a feature extraction+classifier approach can be used to classify obstacles.
Fit the bounding box for each cluster and calculate the obstacle properties, such as center point, centroid point, length, width, etc.
Build a Kalman filter for each obstacle to track and smooth the output, in order to determine whether it is moving.
Insert image description here
1.2 Traditional Registration Approach - Applying Submap for Precision Registration
RF-LIO: Remove First Tightly Coupled Lidar Inertial Dometry in High Dynamic Environments, which is a traditional matching filtering approach based on LIO-SAM. First elimination refers to the proposed RF-LIO first removing moving objects without accurate poses, and then using scan matching. When a new scan arrives, RF-LIO does not immediately perform scan matching to obtain accurate pose, as it is easily affected by dynamic environments. Instead, we use a tightly coupled inertial measurement unit (IMU) odometer to obtain rough initial state estimation, and RF-LIO can use adaptive resolution distance images to preliminarily remove motion points in the environment. After removing the motion points initially, RF-LIO uses scan matching to obtain a relatively more accurate pose. During the iterative process of precise registration, dynamic points in submaps are continuously detected and removed based on initial values and multi-resolution depth maps, ultimately achieving precise registration based on static submaps. Therefore, accurate poses can be obtained even in high dynamic environments. The experimental results show that compared with LOAM and LIO-SAM in high dynamic environments, the proposed RF-LIO can improve the absolute trajectory accuracy by 90% and 70%, respectively.
The paper can be found in the following link: https://pan.baidu.com/s/1GdwaNrH80mgem4xbZKdZ-Q#list/path=%2F
Extraction code: 384o
The overall framework of RF-LIO consists of three main modules: IMU pre integration, feature extraction, and mapping. Firstly, the IMU pre integration module is used to infer system motion and generate an IMU odometer. Then, the feature extraction module compensates for the motion distortion of the point cloud. Extract edge and plane features by evaluating the roughness of points.
The mapping module is the key module of our proposed method. To remove dynamic objects without accurate pose, there are several key steps:
The initial pose is obtained through the IMU odometer. Then, the error between IMU pre integration and scan matching is used to determine the initial resolution (i.e. how many FOV angles each pixel corresponds to).
RF-LIO uses this initial resolution to construct distance images from the current LiDAR scan and corresponding sub images, respectively.
Remove most of the dynamic points in the subgraph by comparing their visibility.
RF-LIO matches the laser radar scan with the sub image and determines whether the scan matching is converging. If it is convergent, after graph optimization, the remaining dynamic points in the current keyframe are removed with the final high resolution. Otherwise, a new resolution will be generated and steps 2, 3, and 4 will be repeated.
Insert image description here
1.3 Modern Registration Approach - Implementing Dynamic Object Recognition through Deep Learning
The current more popular approach is to directly recognize dynamic objects and remove point clouds based on deep learning.
Dynamic Object Aware LiDAR SLAM based on Automatic Generation of Training Data. The author conducts real-time 3D dynamic object detection based on deep learning (3D-MiniNet network), filters out dynamic objects, and feeds the point cloud to LOAM for conventional laser SLAM.

In order to overcome the problem of dynamic obstacles and support the deployment of robots in real-world scenarios, the article proposes a laser radar SLAM algorithm for dynamic object perception. A novel end-to-end grid pipeline is proposed in the article, which can automatically label various arbitrary dynamic objects.
Insert image description here
From the results, we can roughly see that it can effectively segment dynamic obstacles.
Insert image description here
2. Dynamic object filtering
2.1 Classification of Environmental Objects
All objects in the environment are classified into four categories based on their degree of dynamism:
High dynamic objects: Real time moving objects such as pedestrians, vehicles, running pets
Low dynamic object: an object that briefly stops, such as a person standing on the roadside chatting briefly
Semi static object: an object that remains stationary during a SLAM cycle, but is not always stationary, such as vehicles in parking lots, stacked materials, temporary sheds, temporary walls, or temporary stages built in shopping malls
Static objects: objects that remain stationary forever, such as buildings, roads, curbs, traffic signal poles
Except for static objects, the other three types of objects have varying degrees of dynamic properties and their coping strategies are also different:
For highly dynamic objects: online real-time filtering
For low dynamic objects: After the completion of a SLAM process, filter through post-processing methods
For semi-static objects: life long mapping, or long-term mapping)
2.2 Real time point cloud filtering
Real time dynamic point cloud filtering must refer to frames to compare dynamic points. The previous section mainly discussed the operation of dynamic point cloud filtering, which is mainly divided into traditional filtering methods and deep learning filtering methods. The basic idea in deep learning is to first segment dynamic obstacles through deep learning, and then put the segmented point cloud data into SLAM.
This article mainly elaborates on a deep learning based 3D LiDAR MOS dynamic object segmentation method. The paper proposes a method that combines deep learning and spatiotemporal information to achieve 3D LiDAR dynamic object segmentation and improve the accuracy of LiDAR SLAM positioning and mapping. This is the corresponding code: https://github.com/PRBonn/LiDAR-MOS .
Insert image description here
In this work, our goal is to perform dynamic object segmentation (LiDAR MOS) on LiDAR data. In this work, unlike point cloud semantic segmentation, our task is not to predict the semantic categories of point clouds, such as vehicles, roads, buildings, etc., but to focus more on dividing the scene into two parts: one is the actual moving objects, such as moving cars, pedestrians, and the other is static objects, such as parked cars and static backgrounds, such as roads and buildings. We propose a new deep learning based method that utilizes LIDAR range images, which has a very fast computational speed and can achieve online real-time point cloud dynamic object segmentation.
Insert image description here
The above is an overview diagram of the method. We use distance based LiDAR representation and neural networks to achieve online dynamic object segmentation. Given the current LiDAR observations and past LiDAR data, we first generate a "residual image" between the past LiDAR data and the current LiDAR observations. Through this method, we can obtain temporal sequence information. After generating the residual image, we connect it with the current scan and use it as input for the neural network. Then we use the proposed dynamic object binary classification label to train the neural network, which only includes two categories: moving and non moving. Ultimately, the proposed method can achieve detection and separation of moving and stationary objects in LiDAR data.
Insert image description here
2.2 Post processing point cloud pass rate
The post-processing method can identify dynamic points by using all frames within the entire SLAM cycle as reference information, as it does not require real-time considerations. Compared to real-time methods, post-processing methods prioritize the accuracy and adequacy of dynamic point cloud filtering.
Based on future processing methods, common dynamic object filtering methods can be divided into three typical categories: segmentation based, ray casting based, and visibility based
The segmentation based method is no different from the previous one. The task of semantic segmentation is to assign a category label to each position in the scene (each pixel in the image, or every point in the point cloud), such as vehicles, pedestrians, roads, buildings, etc. Essentially, it is still an operation on point cloud images, obtaining dynamic points through segmentation processing. Of course, there are also ground segmentation methods integrated into OctoMap to achieve segmentation of moving objects and static road backgrounds: Mapping the Static Parts of Dynamic Scenes
from 3D LiDAR Point Clouds Exploiting Ground Segmentation
Insert image description here
Ray casting based methods are mainly based on grid based dynamic filtering, which use the situation where the grid is hit and passed through to determine whether it is dynamic. The PeopleRemover - Removing Dynamic Objects from 3-D Point Cloud Data by Traversing a Voxel Occupancy Grid The author of the article proposes the ray casting based method, which has a simple idea: if a grid is hit by a laser first and then passed through by the light path of other laser points (miss), or intersect, or see-through), So this grid is a dynamic grid, and the point cloud it contains will be filtered out as a dynamic point cloud. As shown in the purple grid on the right of the following image. And a series of tricks are used to avoid the common problems of such methods, such as excessive time consumption, accidental or missed killing
Insert image description here
The basic idea of visibility based is to project a query scan into a depth map, and then project the submap near the query scan into a depth map from the same viewpoint. Compare the pixel depths at the same position on two depth maps. If the latter depth is shallower, the corresponding point on the submap for that pixel position is a dynamic point (if the front point obscures the back point, the front point is a dynamic point). Remove, then Revert: Static Point cloud Map Construction using Multisolution Range Images This article uses this as the basic principle, makes many improvements, and uses a coarser resolution depth map comparison to restore mistakenly killed static points. This article provides a reference for methods based on viewpoint visibility (or depth map).
Insert image description here
2.3 Life long mapping
The core issue of life long mapping goes far beyond dynamic/semi static object filtering. Dynamic/semi static object filtering is only a part of the map fusion between different sessions in the life long process. LT mapper: A Modular Framework for LiDAR based Lifelong Mapping proposes a long term point cloud mapping system
Its basic structure is as follows:
Multi session SLAM optimization
Diff detection of point cloud maps constructed at different times
Map updates and long-term map management
Insert image description here
Multi-session SLAM:
The point cloud map of each session is constructed through keyframes, and anchor node detection is performed on keyframes of different sessions. The closed-loop factor constructed based on anchor frames is used to correct the offset between multiple sessions. While ensuring the optimal position for a single session, the positions between multiple sessions are also aligned;
Diff detection:
Firstly, dynamic point detection will be performed on each frame of the new session's point cloud. Dynamic points will be divided into two types: high dynamic (HD) and low dynamic (LD). High dynamic points will be directly removed after a single mapping is completed, while low dynamic points will be distinguished based on the kd tree threshold.
Map updates and long-term map management
Construct two types of static maps: remove the meta map of weak PD and preserve the live map of weak PD. The examples of meta maps and live maps are shown in Figure 3, where the latest representation of the scene will be effectively maintained. In the meta map, non volume maximizing points are iteratively deleted (red box), while other permanent structures are preserved