top of page

3D Perception

The third project focused on 3D perception using a PR2 robot simulation in Gazebo. A RGB-D camera was mounted built in to capture the color and depth of the objects. And then we used SVM to train a model in order to recognize the object in front of the robot. With each object identified, the robot was able to pick and place each object into the correct bin based on prior instructions.

There were several filtering and clustering techniques been used to reach the goal of segmentation and object recognition. The goal of this project is to convert sensor input into a point cloud image that can be used to identify different objects, allowing the PR2 to make decisions to pick and place the items. 

Filtering

The first step of this project is to filter the raw point cloud data. We want to remove the noise and only keep the data that is essential to our goal. By filtering the data we can also increase the speed of computation.

Outlier Filtering

Noise due to external factors like dust in the environment, humidity in the air, or presence of various light sources lead to sparse outliers which corrupt the results even more.

Such outliers lead to complications in the estimation of point cloud characteristics like curvature, gradients, etc. leading to erroneous values, which in turn might cause failures at various stages in our perception pipeline.

One of the filtering techniques used to remove such outliers is to perform a statistical analysis in the neighborhood of each point, and remove those points which do not meet a certain criteria. PCL’s StatisticalOutlierRemoval filter is an example of one such filtering technique. For each point in the point cloud, it computes the distance to all of its neighbors, and then calculates a mean distance.

By assuming a Gaussian distribution, all points whose mean distances are outside of an interval defined by the global distances mean+standard deviation are considered to be outliers and removed from the point cloud.

Voxel Grid Downsampling

Running computation on a full resolution point cloud can be slow and may not yield any improvement on results obtained using a more sparsely sampled point cloud. So, in many cases, it is advantageous to downsample the data. 

Original Image

We want to select a moderate when downsampling the data. From the picture above we can see that either too big or too small leaf size are not ideal. Either it removes too many information or the point cloud data still big. A good estimate of the voxel size can be obtained by having some prior information about the scene like size of the smallest objects, size of the target object, or total volume of the scene in Field of View.

Pass Through Filtering

If we have some prior information about the location of our target in the scene, we can apply a Pass Through Filter to remove useless data from our point cloud.

The Pass Through Filter works much like a cropping tool, which allows us to crop any given 3D point cloud by specifying an axis with cut-off values along that axis. The region we allow to pass through, is often referred to as region of interest.

After applied the passthrough filter along z axis. We removed some excess data that we don't need. Only kept the items on the top of the table.

RANSAC Plane Segmentation

Next, we want to remove the table itself from the scene. We utilized a popular technique Random Sample Consensus, RANSAC. The RANSAC algorithm assumes that all of the data in a dataset is composed of both inliers and outliers, where inliers can be defined by a particular model with a specific set of parameters, while outliers do not fit that model and hence can be discarded. Like in the example below, we can extract the outliners that are not good fits for the model.

After using RANSAC to fit the plane in the point cloud. We can separate objects from table.

Inliers: Points fit the plane equation therefore is the table.

Outliers: Points that don't fit the equation, hence the objects on it.

Clustering

After filtering out data we're not interested. We need to use clustering to distinguish the objects. There are 2 algorithm taught in this program, k-means and DBSCAN.

K-mean Clustering:

K-means clustering is able to group data points into n groups based on their distance to centroids. K-means clustering is a powerful tool but it also has its limitations. It requires prior knowledge of the number of objects we are trying to detect. 

DBSCAN:

DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. This algorithm is a nice alternative to k-means when you don' t know how many clusters to expect in your data, but you do know something about how the points should be clustered in terms of density (distance between points in a cluster). The DBSCAN algorithm creates clusters by grouping data points that are within some threshold distance d​ from the nearest other point in the data.

By assigning random colors to the isolated objects within the scene, I was able to generate the cloud of objects

Object Recognition

The last part of this project is to classify the objects! In order to do this, the system first needs to train a model to learn what each object looks like. Once it has this model, the system will be able to make predictions as to which object it sees. I implemented a machine learning tool SVM for object recognition.

RGB to HSV

In order to have robust representation in perception. We would like to change the color space from RBG to HSV which is less sensitive to light. The following picture shows the picture of the object represent in different color spaces. And we can see when the light gets darker, HSV representation still does a pretty good job differentiate the objects.

Color Histogram

One way to convert color information into features is by binning the color values into histogram. The number of bins used for the histogram changes how detailed each object is mapped, however too many bins will over-fit the object.

RGB signature of a blue can

Support Vector Machine

SVMs work by applying an iterative method to a training dataset, where each item in the training set is characterized by a feature vector and a label. For this project, I trained 300 examples per class and got an accuracy of 90.45%

Project Performance

World1
world2
world3
bottom of page