|
CSC2539 Assignment Multiple object tracking Feb 9, 2012
Wael Louis |
|
Objective: given six video sequences with moving people, track each person while keeping the identity of each of them.
Videos: The 6 videos has the following properties: |
|
Approach:
To approach this problem, I have considered three steps: preprocessing, head detection, and tracking. My idea was to put a big emphasis on object detection in order to give a clear segment of the moving object to the tracker. I assumed that the clearer the input to the tracker the better the tracking, even though the tracker are usually designed to handle some noise, but it is better to avoid such noise if possible.
Detection stage:
Initial experiments were conducted by applying the provided HoG head detector only. However, several false positive detections occurred. Apart from that, I needed some information about the moving person such as size, bounding box, centroid, etc. Therefore, the below block diagram illustrates the detection stage of my algorithm.
|
|
Input: RGB frame |
|
Output: Binary image with separate blobs |
|
Background subtraction:
All fixed camera datasets (i.e. all datasets except ppm), come with background model. Hence, this background model was used as part of the designed detector to separate the foreground from background. The difference between the foreground and the background model was based on squared error measure (SE). A foreground was considered if SE > α, α = 14 was used for all experiments. An example of a background model, background +foreground , and subtracted background which means the foreground , is shows below:
|
|
Background |
|
Foreground +Background |
|
Foreground |
|
Background subtraction image |
|
Erosion with disk of radius 2 pixels |
|
Erosion with disk of radius 2 pixels |
|
Dilation with a disk of radius 9 pixels |
|
Dilation with a disk of radius 9 pixels |
|
Dilation with a disk of radius 5 pixels |
|
Dilation with a disk of radius 2 pixels |

|
2 detected blobs |
|
HoG head detection |
|
Committed final blob |
|
Morphological filters:
· Erosion: It is a basic morphological filter that convolves a moving shape with the input binary or grayscale image. The output pixel value is the minimum value of the pixels in the convolved region. It is an essential image processing filter to eliminate noise. It was used in our application to eliminate noise from the background subtracted images. The convolving shape in our experiment was a disk of radius 2 pixels. A value of 2 pixels was chosen as it gave an acceptable results in the examined dataset. A shape of disk was chosen as we consider human’s head and body is more of curve shape. An example of erosion on a frame as in the following image:
|
|
· Dilation: It is another morphological filter. It is the opposite of the erosion filter in a sense that the output of the convolved pixel is the maximum value of all the pixels in the convolved region. In the conducted experiment, both filters were used. We assumed that the output of the erosion filter was noise free; hence, we wanted to apply the dilation filter to fill the gaps which were either erased by the erosion stage or weren’t considered as foreground in the background subtraction stage. The output of the dilation filter after the erosion stage is shown below:
|
|
A disk shaped mask was also used; however, with a bigger radius than the erosion stage disk. Disk of radius 9 pixels was used. The reason of using a larger disk is essential for next stage; it reduces the chances of having a discontinuous object. The difference between different radii can be shown here:
|
|
Blob detection: each connected blob was considered an object and was given a specific label. The bounding box position (x,y), and size (width, height), were recorded. Up to this stage, each resultant blob was considered as a potential person. Therefore, it can now be clear why a disk of radius 9 was considered. A 9 pixel radius disk merges two blobs if they are 19 pixels apart ( diameter 18 + center pixel ), otherwise, the blobs will be considered as two different objects.
HoG head detector: the found blobs may or may not belong to a person. They may either be a noise from the background or may be a human blob that wasn’t successfully merged using the dilation filter. To overcome this problem, the provided HoG head detector was used to further eliminate the noisy blobs which potentially didn’t belong to human. If a blob had a detection within it, then it was considered as a human blob, otherwise it was eliminated.
|
|
It can be seen from the figure above that the blob that doesn’t belong to human was eliminated since there was no associated HoG head detection for that region. |
|
Tracking stage: The main purpose of this assignment is to track the moving person in the scene. In the previous section, the moving objects were detected and their positions and scales were found. This stage is to track these position. For that purpose a particle filter was implemented.
· The state space equation consists of the (x,y) positions of the bounding box and the scale (w,h) of that bounding box, · Number of particles = 100, · Diffusion covariance matrix of [50 0 0 0;0 50 0 0; 0 0 50 0; 0 0 0 50], · Euclidean distance was used for the measurement. |
|
Results and discussion:
The implementation of this system at this state allows us to precisely track one person in a sequence.
Each frame in every sequence was examined, except peopleSeq3 where every other frame was examined. The frames were combined at a rate of 29.97 frames/second, and they were saved as an avi files.
There are two videos for each sequence: one video corresponds to the detection stage, which shows the detection stage pipeline step by step including the thresholds. The second video shows the tracking results.
For the detection videos, the explanation was given above in the detection stage section. However, for the tracker, the solid colored rectangle shows the observation measure while the dashed blue rectangles show the particles prediction.
Single object tracking:
1– winterSeq2 detection, winterSeq2 tracking, winterSeq4 detection, winterSeq4 tracking
In regards to these two datasets, the detector and the tracker work well without missing the moving person. The videos show that when the detection fails in some frames, the tracker wouldn’t lose the track. That is a significant point in the tracker design, which is to handle observation noise. Hence, the purpose of the assignment for tracking and detection was achieved.
Multiple object tracking:
I wasn’t successful in implementing a multi-object tracker. A lot of time was spent on the AdaBoost classifier following the approach of this paper (M. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, L. Van Gool. "Robust Tracking-by-Detection using a Detector Confidence Particle Filter ICCV 2009).
My plan was to use AdaBoost as a simple recognition stage to save the identity of the tracked object, and associate particles with it. Once a new person shows in the scene, new particles are initialized. However, unsuccessful results were achieved. Apart from the coding difficulty faced to program the idea of having separate particles for each person.
The results from now on are unfortunately unsuccessful; however, I will show them and explain the reason(s) of failure.
2– SFeq2 detection, SFeq2 tracking
The first problem faced in this sequence was in the detection stag due to the shadow. The shadow was detected as a moving object, which indeed is, but the HoG head detector was having false positive detections on the shadow. Hence, as explained earlier, when a blob and HoG are both detected in one region, then a human object is committed. Since the shadow is attached to the person, then the blob will be of a larger size as it is clearly seen in the detection video, and as shown in the image below. Having a wrong observation for several frames will certainly hinder the particle filter accuracy.
|
|
Conclusion:
A person tracking system was implemented. The tracker starts by a person detection which includes HoG head detection, background subtraction, morphological filters, and blob detection. After detecting a person, a particle filter commits and starts tracking the object. The implemented tracking system could successfully track a single object. Hence, this proves that the object detector and the particle filter are working as expected. Generalizing the particle filter to a multi-object tracker wasn’t successful due to coding difficulty and unsuccessful attempts in trying AdaBoost. |
|
Code: The code can be downloaded from here.
Note: The assignment’s provided code is needed to run this code |
|
In regards to the tracking, the problem was faced when another person appeared in the scene. It can be seen from the sequence that the particle filter jumps between the two objects. The reason behind that is the particles are initiated on the first committed object by the object detector. Based on the implementation, committing the objects starts by checking from left to right (i.e. (0,0) of the image to (maximum x ,maximum y)). Hence, the first committed object is usually on the left side of the image.
For this reason, the tracker follows the lady then jumps to David when he enters the scene from the left, then once David becomes to the right of her side, the tracker jumps back to the lady. Apart from the fact that this issue could be elevated if multiple object tracker was successfully implemented, but it could also be tackled if the tracker had some information about the tracked object e.g. histogram of the subject, or the tried AdaBoost.
Besides, even in this sequence, we notice that the tracker could successfully track the lady ones she passed David and became on his left side.
3– SFeq5 detection, SFeq5 tracking
This sequence is similar to the previous one (SFeq2) however with more concurrent people moving in the scene. More people means more detections hence more single object particle filter confusion. The dominance of the left committed detection is clear in this sequence as the tracker keeps tracking the person sitting on the bench.
4– peopleSeq3 detection, peopleSeq3 tracking
In regards to the tracking part, then same failure reasons in the past two sequences apply here (i.e. the inability to track multiple objects). One of the points I want to mentions here, is that this is one of the cases where changing the parameters for the detection stage can improve the results. The people here are closer to the camera than the other sequences so they are bigger in size. If a bigger dilation disk radius was used, then a better detection would occur. However, I was committed to using one set of parameters throughout the entire sequence. Detector failure can be seen below:
|
|
Input image with shadow |
|
Detected blobs |
|
Committed person detectoin |
|
Discontinuity in the blob |
|
Multiple detections for the same object |
|
Note: this is the only sequence where the algorithm was applied to every other frame.
5– ppm
This sequence wasn’t examined as the implemented object detector is based on background subtraction while this sequence is based on a moving camera. Hence, we can’t apply the implemented detection on it. |
|
Video |
# frames |
Outdoor/ indoor |
# people |
Illumination (visual ) |
Person shadow |
Fixed/moving camera |
|
WinterSeq2 |
351 |
Outdoor |
1 |
Uniform |
No |
Fixed |
|
WinterSer4 |
523 |
Outdoor |
1 |
Uniform |
No |
Fixed |
|
SFseq2 |
194 |
Outdoor |
2 |
Varies |
Yes |
Fixed |
|
SFseq5 |
567 |
Outdoor |
7 |
Varies |
Yes |
Fixed |
|
peopleSeq3 |
1,900 |
Indoor |
5 |
Uniform |
No |
Fixed |
|
ppm |
401 |
Outdoor |
1 |
Varies |
Yes |
Moving |