An Advanced Real-Time Multiple Object Tracker in Variant Outdoor Environments

Tracking of humans in dynamic scenes has been an important topic of research. There has been considerable work in tracking humans and other objects in recent years. A real-time method for tracking multiple moving objects based on effective Gaussian Mixture Model (GMM), and identifying the moving objects with Joint Probability Data Association Filter (JPDAF) is proposed in this paper. Most tracking algorithms have better performance under static background but get worse results under background with fake motions. Therefore, most of the tracking algorithms are used in indoor environment. An adaptive Gaussian Mixtures has a nice property in resolving this problem. This paper uses recursive equations to constantly update the parameters of a Gaussian Mixture Model and to simultaneously select the appropriate number of components for each pixel. Therefore, this method is more time and memory efficient than the common GMM with the fixed component number. In tracking multiple moving objects, problems occur when objects pass across each other. The JPDAF method is used in this paper to solve this problem. Moreover, it can effectively deal with the various scenes such as the indoor scene, the outdoor scene, and the cluttered scene. The experimental results on our test sequence demonstrate the high efficiency of the proposed method.


Introduction
Multi target tracking, the association of detected points into sequences over time, is an important NP-hard (non deterministic polynomial-time hard) problem. Considerable efforts have been conducted to design tractable methods by reducing its complexity. Multi target tracking has a broad range of applications including some well-known application such as video surveillance and monitoring [1] sonar based tracking of sea animals or submarines, robot control [2] eye tracking [3] video based identification and tracking of people and car for surveillance or security purposes, and many more [4,5].

Motion detection
Nearly every visual surveillance system starts with motion detection. Subsequent processes such as tracking and behavior recognition are greatly dependent on it. Accurate and real-time motion segmentation will greatly improve the performance of object tracking. The difficulty in motion segmentation lies in how to reconstruct the background image to adapt to the scene changes, and how to remove the false foreground pixels that may lead to inaccurate segmentation results. Most back grounding methods involve continuously estimating a statistical model of the variation for each pixel. A common method of adaptive back grounding is averaging the images over time, creating a background approximation which is similar to the current static scene except where motion occurs. While this is effective in situations where objects move continuously and the background is visible a significant portion of the time, it is not robust to scenes with many moving objects particularly if they move slowly.
In [6] a nonparametric kernel density estimation method is presented, with higher sensitivity in detecting foreground regions. This method updates the background from the sample data directly by using the Gaussian kernel Parzen estimator. If the sample size is large enough, this method can very accurately describe the background variations. So, the performance of this method is based on the sample size. The more precisely the background is estimated, the more memory and CPU time will be needed.
The authors in [7,8] use mean-shift analysis to track the distribution of a target object. The method shows good performance in crowded scenes and cluttered backgrounds. However, the object shape is constrained to be an ellipse, which restricts the applicability of the approach when dealing with deformable objects. Also, the performance of the approach drastically decreases under variations in the appearance of the objects.
In [9] proposed a method for detecting and tracking multiple moving objects based on discrete wavelet transform and identifying the moving objects by their color and spatial information. Many tracking algorithms have better performance under static background but get worse results under background with fake motions.
Therefore, most of the tracking algorithms are used indoors instead of outdoor environment. Since discrete wavelet transform has a nice property that it can divide a frame into four different frequency bands without loss of the spatial information, it is adopted to solve this problem due to the fact that most of the fake motions in the background can be decomposed into the high frequency wavelet sub-band. In tracking multiple moving objects, many applications have problems when objects pass across each other. Color and spatial information are used in this paper to solve this problem. A mixture Gaussian model can represent the complex distribution of each pixel, and it became a popular technique for modeling a dynamic background. Unfortunately, conventional Gaussian mixture model (GMM suffers from slow convergence at the initial stage [10][11][12][13] and the segmentation accuracy largely depends on the threshold parameters.

Object tracking
After motion detection, surveillance systems generally track moving objects from one frame to another in an image sequence. Tracking over time typically involves matching objects in consecutive frames using features such as points, lines or blobs. A visual-based multi-target tracking system should be able to track a variable number of objects in a dynamic scene and maintain the correct identities of the targets regardless of occlusions and any other visual perturbations. It is a very complicated and challenging problem; extensive research work has been done.
A large number of strategies are available to solve the data association problem. These can be broadly categorized as either single frame assignment methods, or multi-frame assignment methods. The Multiple Hypotheses Tracker (MHT) attempts to keep track of all the possible association hypotheses over time. The idea of MHT [14] is to associate each measurement with one of the existing tracks, or to form a new track from the measurement. The MHT algorithm calculates the likelihoods of the measurements and the posterior probabilities of the hypotheses, storing only the most probable hypotheses. To enhance the computational efficiency, heuristic methods such as gating, hypothesis merging, clustering and several other strategies can be employed.
In this respect the Joint Probabilistic Data Association Filter (JPDAF) [15,16] is more appealing. At each time step infeasible hypotheses are pruned away using a gating procedure. A filtering estimate is then computed for each of the remaining hypotheses, and combined in proportion to the corresponding posterior hypothesis probabilities. JPDA approximates the posterior distributions of the targets as separate Gaussian distributions for each target. If the number of targets is T, then T separate Gaussian distributions are maintained. The number of Gaussian distributions is kept constant by integrating over the distribution of data associations of the previous step. This results in an algorithm where each of the target estimates gets updated by every measurement with weights that depend on the predicted probabilities of the associations. Gating is used for limiting the number of measurements for each track. If the predicted probabilities are too low (i.e., below a predefined threshold) for certain targets, those targets are not updated at all. Clutter measurements can be modeled similarly.

Gaussian Mixture Model
A GMM is a statistical model that assumes the data originate from a weighted sum of several Gaussian distributions. There are two important problems when GMM is used to model multivariate data: the selection of the number of components and the initialization. In this paper, use a flexible method for foreground segmentation that proposed in [17,18] and [19][20][21]. We still use the finite GMM to model the scene background, but a stochastic approximation procedure is used to recursively estimate the parameters of the GMM, and to simultaneously obtain the asymptotically optimal number of the mixture components. Therefore, this method is highly memory and time efficient. Moreover, this method can effectively deal with many scenes, such as the indoor scene, the outdoor scene, and the clutter scene.
The value of a pixel at time t in RGB is denoted by x .Some other color space or some local features could also be used. We choose a reasonable time adaptation period T. At time t we have. are the parameters. We use a GMM with M components: Where µ π

Update equations
Given a new data sample  ( ) t x at time t the recursive update equations are: The most commonly used framework for tracking is that of Bayesian Sequential Estimation. This framework is probabilistic in nature, and thus facilitates the modeling of uncertainties due to inaccurate models, sensor errors, environmental noise, etc. The general recursions update the posterior distribution of the target state, also known as the filtering distribution, through two stages: a prediction step that propagates the posterior distribution at the previous time step through the target dynamics to form the one step ahead prediction distribution, and a filtering step that incorporates the new data through Bayes' rule to form the new filtering distribution. In theory the framework requires only the definition of a model for the target dynamics, a likelihood model for the sensor measurements, and an initial distribution for the target state. In most practical tracking applications the sensor yield unlabelled measurements of the targets. This leads to a combinatorial data association problem that is very challenging when targets have a small separation compared to the measurement errors. Furthermore, clutter measurements may arise due to multi-path effects, sensor errors, etc., further increasing the complexity of the data association problem.
Instead of the time interval T that was mentioned above, here the constant α defines an exponentially decaying envelope that is used to limit the influence of the old data. We keep the same notation having in mind that effectively α = x and σ σ where σ 0 is some appropriate initial variance.

Background subtraction for foreground
The GMM with M components as (3) models both the foreground object and the scene background without distinction, that is, some of the mixture components model the foreground objects, and others model the scene background. If one mixture component occurs frequently (with high π  m ) and does not very much (with lowσ m ), it could be deemed to be background [22]. Therefore, the K mixture components are ordered based T is the threshold, the fraction of the total weight given to the background model. Background subtraction is performed, by marking any pixel of the input frame that is more than 2.0 standard deviations away from any of the B components as a foreground.

Parameters estimation
Given a set of t independent and identically distributed samples (ML) estimation of the parameter values is: Unfortunately, it is difficult to analytically find the estimation from (9) [18]. But the Bayesian maximum a posteriori (MAP) criterion: Is flexible for the parameters estimation [12].Where θ  ( ) p is the prior for the mixture parameters. The usual choice for obtaining ML or MAP estimation of mixture models parameters is the EM algorithm, which is an iterative procedure for searching local maxima. The EM algorithm is based on the interpretation of X as incomplete data. For finite mixture, the missing part is a set of t label = (1)

Estimating the number of mixture components
Obviously, in order to use the EM algorithm, the appropriate number of mixture components M must be known. With too many components, the mixture may over-fit the data, while a mixture with too few components may under-fit. Besides, the appropriate number of components can reveal some important existing underlying structure that characterizes the data. However, introducing more mixture components always increases the log-likelihood. The balance is achieved by introducing ( θ Where θ

Online parameters estimation
For the ML estimate, the following holds: The mixing weights are constrained to sum up to 1. Take this into account by introducing the Lagrange multiplier λ and get: After getting rid of λ , for t data samples will get: (18) With the "ownerships" defined as: Similarly, for the MAP solution, have: Where θ p M is the mentioned Dirichlet prior (15). For t data samples, get: The The EM algorithm starts with some initial parameters θ (0) . If to denote the set of the missing part Z, the estimation θ   m from the k th iteration of the EM algorithm is obtained using the previous estimation  Assume that the parameters estimation is almost invariable when new sample x is added, and +  Here, T should be sufficiently large to make sure that

Tracking Multiple Objects
Once segmented, the objects are tracked in the subsequent frames. Tracking is a loop consisting of prediction of the positions from the previous frames, search for the best match, and update of the object representation. Traditionally, the tracking problem is formulated as sequential recursive estimation [23] having an estimate of the probability distribution of the target in the previous frame, the problem is to estimate the target distribution in the new frame using all available prior knowledge and the new information brought by the new frame.

Bayesian sequential estimation
In the state space formalism, the state of the target is described by a state vector containing parameters such as target position, velocity, angle, or size. The target evolves according to the following discretetime stochastic model called the dynamic model: Where  the conceptual solution is found in two steps. Using the transition density one can perform the prediction step: The prediction step makes use of the available knowledge of target evolution encoded in the dynamic model. The update step uses the measurement t Z, available at time t, to update the predicted density: These two steps, repeated for each frame, allow computing recursively the state density for each frame. In the case of multiple targets, each target is parameterized by a state X k,t , k=1,… K which may differ in interpretation over the individual targets. The combined state is constructed as the concatenation of the individual target states,

Problem statement
The humans which are of interest here, all have the same visual appearance and cannot be discriminated visually. Therefore, to track a particular human over time, the association task becomes crucial. We denote by a F 1:t possible hypothesis of the t-frame association problem Which is NP-hard, with a complexity exponential in t , so to find the optimal solution is essentially impossible.
Instead, to solve the association problem, different methods have been introduced to solve the problem sub optimally either by finding the most likely hypothesis from a limited hypothesis set over multiple frames: = as the solution such as the multi hypothesis tracking (MHT) algorithm, or a frame by frame solution based on single-frame associations over time [t-1, t] such as JPDA [14].
The single-frame association method is a feasible approach which has been widely used. However even in the single-frame case, virtually all approaches propose to find solutions over a reduced hypothesis space for frame t.
As the solution, the optimal single-frame solution is found if it is included among the hypotheses of that frame.
One of the widely used and well known single-frame algorithms to solve the multi target tracking problem in a reduced hypothesis space is JPDA.

Joint probabilistic data association
To solve (33), JPDA has been widely applied for multi target tracking. There have been considerable efforts to generalize JPDA and overcome its shortcomings by using the extended Kalman filter (EKF) to linearize modest nonlinear systems [24]. JPDA assumes the number of targets to be known with the following constraints: 1. Each measurement originates from only target or clutter.

2.
A measurement can be associated at most to one target.
3. At most one measurement can be associated to a target.
To make the association problem tractable, JPDA reduces the number of possible association hypotheses and keeps a reasonable subset of them as valid association hypotheses using a gating strategy which JPDA employs to validate the measurements and to generate a subset of association hypotheses, valid associations, based on validated measurements. Gating keeps the measurements which fall inside the validation gate of each target as valid measurements; hence the measurements that fall outside of the target's validation gate are not considered as association candidates and are thrown away. Moreover to manage the dimensionality and so the complexity of the problem, JPDA employs a recursive strategy and updates the filter distribution for each target.
Instead of maintaining the filtering distribution for the joint state Due to the data association uncertainty the updating step cannot be performed independently for the individual targets. The JPDAF gets around this difficulty by performing a soft assignment of targets to measurements according to the corresponding posterior probabilities of these marginal associations. Moreover to manage the dimensionality and so the complexity of the problem, JPDA employs a recursive strategy and updates the filter distribution for each target. JPDA sequentially estimates the marginal distribution Where β m k is the marginal posterior probability of associating measurement m to target k. So that β 0 k is the probability of no measurement to be associated to target k.

The Proposed Method
Tracking and surveillance applications require the segmentation of objects (regions of interest separate to the background) from the scene. Typically, systems divide the scene into two regions, foreground and background, where only the foreground contains events of interest, and the background is relatively unchanging over time. To achieve this separation of foreground and background, techniques such as Adaptive Gaussian Mixture Model are used.
Adaptive Gaussian mixtures are commonly chosen for their analytical representation and theoretical foundations. For these reasons, they have been employed in real-time surveillance systems for background subtraction and object tracking. But there are two open problems when Gaussian Mixture model are used for model multivariate data: the selection of the number of components and the initialization. The proposed method uses recursive equations [17] to constantly update the parameters of a Gaussian Mixture Model and to simultaneously select the appropriate number of components for each pixel. This GMM algorithm can automatically select the needed number of components per pixel that leads to the improvement in the segmentation results. Therefore, this method is highly memory and time efficient. Moreover, our method can effectively deal with various scenes such as the indoor scene, the outdoor scene, and the cluttered scene.
Motion correspondence is a fundamental problem in computer vision where it is referred to as the data association problem. The target tracking and surveillance community has extensively studied the motion correspondence problem and a number of statistical data association techniques have been developed. In this paper, JPDAF algorithm is used to solve the data association problem [24].
Most of the single-frame tracking methods, including JPDAF, reduce the number of possible association hypotheses to make the association problem tractable. Results show that our algorithm performs well for a surveillance application where we are attempting to locate a moving person within the flow images. The algorithm is capable of running in near real-time speed (15 fps) on a 3 GHz workstation. Experimental results are presented to show the effectiveness of the proposed method. Person tracking results show that our system is able to track people accurately.
The proposed multiple object tracking algorithm is as follows: Where c k is normalization constant. The first term of (41), , is a prediction step. The second term of (41) p Z X is the likelihood of measurement Z t , given hypothesis X k,t and is computed in standard JPDA as:

Experimental Results
In this section, we performed several experiments to prove the feasibility of the proposed tracking method. We used an entry-level video camera to capture test sequences. The experimental results are shown for various conditions such as a single moving object in outdoor environment with static background, a single moving object in outdoor environment with varying background and multiple outdoor objects. The results demonstrate a high performance of the proposed algorithm in the cluttered outdoor scene. The image size of all test sequences is 320*240 pixels. These experiments are implemented on the MATLAB 7 platform with the Pentium4, 3 GHz and 1.0 GB memory PC. The experimental results show that the proposed method is efficient, effective and near real time.

Tracking single object
We did several experiments to simulate several conditions for moving objects such as single object in outdoor environment with static background and single object in outdoor environment with varying background. Figure 1 shows an example of the tracking for a single object in outdoor environment with static background. Figure  2 shows another example for a single object in outdoor environment with varying background.

Tracking multiple objects
In order to test the proposed tracking and identification method for multiple objects, we captured five video sequences in outdoor scenes with multiple moving objects crossing each other. In sequence 1, there are two persons in the test video stream, and they enter into the surveillance system. Figure 3 shows how these two persons move in the video sequence. The results of objects identification are shown in table 1. As can be seen in the table 1, the error in the tracking is less than 5% for all sequences. We compare our proposed method with the algorithm described in [25]. In this paper proposed a method for detecting and tracking multiple moving objects based on discrete wavelet transform and identifying the moving objects by their color and spatial information. From table 1, we notice that some objects are not correctly identified in some frames of the sequences. The wrong identification occurs in two types. First, it occurs when the moving object just enters or leaves the scene. When the moving object is just entering the surveillance system, the system will detect the moving object. However, the moving object is detected and tracked at the border of scene; the extracted features of the moving object in this case cannot represent the moving object very well. The main reason is that only partial of the features of the moving objects are extracted. The second kind of erroneous identification is called type 2. This kind of error occurs when the moving object is slowing down. In this situation, the inter-frame difference image of the object will become smaller and we will get smaller bounding box and less moving pixels. Therefore, the extracted features will lose its representative. This kind of erroneous identification arises from the detection error. In our proposed  b) The first frame can be immediately used to initialize the mean µ c) The initial covariance is σ = (a) The ownership can be simplified as

Tracking performance
Some selected frames from the results are shown in figure 3. The algorithm is running at 15 frames per second which is near real time. The reported processing time is for 320*240 pixel images and measured on a 3GHz PC. The ID of each human is shown on his beside in the picture. The proposed method works well with no identification error. The system is able to handle occlusion correctly and the object  ID's remaining correct as shown in figure 4. The ID of each object is assigned automatically as it enters the surveillance system and then remains fixed. No error is observed in identifying multiple objects in the test sequences ( Figure 5).

Conclusion
A real-time method for detecting and tracking multiple moving objects based on adaptive Gaussian Mixture Model and corresponding the moving objects by JPDAF algorithm is proposed in this paper. A flexible GMM is used for foreground segmentation. In the proposed algorithm, the mixture component number is not a preset and fixed value and is optimally online estimated. Therefore, this method is   more time and memory efficient than the traditional GMM with the fixed component number. In tracking multiple moving objects, many applications have some problems when objects pass across each other. The JPDAF algorithm is used in this paper to solve this issue. No identification error has been observed on the test sequences. The algorithm is capable of running in near real-time speed (15 fps) on a 3 GHz workstation. Experimental results are presented to show the effectiveness of the proposed method.