expectation_maximization.rst 11.5 KB
Newer Older
V
Vadim Pisarevsky 已提交
1 2 3 4

.. _ML_Expectation Maximization:


5
Expectation Maximization
6 7
========================

8 9 10
.. highlight:: cpp

The Expectation Maximization(EM) algorithm estimates the parameters of the multivariate probability density function in the form of a Gaussian mixture distribution with a specified number of mixtures.
11

12 13
Consider the set of the N feature vectors
{ :math:`x_1, x_2,...,x_{N}` } from a d-dimensional Euclidean space drawn from a Gaussian mixture:
14 15 16

.. math::

V
Vadim Pisarevsky 已提交
17
    p(x;a_k,S_k, \pi _k) =  \sum _{k=1}^{m} \pi _kp_k(x),  \quad \pi _k  \geq 0,  \quad \sum _{k=1}^{m} \pi _k=1,
18 19 20

.. math::

V
Vadim Pisarevsky 已提交
21
    p_k(x)= \varphi (x;a_k,S_k)= \frac{1}{(2\pi)^{d/2}\mid{S_k}\mid^{1/2}} exp \left \{ - \frac{1}{2} (x-a_k)^TS_k^{-1}(x-a_k) \right \} ,
22

V
Vadim Pisarevsky 已提交
23 24 25 26 27
where
:math:`m` is the number of mixtures,
:math:`p_k` is the normal distribution
density with the mean
:math:`a_k` and covariance matrix
28 29
:math:`S_k`,
:math:`\pi_k` is the weight of the k-th mixture. Given the number of mixtures
V
Vadim Pisarevsky 已提交
30
:math:`M` and the samples
31 32 33 34 35 36
:math:`x_i`,
:math:`i=1..N` the algorithm finds the
maximum-likelihood estimates (MLE) of all the mixture parameters,
that is,
:math:`a_k`,
:math:`S_k` and
V
Vadim Pisarevsky 已提交
37
:math:`\pi_k` :
38 39 40

.. math::

V
Vadim Pisarevsky 已提交
41
    L(x, \theta )=logp(x, \theta )= \sum _{i=1}^{N}log \left ( \sum _{k=1}^{m} \pi _kp_k(x) \right ) \to \max _{ \theta \in \Theta },
42 43 44

.. math::

V
Vadim Pisarevsky 已提交
45
    \Theta = \left \{ (a_k,S_k, \pi _k): a_k  \in \mathbbm{R} ^d,S_k=S_k^T>0,S_k  \in \mathbbm{R} ^{d  \times d}, \pi _k \geq 0, \sum _{k=1}^{m} \pi _k=1 \right \} .
46

47 48
The EM algorithm is an iterative procedure. Each iteration includes
two steps. At the first step (Expectation step or E-step), you find a
V
Vadim Pisarevsky 已提交
49 50 51 52
probability
:math:`p_{i,k}` (denoted
:math:`\alpha_{i,k}` in the formula below) of
sample ``i`` to belong to mixture ``k`` using the currently
53 54 55 56
available mixture parameter estimates:

.. math::

V
Vadim Pisarevsky 已提交
57
    \alpha _{ki} =  \frac{\pi_k\varphi(x;a_k,S_k)}{\sum\limits_{j=1}^{m}\pi_j\varphi(x;a_j,S_j)} .
58

59
At the second step (Maximization step or M-step), the mixture parameter estimates are refined using the computed probabilities:
60 61 62

.. math::

63
    \pi _k= \frac{1}{N} \sum _{i=1}^{N} \alpha _{ki},  \quad a_k= \frac{\sum\limits_{i=1}^{N}\alpha_{ki}x_i}{\sum\limits_{i=1}^{N}\alpha_{ki}} ,  \quad S_k= \frac{\sum\limits_{i=1}^{N}\alpha_{ki}(x_i-a_k)(x_i-a_k)^T}{\sum\limits_{i=1}^{N}\alpha_{ki}}
64

V
Vadim Pisarevsky 已提交
65 66
Alternatively, the algorithm may start with the M-step when the initial values for
:math:`p_{i,k}` can be provided. Another alternative when
67
:math:`p_{i,k}` are unknown is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
68
:math:`p_{i,k}` . Often (including machine learning) the
69
:ocv:func:`kmeans` algorithm is used for that purpose.
70

71
One of the main problems of the EM algorithm is a large number
72
of parameters to estimate. The majority of the parameters reside in
V
Vadim Pisarevsky 已提交
73 74
covariance matrices, which are
:math:`d \times d` elements each
75 76 77
where
:math:`d` is the feature space dimensionality. However, in
many practical problems, the covariance matrices are close to diagonal
V
Vadim Pisarevsky 已提交
78 79
or even to
:math:`\mu_k*I` , where
80 81 82
:math:`I` is an identity matrix and
:math:`\mu_k` is a mixture-dependent "scale" parameter. So, a robust computation
scheme could start with harder constraints on the covariance
83 84 85 86 87 88 89
matrices and then use the estimated parameters as an input for a less
constrained optimization problem (often a diagonal covariance matrix is
already a good enough approximation).

**References:**

*
90
    Bilmes98 J. A. Bilmes. *A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models*. Technical Report TR-97-021, International Computer Science Institute and Computer Science Division, University of California at Berkeley, April 1998.
91

V
Vadim Pisarevsky 已提交
92 93
EM
--
94
.. ocv:class:: EM : public Algorithm
95

V
Vadim Pisarevsky 已提交
96
The class implements the EM algorithm as described in the beginning of this section. It is inherited from :ocv:class:`Algorithm`.
97 98


V
Vadim Pisarevsky 已提交
99 100 101
EM::EM
------
The constructor of the class
102

V
Vadim Pisarevsky 已提交
103
.. ocv:function:: EM::EM(int nclusters=EM::DEFAULT_NCLUSTERS, int covMatType=EM::COV_MAT_DIAGONAL, const TermCriteria& termCrit=TermCriteria(TermCriteria::COUNT+TermCriteria::EPS, EM::DEFAULT_MAX_ITERS, FLT_EPSILON) )
104

105
.. ocv:pyfunction:: cv2.EM([nclusters[, covMatType[, termCrit]]]) -> <EM object>
106

V
Vadim Pisarevsky 已提交
107
    :param nclusters: The number of mixture components in the Gaussian mixture model. Default value of the parameter is ``EM::DEFAULT_NCLUSTERS=5``. Some of EM implementation could determine the optimal number of mixtures within a specified value range, but that is not the case in ML yet.
108

V
Vadim Pisarevsky 已提交
109
    :param covMatType: Constraint on covariance matrices which defines type of matrices. Possible values are:
V
Vadim Pisarevsky 已提交
110

V
Vadim Pisarevsky 已提交
111
        * **EM::COV_MAT_SPHERICAL** A scaled identity matrix :math:`\mu_k * I`. There is the only parameter :math:`\mu_k` to be estimated for each matrix. The option may be used in special cases, when the constraint is relevant, or as a first step in the optimization (for example in case when the data is preprocessed with PCA). The results of such preliminary estimation may be passed again to the optimization procedure, this time with ``covMatType=EM::COV_MAT_DIAGONAL``.
V
Vadim Pisarevsky 已提交
112

V
Vadim Pisarevsky 已提交
113
        * **EM::COV_MAT_DIAGONAL** A diagonal matrix with positive diagonal elements. The number of free parameters is ``d`` for each matrix. This is most commonly used option yielding good estimation results.
V
Vadim Pisarevsky 已提交
114

V
Vadim Pisarevsky 已提交
115
        * **EM::COV_MAT_GENERIC** A symmetric positively defined matrix. The number of free parameters in each matrix is about :math:`d^2/2`. It is not recommended to use this option, unless there is pretty accurate initial estimation of the parameters and/or a huge number of training samples.
116

V
Vadim Pisarevsky 已提交
117
    :param termCrit: The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``termCrit.maxCount`` (number of M-steps) or when relative change of likelihood logarithm is less than ``termCrit.epsilon``. Default maximum number of iterations is ``EM::DEFAULT_MAX_ITERS=100``.
118

V
Vadim Pisarevsky 已提交
119 120 121
EM::train
---------
Estimates the Gaussian mixture parameters from a samples set.
122

V
Vadim Pisarevsky 已提交
123
.. ocv:function:: bool EM::train(InputArray samples, OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray())
124

V
Vadim Pisarevsky 已提交
125
.. ocv:function:: bool EM::trainE(InputArray samples, InputArray means0, InputArray covs0=noArray(), InputArray weights0=noArray(), OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray())
126

V
Vadim Pisarevsky 已提交
127
.. ocv:function:: bool EM::trainM(InputArray samples, InputArray probs0, OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray())
128

129 130 131 132 133 134
.. ocv:pyfunction:: cv2.EM.train(samples[, logLikelihoods[, labels[, probs]]]) -> retval, logLikelihoods, labels, probs

.. ocv:pyfunction:: cv2.EM.trainE(samples, means0[, covs0[, weights0[, logLikelihoods[, labels[, probs]]]]]) -> retval, logLikelihoods, labels, probs

.. ocv:pyfunction:: cv2.EM.trainM(samples, probs0[, logLikelihoods[, labels[, probs]]]) -> retval, logLikelihoods, labels, probs

V
Vadim Pisarevsky 已提交
135
    :param samples: Samples from which the Gaussian mixture model will be estimated. It should be a one-channel matrix, each row of which is a sample. If the matrix does not have ``CV_64F`` type it will be converted to the inner matrix of such type for the further computing.
136 137

    :param means0: Initial means :math:`a_k` of mixture components. It is a one-channel matrix of :math:`nclusters \times dims` size. If the matrix does not have ``CV_64F`` type it will be converted to the inner matrix of such type for the further computing.
138

V
Vadim Pisarevsky 已提交
139
    :param covs0: The vector of initial covariance matrices :math:`S_k` of mixture components. Each of covariance matrices is a one-channel matrix of :math:`dims \times dims` size. If the matrices do not have ``CV_64F`` type they will be converted to the inner matrices of such type for the further computing.
140 141 142 143

    :param weights0: Initial weights :math:`\pi_k` of mixture components. It should be a one-channel floating-point matrix with :math:`1 \times nclusters` or :math:`nclusters \times 1` size.

    :param probs0: Initial probabilities :math:`p_{i,k}` of sample :math:`i` to belong to mixture component :math:`k`. It is a  one-channel floating-point matrix of :math:`nsamples \times nclusters` size.
144

V
Vadim Pisarevsky 已提交
145
    :param logLikelihoods: The optional output matrix that contains a likelihood logarithm value for each sample. It has :math:`nsamples \times 1` size and ``CV_64FC1`` type.
146

V
Vadim Pisarevsky 已提交
147
    :param labels: The optional output "class label" for each sample: :math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample). It has :math:`nsamples \times 1` size and ``CV_32SC1`` type.
148

V
Vadim Pisarevsky 已提交
149
    :param probs: The optional output matrix that contains posterior probabilities of each Gaussian mixture component given the each sample. It has :math:`nsamples \times nclusters` size and ``CV_64FC1`` type.
150

V
Vadim Pisarevsky 已提交
151
Three versions of training method differ in the initialization of Gaussian mixture model parameters and start step:
152

V
Vadim Pisarevsky 已提交
153
* **train** - Starts with Expectation step. Initial values of the model parameters will be estimated by the k-means algorithm.
154

V
Vadim Pisarevsky 已提交
155
* **trainE** - Starts with Expectation step. You need to provide initial means :math:`a_k` of mixture components. Optionally you can pass initial weights :math:`\pi_k` and covariance matrices :math:`S_k` of mixture components.
156

V
Vadim Pisarevsky 已提交
157
* **trainM** - Starts with Maximization step. You need to provide initial probabilities :math:`p_{i,k}` to use this option.
158

V
Vadim Pisarevsky 已提交
159
The methods return ``true`` if the Gaussian mixture model was trained successfully, otherwise it returns ``false``.
160

161 162 163 164 165 166
Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or function values) as input. Instead, it computes the
*Maximum Likelihood Estimate* of the Gaussian mixture parameters from an input sample set, stores all the parameters inside the structure:
:math:`p_{i,k}` in ``probs``,
:math:`a_k` in ``means`` ,
:math:`S_k` in ``covs[k]``,
:math:`\pi_k` in ``weights`` , and optionally computes the output "class label" for each sample:
167
:math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample).
168

169
The trained model can be used further for prediction, just like any other classifier. The trained model is similar to the
170
:ocv:class:`CvNormalBayesClassifier`.
171

V
Vadim Pisarevsky 已提交
172 173 174
EM::predict
-----------
Returns a likelihood logarithm value and an index of the most probable mixture component for the given sample.
175

V
Vadim Pisarevsky 已提交
176
.. ocv:function:: Vec2d predict(InputArray sample, OutputArray probs=noArray()) const
177 178 179

.. ocv:pyfunction:: cv2.EM.predict(sample[, probs]) -> retval, probs

V
Vadim Pisarevsky 已提交
180
    :param sample: A sample for classification. It should be a one-channel matrix of :math:`1 \times dims` or :math:`dims \times 1` size.
181

V
Vadim Pisarevsky 已提交
182
    :param probs: Optional output matrix that contains posterior probabilities of each component given the sample. It has :math:`1 \times nclusters` size and ``CV_64FC1`` type.
183

V
Vadim Pisarevsky 已提交
184
The method returns a two-element ``double`` vector. Zero element is a likelihood logarithm value for the sample. First element is an index of the most probable mixture component for the given sample.
185

V
Vadim Pisarevsky 已提交
186 187 188
CvEM::isTrained
---------------
Returns ``true`` if the Gaussian mixture model was trained.
189

V
Vadim Pisarevsky 已提交
190
.. ocv:function:: bool EM::isTrained() const
191

192 193
.. ocv:pyfunction:: cv2.EM.isTrained() -> retval

V
Vadim Pisarevsky 已提交
194
EM::read, EM::write
195
-------------------
V
Vadim Pisarevsky 已提交
196
See :ocv:func:`Algorithm::read` and :ocv:func:`Algorithm::write`.
197

V
Vadim Pisarevsky 已提交
198 199 200 201 202 203 204 205 206 207 208 209
EM::get, EM::set
----------------
See :ocv:func:`Algorithm::get` and :ocv:func:`Algorithm::set`. The following parameters are available:

* ``"nclusters"``
* ``"covMatType"``
* ``"maxIters"``
* ``"epsilon"``
* ``"weights"`` *(read-only)*
* ``"means"`` *(read-only)*
* ``"covs"`` *(read-only)*
..