Hand Gesture recognition using
Abstract— With recent technological
advancements in the field of artificial intelligence, human gesture recognition
has gained special interests in the fields of computer vision and human
computer interaction. Gesture recognition is a difficult problem because of the
variations in subject location, size in the frame. Subject occlusion also
hinders the process of image recognition. Similarly, human gestures are a
pattern recognition problem. This paper proposes a method to recognize time –
varying human gestures from continuous video streams. This uses hidden Markov
model method for the human hand gesture recognition. This can be applied to
gestures by simply converting the into sequential symbols.
models; neural networks;Vector quantization.
Human hand gestures are non-verbal actions
used to express information and can instantly recognized by humans. Automatic
gesture recognition from continuous video streams has a variety of potential
applications from smart surveillance to human machine interaction. .
Hand gesture recognition is implemented automatically.
It is performed in two phase process:
Feature Extraction and classification.
II.FEATURE EXTRACTION AND CLASSIFICATION
Feature extraction and
classification is a field where objects captured are classified into a number
of classes. These objects are referred to as patterns and generally consists of
speech, text or any other pattern. In this paper, the pattern is hand gestures.
Basically, we start when the user provides an input in
form of some gesture. The input data is then collected also known as our
training data for the next step i.e. feature extraction. Training is done using
hidden Markov model. Finally, classification is performed using feature
classifier and the output is generated
III.HIDDEN MARKOV MODEL
about the hidden Markov model, it is necessary to understand it’s foundation
i.e. the Markov process. A Markov
process consists of two parts:
finite number of states, and
probabilities of moving between those states.
The Markov model
provides a powerful way to stimulate the real-world processes, provided a small
number of assumptions are met (i.e. a
fixed set of states, fixed transition probabilities, and the possibility of
getting from any state to any other state through a series of transitions).
We can say, a HMM
is a collection of finite states connected by change of states.
Each state events
has two probabilities
discrete output probability distribution or continuous output probability
The application of
HMM to gesture recognition was motivated by HMM in speech recognition.
There are many similarities between gesture and speech recognition therefore
HMM can be applied here as well. The gestures vary with subject position,
symmetry and size in frame.
The gestures that
are to be used are to be defined in advance. Here, using the sign language, the
vocabulary needs to be defined in advance.
Now the role of
HMM comes into play, each gesture is defined in terms of HMM. The structures of
the transition function and the output probability are estimated.
We pre-process the
data that was fetched during the initial steps. It involves short term Fourier
transformation and Vector quantization. Here the data is collected and gestures
are defined through our training data. Therefore, this data needs to be
represented in a compact term.
The parameters of
HMM are estimated by using the Baum-Welch algorithm. Baum-Welch algorithm is needed because the
state paths in HMM are hidden and the equations cannot be solved analytically.
Also, it provides the maximum likelihood estimates i.e. attempts to find the model
that assigns the training data the highest likelihood. The probability of a
state ‘i’ calculated at time ‘t’ is given by:
Then, gestures are
decoded and recognized using Viterbi algorithm. This algorithm is used to find
the most likely hidden state in HMM.
The figures after
classification come as follow:
Windows 7 or
higher, Python 3 or higher, Numpy 1.14, Scipy 1.0.0 , Open CV 3.4.0.
V. FEATURE EXTRACTION, VECTOR QUANTIZATION
The main goal of
the feature extraction step is to simplify recognition by summarizing vast
amounts of image data and then obtaining the unique properties that define the
In the initial
steps, the observation of a feature is transformed into a pattern and then the
distinct components are called features. A classifier is needed in the next
step. Primarily it divides the features space into parts such that each corresponds
to a pattern class.
Vector quantization is a lossy data compression method based upon principle of
block coding. Vector quantization aims at minimizing distortions over all
samples. Vector quantization is one of the preferred methods to lower distortions
than using scalar quantization at the same rate.
The final step
that comes is the recognition. Basically, we use the Viterbi algorithm to find
the most likely sequence of states. A recognition is made by analyzing the
final and initial states together. For a successful model there is no change of
state from initial state of gesture provided to the initial state of
recognition model. Same happens to the final states too.
produces a model for the hand gesture recognition using hidden Markov model.
Hidden Markov model were originally used for the speech recognition learning
algorithms, results show that the HMM can be successfully applied to hand
gesture recognition as well. The results can be further improved by using
better gesture segmentation and image noise filtration. The proposed method can
be used in developing various learning algorithms in the field of robotics,
human-computer interaction etc.
1 T.B. Moeslund and E. Granum, “A
survey of computer vision based human motion
2 Christopher M. Bishop, “Patten recognition and machine learning”,
3 Daphne Koller and Nir Friedman, “Probabilistic Graphical models”,
4 Andrew D. Wilson, Aaron F. Bobick ” Parametric Hidden Markov
5 Donald O. Tanguay Jr, “Hidden Markov model for gesture
6Trevor Hastie, Robert Tibshirani, Jerome Friedman “The Elements of
Statistical Learning, Data Mining, Inference and Prediction”s.