Background of Virtual Valipilla

From this series of posts I'm going to mention the project background. The techniques and tools that can be used for  air-finger-writing process. This post includes the gesture based character recognition techniques that can be used for the implementation.

BACKGROUND

Air-writing recognition enables a user to input text by writing in the air. Different from conventional pen-based handwriting, air-finger-writing renders the characters written with a finger tip or tip of a stylus, on a virtual plane without a haptic feedback. It involves no physical plane to write on, and has no direct pen-up/pen-down information. In other words, air-writing is uni-stroke and different from ordinary handwriting. Therefore, conventional offline handwriting recognition techniques cannot be applied directly. This type of research deals with interpreting the data while it is generated and much different to the process of a typical OCR system. So most of the researches carried out in this area is to finding better recognition techniques to recognize this uni-stroke writing in a virtual plane. In simple, it is like building a gesture based writing tool. That is the most general usage in this research domain. So our focus is not on building a writing tool, but building a learn-to-write tool.

Gesture Based Character Recognition

In building this learn-to-write tool, our consideration is just on a single character rather than the whole word. However this single character should be verified, that it is written perfectly and in a proper way with correct component proportions. Also we could not expect that the characters are only on uni-strokes. They can contain multi-strokes. 

When designing a recognizer, a trade-off is usually made between personalization and generality. So there are two extreme cases lying on as user-dependent and user-independent gesture recognition. Even with a predefined gesture vocabulary, robust user-independent gesture recognition can be very challenging due to the large variations among different users. 

Considering these aspects, several approaches were surveyed as follows, in order to find a suitable one for the online recognition of a character in a stream of 3D coordinate points from finger gestures.

  •  Dynamic Time Warping

It is an effective algorithm based on dynamic programming. By treating the input as a time series of 3D positions, Dynamic Time Warping (DTW) algorithms can be used to recognize characters. The task of identifying characters in a time series requires data to test and train on. Vikram et al.[1] proposed an approach where their algorithm searches similar character writing sequences from the training database using DTW. Their first goal is to have a similarity search algorithm, as in the pseudo code depict. The similarity search will sweep across the data time series, checking every sub -sequences against the candidate and returning the best match. Both candidates and all sub-sequences are z-normalized in the process.

DTW Process


The dynamic time warping algorithm is used as a similarity metric between vectors. It is a generalization of the Euclidean distance metric but chooses the closest point within a certain time window, rather than creating a one-to-one mapping of points. When the time window is 0, DTW reduces to Euclidean distance. 


Letter Stroke Distance Measuring Techniques


To obtain better results for this kind of a character recognition technique, it is best suitable when the handwriting's are belonging to a frequent writer. Actually DTW can be useful for personalized gesture recognition. But for our requirement we are focusing on language learners on different proficiency levels and age limits. So the time duration people taking to write a letter may have significant variances.

  • Hidden Markov Model

The statistical approaches to this problem use Hidden Markov Model (HMM) or use a combination of HMM and neural network approach to recognize characters. The HMM is efficient at modeling a time series with spatial and temporal variations, and has been successfully applied to gesture recognition.

Depending on the tracking technology in use, the features (observations) for the HMMs may vary, including the position, the moving direction, acceleration, etc. Chan[2] proposed an approach using HMM-based recognizer which uses the 2 dimensional (2D) position and velocity on the xy-plane as the feature vector for the HMMs. The raw sensor signals may need proper normalization to make the recognizer scale and speed invariant, or quantization to handle the variations of gestures, especially in the user-independent case.

  • Artificial Neural Network
Most of the researches used Artificial Neural Network (ANN) as a classifier in gesture recognition process. ANN can be trained with known examples of a problem before it is tested for its inference capability on unknown instances of the problem. It possesses the capability to generalize the given set of data; as a result they can predict new outcomes from past trends. The system proposed by Joshi et al.[3], inputs preprocessed data; such as smoothed, duplicate removed, spatial normalized data for ANN model. They had used back propagation Neural Network (NN) with one input layer, two hidden layers and one output layer for handwriting recognition as shown. 


Artificial Neural Network


Larger the amount of training cycles, better the yield mean square error, which in general indicates better test result. However, more training cycles spend more training time as well. A very important feature of these networks is their adaptive nature, where programming is replaced by learning in solving problems. They are robust, fault tolerant and can recall full patterns from partial, incomplete or noisy patterns.

  • Hilbert Warping Method

Ishida et al.[4] has proposed an alignment method for gesture based handwriting recognition called Hilbert Warping as a solution for the problems occurred in DTW. As they say, DTW has a drawback for the classification task because DTW looking always for the best alignment for the reference sequences of all categories. They suggest misclassification can occur due to the over-fitting to incorrect categories.

In their proposed method, the input sequence is aligned to the reference sequences by phase-synchronization of the analytic signals, and then classified by comparing the cumulative distances. A major benefit of this method is that over fitting to sequences of incorrect categories is restricted. The proposed method exhibited high recognition accuracy in finger-writing character recognition.

  • Data-driven Template Matching

The $1 recognizer and its variants $N and $P [5] are based on template matching. Unlike DTW, which relies on dynamic programming, these algorithms process the trajectory with re-sampling, rotation, and scaling and then match the point-paths with the reference templates. These recognizers are simple to implement, computationally inexpensive, and require only a few training samples to function properly for personalized recognition. However, for user-independent recognition, a significant amount of templates are needed to cover the range of variations. Still the research space is lacking of using $-family (dollar family) for character recognition as most of the researches carried out to build a writing tool which focus on words and sentences rather than a single character. $-family is best suitable for recognizing a single character or a symbol written with finger gesture.

State-of-the-art gesture recognition techniques, such as HMM, feature-based statistical classifiers or mixture of classifiers, typically require significant technical knowledge to understand and develop them for new platforms, or else knowledge from other fields needed (e.g. graph theory). But $-family is proposing low-cost, easy to understand and  easy to  implement, yet high performing, gesture recognition approaches. It involves only simple geometric computations and straightforward internal representations. Furthermore, the algorithms are highly accessible through the published pseudo codes which developers can use for their own platforms.

References

[1]
Vikram Sharad, Lei Li, and Stuart Russell, "Writing and sketching in the air, recognizing and controlling on the fly.," in ACM Conference on Human Factors in Computing Systems (CHI), 2013.
[2]
Mingyu Chen, "Universal Motion Based Control," School of Electrical and Computer Engineering, Georgia Institute of Technology, PhD. Dissertation 2013.
[3]
Aditya G. Joshi, Darshana V. Kolte, Ashish V. Bandgar, Aakash S. Kadalak, and N. S. Patil, "Touchless Writer a hand gesture recognizer for Englsih characters," in Proceedings of 22nd IRF International Conference, Pune, India, January 2015.
[4]
Hiroyuki Ishida, Tomokazu Takahashi, Ide Ichiro, and Murase Hiroshi, "A Hilbert warping method for handwriting gesture recognition," Pattern Recognition, vol. 43, no. 0031-3203, pp. 2799-2806, August 2010.
[5]
Radu-Daniel Vatavu, Lisa Anthony, and Jacob O. Wobbrock, "Gestures as Point Clouds: A $P Recognizer for User Interface Prototypes," in Proceedings of the 14th ACM International Conference on Multimodal Interaction, Santa Monica, California, USA, 2012, pp. 273-280.







So which one should I use??? Stay Tuned...

Comments