Department of Computer Applications, Visvesvaraya Technological University, Angadi Institute of Technology & Management, Belgaum, Karnataka 590008, India. Email: [email protected]Avinash A Malawade
Department of Computer Applications, Visvesvaraya Technological University, Angadi Institute of Technology & Management, Belgaum, Karnataka 590008, India. Email: [email protected]Seema G Itagi
Department of Computer Applications, Visvesvaraya Technological University, Angadi Institute of Technology & Management, Belgaum, Karnataka 590008, India. Email: [email protected]
Visit for more related articles at International Journal of Advancements in Technology
OCR, pre-processing, image extraction and classification.
For the development of a high performance OCR algorithm has become essential. OCR research work has been undertaken by several researchers which aim at developing a high performance OCR algorithm.
The purpose behind an OCR is to identify and analyze a document image by dividing the page into line elements, further sub-dividing into words, and then into characters. These characters are compared with image patterns to predict the probable characters. Recognition of characters can be done either from printed documents or from hand written documents.
In particular, Kannada hand written OCR is more complicated than other related work. This is because Kannada numerals have more angles (curves). Challenges that researchers face during recognition process are due to the curve in the numerals and number of strokes and holes, sliding numerals, different writing styles so on.
The steps involved in character recognition comprise pre-processing, segmentation feature extraction and classification. There are three types of features, namely statistical, structural and hybrid can be analyzed there.
Researchers have come up with many approaches for the character recognition, however, many of them have surveyed in the paper. Apart from that, challenges and issues still prevailing in this even for future research has also been surveyed in these papers.
This paper is organized in the following manner; Preprocessing techniques are surveyed in section 2. Section 3 illustrates the various segmentation techniques available. In section 4 Feature extraction methods are explained. Various classification approaches are available explained in section 5. In section 6 the scope of our research in this area and conclusion.
There are so many numbers of tasks to be completed before performing character recognition. A hand written document must be scanned and converted into a suitable format for processing. Pre-processing consisting of a few types of sub processes to clean the document image and make it appropriate to carry the recognition process accurately. The sub process which gets involved in pre-processing are illustrated bellow:
ii. Noise reduction
iv. Skew correcting, thinning and slant removal
Binarizing in a method of transforming a gray scale image into a black and white image through thresholding . Another approach, Otsu’s method may be used to perform histogram based thresholding  to get binarized image automatically. Most researchers use thresholding concepts to extract the fore ground image from back ground image . Threshold value is fixed in this method by taking any value between two foreground gray code images. Histogram based thresholding approach can also be used to identify the local gray value contrast of image. This will help to extract text information from low quality documents.
Digital images are having tendency to many types of noises. Noise in a document image is due to poorly photocopied pages. Median filtering , Wiener Filtering method  and morphological operations can be performed to remove noise. Intensity of the character images are replaced using Median filters . Whereas images are smoothened using Gaussian filters .
Normalization is the process of converting random sized images into a standard size. The Bicubic interpolation , linear sized normalization  and Java Image Class  normalization techniques could be used for standard sized images. The Roi-Extraction  method is used to get the single structural element from the image. In many works, input images are normalized to a size 40 * 40 after finding the bounding box of each hand written image for razing processing.
Skew Correction, Thinning and Slant Removal
Thinning is a pre-processor which results in single pixel width image to recognize the hand written character easily. It is applied repeatedly leaving only pixel wide linear representations of the image character. Cumulative Scalar product (CSP) of windows text lock with Gabor filters has been used for thinning purpose. Morphology based thinning algorithm  and other thinning algorithms  has also been used for better symbol representation and to thin the character images. Skeletonization is the process of shedding off a pattern to as many pixels as possible without affecting the general shape of the pattern. Skew is inevitably introduced into the incoming document image during document scanning. Fourier spectrum , normalization  techniques are used for correction of the slant, angle stroke, width and vertical scaling.
Segmentation process is used to split the document images into lines, words and characters. Segmentation of the handwritten document is more complex than type written documents. Histogram profiles and connected components analysis  are used for line segmentation. In this segmentation process, paragraph space has been checked for identifying paragraphs. Image’s histogram is used to detect the horizontal line’s width . Special space detection technique has been used for word segmentation. Histogram method is used to detect the both width of the words  and also to convert the image to glyph .
The vertical histogram profile methods   is used to find spacing within the lines to identify the word boundary. Region probe algorithm  is used to get individual characters from the image. Modified cross counting techniques , histogram profile  and connected component analysis are also found in the segmentation problem.
There are three classes of feature extraction namely statistical features, Structural features and the hybrid features. Quantitative measurements are used in stasticale feature technique, where as structural techniques use qualitative measurements for feature extraction. In hybrid approach, these two techniques are combined and used for recognition.
Scale invariant feature Transformation (SIFT) is used to transfer the character image into a set of local features. 128 dimensions of SIFE features re identified from the character image. An image is converted to two tone image, and then converted into frame. The frame point obtained from the frame will process the vectors. The normalized feature vector (NFV) obtains the prototype from vector.
In the zone based method, pixel destiny is calculated for each zone. Then the pixel destiny is used for representing the features. The height and width of character pixels are counted using the encoding Binary variation approach. The process halts, when the top level of row and width is reached. A feature is extracted from it and a binary flag is set as per the approach.
All images are scaled to the same height and width using bilinear interpolation technique . Sobel edge detection algorithm  is used to correct the unwanted portions. Octal graphs  are used to derive structural features like end point, holes, length, shape and curvature of individual stroke.
Boundaries of the images are traced using “eight-neighbor” adjustment method. The approach scans until it finds the boundary of image. Then, the Fourier description  is used to find the co-efficient and obtain the total number of boundaries. This number of invariant descriptors is given as important to a natural network for future classification.
Hough transform  is used to detect the horizontal and vertical lines. They have been analyzing branch and position using another algorithm. Bilinear interpolation  is used to extract the features such as slant and strip.
The extracted features are given as the input to the classification process. There are some approaches are used to classify the character features in the existing systems such as K-nearest Neighbor approach, fuzzy system, neural network and so on. For all these approaches, a bag of key points extracted from the feature extraction approaches are used for classification.
Maximum research work exists in the survey for Kannada Handwritten numeral recognition. However, there is no standard solution to identify all kannada numerals with reasonable accuracy. Different approaches has been used in each phase of recognition process, where as each approach provides solution only for few numeral sets. Challenges still prevails in the recognition of normal as abnormal writing, slanting numerals, similar shaped numerals, curves and so on during recognition process.
The following key challenges can be further explored in my future research work.
• Curves in the Kannada numeral
• Significant variation in writing styles.
• Difficulties faced in viewing angles, shadows and unique fonts.
Authors are grateful to Dr.R.S.Hegadi, Solapur University, Solapur for his valuable guidance and also grateful to Dr. Nagaraja, Director of Angadi Institute of technology and Management, Belgaum for his constant support for researchers.
 Shanthi N and Duraiswami K, “Performance comparison of different image size for recognizing unconstrained handwritten Tamil character using SVM”, Journal of Computer Science vol-3(9): page (3) 760-764, 2007
 Jagadeesh Kumar R, Prabhakar R and Suresh R.M, “Off-line cursive handwritten Tamil characters recognition”, International Conferences on Security Technology, page(s): 159-164, 2008
 Shanthi N and Duraiswami K, “A novel SVM based handwritten Tamil character recognition system”, Springer, Pattern Analysis &Applications, Vol-13, No.2, 173-180, 2010
 Ramanathan R, Ponmathavan S, Thaneshwaran L, Arun S. Nair and Valliappan N, “Tamil font recognition using Gabor and support vector machines”, International Conference on Advances in Computing, Control & Telecommunication Technologies, page(s):613-615, 2009
 Sigappi A.N, Palanivel S and Ramalingam V, “Handwritten document retrieval system for Tamil language”, Int. J of Computer Application, vol-31, 2011
 Suresh Kumar C and Ravichandran T, “Handwritten Tamil character recognition using RCS algorithms”, Int. J. of Computer Applications,(0975-8887) volume-8-no.8, October 2010
 Bremananth R and Prakash A, “Tamil numerals identification”, International Conference on Advances in Recent Technologies in Communication and Computing, page(s):620-622, 2009
 Stuti Asthana, Farha Haneef and Rakesh K Bhujade, “Handwritten multiscript numeral recognition using artificial neural networks”, Int. J. of Soft Computing and Engineering ISSN:2231-2307, volume-1, Issue-1, March 2011
 Sutha J and RamaRaj N, “Neural network based offline Tamil handwritten character recognition system”, International Conference on Computational Intelligence and Multimedia vol: 2, pages: 446-450, 2007
 Rajashekararadhya S.V and Vanaja Ranjan P, “Efficient zone based feature extraction algorithm for handwritten numeral recognition of four popular south Indian scripts”. Int. J. of Theoretical and Applied Information Technology, pages: 1171-1181, 2008
 Paulpandian T and Ganpathy V, “Translation and scale invariant recognition of handwritten Tamil characters using hierarchical neural networks”, Circuits and Systems, IEEE Int. Sym., vol.4, 2439-2441, 1993
 Rajashekararadhya S.V, Vanaja Ranjan P and Manjunath Aradhya V.N “Isolated handwritten Kannada and Tamil numeral recognition a novel approach”, First International Conference on Emerging Trends in Engineering and Technology, page(s): 1192-1195, 2008
 Subashini A and Kodikara N.D, “A novel SIFE based codebook generation for handwritten Tamil character recognition”, 6th IEEE Int. Conf. on Industrial and Information Systems (ICIIS), page(s):261-264, 2011
 Venkatesh J and Suresh Kumar C, “Tamil handwritten character recognition using Kohonon’s self organizing map”, Int. J. of Computer Science and Network Security, Vol.9 No.12, Dec 2009
 Jagadeesh Kumar R and Prabhakar R, “Accuracy augmentation of Tamil OCR using algorithm fusion”, Int. J. of Computer Science and Network Security, VOL.8 No5, May 2008
 Bhattacharya U, Ghosh S.K and Parui, “A two stage recognition scheme for handwritten Tamil characters”, Ninth International Conference on Document Analysis and Recognition, Vol: 1 page(s):511-515, 2007
 Suresh R.M, “Printed and handwritten Tamil characters recognition using Fuzzy technique”, Pro. Of the Int. Multi Conference of Engineers and Computer Scientists, vol 1, 19-2, March, 2008
 Sarveswaran K and Ratnaweera, “An adaptive technique for handwritten Tamil character recognition”, International Conference on Intelligent and Advanced Systems, page(s):151-156, 2007
 Indra Gandhi R and Iyakutti K, “An attempt to recognize handwritten Tamil character using Kohonen SOM”, Int. J. of Advance d Networking and Applications, Volume: 01 Issue: 03 ages: 188-192, 2009
 Banumathi P and Nasira G.M, “Handwritten Tamil character recognition using artificial neural networks”, International Conference on Process Automation, Control and Computing (PACC), page(s): 1-5, 2011
 Jagadeesh Kumar R and Prabhakar R, “An improved handwritten Tamil character recognition system using octal graph”, Int. J. of Computer Science, ISSN 1549-3636, Vol 4 (7): 509-516, 2008
 Akshay Apte and Harshad Gado, “Tamil character recognition using structural features”, 2010
 Hewavitharana S and Fernando H.C, “A two stage classification approach to Tamil handwritten recognition”, Tamil Internet, California, USA, 2002
 S.V Rajashekaradhya and Dr. P. Vanaja Ranjan, “Handwritten numeral/Mixed numeral recognition of south Indian scripts: The zone based feature extraction method”, Journal of Theoretical and Applied Information Technology, page(s)63-79, Vol:7.No.1,2009
 B.V Dhandra, Mallikarjun Hangarge ang Gururaj Mukarambi, “Spatial features for multi font/font size Kannada numerals and vowels recognition”
 B.V Dhandra, R.G Benne and Mallikarjun Hangarge, “A single euler number feature for multi-font multi-size Kannada numeral recognition”, Recent Trends in Information Technology(RTIT-2009), pp101-106
 B.V Dhandra, R.G Benne and Mallikarjun Hangarge, “Multi-font multi-size Kannada numerals recognition based on structural features”, Emerging Trends in information Technology, page(s)193-199, 2007
 B.V Dhandra, R.G Benne and Mallikarjun Hangarge, “Printed and handwritten Kannada numerals recognition using directional stroke and directional density with KNN”, International Journal of Machine Intelligence (IJMI), pp121-125, Volume: 3, Issue: 3, 2011
 B.V Dhandra, Gururaj Mukarambi and Mallikarjun Hangarge, “A script independent approach for handwritten bilingual Kannada and Telugu digits recognition”, International Journal of Machine Intelligence (IJMI), pp155-159, Volume: 3, Issue: 3, 2011
 B.V Dhandra, Gururaj Mukarambi and Mallikarjun Hangarge, “Zone based features for handwritten and printed mixed Kannada digits recognition”, International Conference on VLSI, Communication and Instrumentation (ICVCI), pp5-9, 2011
 G.G Rajput, Rajeswari Horakeri, Sidramappa Chandrakant, “Printed and handwritten mixed Kannada numerals recognition using SVM”, International Journal on Computer Science and Engineering (IJCSE), vol: 2, No.5, pp1622-1626, 2010
 Ashoka H.N, Manjaiah D.H and Rabindranath Bera,”Zone based feature extraction and statistical classification technique for Kannada handwritten numeral recognition”, International Journal on Computer Science and Engineering (IJCSE), Vol: 3, No.10, pp476-482, 2012
 K.S Prassana Kumar,” Optical character recognition (OCR) for Kannada numerals using left bottom 1/4th segment minimum feature extraction”, Int. Journal of Computer Technology and Application, Vol: 3(1), pp 221-225, 2012
 S.V Rajashekararadhya and P. Vanaja Ranjan, “Handwritten numeral recognition of Kannada script”, Proceedings of the International Workshop on Machine Intelligence Research, pp 80-86, 2009
 B.V Dhandra, Gururaj Mukarambi and Mallikarjun Hangarge, “Kannada and English numeral recognition system”, International Journal of Computer Applications (0975-8887), Vol: 26, No.9, pp 17-22, 2011
 B.V Dhandra, R.G Benne and Mallikarjun Hangarge, “Kannada, Telugu and Devanagiri handwritten numeral recognition with probabilistic neural network: A script independent approach”, International Journal of Computer Applications (0975-8887), Vol: 26, No.9, pp 11-16, 2011
 G.G Rajput, Rajeswari Horakeri, Sidramappa Chandrakant, “Printed and handwritten Kannada numeral recognition using crack codes and Fourier descriptors plate”, IJCA Special issue on Recent Trends in Image Processing and Pattern Recognition, pp 53-58, 2010
 B.V Dhandra, R.G Benne and Mallikarjun Hangarge, “Kannada, Telugu and Devanagiri handwritten numeral recognition with probabilistic neural network: A novel approach”, IJCA Special issue on Recent Trends in Image Processing and Pattern Recognition, pp 83-88, 2010
 Dinesh Acharya U, N.V Subba Reddy and Krishnamoorthi Makkithaya,” Multilevel classifiers in recognition of handwritten Kannada numerals”, World Academy of Science, Engineering and Technology, pp 278-283, 2008