Full Paper View Go Back

Advancements in Optical Character Recognition (OCR) for India Scripts: A Review

Jagin M. Patel1 , Bharat C. Patel2 , Manish M. Kayasth3

Section:Review Paper, Product Type: Journal-Paper
Vol.7 , Issue.1 , pp.23-29, Feb-2019


Online published on Feb 28, 2019


Copyright Β© Jagin M. Patel, Bharat C. Patel, Manish M. Kayasth . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
 

View this paper at   Google Scholar | DPI Digital Library


XML View     PDF Download

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Jagin M. Patel, Bharat C. Patel, Manish M. Kayasth, β€œAdvancements in Optical Character Recognition (OCR) for India Scripts: A Review,” International Journal of Scientific Research in Computer Science and Engineering, Vol.7, Issue.1, pp.23-29, 2019.

MLA Style Citation: Jagin M. Patel, Bharat C. Patel, Manish M. Kayasth "Advancements in Optical Character Recognition (OCR) for India Scripts: A Review." International Journal of Scientific Research in Computer Science and Engineering 7.1 (2019): 23-29.

APA Style Citation: Jagin M. Patel, Bharat C. Patel, Manish M. Kayasth, (2019). Advancements in Optical Character Recognition (OCR) for India Scripts: A Review. International Journal of Scientific Research in Computer Science and Engineering, 7(1), 23-29.

BibTex Style Citation:
@article{Patel_2019,
author = {Jagin M. Patel, Bharat C. Patel, Manish M. Kayasth},
title = {Advancements in Optical Character Recognition (OCR) for India Scripts: A Review},
journal = {International Journal of Scientific Research in Computer Science and Engineering},
issue_date = {2 2019},
volume = {7},
Issue = {1},
month = {2},
year = {2019},
issn = {2347-2693},
pages = {23-29},
url = {https://www.isroset.org/journal/IJSRCSE/full_paper_view.php?paper_id=3138},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.isroset.org/journal/IJSRCSE/full_paper_view.php?paper_id=3138
TI - Advancements in Optical Character Recognition (OCR) for India Scripts: A Review
T2 - International Journal of Scientific Research in Computer Science and Engineering
AU - Jagin M. Patel, Bharat C. Patel, Manish M. Kayasth
PY - 2019
DA - 2019/02/28
PB - IJCSE, Indore, INDIA
SP - 23-29
IS - 1
VL - 7
SN - 2347-2693
ER -

419 Views    299 Downloads    83 Downloads
  
  

Abstract :
Optical character recognition (OCR) has been the subject of extensive study, and during the past few decades, many publications have been written about it. India is a linguistically diverse country with numerous scripts, including Devanagari, Bengali, Tamil, Gujarati and many more. There are now many commercial OCR systems on the market. However, the majority of these systems support English, Chinese, Japanese, etc. Despite India having many major scripts, there are not many studies on character identification in the Indian language. This review discusses recent research and developments in OCR techniques for Indian scripts, focusing on both printed and handwritten text recognition.

Key-Words / Index Term :
OCR survey, Indian script OCR

References :
[1] Pal, Umapada, and B. B. Chaudhuri. "Indian script character recognition: a survey." pattern Recognition 37, no. 9 (2004): 1887-1899.
[2] A.K. Dutta, "A generalized formal approach for description and analysis for major Indian scripts", J. Inst. Telecom. Eng. 30, pp. 155–161, 1984.
[3] B.B. Chaudhuri, U. Pal, "A complete printed Bangla OCR system", Pattern Recognition 31, pp. 531–549, 1998.
[4] U. Bhattacharya, T.K. Das, A. Datta, S.K. Parui, B.B. Chaudhuri, "A hybrid scheme for hand printed numeral recognition based on a self-organizing network and MLP classifiers", Int. J. Pattern Recognition Artif. Intell. 16, pp. 845–864, 2002.
[5] B.B. Chaudhuri, U. Pal, "Relational studies between phoneme and grapheme statistics in current Bangla", J. Acoust. Soc. India 23, pp. 67–77, 1995.
[6] Basu, Subhadip, Nibaran Das, Ram Sarkar, MahantapasKundu, MitaNasipuri, and Dipak Kumar Basu. "A hierarchical approach to recognition of handwritten Bangla characters." Pattern Recognition 42, no. 7, pp. 1467-1484, 2009.
[7] Pal, U., and N. Tripathy. "A contour distance-based approach for multi-oriented and multi-sized character recognition." Sadhana 34, pp. 755-765, 2009.
[8] U. Pal, P.K. Kundu, B.B. Chaudhuri, "OCR error correction of an Inflectional Indian language using morphological parsing", J. Inform. Sci. Eng. 16, pp. 903–922, 2000.
[9] U. Pal, B.B. Chaudhuri, "Automatic recognition of unconstrained on-line Bangla hand-written numerals", Advances in Multimodal Interfaces, Springer Verlag Lecture Notes on Computer Science (LNCS-1948), pp. 371–378, 2000.
[10] U. Pal, S. Datta, "Segmentation of Bangla unconstrained handwritten text", in: Proceedings of the Seventh International Conference on Document Analysis and Recognition, pp. 1128–1132, 2003.
[11] A.K. Dutta, S. Chaudhuri, "Bengali alpha-numeric character recognition using curvature features", Pattern Recognition 26, pp. 1757–1770, 1993.
[12] Bhattacharya, Ujjwal, MalayappanShridhar, and Swapan K. Parui. "On recognition of handwritten Bangla characters." In Computer Vision, Graphics and Image Processing: 5th Indian Conference, ICVGIP 2006, Madurai, India, pp. 817-828, December 13-16, 2006.
[13] K. Ray, B. Chatterjee, "Design of a nearest neighbor classifier system for Bengali character recognition", J. Inst. Electron. Telecom. Eng. 30, pp. 226–229, 1984.
[14] ShamikSural, P.K. Das, "An MLP using Hough transform based fuzzy feature extraction for Bengali script recognition", Pattern Recognition Lett. 20, pp. 771–782, 1999.
[15] Kubatur, Shruthi, Maher Sid-Ahmed, and Majid Ahmadi. "A neural network approach to online Devanagari handwritten character recognition." In 2012 International conference on high performance computing & simulation (HPCS), pp. 209-214. IEEE, 2012.
[16] K. Keeni, Shimodaira, Hiroshi, Nishino, Tetsuro, Tan, Yasuo, "Recognition of Devnagari characters using neural networks", IEICE Trans. Inform. Systems 5, pp. 523–528, 1996.
[17] R.R. Karnik, "Identifying Devnagari characters", Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 669–672, 1999.
[18] Pal, Umapada, Nabin Sharma, Tetsushi Wakabayashi, and Fumitaka Kimura. "Off-line handwritten character recognition of devnagari script." In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 1, pp. 496-500. IEEE, 2007.
[19] Singh, Pratibha, Ajay Verma, and Narendra S. Chaudhari. "Deep convolutional neural network classifier for handwritten Devanagari character recognition." In Information Systems Design and Intelligent Applications: Proceedings of Third International Conference INDIA 2016, Volume 2, pp. 551-561. Springer India, 2016.
[20] K. Sethi, B. Chatterjee, "Machine recognition of constrained hand-printed Devnagari", Pattern Recognition 9, pp. 69–76, 1977.
[21] R.M.K. Sinha, "Rule based contextual post-processing for Devnagari text recognition", Pattern Recognition 20, pp. 475–485, 1987.
[22] R.M.K. Sinha, "Role of context in Devnagari script recognition", J. Inst. Electron. Telecom Eng. 33, pp. 87–91, 1987.
[23] Singh, Raghuraj, C. S. Yadav, PrabhatVerma, and VibhashYadav. "Optical character recognition (OCR) for printed devnagari script using artificial neural network." International Journal of Computer Science & Communication 1, no. 1, pp. 91-95, 2010.
[24] R. Bajaj, L. Dey, S. Chaudhury, "Devnagari numeral recognition by combining decision of multiple connectionist classifier", Sadhana 27, pp. 59–72, 2002.
[25] Dhurandhar, Amit, KartikShankarnarayanan, and RakeshJawale. "Robust pattern recognition scheme for Devanagari script." In Computational Intelligence and Security: International Conference, CIS 2005, Xi’an, China, December 15-19, Proceedings Part I, pp. 1021-1026, 2005.
[26] V. Bansal, R.M.K. Sinha, "Partitioning and searching dictionary for correction of optically read Devnagari characters strings", Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 653–656, 1999.
[27] V. Bansal, R.M.K. Sinha, "On how to describe shapes of Devnagari characters and use them for recognition", Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 410–413, 1999.
[28] V. Bansal, R.M.K. Sinha, "Integrating knowledge sources in Devnagari text recognition system", IEEE Trans. Systems Man Cybern. Part A: Systems Humans 30, pp.500–505, 2000.
[29] V. Bansal, R.M.K. Sinha, "Segmentation of touching and fused Devnagari characters", Pattern Recognition 35, pp. 875–893, 2002.
[30] K. Sethi, B. Chatterjee, "Machine recognition of constrained hand-printed Devnagari numerals", J. Inst. Electron. Telecom. Eng. 22, pp. 532–535, 1976.
[31] S.D. Connell, R.M.K. Sinha, A.K. Jain, "Recognition of unconstrained on-line Devanagari characters", Proceedings of the International Conference on Pattern Recognition, Vol. II, pp. 368–371, 2000.
[32] B.B. Chaudhuri, U. Pal, M. Mitra, "Automatic Recognition of Printed Oriya Script", Sadhana 27, pp. 23–34, 2002.
[33] S. Mohanti, "Pattern recognition in alphabets of Oriya language using Kohonen neural network", Int. J. Pattern Recogn. Artif. Intell. 12, pp. 1007–1015, 1998.
[34] G. Siromony, R. Chandrasekaran, M. Chandrasekaran, "Computer recognition of printed Tamil characters", Pattern Recognition 10, pp. 243–247, 1978.
[35] S. Sundaresan, S.S. Keerthi, "A study of representations for pen based hand writing recognition of Tamil characters", Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 422–423, 1999.
[36] T.V. Ashwin, P.S. Sastry, "A font and size independent OCR system for printed Kannada documents using support vector machines", Sadhana 27, pp. 35–58, 2002.
[37] R.S. Rao, R.D. Sudhaker Samuel, "On-line character recognition for handwritten Kannada characters using Wavelet features and Neural classifier", IETE Journal of Research 46, no. 5, pp. 387-393, 2000.
[38] A. Negi, Chakravarthy, Bhagvati, B. Krishna, "An OCR system for Telugu", in: Proceedings of the Sixth International Conference on Document Processing, pp. 1110–1114, 2001.
[39] Jayaraman, Anitha, C. Chandra Sekhar, and V. SrinivasaChakravarthy. "Modular approach to recognition of strokes in Telugu script." In Ninth international conference on document analysis and recognition (ICDAR 2007), vol. 1, pp. 501-505. IEEE, 2007.
[40] S.N.S. Rajasekaran, B.L. Deekshatulu, "Recognition of printed Telugu characters", Comput. Graphics Image Process. 6, pp. 335–360, 1977.
[41] R. Sukhaswami, P. Seetharamulu, A.K. Pujari, "Recognition of Telugu characters using Neural networks", International journal of neural systems 6, no. 03, pp. 317-357, 1995
[42] S. Antani, L. Agnihotri, "Gujarati character recognition", Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 418-421. IEEE, 1999
[43] Sharma, Anuj, Rajesh Kumar, and R. K. Sharma. "Online handwritten Gurmukhi character recognition using elastic matching." In 2008 congress on image and signal processing, vol. 2, pp. 391-396. IEEE, 2008.
[44] G.S. Lehal, C. Singh, "Feature extraction and classification for OCR of Gurmukhi script", Vivek 12, pp. 2–12, 1999.
[45] G.S. Lehal, C. Singh, "A post processor for Gurmukhi OCR", Sadhana 27, pp. 99–111, 2002.
[46] G.S. Lehal, C. Singh, "Text segmentation of machine printed Gurmukhi script", Document Recognition and Retrieval VIII, Proceedings SPIE, USA, Vol. 4307, pp. 223–231, 2001.
[47] Lehal, G. S., and Chandan Singh. "A technique for segmentation of Gurmukhi text." In Computer Analysis of Images and Patterns: 9th International Conference, CAIP 2001 Warsaw, Poland, September 5–7, 2001 Proceedings 9, pp. 191-200. Springer Berlin Heidelberg, 2001.
[48] G.S. Lehal, C. Singh, R. Lehal, "A shape based post processor for Gurmukhi OCR", Proceedings of the Sixth International Conference on Document Analysis and Recognition, Seattle, pp. 1105–1109, 2001.
[49] G.S. Lehal, C. Singh, "A Gurmukhi script recognition system", in: Proceedings of the 15th International Conference on Pattern Recognition, Vol. 2, pp. 557–560, 2000.
[50] Bhattacharya, Ujjwal, Swapan Kumar Parui, and SrikantaMondal. "Devanagari and bangla text extraction from natural scene images." In 2009 10th International Conference on Document Analysis and Recognition, pp. 171-175. IEEE, 2009.
[51] B.B. Chaudhuri, U. Pal, "Skew angle detection of digitized Indian Script documents", IEEE Trans. Pattern Anal. Mach. Intell. 19, pp. 182–186, 1997.
[52] B.B. Chaudhuri, U. Pal, "An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi)", Proceedings of the Fourth International Conference on Document Analysis and Recognition, pp. 1011–1016, 1997.
[53] U. Garain, B.B. Chaudhuri, "Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis", IEEE Trans. Systems Man Cybern. Part C-32, pp. 449–459, 2002.
[54] U. Garain, B.B. Chaudhuri, T.T. Pal, "Online handwritten Indian script recognition: a human motor function based framework", in: Proceedings of the 16th International Conference on Pattern Recognition, Vol. 3, pp. 164–167, 2002.
[55] U. Pal, B.B. Chaudhuri, "Automatic separation of words in Indian multi-lingual multi-script documents", in: Proceedings of the Fourth International Conference on Document Analysis and Recognition, pp. 576–579, 1997.
[56] U. Pal, B.B. Chaudhuri, "Script line separation from Indian multi-script documents", in: Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 406–409, 1999.
[57] U. Pal, M. Mitra, B.B. Chaudhuri, "Multi-skew detection of Indian script documents", in: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 292–296, 2001.
[58] Sethi, Ishwar K., and B. Chatterjee. "Machine recognition of constrained hand printed Devanagari." Pattern recognition 9, no. 2, pp. 69-75, 1977.
[59] R.M.K. Sinha, H. Mahabala, "Machine recognition of Devnagari script", IEEE Trans. Systems Man Cybern. 9, pp. 435–441, 1979.
[60] M. Hanmandlu and O.V. Ramana Murthy, "Fuzzy Model Based Recognition of Handwritten Hindi Numerals", In Proc. Intl. Conf. on Cognition and Recognition, pp. 490-496, 2005.
[61] U. Bhattacharya, S. K .Parui, B. Shaw, K. Bhattacharya, "Neural combination of ANN and HMM for handwritten Devanagri Numeral Recognition", In Tenth international workshop on frontiers in handwriting recognition, pp.613-618, 2006
[62] Acharya, Shailesh, Ashok Kumar Pant, and Prashnna Kumar Gyawali. "Deep learning based large scale handwritten Devanagari character recognition." In 2015 9th International conference on software, knowledge, information management and applications (SKIMA), pp. 1-6. IEEE, 2015.
[63] N. Sharma, U. Pal, F. Kimura and S. Pal, "Recognition of Offline Handwritten Devanagri Characters using Quadratic Classifier", In Proc. Indian Conference on Computer Vision Graphics and Image Processing, pp- 805-816, 2006.
[64] Arora, Sandhya, DebotoshBhattacharjee, MitaNasipuri, Dipak Kumar Basu, and MahantapasKundu. "Combining multiple feature extraction techniques for handwritten Devnagari character recognition." In 2008 IEEE Region 10 and the Third international Conference on Industrial and Information Systems, pp. 1-6. IEEE, 2008.
[65] K. Jaynathi, A.Suzuki, H. Kanai,Y. Kawazoe, M. Kimura, K. Kido, "Devanagari Character Recognition Using Structure Analysis", IEEE Trans, pp 363-366,1989.
[66] U. Pal, S. Chanda, T. Wakabayashi and F. Kimura, "Accuracy Improvement of Devnagari Character Recognition Combining SVM and MQDF", In Proc. 11th ICFHR, pp.367-372, 2008.
[67] P. Deshpande, L. Malik, S. Arora, "Character Recognition with Histogram Band Analysis of Encoded String and Neural Network", Proceedings of the 4th WSEAS Int. Conf. on Information Security, Communications and Computers, pp354-359, December 16-18, 2005.
[68] Joshi, Niranjan, G. Sita, A. G. Ramakrishnan, V. Deepu, and SriganeshMadhvanath. "Machine recognition of online handwritten Devanagari characters." In Eighth International Conference on Document Analysis and Recognition (ICDAR`05), pp. 1156-1160. IEEE, 2005.
[69] B. Shaw, S. K. Parui and M. Shridhar, "Off-line Handwritten Devanagari Word Recognition: A Segmentation Based Approach", IEEE ,2008.
[70] Karayil, Tushar, Adnan Ul-Hasan, and Thomas M. Breuel. "A segmentation-free approach for printed Devanagari script recognition." In 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 946-950. IEEE, 2015.
[71] S.K. Parui, B.B. Chaudhuri, D. Dutta Majumder, "A procedure for recognition of connected hand written numerals", Int. J. Systems Sci. 13, pp. 1019–1029, 1982.
[72] U. Garain, B.B. Chaudhuri, "Compound character recognition by run-number-based metric distances", SPIE Proc. 3305, pp. 90–97, 1996.
[73] A.F.R. Rahman, M. Kaykobad, "A complete Bengali OCR: a novel hybrid approach to handwritten Bengali character recognition", J. Comput. Inform. Technol. 6, pp.395–413, 1998.
[74] A.F.R. Rahman, R. Rahman, M.C. Fairhurst, "Recognition of handwritten Bengali characters: a novel multistage approach", Pattern Recognition 35, pp. 997–1006, 2002.
[75] P. Chinnuswamy, S.G. Krishnamoorty, "Recognition of hand-printed Tamil characters", Pattern Recognition 12, pp. 141–152, 1980.
[76] Shanthi, N., and K. Duraiswamy. "A novel SVM-based handwritten Tamil character recognition system." Pattern Analysis and Applications 13, pp.173-180, 2010.
[77] Jomy, John, K. V. Pramod, and BalakrishnanKannan. "Handwritten character recognition of south Indian scripts: a review." arXiv preprint arXiv:1106.0107 (2011).
[78] S. Sundaresan, S.S. Keerthi, "A study of representations for pen based hand writing recognition of Tamil characters", Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 422–423, 1999.
[79] Dholakia, Jignesh, AtulNegi, and S. Rama Mohan. "Progress in Gujarati document processing and character recognition." Guide to OCR for Indic Scripts: Document Recognition and Retrieval, pp. 73-95, 2010.
[80] Dholakia, Jignesh, ArchitYajnik, and AtulNegi. "Wavelet feature based confusion character sets for Gujarati script." In International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), vol. 2, pp. 366-370, 2007.
[81] Desai, Apurva A. "Gujarati handwritten numeral optical character reorganization through neural network." Pattern recognition 43, no. 7, 2582-2589, 2010.
[82] Patel Jagin and Desai Apurva. A, "A comparison of four edge detection methods for identifying Gujarati Numerals from images", VNSGU Journal of Science & Technology, Vol. 3 issue:2, 113-124, ISSN: 0975-5446, 2012.
[83] Patel Jagin and Desai Apurva. A, "Segmentation and Recognition of Gujarati Printed Numerals from Image", International Journal of Engineering Research & Technology, Vol. 3 - Issue 2, pp. 1012-1020, February – 2014.
[84] Chaudhari, Shailesh A., and Ravi M. Gulati. "An OCR for separation and identification of mixed Englishβ€”Gujarati digits using kNN classifier." In 2013 International Conference on Intelligent Systems and Signal Processing (ISSP), pp. 190-193, 2013.
[85] Chaudhari, Shailesh, and Ravi M. Gulati. "Script identification using Gabor feature and SVM classifier." Procedia Computer Science 79 (2016): pp. 85-92, 2016.
[86] Desai, Apurva A. "Support vector machine for identification of handwritten Gujarati alphabets using hybrid feature space." CSI transactions on ICT 2, no. 4, pp. 235-241, 2015.
[87] Lakshmi, C. Vasantha, and C. Patvardhan. "An optical character recognition system for printed Telugu text." Pattern analysis and applications 7 (2004): 190-204.
[88] SanjeevKunte, R., and R. D. Sudhaker Samuel. "A simple and efficient optical character recognition system for basic symbols in printed Kannada text." Sadhana 32, no. 5 (2007): 521.
[89] Niranjan, S. K., Vijaya Kumar, and Hemantha Kumar. "FLD based unconstrained handwritten Kannada character recognition." In 2008 Second International Conference on Future Generation Communication and Networking Symposia, vol. 3, pp. 7-10. IEEE, 2008.
[90] Jindal, Manish Kumar, Rajendra Kumar Sharma, and Gurpreet Singh Lehal. "A study of different kinds of degradation in printed Gurmukhi script." In 2007 International Conference on Computing: Theory and Applications (ICCTA`07), pp. 538-544. IEEE, 2007.
[91] Jindal, Manish Kumar, Rajendra Kumar Sharma, and Gurpreet Singh Lehal. "Structural features for recognizing degraded printed Gurmukhi script." In Fifth International Conference on Information Technology: New Generations (Itng 2008), pp. 668-673. IEEE, 2008.

Authorization Required

 

You do not have rights to view the full text article.
Please contact administration for subscription to Journal or individual article.
Mail us atΒ  support@isroset.org or view contact page for more details.

Go to Navigation