An Efficient Context-dependent Lexical Information Detection using Word Embeddings and Deep Machine Learning Classifiers for Unstructured Textual Contents

Amit Shukla, Rajendra Gupta

An Efficient Context-dependent Lexical Information Detection using Word Embeddings and Deep Machine Learning Classifiers for Unstructured Textual Contents

Amit Shukla¹ , Rajendra Gupta²

Section:Research Paper, Product Type: Journal-Paper
Vol.11 , Issue.5 , pp.54-59, Oct-2023

Online published on Oct 31, 2023

Copyright © Amit Shukla, Rajendra Gupta . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Style Citation: Amit Shukla, Rajendra Gupta, “An Efficient Context-dependent Lexical Information Detection using Word Embeddings and Deep Machine Learning Classifiers for Unstructured Textual Contents,” International Journal of Scientific Research in Computer Science and Engineering, Vol.11, Issue.5, pp.54-59, 2023.

MLA Style Citation: Amit Shukla, Rajendra Gupta "An Efficient Context-dependent Lexical Information Detection using Word Embeddings and Deep Machine Learning Classifiers for Unstructured Textual Contents." International Journal of Scientific Research in Computer Science and Engineering 11.5 (2023): 54-59.

APA Style Citation: Amit Shukla, Rajendra Gupta, (2023). An Efficient Context-dependent Lexical Information Detection using Word Embeddings and Deep Machine Learning Classifiers for Unstructured Textual Contents. International Journal of Scientific Research in Computer Science and Engineering, 11(5), 54-59.

BibTex Style Citation:
@article{Shukla_2023,
author = {Amit Shukla, Rajendra Gupta},
title = {An Efficient Context-dependent Lexical Information Detection using Word Embeddings and Deep Machine Learning Classifiers for Unstructured Textual Contents},
journal = {International Journal of Scientific Research in Computer Science and Engineering},
issue_date = {10 2023},
volume = {11},
Issue = {5},
month = {10},
year = {2023},
issn = {2347-2693},
pages = {54-59},
url = {https://www.isroset.org/journal/IJSRCSE/full_paper_view.php?paper_id=3286},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.isroset.org/journal/IJSRCSE/full_paper_view.php?paper_id=3286
TI - An Efficient Context-dependent Lexical Information Detection using Word Embeddings and Deep Machine Learning Classifiers for Unstructured Textual Contents
T2 - International Journal of Scientific Research in Computer Science and Engineering
AU - Amit Shukla, Rajendra Gupta
PY - 2023
DA - 2023/10/31
PB - IJCSE, Indore, INDIA
SP - 54-59
IS - 5
VL - 11
SN - 2347-2693
ER -

140 Views

166 Downloads

53 Downloads

Bar Line

Abstract :
The term "context dependent" refers to a type of word representation that enables machine learning algorithms to distinguish words that have similar meanings. It is a feature learning technique that uses probabilistic models, dimension reduction, or neural networks on the word co-occurrence vector matrix to map words into real-number vectors. We address the problem of recognizing unstructured context-dependent lexical information in unstructured data containers in the research study. We investigate a method that employs word embedding for automatic context and relevant feature detection, as well as a deep neural network for classification. Using publicly accessible tweet and image datasets, we present an alternative model that use Conventional Machine Learning (CML) classifiers and a rule-based model. The proposed method outperforms the alternatives of earlier research. The CLID is analysed in terms of four aspects on the basis of Context-Centred Extraction of Concepts (CCEC). The proposed word embeddings method CCEC gives benefit from a neural-network methods ability to encode textual information by converting meaningful text information into numeric values.

Key-Words / Index Term :
Context-dependent Lexical Information, Word Embeddings, Deep ML Classifier, Unstructured Textual Contents

References :
[1] H. Mao, X. Shuai, A. Kapadia, “Loose Tweets: An Analysis of Privacy Leaks on Twitter”, in: Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society, in: WPES’11, Association for Computing Machinery, New York, NY, USA, pp. 1–12, 2021.
[2] T.B. Murdoch, A.S. Detsky, “The Inevitable Application of Big Data to Health Care”, JAMA 309 Vol. 13 pp. 1351–1352, 2021.
[3] J.-s. Park, G.-w. Kim, D.-h. Lee, “Sensitive Data Identification in Structured Data through Genner Model Based on Text Generation and NER”, in: Proceedings of the 2020 International Conference on Computing, Networks and Internet of Things, in: CNIOT2020, Association for Computing Machinery, New York, NY, USA, pp. 36–40, 2020.
[4] Z. Yang, Z. Liang, “Automated Identification of Lexical Data from Implicit user Specification”, Journal of Cybersecurity, Vol. 1, Issue 1, pp.12-13 2020.
[5] A.C. Islam, J. Walsh, R. Greenstadt, “Privacy Detective”, in: Proceedings of the 13th Workshop on Privacy in the Electronic Society - WPES ’14, 2019
[6] M. Keshavarz, M. Anwar, “The Automatic Detection of Lexical Data in Smart Homes”, IJPE-2020, pp. 404–416, 2019.
[7] L. Kopeykina, A.V. Savchenko, “Automatic Privacy Detection in Scanned Document Images Based on Deep Neural Networks”, in: 2019 International Russian Automation Conference (RusAutoCon), pp. 1–6, 2019.
[8] E. Myasnikov, A. Savchenko, “Detection of Lexical Textual Information in User Photo Albums on Mobile Devices”, Journal of Computing, pp. 0384–0390, 2019.
[9] R. Chow, P. Golle, J. Staddon, “Detecting Privacy Leaks using Corpus-based Association Rules”, in: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 08, 2018.
[10] P. Kamakshi, A.V. Babu, “Automatic Detection of Lexical Attribute in PPDM”, in: 2012 IEEE International Conference on Computational Intelligence and Computing Research, pp. 1–5, 2012.
[11] J. Akoka, I. Comyn-Wattiau, C.D. Mouza, H. Fadili, N. Lammari, E. Metais, S.S.-S. Cherfi, “A Semantic Approach for Semi-Automatic Detection of Lexical Data”, Information Resource Management, J. Vol. No. 27, Issue 4, pp.23–44, 2018.
[12] C.D. Mouza, E. Métais, N. Lammari, J. Akoka, T. Aubonnet, I. Comyn-Wattiau, H. Fadili, S.S.-S.d. Cherfi, “Towards an Automatic Detection of Lexical Information in a Database”, in: 2010 Second International Conference on Advances in Databases, Knowledge, and Data Applications, 2018.
[13] H. Heni, F. Gargouri, “Towards an Automatic Detection of Lexical Information in Mongo Database”, Advanced Intelligent System Computer Intelligent System Design Application, pp.138–146, 2019.
[14] L.Q. Trieu, T.-N. Tran, M.-K. Tran, M.-T. Tran, “Document Sensitivity Classification for Data Leakage Prevention with Twitter-Based Document Embedding and Query Expansion”, in: 2017 13th International Conference on Computational Intelligence and Security (CIS), pp. 537–542, 2017.
[15] J.M. Gómez-Hidalgo, J.M. Martín-Abreu, J. Nieves, I. Santos, F. Brezo, P.G. Bringas, “Data Leak Prevention through Named Entity Recognition”, in: 2010 IEEE Second International Conference on Social Computing, pp. 1129–1134, 2010
[16] H. Sak, A. Senior, F. Coise Beaufays, “Long Short-Term Memory based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition”, 2014.
[17] A. Khan, A. Sohail, U. Zahoora, A. Saeed, “A Survey of the Recent Architectures of Deep Convolutional Neural Networks”, Artificial Intelligent Review, Vol. 53 2020.
[18] Y. Zhang, B. Wallace, “A Sensitivity Analysis of Practitioners’, Convolutional Neural Networks for Sentence Classification, 2015.
[19] Ramya S., "Optimal Path Planning for Navigation Using a Generalized Genetic Algorithm," International Journal of Scientific Research in Computer Science and Engineering, Vol.9, Issue.5, pp.7-13, 2021
[20] J. Dhiviya Rose, Isha Mittal, Ramya Mihir, "Efficient and Simple Machine Learning-based Malware and Trojan Identification Tool," International Journal of Scientific Research in Computer Science and Engineering, Vol.10, Issue.2, pp.64-68, 2022.

Full Paper View Go Back

Main Menu

Journals Contents

Information

Download

Publication Certificate

Contact Us

Use full Link