Full Paper View Go Back

Outlier Detection Based on Clustering Over Sensed Data Using Hadoop

V. Jain1

Section:Research Paper, Product Type: Isroset-Journal
Vol.1 , Issue.2 , pp.45-50, Mar-2013


Online published on Apr 30, 2013


Copyright © V. Jain . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
 

View this paper at   Google Scholar | DPI Digital Library


XML View     PDF Download

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: V. Jain, “Outlier Detection Based on Clustering Over Sensed Data Using Hadoop,” International Journal of Scientific Research in Computer Science and Engineering, Vol.1, Issue.2, pp.45-50, 2013.

MLA Style Citation: V. Jain "Outlier Detection Based on Clustering Over Sensed Data Using Hadoop." International Journal of Scientific Research in Computer Science and Engineering 1.2 (2013): 45-50.

APA Style Citation: V. Jain, (2013). Outlier Detection Based on Clustering Over Sensed Data Using Hadoop. International Journal of Scientific Research in Computer Science and Engineering, 1(2), 45-50.

BibTex Style Citation:
@article{Jain_2013,
author = {V. Jain},
title = {Outlier Detection Based on Clustering Over Sensed Data Using Hadoop},
journal = {International Journal of Scientific Research in Computer Science and Engineering},
issue_date = {3 2013},
volume = {1},
Issue = {2},
month = {3},
year = {2013},
issn = {2347-2693},
pages = {45-50},
url = {https://www.isroset.org/journal/IJSRCSE/full_paper_view.php?paper_id=326},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.isroset.org/journal/IJSRCSE/full_paper_view.php?paper_id=326
TI - Outlier Detection Based on Clustering Over Sensed Data Using Hadoop
T2 - International Journal of Scientific Research in Computer Science and Engineering
AU - V. Jain
PY - 2013
DA - 2013/04/30
PB - IJCSE, Indore, INDIA
SP - 45-50
IS - 2
VL - 1
SN - 2347-2693
ER -

4324 Views    4262 Downloads    4209 Downloads
  
  

Abstract :
Outliers are regarded as noisy data in statistics, has turned out to be an important problem which is being researched in diverse fields of research and application domains. Many outlier detection techniques have been developed specific to certain application domains, while some techniques are more generic. Outlier detection aims to find patterns in data that do not conform to expected behaviour. It has extensive use in a wide variety of applications such as military surveillance for enemy activities, intrusion detection in cyber security, fraud detection for credit cards, insurance or health care and fault detection in safety critical systems. In our work, we investigate that there is need to develop an outlier detection solution for large amount of sensed data facts to optimize the processing of data mining. Sensed data is the output of sensor nodes consisting the real values after sensing. Existing solutions provide outlier detection only for static datasets and using clustering algorithms for normal data size. In our work, we have developed an outlier detection system which performs outlier detection of Intel sensed dataset using clustering algorithms DBScan and K-Means. Experimental study has been performed using java application and hadoop system.

Key-Words / Index Term :
Outlier detection, Clustering, Hadoop

References :
[1] M. Ester, H.P. Kriegel, J. Sander, X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise",KDD-96 Proceedings, , German, pp.226-231, 1996.
[2] K. Narita, H. Kitagawa, “Outlier Detection for Transaction Databases Using Association Rules”, In Proceedings of the Ninth International Conference on Web-Age Information Management,Washington, pp. 373-380, 2008.
[3] J. Wang, X. Su, "An improved K-Means clustering algorithm," 2011 IEEE 3rd International Conference on Communication Software and Networks, China, pp. 44-46, 2011.
[4] M. Bhandarkar, "MapReduce programming with apache Hadoop", IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Atlanta-GA, pp.1-1, 2010.
[5] M. Ding, L. Zheng, Y. Lu, L. Li, S. Guo, and M. Guo, “More convenient more overhead: the performance evaluation of Hadoop streaming”, In Proceedings of the ACM Symposium on Research in Applied Computation (RACS), USA, pp. 307-313, 2011.
[6] W. Zhao, H. Ma and Q. He, “Parallel K-Means Clustering Based on MapReduce”, Cloud Computing: First International Conference, CloudCom 2009, Beijing, China, Springer Berlin Heidelberg, pp. 674-679, 2009.
[7] Feng Wang, Jie Qiu, Jie Yang, Bo Dong, Xinhui Li, Ying Li, “Hadoop high availability through metadata replication”. In Proceedings of the first international workshop on Cloud data management (CloudDB `09). ACM- USA, pp. 37-44, 2009.
[8] R. Leonardo, F, Cordeiro, "Clustering very large multi-dimensional datasets with MapReduce", ACM SIGKDD international conference on Knowledge discovery and data mining, USA, pp.690-698, 2011.Y. X. Fu, W. Z. Zhao, H. F. Ma, "Research on Parallel DBSCAN Algorithm Design Based on MapReduce", Advanced Materials Research, Vols.301, Issue.303, pp. 1133-1138, 2011.
[9] W. Zhao, H.Ma, Q. He, “Parallel K-Means Clustering Based on MapReduce”, In Proceedings of the 1st International Conference on Cloud Computing , Springer-Verlag, Berlin, pp. 674-679, 2009.
[10] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, Sixth symposium on Operating Systems design and implementation (OSDI), San Francisco, CA, pp. 213-220, 2004.
[11] M.F. Hornick, E. Marcadé, S. Venkayala, "Java Data Mining: Strategy, Standard, and Practice: A Practical Guide for Architecture, Design, and Implementation", Morgan Kaufmann, canada,pp.1-544, 2010.

Authorization Required

 

You do not have rights to view the full text article.
Please contact administration for subscription to Journal or individual article.
Mail us at  support@isroset.org or view contact page for more details.

Go to Navigation