Abstract
Breast Cancer is one of the diseases that causes a higher number of deaths in a year. Breast cancer is the second most common cause of mortality among women, and majority of death occur due to its unavailable fact in Canada. Breast cancer is the most treatable type of cancer among all others, and early detection and thorough screening for the disease guarantee a higher patient survival rate. In order to find a reliable approach of predicting breast cancer, this paper offers a study about breast cancer prediction based on machine learning techniques. This study compares numerous patient clinical records to find an accurate model that can predict the likelihood of developing breast cancer. In this paper, a few machine learning models— kNN (k Nearest Neighbour), SVM (Support Vector Machine), ANN (Artificial Neural Network), and Naive Bayes classifier—are used. Wisconsin Breast Cancer Database (1991) and Wisconsin Diagnostic Breast Cancer (1995) are two commonly used test data sets that are used to assess the performance of these models. The 10-fold cross-validation approach is utilised to calculate each model`s test error. We downloaded the dataset from Kaggle.com in order to conduct this study. With a class distribution of 357 benign and 212 malignant cells, it has a total of 32 attributes, including the I.D. number, diagnosis (M= malignant and B= benign), and additional 30 real-valued input properties. The analysis`s findings show a thorough trade-off between these tactics and also give a thorough assessment of the models. In practical use, it is anticipated that feature identification results will help doctors and patients prevent breast cancer.
Key-Words / Index Term
Breast Cancer, Support Vector Machine, Random Forest, k-Nearest Neighbor, Artificial Neural Network
References
[1] A. Toloie Eshlaghy, A. Poorebrahimi, M. Ebrahimi, A. R. Razavi, and L. Ghasem Ahmad, “Using Three Machine Learning Techniques for Predicting Breast Cancer Recurrence,” 2013.
[2] Y. Li, H. Chen, L. Cao, and J. Ma, “A Survey of Computer-aided Detection of Breast Cancer with Mammography,” 2016.
[3] H. L. Chen, B. Yang, J. Liu, and D. Y. Liu, “A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis,” Expert Syst. Appl., Vol.38, Issue.7, pp.9014–9022, 2011.
[4] Forouzanfar, M. H., Foreman, K. J., Delossantos, A. M., Lozano, R., Lopez, A. D., Murray, C. J., and Naghavi, M., “Breast and Cervical Cancer in 187 Countries between 1980 and 2010: A Systematic Analysis,” The Lancet, 378(9801), pp.1461-1484, 2011.
[5] Siegel, R., Ma J., Zou Z., and Jemal A., “Cancer Statistics 2014,” CA: A Cancer Journal for Clinicians, Vol.64, Issue.1, pp.9-29, 2014.
[6] Octa Heriana, Indah Soesanti. Tumor size classification of breast thermal image using fuzzy C-Means algorithm. 2015 International Conference on Radar, Antenna, Microwave, Electronics and Telecommunications (ICRAMET), IEEE, 2015
[6] B.M.Gayathri and C.P.Sumathi,”Mamdani fuzzy inference system for breast cancer risk detection”, 2015.
[7] Mohd,F.,Thomas,M, “Comparison of different classification techniques using WEKA for Breast cancer” 2007.
[8] T Choudhury, V Kumar, D Nigam ,An Innovative Smart Soft Computing Methodology towards Disease (Cancer, Heart Disease, Arthritis) Detection in an Earlier Stage and in a Smarter Way, International Journal of Computer Science and Mobile Communication (IJCSMC) 2014.
[9] Naresh Khuriwal, Nidhi Mishra. Breast cancer diagnosis using adaptive voting ensemble machine learning algorithm. 2018 IEEMA Engineer Infinite Conference (eTechNxT), IEEE, 2018.
[10] Mohd Rasoul Al-hadidi, Abdulsalam Alarabeyyat, Mohannad Alhanahnah, “Breast cancer detection using k-nearest neighbor machine learning algorithm”, 9th International Conference on Developments in eSystems Engineering, pp.35–39, 2016.
[11] Ahmed F. Seddik, Doaa M. Shawky, “Logistic regression model for breast cancer automatic diagnosis”, SAI Intelligent Systems Conference, pp. 150-154, 2015.
[12] D. Bazazeh and R. Shubair, “Comparative study of machine learning algorithms for breast cancer detection and diagnosis,” in 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA), pp.1–4, 2016.
[13] C. M. Dayton, “LOGISTIC REGRESSION ANALYSIS,” 1992.
[14] M.-L. Zhang and Z.-H. Zhou, “ML-KNN: A lazy learning approach to multi-label learning,” Pattern Recognit., Vol.40, Issue.7, pp.2038–2048, 2007.
[15] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN Model-Based Approach in Classification,” Springer, Berlin, Heidelberg, pp.986–996, 2003.
[16] W. H. Wolberg and O. L. Mangasarian, “Multi-surface method of pattern separation for medical diagnosis applied to breast cytology.,” Proc. Natl. Acad. Sci., Vol.87, Issue.23, pp.9193–9196, 1990.
[17] Koza, J.R.; Rice, J.P. Genetic generation of both the weights and architecture for a neural network. In Proceedings of the IJCNN-91- Seattle International Joint Conference on Neural Networks, Seattle, WA, USA, 8–12 July; Vol.2, pp.397–404, 1991.
[18] Bhardwaj, A.; Tiwari, A. Breast cancer diagnosis using genetically optimized neural network model. Expert Syst. Appl., 42, pp.4611– 4620, 2015.
[19 ] W. H. Wolberg and O. L. Mangasarian, “Multi-surface method of pattern separation for medical diagnosis applied to breast cytology.,” Proc. Natl. Acad. Sci., Vol.87, Issue.23, pp.9193–9196, 1990.
[20] A. D.-N. C. and Applications and undefined, “Performance evaluation of different machine learning techniques for prediction of heart disease,” Springer. 2016.