Full Paper View Go Back
From Web to Insights: Automating and Optimizing Job Data Collection with Selenium
Pramiti Tewari1 , Utkarsh Gupta2 , Samriddhi Tripathi3 , Ajay Kumar4
Section:Research Paper, Product Type: Journal-Paper
Vol.11 ,
Issue.4 , pp.1-7, Dec-2024
Online published on Dec 31, 2024
Copyright © Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
View this paper at Google Scholar | DPI Digital Library
How to Cite this Paper
- IEEE Citation
- MLA Citation
- APA Citation
- BibTex Citation
- RIS Citation
IEEE Style Citation: Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar, “From Web to Insights: Automating and Optimizing Job Data Collection with Selenium,” World Academics Journal of Engineering Sciences, Vol.11, Issue.4, pp.1-7, 2024.
MLA Style Citation: Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar "From Web to Insights: Automating and Optimizing Job Data Collection with Selenium." World Academics Journal of Engineering Sciences 11.4 (2024): 1-7.
APA Style Citation: Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar, (2024). From Web to Insights: Automating and Optimizing Job Data Collection with Selenium. World Academics Journal of Engineering Sciences, 11(4), 1-7.
BibTex Style Citation:
@article{Tewari_2024,
author = {Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar},
title = {From Web to Insights: Automating and Optimizing Job Data Collection with Selenium},
journal = {World Academics Journal of Engineering Sciences},
issue_date = {12 2024},
volume = {11},
Issue = {4},
month = {12},
year = {2024},
issn = {2347-2693},
pages = {1-7},
url = {https://www.isroset.org/journal/WAJES/full_paper_view.php?paper_id=3758},
publisher = {IJCSE, Indore, INDIA},
}
RIS Style Citation:
TY - JOUR
UR - https://www.isroset.org/journal/WAJES/full_paper_view.php?paper_id=3758
TI - From Web to Insights: Automating and Optimizing Job Data Collection with Selenium
T2 - World Academics Journal of Engineering Sciences
AU - Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar
PY - 2024
DA - 2024/12/31
PB - IJCSE, Indore, INDIA
SP - 1-7
IS - 4
VL - 11
SN - 2347-2693
ER -
Abstract :
The research explores web scraping as an efficient method for data extraction, focusing on job postings from LinkedIn using Selenium. By automating the interactions with dynamic web elements, the study extracts job data such as job titles, companies, and addresses from the recent job postings. It also addresses the challenges such as dynamic content handling, anti-bot mechanisms while also keeping the legal norms and ethical considerations of data mining. Python modules such as BeautifulSoup, Scrapy and Selenium are reviewed as choices for the automated script while emphasis is given on Selenium’s scalability, robustness, efficiency and adaptability to real-world scenarios such as multi-page navigation, error handling and regular updates. The approach highlights web scraping’s potential in leveraging data mining and potential for effective analysis, offering an ethical solution for data-driven approach.
Key-Words / Index Term :
Web Scraping, Selenium, Python Automation, Dynamic Content Handling, Pagination, WebDriver
References :
[1] T. C. Lethbridge, S. E. Sim, and J. Singer, “Studying software engineers: Data collection techniques for software field studies,” Empir. Softw. Eng., Vol.10, No.3, pp.311–341, 2005. doi: 10.1007/s10664-005-1290-x.
[2] H. Chaib and K. Salah-ddine, “Using Web Scraping In A Knowledge Environment To Build Ontologies Using Python And Scrapy,” no. October, 2020.
[3] H. Lo, M. Reboiro-jato, F. Fdez-riverola, and D. Glez-pen, “Web scraping technologies in an API world,” Vol.15, No.5, pp.788–797, 2013. doi: 10.1093/bib/bbt026.
[4] M. A. Khder, “Web Scraping or Web Crawling?: State of Art , Techniques , Approaches and Application,” Vol.13, No.3, 2021. doi: 10.15849/IJASCA.211128.11.
[5] V. Singrodia and A. Mitra, “A Review on Web Scrapping and its Applications,” 2019 Int. Conf. Comput. Commun. Informatics, no. January, pp.1–6, 2019. doi: 10.1109/ICCCI.2019.8821809.
[6] R. J. E. James, “Web Scraping Using R,” 2019. doi: 10.1177/2515245919859535.
[7] M. Dogucu and M. Çetinkaya-rundel, “Web Scraping in the Statistics and Data Science Curriculum?: Challenges and Opportunities Web Scraping in the Statistics and Data Science Curriculum?: Challenges and,” J. Stat. Educ., pp.1–24, 2021. doi: 10.1080/10691898.2020.1787116.
[8] E. Uzun, “A Novel Web Scraping Approach Using the Additional Information Obtained from Web Pages,” IEEE Access, Vol.8, pp.61726–61740, 2020. doi: 10.1109/ACCESS.2020.2984503.
[9] S. Kumar, J. Thakur, D. Ekka, and I. Sahu, “Web Scraping Using Python,” Int. J. Adv. Eng. Manag., Vol.4, No.9, pp.235, 2022. doi: 10.35629/5252-0409235237.
[10] C. Zheng, G. He, and Z. Peng, “A Study of Web Information Extraction Technology Based on Beautiful Soup,” J. Comput., Vol.10, No.6, pp.381–387, 2015. doi: 10.17706/jcp.10.6.381-387.
[11] L. Richardson, “Beautiful Soup Documentation Release 4.4.0,” Media.Readthedocs.Org, pp.1–72, 2019.
[12] A. Abodayeh, R. Hejazi, W. Najjar, L. Shihadeh, and R. Latif, “Web Scraping for Data Analytics: A BeautifulSoup Implementation,” Proc. - 2023 6th Int. Conf. Women Data Sci. Prince Sultan Univ. WiDS-PSU 2023, no. January, pp.65–69, 2023. doi: 10.1109/WiDS-PSU57071.2023.00025.
[13] V. Suganthi and M. M. Varun, “INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY RESEARCH Automation Using Selenium,” Sci. Eng. Technol. | An ISO, Vol.9001, No.4, pp.5181, 2008. doi: 10.15680/IJMRSET.2024.0704026.
[14] S. Mehta, P. Gayatri, and P. Jain, “An Improving Approach for Fast Web Scrapping Using Machine Learning and Selenium Automation,” Vol 8, No.10, pp.434–438, 2019.
[15] K. Henrys, “Importance of web scraping in e-commerce and e-marketing,” no. January, pp.1–10, 2021.
[16] F. Pimentel, L. Murta, V. Braganholo, and J. Freire, “A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks,” pp.1–11, 2024.
[17] P. Meschenmoser, N. Meuschke, M. Hotz, and B. Gipp, “Bibliographic Details BibTe X , EndNote … Authors ’ Details D, Lib Magazine Scraping Scientific Web Repositories?: Challenges and Solutions for Automated Content Extraction 1 Introduction 2 Related Work 3 Challenges for Scraping”, doi: 10.1045/September, 2016.
[18] V. Krotov and L. Johnson, “Big web data: Challenges related to data, technology, legality, and ethics,” Bus. Horiz., no. October 2022, 2023. doi: 10.1016/j.bushor.2022.10.001.
[19] K. Weerasinghe, M. W. P. Maduranga, and M. M. V. T. Kawya, “Enhancing Web Scraping with Artificial Intelligence: A Review,” January, 2024.
[20] V. Srividhya and P. Megala, “Scraping and Visualization of Product Data from E-commerce Websites,” Int. J. Comput. Sci. Eng., Vol.7, No.5, pp.1403–1407, 2019. doi: 10.26438/ijcse/v7i5.14031407.
[21] S. Kulkarni, “Web Scraping: Extracting Insights from the Digital Landscape,” Int. J. Res. Appl. Sci. Eng. Technol., Vol.11, No.5, pp.7564–7567, 2023. doi: 10.22214/ijraset.2023.53467.
You do not have rights to view the full text article.
Please contact administration for subscription to Journal or individual article.
Mail us at support@isroset.org or view contact page for more details.