From Web to Insights: Automating and Optimizing Job Data Collection with Selenium

Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar

Main Menu

Journals Contents

Information

Download

Publication Certificate

Certificate for Regular Issue

Full Paper View Go Back

From Web to Insights: Automating and Optimizing Job Data Collection with Selenium

Pramiti Tewari¹ , Utkarsh Gupta² , Samriddhi Tripathi³ , Ajay Kumar⁴

Section:Research Paper, Product Type: Journal-Paper
Vol.11 , Issue.4 , pp.1-7, Dec-2024

Online published on Dec 31, 2024

Copyright © Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Citation

IEEE Style Citation: Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar, “From Web to Insights: Automating and Optimizing Job Data Collection with Selenium,” World Academics Journal of Engineering Sciences, Vol.11, Issue.4, pp.1-7, 2024.

MLA Citation

MLA Style Citation: Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar "From Web to Insights: Automating and Optimizing Job Data Collection with Selenium." World Academics Journal of Engineering Sciences 11.4 (2024): 1-7.

APA Citation

APA Style Citation: Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar, (2024). From Web to Insights: Automating and Optimizing Job Data Collection with Selenium. World Academics Journal of Engineering Sciences, 11(4), 1-7.

BibTex Citation

BibTex Style Citation:
@article{Tewari_2024,
author = {Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar},
title = {From Web to Insights: Automating and Optimizing Job Data Collection with Selenium},
journal = {World Academics Journal of Engineering Sciences},
issue_date = {12 2024},
volume = {11},
Issue = {4},
month = {12},
year = {2024},
issn = {2347-2693},
pages = {1-7},
url = {https://www.isroset.org/journal/WAJES/full_paper_view.php?paper_id=3758},
publisher = {IJCSE, Indore, INDIA},
}

RIS Citation

RIS Style Citation:
TY - JOUR
UR - https://www.isroset.org/journal/WAJES/full_paper_view.php?paper_id=3758
TI - From Web to Insights: Automating and Optimizing Job Data Collection with Selenium
T2 - World Academics Journal of Engineering Sciences
AU - Pramiti Tewari, Utkarsh Gupta, Samriddhi Tripathi, Ajay Kumar
PY - 2024
DA - 2024/12/31
PB - IJCSE, Indore, INDIA
SP - 1-7
IS - 4
VL - 11
SN - 2347-2693
ER -

150 Views

218 Downloads

30 Downloads

Bar Line

Abstract :
The research explores web scraping as an efficient method for data extraction, focusing on job postings from LinkedIn using Selenium. By automating the interactions with dynamic web elements, the study extracts job data such as job titles, companies, and addresses from the recent job postings. It also addresses the challenges such as dynamic content handling, anti-bot mechanisms while also keeping the legal norms and ethical considerations of data mining. Python modules such as BeautifulSoup, Scrapy and Selenium are reviewed as choices for the automated script while emphasis is given on Selenium’s scalability, robustness, efficiency and adaptability to real-world scenarios such as multi-page navigation, error handling and regular updates. The approach highlights web scraping’s potential in leveraging data mining and potential for effective analysis, offering an ethical solution for data-driven approach.

Key-Words / Index Term :
Web Scraping, Selenium, Python Automation, Dynamic Content Handling, Pagination, WebDriver

References :
[1] T. C. Lethbridge, S. E. Sim, and J. Singer, “Studying software engineers: Data collection techniques for software field studies,” Empir. Softw. Eng., Vol.10, No.3, pp.311–341, 2005. doi: 10.1007/s10664-005-1290-x.
[2] H. Chaib and K. Salah-ddine, “Using Web Scraping In A Knowledge Environment To Build Ontologies Using Python And Scrapy,” no. October, 2020.
[3] H. Lo, M. Reboiro-jato, F. Fdez-riverola, and D. Glez-pen, “Web scraping technologies in an API world,” Vol.15, No.5, pp.788–797, 2013. doi: 10.1093/bib/bbt026.
[4] M. A. Khder, “Web Scraping or Web Crawling?: State of Art , Techniques , Approaches and Application,” Vol.13, No.3, 2021. doi: 10.15849/IJASCA.211128.11.
[5] V. Singrodia and A. Mitra, “A Review on Web Scrapping and its Applications,” 2019 Int. Conf. Comput. Commun. Informatics, no. January, pp.1–6, 2019. doi: 10.1109/ICCCI.2019.8821809.
[6] R. J. E. James, “Web Scraping Using R,” 2019. doi: 10.1177/2515245919859535.
[7] M. Dogucu and M. Çetinkaya-rundel, “Web Scraping in the Statistics and Data Science Curriculum?: Challenges and Opportunities Web Scraping in the Statistics and Data Science Curriculum?: Challenges and,” J. Stat. Educ., pp.1–24, 2021. doi: 10.1080/10691898.2020.1787116.
[8] E. Uzun, “A Novel Web Scraping Approach Using the Additional Information Obtained from Web Pages,” IEEE Access, Vol.8, pp.61726–61740, 2020. doi: 10.1109/ACCESS.2020.2984503.
[9] S. Kumar, J. Thakur, D. Ekka, and I. Sahu, “Web Scraping Using Python,” Int. J. Adv. Eng. Manag., Vol.4, No.9, pp.235, 2022. doi: 10.35629/5252-0409235237.
[10] C. Zheng, G. He, and Z. Peng, “A Study of Web Information Extraction Technology Based on Beautiful Soup,” J. Comput., Vol.10, No.6, pp.381–387, 2015. doi: 10.17706/jcp.10.6.381-387.
[11] L. Richardson, “Beautiful Soup Documentation Release 4.4.0,” Media.Readthedocs.Org, pp.1–72, 2019.
[12] A. Abodayeh, R. Hejazi, W. Najjar, L. Shihadeh, and R. Latif, “Web Scraping for Data Analytics: A BeautifulSoup Implementation,” Proc. - 2023 6th Int. Conf. Women Data Sci. Prince Sultan Univ. WiDS-PSU 2023, no. January, pp.65–69, 2023. doi: 10.1109/WiDS-PSU57071.2023.00025.
[13] V. Suganthi and M. M. Varun, “INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY RESEARCH Automation Using Selenium,” Sci. Eng. Technol. | An ISO, Vol.9001, No.4, pp.5181, 2008. doi: 10.15680/IJMRSET.2024.0704026.
[14] S. Mehta, P. Gayatri, and P. Jain, “An Improving Approach for Fast Web Scrapping Using Machine Learning and Selenium Automation,” Vol 8, No.10, pp.434–438, 2019.
[15] K. Henrys, “Importance of web scraping in e-commerce and e-marketing,” no. January, pp.1–10, 2021.
[16] F. Pimentel, L. Murta, V. Braganholo, and J. Freire, “A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks,” pp.1–11, 2024.
[17] P. Meschenmoser, N. Meuschke, M. Hotz, and B. Gipp, “Bibliographic Details BibTe X , EndNote … Authors ’ Details D, Lib Magazine Scraping Scientific Web Repositories?: Challenges and Solutions for Automated Content Extraction 1 Introduction 2 Related Work 3 Challenges for Scraping”, doi: 10.1045/September, 2016.
[18] V. Krotov and L. Johnson, “Big web data: Challenges related to data, technology, legality, and ethics,” Bus. Horiz., no. October 2022, 2023. doi: 10.1016/j.bushor.2022.10.001.
[19] K. Weerasinghe, M. W. P. Maduranga, and M. M. V. T. Kawya, “Enhancing Web Scraping with Artificial Intelligence: A Review,” January, 2024.
[20] V. Srividhya and P. Megala, “Scraping and Visualization of Product Data from E-commerce Websites,” Int. J. Comput. Sci. Eng., Vol.7, No.5, pp.1403–1407, 2019. doi: 10.26438/ijcse/v7i5.14031407.
[21] S. Kulkarni, “Web Scraping: Extracting Insights from the Digital Landscape,” Int. J. Res. Appl. Sci. Eng. Technol., Vol.11, No.5, pp.7564–7567, 2023. doi: 10.22214/ijraset.2023.53467.

Authorization Required

Close(X)

You do not have rights to view the full text article.
Please contact administration for subscription to Journal or individual article.
Mail us at support@isroset.org or view contact page for more details.

Main Menu

Journals Contents

Information

Download

Publication Certificate

Full Paper View Go Back

IEEE Citation

MLA Citation

APA Citation

BibTex Citation

RIS Citation

Main Menu

Journals Contents

Information

Download

Publication Certificate

Contact Us

Use full Link