Case Study
HR Tech and Recruitment
Generative AI & ML

Scalable Cloud-based Web Crawling And Scraping Solution for AI Hiring Platform

About This Project

The client needed a solution to automate the search, retrieval, and storage of publicly available data from multiple websites. This process was done manually and required a dedicated resource to manage the entire process. They needed a crawling and scraping solution that automates the searching and extraction of data from multiple sources and uploads the data to the client database for further processing. 

Services

Generative AI & ML

Technologies

About the Client

The client is an AI-based recruitment platform that enables talent discovery and personalized interaction based on organization alignment and profile matching. Their data-driven hiring solution helps companies spot candidates who best fit their needs and are likely to move, and then approach them through personalized interaction.

Understanding the Challenge

A data pipeline and data warehousing solution was needed to manage the movement and transformation of data, as well as quick retrieval and analysis for reporting and decision making.

Velotio developed a crawling and scraping solution that automates the searching and extraction of data from multiple sources and uploads the data to the client database for further processing. The solution will crawl, extract and store data based on pre-specified rules. The solution also made it possible to specify the kind of URLs to crawl, and the data type to be extracted and stored in the database. The time intervals for the crawling/extraction process and quantum of data extracted be specified as per requirement.

“Velotio quickly adapts to our ever-changing needs. They have excellent business acumen with the ability to prioritize deliverables, while continually exceeding our expectations.”

CTO, Washington DC-based startup

How We Made It Happen

The solution was organized into 3 layers – Crawler, Data Extractor, and Backend API Layer. Domain-specific rules and intelligence was used by the solution to crawl, extract and store data. Basic NLP and machine learning were also leveraged to reduce the effort of scraping websites of platforms.‍

  • Crawler: It will crawl specific websites and platforms, following domain-specific rules to extrapolate data. The data will then be uploaded to the cloud storage (Amazon S3 or Equivalent) for processing. The crawler will then have multiple spiders (processes), most likely customized spiders for each kind of platform. The crawler will run periodically to keep the content up to date.
  • Extractor: The data extractor takes HTML page content and applies the generic, platform/website-specific rules to get the relevant data, which will then be saved to a database.
  • API Layer: A REST API-based Server was used to provide the facility to access data from persistent storage over HTTP REST APIs.‍

Database: MySQL was used as a relational database as persistent storage for the data. MySQL’s free text feature was used for supporting the search APIs. The solution described above was delivered as a Docker container.

How Velotio Made a Difference

Automated the entire process of searching and storing content in the database.

Significantly saved resource productivity and efforts.

Simplified profile management with automated and regular data upload.

With Velotio, achieve breakthroughs in your product development journey.

Over 90 global customers, including NASDAQ-listed enterprises, unicorn startups, and cutting-edge product companies have trusted us for our technology expertise to deliver delightful digital products.

Talk to us

Work with modern and scalable technologies

We leverage emerging technologies to build products that are designed for scalability and better usability.

Rated 4.6/5 on Clutch

325+ highly skilled engineers

With us as your tech partners, you get access to a pool of digital strategists, engineers, architects, project managers, UI/UX designers, Cloud & DevOps experts, product analysts and QA managers.

At Velotio, we hold ourselves to sky-high standards of excellence and expect the same from our customers.