How to Perform Web Scraping of Popular Classifieds Sites?

How-to-Perform-Web-Scraping-of-Popular-Classifieds-Sites.png

A classified website is an online advertising platform that is pivotal in promoting products or services. It connects both buyers and sellers in a single entity. The classified portals have several sections devoted to jobs, resumes, housing, personals, services, items wanted, community, and more.

Structured or unstructured data plays a significant role in generating growth and innovation for your business. In this era of Big Data, the web scraping process is crucial for every industry. And classified site is no such exception.

The classified portal is generally customized and differentiated, allowing users to search for relevant categories and sub-categories. Web scraping popular classified site data enables extracting important information that can benefit business owners and buyers.

Some vital information extracted from classified sites is ad title, ad ID, description, current price, posting date, specification, category, subcategory, image, city, state, category, etc.

Importance for Web Scraping Classified Site

By leveraging the power of automated scraper, businesses can find classifieds for a fair price and, in the meantime, can focus on other business activities rather than manually collecting information. Scraping allows you to extract your competitors’ contact information, including pricing, contact numbers, and emails, and help send several marketing messages to customers.

With web scraping popular classified websites, businesses can easily procure several volumes of data immediately and save time and effort while manually collecting information. The information extracted is helpful for several other business-related areas

With the Classified extractor services from iWeb Data Scraping, you can quickly obtain competitors’ contact information and connect with them. However, you can also promote your business to your customers and target consumers.

However, with several other internet marketing methods, the primary aspect of data scraping is legality.

Why Scrape Classified Site?

The primary reason for extracting classified websites may be variable. The most popular ones are:

  • Research/Analytical: Writing reports require specific data. Whether you are an investigative journalist or a student, extract the post in the given section and analyze data from them.
  • Personal: If you are looking for a new car, extract the classified data to correlate pricing, models, and location of the used car models.
  • For Profit: Web scraping classified site data enables you to extract data related to some items you wish to buy and resell.
  • For Business: Extracting data helps in lead generation. You can easily search for those who require your products and services and offer them directly.

Here, we are scrapping OLX classified site. We will choose a random category, i.e., Children’s clothing. First, open in Chrome tab and enable the developer tools. Now, go to the Elements tab, select a tool for inspecting the page item, click on the section with the first item, and then see the selected HTML node in the HTML code in the ‘Elements’ tab.

Here,-we-are-scrapping-OLX-classified-site.png

Here, we can use td.offer as the CSS selector. But, first, we need to ensure that. For this, press CTRL + F while you are in the HTML code of the ‘Elements’ tab. In the search bar, type the Selector. If everything goes right, you can see 44 elements.

Here,-we-can-use-td.offer.png

First, we will open the HTML elements and find the required link. The Selector for links to ad pages is td.offer a.link.detailsLink. Check and ensure that there are 44 links. For better compatibility, we can use a.link.detailsLink selector.

First,-we-will-open-the-HTML-elements-and.png

Now, check for the paginator. We will find the link to the next page in the paginator. The Selector that we obtained is a[data-cy="page-link-next"]. Ensure that there is just one element specific to the Selector on the page.

Now,-check-for-the-paginator.png

To navigate through the category pages, we will use a link pool. The scraper will appear like this:

To-navigate-through-the-category-pages00.png

Next, we describe the data collection logic from the ad page. Hence, to perform, we will first open any ad and find CSS selectors:

  • Page ad block selector: div#offer_active.
  • Ad Title: h1
  • Address: address > p
  • Ad Id: em > small
  • Date and time of ad placement: em
  • Table with details: table.details.
  • Description: div#textContent
  • Image: div#photo-gallery-opener>img
  • Price: div. price-label
  • Seller Name: div.offeruser__details h4

We will first code part of the scraper to collect the actual data from the ad page.

We-will-first-code-part-of-the-scraper We-will-first-code-part-of-the-scraper We-will-first-code-part-of-the-scraper We-will-first-code-part-of-the-scraper We-will-first-code-part-of-the-scraper We-will-first-code-part-of-the-scraper We-will-first-code-part-of-the-scraper We-will-first-code-part-of-the-scraper We-will-first-code-part-of-the-scraper We-will-first-code-part-of-the-scraper We-will-first-code-part-of-the-scraper

We get the following dataset record:

We-get-the-following-dataset-record

Now, we want to collect the phone numbers. And for this, first, open the page with the ad, then the developer tools, and then go to the Network tab. Within the tab, we only want to view XHR requests. Click to clear all requests button. Now, click on the ‘Show Phone’ button.

Now,-we-want-to-collect-the-phone-numbers

Now, open the requests and check the address and the type of data they sent.

Now,-open-the-requests-and-check-the-address-and-the-type-of-data-they-sent

The URL that we have is

https://www.olx.ua/ajax/misc/contact/phone/qsKeK/ and the parameter pt is cda38f1d74d6e50f6f5a248ea2578ba04d44b58ccb6648718ce825a15dd1c036494b2cd1c6cb27762a8de30f5f58676149a11ee8a228998fd7f6b8cde5bb83a9

From the above link, we c require ad id qsKeK and parameter pt to imitate such requests. This parameter is in JavaScript, which we can extract using a regular expression. We will make certain changes in the scraper to collect the phone number and then add a snippet.

From-the-above-link,-we-c-require-ad-id-qsKeK-and-parameter.png From-the-above-link,-we-c-require-ad-id-qsKeK-and-parameter.png From-the-above-link,-we-c-require-ad-id-qsKeK-and-parameter.png From-the-above-link,-we-c-require-ad-id-qsKeK-and-parameter.png From-the-above-link,-we-c-require-ad-id-qsKeK-and-parameter.png From-the-above-link,-we-c-require-ad-id-qsKeK-and-parameter.png From-the-above-link,-we-c-require-ad-id-qsKeK-and-parameter.png From-the-above-link,-we-c-require-ad-id-qsKeK-and-parameter.png From-the-above-link,-we-c-require-ad-id-qsKeK-and-parameter.png From-the-above-link,-we-c-require-ad-id-qsKeK-and-parameter.png From-the-above-link,-we-c-require-ad-id-qsKeK-and-parameter.png From-the-above-link,-we-c-require-ad-id-qsKeK-and-parameter.png

If we run the scraper in debug mode, we see the following structure:

If-we-run-the-scraper-in-debug-mode,-we-see-the-following-structure.png

We will use body_safe > value CSS Selector to collect the phone number. Add it to the web scraper to obtain the following:

We-will-use-body.png We-will-use-body01.png We-will-use-body02.png We-will-use-body03.png We-will-use-body.png We-will-use-body.png We-will-use-body.png We-will-use-body.png We-will-use-body.png We-will-use-body.png We-will-use-body.png We-will-use-body.png We-will-use-body.png

Conclusion: Thus, the scraper works well to collect the data we require from OLX classified site.

CTA: For more information, contact iWeb Data Scraping now! You can also reach us for all your web scraping service and mobile app data scraping service requirements.

Let’s Discuss Your Project