How to Scrape Job Postings Data from Indeed.com with Python?

How-to-Scrape-Job-Postings-Data-from-Indeed.com-with-Python.jpg

If you're looking for a job or a job change but not getting the right job post even though you're searching frequently, web scraping is a perfect shortcut for you to find the right job post. In this blog post, you'll understand how to find a job post from a top job listing site, and ultimately it will be the best job of your comfort. However, we recommend you not to work hard to find the job since you're getting the shortcut for it.

The shortcut is web data scraping using python, where you can create a scraper as per your requirements. Let's walk through the process to Scrape Job Postings Data from Indeed.com

What Is Web Scraping For Job Postings Using Python?

Web scraping for job posting is a process to automatically gather data from job portals like Indeed and reduce your time to find the job manually. Having the solution after you to get the right job vacancy in no time will make your life easier. Once you get the customized data for job vacancies as per your role on indeed, you just need to go and apply there with your style.

Understanding the URL and Job Post Page Structure on Indeed

URL plays an important role in data extraction.

  • Syntax q= denotes the string for what field on the page. Further, the + sign separates the search term.
  • When it comes to salary, commas will parse it in the salary figure. For example, the beginning of the salary component will be like %24 and then the number before the first comma. Further, %2C will break the salary figure.
  • To know the code for the city of interest, you can search for &I= term at the beginning of the syntax.

Most importantly, the start date you can get by the syntax &start= in the URL structure. To build the scraper to gather data from multiple pages, this understanding of URL structure will be a huge plus.

If you're a newbie, you'll get to know about HTML tags and code structure on chrome when you inspect the element. However, it's not necessary to know the code in detail. Just knowing page structure and component hierarchy will do everything for you.

structure-and-component-hierarchy-will-do-everything-for-you.jpg

Getting Started with the Scraper on Indeed

So far, we've studied the page and URL structure. Now, let's build the code to extract the job posting data from Indeed of your choice. However, please note that you're also going to import time to negate the server loading issue during data extraction.

To begin with, you can target a single page to grab the data you want.

To-begin-with,-you-can-target-a-single-page-to-grab-the-data-you-want.jpg

Once you run the above code, you'll get the output as shown below.

Once-you-run-the-above-code,-you'll-get-the-output-as-shown-below.jpg

Using the above code, you've gathered all the details in the variable soup. Moving ahead, to iterate more into tags and subtags for capturing the necessary data, you've to dig more into the code.

Getting Fundamental Data Elements

There are 5 basic elements in each job posting.

  • Job Title.
  • Company Name.
  • Location.
  • Salary
  • Job Summary.

For example, if you explore the job page and there are 24 job posts. Hence your code should be such that it will generate 24 different items. But, in case the code gives less output, you can go back to a reference page and check what's wrong.

Getting a Job Title from Indeed

You can see the wholeness of every job post is under < div > tag, with class = row result attribute.

Additionally, you can also observe that the job titles are below < a > tags having an attribute title = (title). Further, you can use value of a tag's attribute having tag[“attribute”] and find the job title for each and every job posting

Once you understand the above, you'll see the below 3 steps in capturing the job titles.

  • Capturing all the < div > tags using class with “row”.
  • Recognizing < a > tags having attribute called “data-tn-element”:”jobTitle”
  • For all these < a > tags, search attribute having value “title”
For-all-these-tags,-search-attribute-having-value.jpg

Getting Company Name on Indeed Job Postings

Since most of the company names appear in < span > tags with class company, it's a bit tricky to grab company names. Further, they are also slotted in class result-link-source.

To extract the company info from these fields, you are going to use if/else statements. In addition, in order to delete white spaces around the name of the company, you have to use inputting.strip() at the end of the code.

inputting.strip()-at-the-end-of-the-code.jpg

Getting Location

Job locations are placed in the span tags. These tags are at times nested with each other, like a location text could be within class : location attributes or nested with “itemprop”:”addressLocality”. However, an easy loop can retrieve the necessary data for you.

Getting-Location.jpg

Getting Salary

It is one of the most challenging tasks to extract salary data for job postings on Indeed. The reason being most of the companies don't post about the salary range in the job post. Free companies share the range of the salary. There are multiple places available to grab salary data on the job portal. You have to create the python code to grab the salary component accordingly. If any company doesn't provide salary info, you've to create a placeholder Nothing Found to place those job posts.

Found-to-place-those-job-posts.jpg

Getting Job Summary

This is the last part of the process, where you need to grab the job summary, and it is not possible to find it for each position. Because it is not available in the HTML code of the given Indeed Page. To grab this data, you've to go for selenium.

Wrapping Up

Are you looking for a job but unable to find the right one? That too with the help of data scraping using python, which we have covered in this article. Don't worry, IWeb Data Scraping is always happy to help you to find the right job for you with the help of web scraping services

Let’s Discuss Your Project