data collection by web scrapping with python

  • Why is Web Scraping Used?
  • What Is Web Scraping?
  • Why is Python Good For Web Scraping?
  • Libraries used for Web Scraping
  • Web Scraping Example : Scraping GitHub Search Website

Why is Web Scraping Used?

  • Price Comparison: Services such as ParseHub use web scraping to collect data from online shopping websites and use it to compare the prices of products.
  • Email address gathering: Many companies that use email as a medium for marketing, use web scraping to collect email ID and then send bulk emails.
  • Social Media Scraping: Web scraping is used to collect data from Social Media websites such as Twitter to find out what’s trending.
  • Research and Development: Web scraping is used to collect a large set of data (Statistics, General Information, Temperature, etc.) from websites, which are analyzed and used to carry out Surveys or for R&D.
  • Job listings: Details regarding job openings, interviews are collected from different websites and then listed in one place so that it is easily accessible to the user.

What is Web Scraping?

See the source image

Why is Python Good for Web Scraping?

  • Ease of Use: Python is simple to code. You do not have to add semi-colons “;” or curly-braces “{}” anywhere. This makes it less messy and easy to use.
  • Large Collection of Libraries: Python has a huge collection of libraries such as Numpy, Matlplotlib, Pandas etc., which provides methods and services for various purposes. Hence, it is suitable for web scraping and for further manipulation of extracted data.
  • Dynamically typed: In Python, you don’t have to define datatypes for variables, you can directly use the variables wherever required. This saves time and makes your job faster.
  • Easily Understandable Syntax: Python syntax is easily understandable mainly because reading a Python code is very similar to reading a statement in English. It is expressive and easily readable, and the indentation used in Python also helps the user to differentiate between different scope/blocks in the code.

Libraries used for Web Scraping

  • BeautifulSoup: Beautiful Soup is a Python package for parsing HTML and XML documents. It creates parse trees that is helpful to extract the data easily.
  • Pandas: Pandas is a library used for data manipulation and analysis. It is used to extract the data and store it in the desired format.
  • Requests: Requests library is one of the integral part of Python for making HTTP requests to a specified URL.

Web Scraping Example : Scraping Flipkart Website




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Performing Analysis of Meteorological Data Using Python

The ABC of building a content-based music recommender system

Essential Math for Data Science: The Poisson Distribution

Mode Analytics — A 10 Point Scorecard For Insights Success


Learning Day 19: Pooling, up-sampling and relu in Pytorch

Whats covered in the Mattress Quickguide?

A Data Scientist Approach: Running Postgres SQL using Docker

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


More from Medium

Classification of email text is spam or ham using Logistic Regression algorithm

TWITTER API: Calculating the trendiness score


Python — Tkinter rendering on the mainloop