Hands on with Scraping and Plotting Data in Python

Recently, I’ve come along a requirement to scrape some content from a website, well it was Wikipedia. I have to utilize it somewhere and try to get something out of that content. Well getting something out of it is another story, trying to achieve this i came across a library used in python named as “BeautifulSoup”, There are many other libraries as well like “scrapy”. But for this task I opted beautifulsoup. It is indeed a good library to scrape off the content from any web-page or website within couple of minutes.

Introduction to BeautifulSoup

Beautiful Soup is a simple library in Python for extracting and getting data out of HTML and XML files. It commonly saves programmers hours or days of work, providing them Pythonic idioms for iterating, searching, and modifying the parse tree.

import requests                       
from bs4 import BeautifulSoup
import re
import pandas as pd
URL = 'https://en.wikipedia.org/wiki/World_Happiness_Report#2019_report'
material = requests.get(URL)
soup = BeautifulSoup(material.content, 'html.parser')
tables = soup.find_all('table', class_='wikitable sortable')
Screen-shot showing the class of table present on the Wikipedia page
Screen-shot showing the class of table present on the Wikipedia page
Name of class of table present on Wikipedia page.
for table in tables:
ths = table.find_all('th')
headings = [th.text.strip() for th in ths]
if headings[:2] == ['Country or region','Score']:
break
for tr in table.find_all('tr'):
tds = tr.find_all('td')
if not tds:
continue
country_name, score = [td.text.strip() for td in tds[:2]]
df = df.append({
'Country or region': country_name,
'Score': score,
}, ignore_index=True)
df['Score']=df['Score'].astype(float)
df.plot.bar(x='Country or region',y='Score', rot=90, title="Distribution of Happiness score per Country",legend=False);
plot.ylabel("Happiness Score")
plot.xlabel("Countries")
plot.show();
Plot of Happiness Score vs Countries
URL = 'https://en.wikipedia.org/wiki/World_Happiness_Report#2019_report'
material = requests.get(URL)
soup = BeautifulSoup(material.content, 'html.parser')
tables = soup.find_all('table', class_='wikitable sortable')for table in tables:
ths = table.find_all('th')
headings = [th.text.strip() for th in ths]
if headings[:2] == ['Country or region','Score']:
break
df = pd.DataFrame(columns=['Country or region','Score'])for tr in table.find_all('tr'):
tds = tr.find_all('td')
if not tds:
continue
country_name, score = [td.text.strip() for td in tds[:2]]
df = df.append({
'Country or region': country_name,
'Score': score,
}, ignore_index=True)
df['Score']=df['Score'].astype(float)
df.plot.bar(x='Country or region',y='Score', rot=90, title="Distribution of Happiness score per Country",legend=False);
plot.ylabel("Happiness Score")
plot.xlabel("Countries")
plot.show();

Python, DevOps, Cryptography, Infra-Structure, Automation. https://syedsaadahmed.com/