SerpApi Demo Project: Walmart Coffee Exploratory Data Analysis

Intro

This Python demo project is a practical showcase of using SerpApi's Walmart Search Engine Results API plus how extracted data could be used in exploratory data analysis.

SerpApi is an API that let's you receive structured JSON data from 26 search engines in 10 languages.

You can explore this project on your own that is available on GitHub: dimitryzub/walmart-stores-coffee-analysis

We'll cover:

  1. Extracting data from Walmart Organic results.
  2. Extracting data from all store pages using SerpApi pagination.
  3. Extracting data from 500 Walmart stores. SerpApi provides JSON Walmart Stores Locations with 4.460 stores in total.
  4. My full process of exploratory data analysis.

📌Note: the following script extracts data in a sync fashion, which is slower.

SerpApi provides async parameter that allows processing a lot more results without waiting for SerpApi response. Those results will be processed at the SerpApi backend and can be retrieved in up to 31 days.

If you're interested in extracting decent amounts of data (10.000+ requests) that take less than 5 hours, sign up for our blog so you don't miss it.

Extracting Walmart Data

Explanation

First, we need to install libraries:

$ pip install pandas matplotlib seaborn jinja2 google-search-results
LibraryPurpose
pandasPowerful data analysis manipulation tool.
matplotlibMatplotlib is a comprehensive Python library for visualizing data.
seabornA high-level interface that built on top of matplotlib for drawing attractive and informative statistical graphics.
jinja2Python template engine. Used for pandas table viz.
google-search-resultsSerpApi Python API wrapper.

Secondly, we need to import libraries

from serpapi import WalmartSearch
from urllib.parse import (parse_qsl, urlsplit)
import pandas as pd
import os, re, json
LibraryPurpose
urllib.parseFor SerpApi pagination purposes.
osTo read secret API key. Don't show your API key publicly.
reTo extract parts of the data from a string.
jsonFor the most part for pretty-printing.

Next, we read Walmart Stores JSON file:

# to get store ID
store_ids = pd.read_json('<path_to>/walmart-stores.json')

Next, we create a constant variable of coffee types. This will be used to check the coffee type later from the listing's title:

COFFEE_TYPES = [
    'Espresso',
    '...',
    'Black Ivory Coffee'
]

After that, we're iterating over store_ids and passing store id to store_id SerpApi query paramater:

for store_id in store_ids['store_id']:
    params = {
        'api_key': os.getenv('API_KEY'),  # serpapi api key
        'engine': 'walmart',              # search engine
        'device': 'desktop',              # device type
        'query': 'coffee',                # search query
        'store_id': store_id
    }

    search = WalmartSearch(params)       # where data extraction happens

In the next step, we need to create a list where extracted data will be temporarily stored, and create a page number counter for demonstration purposes only:

data = []
page_num = 0

Next step would be to create a while loop which will iterate over all store pages:

while True:
    results = search.get_json()      # JSON -> Python dict

After that, we need to check if there're any errors returned from SerpApi, and break if any:

if results['search_information']['organic_results_state'] == 'Fully empty':
    print(results['error'])
    break

Increment a page counter and print it:

page_num += 1
print(f'Current page: {page_num}')

After that, we need to extract data and standardize it as much as possible:

for result in results.get('organic_results', []):
    title = result.get('title').lower()

    # https://regex101.com/r/h0jTPG/1
    try:
        weight = re.search(r'(\d+-[Ounce|ounce]+|\d{1,3}\.?\d{1,2}[\s|-]?[ounce|oz]+)', title).group(1)
    except: weight = 'not mentioned'

    # ounce -> grams
    # formula: https://www.omnicalculator.com/conversion/g-to-oz#how-to-convert-grams-to-ounces
    # https://regex101.com/r/wWMUQd/1
    if weight != 'not mentioned':
        weight_formatted_to_gramms = round(float(re.sub(r'[^\d.]+', '', weight)) * 28.34952/1, 1)

    # loop through COFFEE_TYPES and checks if result.title has coffee type
    # found match or matches will be stored as a list with ',' separator
    coffee_type = ','.join([i for i in COFFEE_TYPES if i.lower() in title])

After that, we need to append the data to temporary list:

data.append({
    'title': title,
    'coffee_type': coffee_type.lower(),
    'rating': result.get('rating'),
    'reviews': result.get('reviews'),
    'seller_name': result.get('seller_name').lower() if result.get('seller_name') else result.get('seller_name'),
    'thumbnail': result.get('thumbnail'),
    'price': result.get('primary_offer').get('offer_price'),
    'weight': weight,
    'weight_formatted_to_gramms': weight_formatted_to_gramms
})

And then check if there's next page in serpapi_pagination hash key and paginate, otherwise break the while loop:

# check if the next page key is present in the JSON
# if present -> split URL in parts and update to the next page
if 'next' in results.get('serpapi_pagination', {}):
    search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))
else:
    break

And finally, after all looping done, we need to save data. I choose JSON and CSV with pandas to_csv and to_json methods:

pd.DataFrame(data=data).to_json('coffee-listings-from-all-walmart-stores.json', orient='records')
pd.DataFrame(data=data).to_csv('coffee-listings-from-all-walmart-stores.csv', index=False)

Full code

There's also a code example in the online IDE (Replit) so you can play around:

from serpapi import WalmartSearch
from urllib.parse import (parse_qsl, urlsplit)
import pandas as pd
import os, re, json


store_ids = pd.read_json('<path_to>/walmart-stores.json')

COFFEE_TYPES = [
    'Espresso',
    'Double Espresso',
    'Red Eye',
    'caramel flavored',
    'caramel',
    'colombian',
    'french',
    'italian',
    'Black Eye',
    'Americano',
    'Long Black',
    'Macchiato',
    'Long Macchiato',
    'Cortado',
    'Breve',
    'Cappuccino',
    'Flat White',
    'Cafe Latte',
    'Mocha',
    'Vienna',
    'Affogato',
    'gourmet coffee',
    'Cafe au Lait',
    'Iced Coffee',
    'Drip Coffee',
    'French Press',
    'Cold Brew Coffee',
    'Pour Over Coffee',
    'Cowboy Coffee',
    'Turkish Coffee',
    'Percolated Coffee',
    'Infused Coffee',
    'Vacuum Coffee',
    'Moka Pot Coffee',
    'Espresso Coffee',
    'Antoccino',
    'Cafe Bombon',
    'Latte',
    'City roast',
    'American roast',
    'Half City roast',
    'New England roast',
    'Light City roast',
    'Blonde roast',
    'Cinnamon roast',
    'Breakfast roast',
    'Full City roast',
    'Continental roast',
    'High roast',
    'New Orleans roast',
    'Espresso roast',
    'Viennese roast',
    'European roast',
    'French roast',
    'Italian roast',
    'Galao',
    'Caffe Americano',
    'Cafe Cubano',
    'Cafe Zorro',
    'Doppio',
    'Espresso Romano',
    'Guillermo',
    'Ristretto',
    'Cafe au lait ',
    'Coffee with Espresso',
    'Dead eye',
    'Botz',
    'Nitro Coffee',
    'Bulletproof Coffee',
    'Black tie',
    'Red tie',
    'Dirty chai latte',
    'Yuenyeung',
    "Bailey's Irish Cream and Coffee",
    'Caffè Corretto',
    'Rüdesheimer Kaffee',
    'Pharisee',
    'Barraquito',
    'Carajillo',
    'Irish coffee',
    'Melya',
    'Espressino',
    'Caffè Marocchino',
    'Café miel',
    'Cafe Borgia',
    'Café de olla',
    'Café Rápido y Sucio',
    'Coffee with a flavor shot',
    'Caffè Medici',
    'Egg coffee',
    'Kopi susu',
    'Vienna Coffee',
    'Iced lattes',
    'Iced mochas',
    'Ca phe sua da',
    'Eiskaffee',
    'Frappé',
    'Freddo Espresso',
    'Freddo Cappuccino',
    'Mazagran',
    'Palazzo',
    'Ice Shot',
    'Shakerato',
    'Instant Coffee',
    'Canned Coffee',
    'Coffee Milk',
    'South Indian Coffee',
    'Pocillo',
    'Arabica',
    'Robusta Beans',
    'Liberica Beans',
    'Excelsa Beans',
    'White Roast Coffee',
    'Light Roast',
    'Medium Roast',
    'Medium-Dark Roast',
    'medium dark',
    'medium dark roast',
    'Classic Roast',
    'black silk ground coffee',
    'black rifle coffee',
    'Dark Roast',
    'Kopi Luwak Coffee',
    'Jacu Bird Coffee',
    'Black Ivory Coffee'
]

for store_id in store_ids['store_id']:
    params = {
        'api_key': os.getenv('API_KEY'),  # serpapi api key
        'engine': 'walmart',              # search engine
        'device': 'desktop',              # device type
        'query': 'coffee',                # search query
        'store_id': store_id
    }

    search = WalmartSearch(params)       # where data extraction happens
    print(params)                        # just to show the progress

    data = []
    page_num = 0

    while True:
        results = search.get_json()      # JSON -> Python dict

        if results['search_information']['organic_results_state'] == 'Fully empty':
            print(results['error'])
            break

        page_num += 1
        print(f'Current page: {page_num}')

        for result in results.get('organic_results', []):
            title = result.get('title').lower() if result.get('title') else result.get('title')

            # https://regex101.com/r/h0jTPG/1
            try:
                weight = re.search(r'(\d+-[Ounce|ounce]+|\d{1,3}\.?\d{1,2}[\s|-]?[ounce|oz]+)', title).group(1)
            except: weight = 'not mentioned'

            # ounce -> grams
            # formula: https://www.omnicalculator.com/conversion/g-to-oz#how-to-convert-grams-to-ounces
            # https://regex101.com/r/wWMUQd/1
            if weight != 'not mentioned':
                weight_formatted_to_gramms = round(float(re.sub(r'[^\d.]+', "", weight)) * 28.34952/1, 1)

            coffee_type = ",".join([i for i in COFFEE_TYPES if i.lower() in title])

            data.append({
                'title': title,
                'coffee_type': coffee_type.lower(),
                'rating': result.get('rating'),
                'reviews': result.get('reviews'),
                'seller_name': result.get('seller_name').lower() if result.get('seller_name') else result.get('seller_name'),
                'thumbnail': result.get('thumbnail'),
                'price': result.get('primary_offer').get('offer_price'),
                'weight': weight,
                'weight_formatted_to_gramms': weight_formatted_to_gramms
            })

        # check if the next page key is present in the JSON
        # if present -> split URL in parts and update to the next page
        if 'next' in results.get('serpapi_pagination', {}):
            search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))
        else:
            break

pd.DataFrame(data=data).to_json('coffee-listings-from-all-walmart-stores.json', orient='records')
pd.DataFrame(data=data).to_csv('coffee-listings-from-all-walmart-stores.csv', index=False)

Exploratory Analysis

Key Takeaways

  1. The Most popular coffee seller is Walmart.
  2. The Most popular coffee type is medium roast.
  3. More weight (grams) doesn't equal higher price.
    • A lower gram coffee may cost more than a higher gram coffee.
  4. The Highest coffee weight is 2835 grams (2.8 kg).
  5. "Folgers classic roast ground coffee" has 15k+ reviews which is the maximum value from data set.
  6. ~300-500 grams is the most frequent weight.
  7. The Highest coffee price is $77 (Lavazza perfetto single-serve k-cup)

Questions to Answer

I wrote those questions at the beginning of the exploration to track their progress. Keep in mind that those questions only reflect my personal interest.

  • What coffee title has the most reviews?
  • What coffee title has the most rating?
  • What is the most popular seller?
  • What coffee title has the highest/lowest price?
  • What is the sum weight in grams?
  • What coffee title has the highest/lowest weight (grams)?
  • What is the most popular coffee type?
  • Most frequent coffee grams?
  • What is the sum weight in grams?
  • Higher weight (grams) = bigger price?
  • Lower weight (grams) = lower price?

Process

Install libraries and tell matplotlib to plot inline (inside notebook) with the help of % magic functions which sets the backend of matplotlib to the inline backend:

%pip install pandas matplotlib seaborn jinja2 # install libraries
%matplotlib inline

Next, we import pandas and read_csv() that was earlier extracted using Walmart Search Engine Results API from SerpApi:

import pandas as pd

coffee_df = pd.read_csv('/workspace/serpapi-demo-projects/walmart-coffee-analysis/data/coffee-listings-from-all-walmart-stores.csv')
coffee_df.head()
title coffee_type rating reviews seller_name thumbnail price weight weight_formatted_to_gramms
0 folgers classic roast ground coffee, 40.3-ounce classic roast 3.8 93 walmart.com i5.walmartimages.com/asr/1fbbd523-8554... 13.92 40.3-ounce 1142.5
1 café bustelo, espresso style dark roast ground... espresso,dark roast 4.7 914 walmart.com i5.walmartimages.com/asr/99a53df0-0471... 3.76 10 oz 283.5
2 folgers classic roast ground coffee, medium ro... medium roast,classic roast 4.4 740 walmart.com i5.walmartimages.com/asr/e6aba325-608e... 9.97 25.9 ounce 734.3
3 maxwell house original roast ground coffee, 42... NaN 4.8 1321 walmart.com i5.walmartimages.com/asr/a5be9586-b75d... 9.92 42.5 oz 1204.9
4 great value classic roast medium ground coffee... classic roast 4.7 1598 walmart.com i5.walmartimages.com/asr/de42310c-4cd6... 9.98 48 oz 1360.8

Now I wanted to check the overall info about the DataFrame. Here I was looking at Dtype of each column and if it's correct:

coffee_df.info()

Outputs:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1400 entries, 0 to 1399
Data columns (total 9 columns):
  #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
  0   title                       1400 non-null   object 
  1   coffee_type                 1121 non-null   object 
  2   rating                      1400 non-null   float64
  3   reviews                     1400 non-null   int64  
  4   seller_name                 1400 non-null   object 
  5   thumbnail                   1400 non-null   object 
  6   price                       1400 non-null   float64
  7   weight                      1400 non-null   object 
  8   weight_formatted_to_gramms  1400 non-null   float64
dtypes: float64(3), int64(1), object(5)
memory usage: 98.6+ KB

Next I want to see the overall numeric info, such as mean, min, max values and if they make sense:

coffee_df.describe()
rating reviews price weight_formatted_to_gramms
count 1400.000000 1400.000000 1400.000000 1400.000000
mean 3.982643 440.853571 14.041343 621.391786
std 1.518037 879.351997 10.257832 369.564693
min 0.000000 0.000000 0.000000 0.000000
25% 4.300000 16.000000 7.950000 340.200000
50% 4.600000 136.000000 12.735000 567.000000
75% 4.800000 604.500000 16.990000 850.500000
max 5.000000 15148.000000 77.090000 2835.000000
coffee_df.shape

# (1400, 9)

Here I wanted to take a quick glance at correlation between numeric columns using DataFrame.corr():

import seaborn as sns
import matplotlib as plt

ax = sns.heatmap(coffee_df.corr(numeric_only=True), annot=True, cmap='Blues')
ax.set_title('Correlation between variables')

Here we can see a very slight correlation between weight of the coffee and its price which pretty logical:

image

What coffee title has the most reviews?

Answer: folgers classic roast ground coffee, medium ro... / 15148 reviews

What coffee title has the most rating?

Answer: community coffee caf special decaf medium-dark roast coffee single-serve cups 36 ct box compatible with keurig 2.0 k-cup brewers / rating of 5 and 108 reviews

coffee_df.query('reviews == reviews.max()')[['title', 'reviews']]
title reviews
46 folgers classic roast ground coffee, medium ro... 15148
coffee_df.query('rating == rating.max()')[['title', 'rating', 'reviews']].sort_values(
    by='reviews', ascending=False
).style.hide(axis='index').background_gradient(cmap='Blues')

title rating reviews
community coffee caf special decaf medium-dark roast coffee single-serve cups 36 ct box compatible with keurig 2.0 k-cup brewers 5.000000 108
cameron's coffee jamaican me crazy ground coffee, light roast, 12 oz 5.000000 29
caf bustelo ground coffee, dark roast, 6-ounce brick 5.000000 20
vispak zlatna bosnian coffee, 35.2 oz 5.000000 11
black rifle ready-to-drink coffee, espresso with cream, 11oz, can 5.000000 8
eldorado espresso brick 9 oz 5.000000 7
black rifle ready-to-drink coffee, espresso mocha, 11oz, can 5.000000 6
death wish coffee, organic, fair trade, pumpkin chai ground coffee, 12 oz, bag 5.000000 4
death wish coffee, organic, fair-trade, espresso roast ground, 14oz, bag 5.000000 4
kauai coffee na pali coast k-cup coffee pods, dark roast, 24 ct 5.000000 4
papanicholas coffee hawaiian island blend whole bean 2lb bag 5.000000 3
yaucono ground coffee 10 oz can 5.000000 3
verena street nine mile sunset ground coffee, dark roast, 32 ounces 5.000000 3
folgers 100% colombian coffee, medium roast ground coffee, 22.6 ounce canister 5.000000 3
verena street cow tipper flavored ground coffee, medium roast, 12 ounces 5.000000 2
black rifle coffee just black single-serve pods, medium roast, 22 ct 5.000000 2
folgers gourmet supreme ground coffee, 22.6 ounce canisters 5.000000 2
seattle's best premeasured coffee packs, signature-level 3, 2 oz packet, 18/box 5.000000 2
ruta maya organic coffee dark roast 2.2 lbs. 5.000000 2
starbucks stainless with coffee core everyday gift 5.000000 2
cafe el morro espresso dark roast caffeinated ground coffee, 8.8 oz 5.000000 2
caf la carreta cuban espresso coffee 10 oz brick 5.000000 2
folgers half caff ground coffee, medium roast, 22.6-ounce 5.000000 2
gold coffee company 100% arabica morning blend ground coffee, dark roast, 10 oz 5.000000 2
postum coffee alternative roasted wheat 5.000000 2
papanicholas coffee cinnamon hazelnut ground 12oz bag 5.000000 2
folgers breakfast blend ground coffee, smooth & mild coffee, 33.7 ounce canister 5.000000 2
hawaii coffee lion coffee, 10 oz 5.000000 1
arabica ground 100% coffee 126 pk 5.000000 1
papanicholas italian expresso 24ct 5.000000 1
the coffee bean & tea leaf house blend light roast single serve coffee for keurig brewers, 1 box of 16 (16 total pods) 5.000000 1
kaladi brothers coffee coffee wb red goat 16 oz 5.000000 1
black rifle coffee spirit of '76, medium roast, ground coffee, 12 oz 5.000000 1
ruta maya organic coffee medium roast 2.2 pounds 5.000000 1
lion coffee, vanilla macadamia flavor light roast - ground coffee, 24 ounce bag 5.000000 1
dunkin' 10 ounce cold roast & ground coffee each 5.000000 1
arabica ground 100% coffee 126 pk 5.000000 1
arabica ground 100% coffee 126 pk 5.000000 1
cafe aroma espresso ground coffee, dark roast caffeinated, 8.8 oz 5.000000 1
black rifle coffee spirit of '76 single-serve pods, medium roast, 22 ct 5.000000 1
arabica ground 100% coffee 126 pk 5.000000 1
arabica ground 100% coffee 126 pk 5.000000 1
arabica ground 100% coffee 126 pk 5.000000 1
stumptown coffee roasters organic blend whole bean coffee, dark roast, 12 oz 5.000000 1
black rifle coffee just black single-serve pods, medium roast, 44 ct 5.000000 1

Answer: Walmart

plt.title('Most Popular Coffee Seller on Walmart')

# value_counts() to count how many times each seller is repeated
# head() display only top 10
# sort_values() to sort from highest to lowest
# and plot the data

ax = coffee_df['seller_name'].value_counts().head(10).sort_values().plot(kind='barh', figsize=(13,5))
ax.bar_label(ax.containers[0])

plt.xlabel('Number of listings')
plt.show()

image

What is the sum weight in grams?

Answer: 869948.5 grams which is 870 kilograms 🤓

# sum of the all grams
grams = coffee_df['weight_formatted_to_gramms'].sum()
kilorgrams = round(coffee_df['weight_formatted_to_gramms'].sum() / 1000)

print(f'Grams: {grams}\nKilograms: {kilorgrams}')
Grams: 869948.5
Kilograms: 870

What coffee title has the highest/lowest price?

Answer highest: lavazza perfetto single-serve k-cup® pods for keurig brewer, dark roast, 10-ct boxes (pack of 6) / price $77.09

Answer lowest:

  • folgers classic roast instant coffee, single serve packets / price $1
  • classic decaf instant coffee crystals packets, 6 count / price $1
Highest price
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html
coffee_df.query('price == price.max()')[['title', 'price']]
title price
1275 lavazza perfetto single-serve k-cup® pods for ... 77.09
Lowest price
  • folgers classic roast instant coffee, single serve packets / price $1
  • folgers classic decaf instant coffee crystals packets, 6 count / price $1
# condition with .min() won't work: coffee_df.loc[(coffee_df['price'] != 0) & (coffee_df['price'].min()), ['price']]

# != 0 to exclude 0 values
coffee_df.loc[(coffee_df['price'] != 0) & (coffee_df['price'] < 1.1), ['title', 'price']]
title price
1109 folgers classic roast instant coffee, single s... 1.0
1304 folgers classic decaf instant coffee crystals ... 1.0

What coffee title has the highest/lowest weight (grams)?

Answer highest:

TitleAnswer
victor allen's coffee variety pack, 100 count, single serve coffee pods for keurig k-cup brewers2835.0 grams
eight o'clock the original medium roast k-cup coffee pods, 100 ct2835.0 grams
royal kona coffee for royalty chocolate macadamia nut, 10% kona coffee blend, all purpose grind2835.0 grams

Answer lowest:

TitleAnswer
great value classic roast ground coffee pods, 0.31 oz, 48 count8.8 gram
great value classic roast ground coffee pods, 0.31 oz, 12 count8.8 gram
great value classic roast ground coffee pods, 0.31 oz, 96 count8.8 gram
Highest Weight
coffee_df.query('weight_formatted_to_gramms == weight_formatted_to_gramms.max()')[
    ['title', 'weight_formatted_to_gramms']
]
title weight_formatted_to_gramms
334 victor allen's coffee variety pack, 100 count,... 2835.0
986 eight o'clock the original medium roast k-cup ... 2835.0
987 royal kona coffee for royalty chocolate macada... 2835.0
Lowest Weight

There are 7 indexes with 0.0 values due to not correctly extracted data.

coffee_df.query('weight_formatted_to_gramms == weight_formatted_to_gramms.min()')[
    'weight_formatted_to_gramms'
]

Outputs:

305     0.0
425     0.0
703     0.0
890     0.0
957     0.0
991     0.0
1188    0.0
Name: weight_formatted_to_gramms, dtype: float64

I've decided to check weight that doesn't equal 0 and weight doesn't higher than 10 grams:

coffee_df.loc[
    (coffee_df['weight_formatted_to_gramms'] != 0.0)
    & (coffee_df['weight_formatted_to_gramms'] < 10),
    ['title', 'weight_formatted_to_gramms'],
]
title weight_formatted_to_gramms
227 great value classic roast ground coffee pods, ... 8.8
289 great value classic roast ground coffee pods, ... 8.8
1317 great value classic roast ground coffee pods, ... 8.8

Answer (Top 3):

  1. Medium Roast
  2. Dark Roast
  3. Arabica

📌Note: There're 125 different types of coffee and some of them could be missing.

Take a look at COFFEE_TYPES list under extraction.py

Split coffee types and extend them to a new list
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html
coffee_types = coffee_df['coffee_type'].fillna(method='ffill') # fill with None

coffee_types_new = []

# split by comma and extend the new list with split values
# https://stackoverflow.com/a/27886807/15164646
for coffee_type in coffee_types:
    coffee_types_new.extend(coffee_type.split(','))
Get the most frequent coffee type
coffee_types_new_series = pd.Series(coffee_types_new).dropna()

plt.title('Most Popular Coffee Types')

ax = coffee_types_new_series.value_counts().sort_values(ascending=True).plot(kind='barh', figsize=(15,10))
ax.bar_label(ax.containers[0]) # bar annotation

plt.ylabel('Coffee Type')
plt.xlabel('Number of Occurrences')

plt.show()

image

Most frequent coffee weight?

Answer: ~300-500 grams

sns.jointplot(data=coffee_df, x='price', y='weight_formatted_to_gramms', kind='hex')

image

Additional Plots
g = sns.PairGrid(coffee_df[['price', 'weight_formatted_to_gramms']], height=4)
g.map_upper(sns.histplot)
g.map_lower(sns.kdeplot, fill=True)
g.map_diag(sns.histplot, kde=True)

image

sns.displot(
    coffee_df,
    x='price',
    y='weight_formatted_to_gramms',
    kind='kde',
    fill=True,
    thresh=0,
    levels=100,
    cmap='mako',
)

image

Higher weight (grams) = bigger price?

Answer: with this dataset it depends.

Lower weight (grams) = lower price?

Answer: with this dataset it depends.

sns.jointplot(data=coffee_df, x='price', y='weight_formatted_to_gramms', kind='reg')

image

ax = sns.scatterplot(
    data=coffee_df,
    x='price',
    y='weight_formatted_to_gramms',
    hue='price',
    size='price',
    sizes=(40, 300),
)

# https://stackoverflow.com/a/34579525/15164646
sns.move_legend(ax, 'upper left', bbox_to_anchor=(1, 1.02)) # 1 = X axis, 1.02 = Y axis of the legend.

image

Conclusion

In this blog post we've covered:

Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞