Web scraping The Home Depot Product Info with Nodejs
What will be scraped
Full code
If you don't need an explanation, have a look at the full code example in the online IDE
import dotenv from "dotenv";
import { config, getJson } from "serpapi";
dotenv.config();
config.api_key = process.env.API_KEY; //your API key from serpapi.com
const engine = "home_depot_product"; // search engine
const params = {
product_id: "318783386", // The Home Depot identifier of a product
};
const getResults = async () => {
const json = await getJson(engine, params);
return json.product_results;
};
getResults().then((result) => console.dir(result, { depth: null }));
Why use The Home Depot Product API from SerpApi?
Using API generally solves all or most problems that might get encountered while creating own parser or crawler. From webscraping perspective, our API can help to solve the most painful problems:
Bypass blocks from supported search engines by solving CAPTCHA or IP blocks.
No need to create a parser from scratch and maintain it.
Pay for proxies, and CAPTCHA solvers.
Don't need to use browser automation if there's a need to extract data in large amounts faster.
Head to the Playground for a live and interactive demo.
Preparation
First, we need to create a Node.js* project and add npm
packages serpapi
and dotenv
.
To do this, in the directory with our project, open the command line and enter:
$ npm init -y
And then:
$ npm i serpapi dotenv
*If you don't have Node.js installed, you can download it from nodejs.org and follow the installation documentation.
SerpApi package is used to scrape and parse search engine results using SerpApi. Get search results from Google, Bing, Baidu, Yandex, Yahoo, Home Depot, eBay, and more.
dotenv package is a zero-dependency module that loads environment variables from a
.env
file intoprocess.env
.
Next, we need to add a top-level "type" field with a value of "module" in our package.json
file to allow using ES6 modules in Node.JS:
For now, we complete the setup Node.JS environment for our project and move to the step-by-step code explanation.
Code explanation
First, we need to import dotenv
from dotenv
library, and config
and getJson
from serpapi
library:
import dotenv from "dotenv";
import { config, getJson } from "serpapi";
Then, we apply some config. Call dotenv
config()
method and set your SerpApi Private API key to global config
object.
dotenv.config();
config.api_key = process.env.API_KEY; //your API key from serpapi.com
dotenv.config()
will read your.env
file, parse the contents, assign it toprocess.env
, and return an object with aparsed
key containing the loaded content or anerror
key if it failed.config.api_key
allows you declare a globalapi_key
value by modifying the config object.
Next, we write search engine
and write the necessary search parameters for making a request:
const engine = "home_depot_product"; // search engine
const params = {
product_id: "318783386", // The Home Depot identifier of a product
};
You can use the next search params:
product_id
The Home Depot identifier of a product.delivery_zip
ZIP Postal code. To filter the shipping products by a selected area.store_id
Store Id to filter the products by the specific store only.no_cache
parameter will force SerpApi to fetch the App Store Search results even if a cached version is already present. A cache is served only if the query and all parameters are exactly the same. Cache expires after 1h. Cached searches are free, and are not counted towards your searches per month. It can be set tofalse
(default) to allow results from the cache, ortrue
to disallow results from the cache.no_cache
andasync
parameters should not be used together.async
parameter defines the way you want to submit your search to SerpApi. It can be set tofalse
(default) to open an HTTP connection and keep it open until you got your search results, ortrue
to just submit your search to SerpApi and retrieve them later. In this case, you'll need to use our Searches Archive API to retrieve your results.async
andno_cache
parameters should not be used together.async
should not be used on accounts with Ludicrous Speed enabled.
Next, we declare the function getResult
that gets data from the page and return it:
const getResults = async () => {
...
};
In this function we get json
with results, and return product_results
from the received json
.
const json = await getJson(engine, params);
return json.product_results;
And finally, we run the getResults
function and print all the received information in the console with the console.dir
method, which allows you to use an object with the necessary parameters to change default output options:
getResults().then((result) => console.dir(result, { depth: null }));
Output
{
"product_id":"318783386",
"title":"2500-Watt Recoil Start Ultra-Light Portable Gas and Propane Powered Dual Fuel Inverter Generator with CO Shield",
"description":"The Champion Power Equipment 201122 2500-Watt Portable Inverter Generator with CO Shield is ideal for camping or tailgating. Weighing in at an ultralight 39 pounds, this model is one of the lightest 2500-watt inverters in the industry. Included are a covered 120V 20A household duplex outlet (5-20R) plus two handy 2.1A USB ports you can use to power your phone, laptop, or similar device. Just add oil (included 10W-30), and operate your Dual-Fuel generator right out of the box on gasoline or propane, and easily switch fuels with the fuel selector dial. When the 1.1-gallon tank of gasoline is full, the 79cc Champion engine produces 2500 starting watts and 1850 running watts and will run for up to 11.5 hours at 25% load. When using a 20-pound propane tank, it produces 2500 starting watts and 1665 running watts and will run for up to 34 hours at 25% load. CO Shield technology monitors the accumulation of carbon monoxide (CO), a poisonous gas produced by engine exhaust when the generator is running. If CO Shield detects unsafe elevated levels of CO gas, it automatically shuts off the engine. CO Shield is not a substitute for an indoor carbon monoxide alarm or for safe operation. DO NOT allow engine exhaust fumes to enter a confined area through windows, doors, vents or other openings. Generators must ALWAYS be used outdoors, far away from occupied buildings with engine exhaust pointed away from people and buildings. Meets the requirements of ANSI/PGMA G300-2018.",
"link":"https://www.homedepot.com/p/Champion-Power-Equipment-2500-Watt-Recoil-Start-Ultra-Light-Portable-Gas-and-Propane-Powered-Dual-Fuel-Inverter-Generator-with-CO-Shield-201122/318783386",
"upc":"817198025742",
"model_number":"201122",
"rating":"4.7821",
"reviews":"1606",
"price":899,
"highlights":[
"Ultra-quiet operation, Dual fuel flexibility (gas or propane)",
"Engine oil and oil funnel included",
"2500 starting watts and 1850 running watts"
],
"brand":{
"name":"Champion Power Equipment",
"link":"https://www.homedepot.com/b/Sports-Outdoors-Tailgating-Gear-Tailgating-Portable-Gas-Power/Champion-Power-Equipment/N-5yc1vZcbwtZ9xs"
},
"images":[
[
"https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_65.jpg",
"https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_100.jpg",
"https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_145.jpg",
"https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_300.jpg",
"https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_400.jpg",
"https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_600.jpg",
"https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_1000.jpg"
],
...and other images
],
"bullets":[
"Need help with service or repair? Champion Power Equipment is available to help 24 hours a day, 7 days a week. Call us at 1-877-338-0999.",
"Operate your 2500-Watt portable generator right out of the box on either gasoline or propane, plus at only 39 lbs., this inverter is 1 of the lightest 2500-Watt inverters in the industry",
"With an ultra-quiet 53 dBA from 23 ft., enjoy 2500-Watt starting watt, 1850-Watt running watt and up to 11.5-hours run time on gasoline and 1665-Watt running watt and up to 34-hours on propane",
"Optional, sold-separately parallel kit enables this inverter to connect with another 2500-Watt Champion inverter to double your output power",
"Includes a 120-Volt 20 Amp household duplex outlet (5-20R) with clean electricity (less than 3% THD) and 2 convenient USB ports",
"Includes 3-year limited warranty with free lifetime technical support from dedicated experts",
"<br /><br /><center><img src=\"https://inlinecontent.thdstatic.com/28I/CHAMPION POWER EQUIPMENT/Champinline.jpg\"></center><br />"
],
"info_and_guides":[
{
"title":"SDS",
"link":"https://images.thdstatic.com/catalog/pdfImages/15/15de2a59-eb0b-443f-85dd-f6b773d618df.pdf"
},
{
"title":"Replacement Part List",
"link":"https://images.thdstatic.com/catalog/pdfImages/f9/f999c0c7-2df4-432d-86e1-fff36b2d3de9.pdf"
},
{
"title":"Service and Repairs",
"link":"https://images.thdstatic.com/catalog/pdfImages/70/70c2dc77-a1d9-4d45-9610-a723523171ab.pdf"
},
{
"title":"Product Brochure",
"link":"https://images.thdstatic.com/catalog/pdfImages/c7/c767f0c7-bd5e-4b78-ad4d-0bf2272ca158.pdf"
},
{
"title":"Product Label in Spanish",
"link":"https://images.thdstatic.com/catalog/pdfImages/be/be815184-f129-40b7-829d-34d3629168fe.pdf"
},
{
"title":"Full Product Manual",
"link":"https://images.thdstatic.com/catalog/pdfImages/a5/a531c5cb-34f4-4d3f-8992-86ab7b774b1c.pdf"
}
],
"specifications":[
{
"key":"Details",
"value":[
{
"name":"Application",
"value":"Campsite,Recreation,Tailgating"
},
{
"name":"Built-in inverter",
"value":"Yes"
},
... and other details
]
},
{
"key":"Warranty / Certifications",
"value":[
{
"name":"Certifications and Listings",
"value":"CARB Compliant,CARB Compliant,EPA Approved,EPA Approved,FCC Listed"
},
{
"name":"Manufacturer Warranty",
"value":"3 Year Limited Warranty"
}
]
},
{
"key":"Dimensions",
"value":[
{
"name":"Product Height (in.)",
"value":"17.7 in"
},
{
"name":"Product Length (in.)",
"value":"17.3 in"
},
{
"name":"Product Width (in.)",
"value":"11.5 in"
}
]
}
],
"fulfillment":{
"countity":1385,
"store":"Bangor",
"options":[
{
"type":"Ship to Home",
"title":"Get it by",
"arrival_time":[
"Dec 26",
"Dec 26"
],
"bottom":"Free delivery"
},
{
"type":"Schedule delivery",
"title":"Not available for this item"
},
{
"type":"Ship to store",
"title":"Pickup",
"arrival_time":[
"Dec 22",
"Dec 28"
],
"bottom":"FREE"
}
]
}
}
How to extract products results and then extract product data
You can get products (with product_id
) using our Web scraping The Home Depot Search with Nodejs blog post and then get product data with extracted product_id
. Or you can use the simple code example, which shows you how to get products and get all these products info:
import dotenv from "dotenv";
import { config, getJson } from "serpapi";
dotenv.config();
config.api_key = process.env.API_KEY; //your API key from serpapi.com
const getResults = async () => {
const productsInfo = [];
const { products } = await getJson("home_depot", { q: "refrigerator" });
for (const { product_id } of products) {
const { product_results } = await getJson("home_depot_product", { product_id });
productsInfo.push(product_results);
}
return productsInfo;
};
getResults().then((result) => console.dir(result, { depth: null }));
First, we get and destructure product from the received JSON with getJson
function(we use "home_depot"
search engine and params object with search query (q
) "refrigerator"
).
Then, we use for...of
loop, destructure product_id
from each received product and get and destructure product_results
from received JSON with getJson
function(for now we use "home_depot_product"
search engine and params object with product_id
). And then we add the received results to the productsInfo
array (using push()
method).
Links
If you want other functionality added to this blog post or if you want to see some projects made with SerpApi, write me a message.
Add a Feature Request๐ซ or a Bug๐