Web scraping The Home Depot Search with Nodejs

A step-by-step tutorial on creating The Home Depot Search web scraper in Nodejs.

What will be scraped

what

Full code

If you don't need an explanation, have a look at the full code example in the online IDE

import dotenv from "dotenv";
import { config, getJson } from "serpapi";
import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";

dotenv.config();
const rl = readline.createInterface({ input, output });
config.api_key = process.env.API_KEY; //your API key from serpapi.com

const engine = "home_depot"; // search engine
const resultsLimit = 40; // hardcoded limit for demonstration purpose
const params = {
  q: "inverter generatot", // Parameter defines the search query
  page: 1, // Value is used to get the items on a specific page
};

const getSearchRefinement = async () => {
  const preResults = await getJson(engine, params);
  let fixedQuery;
  if (preResults?.search_information?.spelling_fix) {
    fixedQuery = preResults.search_information.spelling_fix;
  }
  return { query: fixedQuery, filters: preResults.filters };
};

const applyNewSearchParams = async ({ query, filters }) => {
  if (query) {
    const answer = await rl.question(`Do you want to change the search query from "${params.q}" to "${query}"? y/n: `);
    if (answer.toLowerCase() === "y") {
      params.q = query;
      console.log(`Now the search query is: "${query}"`);
    } else {
      console.log(`The search query didn't change`);
    }
  }
  if (filters) {
    const appliedFilters = [];
    let tokens = "";
    for (const filter of filters) {
      const answer = await rl.question(`Do you want to apply some filter from "${filter.key}" category? y/n: `);
      if (answer.toLowerCase() === "y") {
        for (const filterValue of filter.value) {
          const answer = await rl.question(`Do you want to apply "${filterValue.name}" filter from "${filter.key}" category? y/n: `);
          if (answer.toLowerCase() === "y") {
            tokens += `${filterValue.value},`;
            appliedFilters.push(`${filter.key}: ${filterValue.name}`);
          }
        }
      }
    }
    rl.close();
    if (tokens) {
      params.hd_filter_tokens = tokens.slice(0, -1);
    }
  }
};

const getResults = async () => {
  const results = [];
  while (true) {
    const json = await getJson(engine, params);
    if (json.products) {
      results.push(...json.products);
      params.page += 1;
    } else break;
    if (results.length >= resultsLimit) break;
  }
  return results;
};

getSearchRefinement()
  .then(applyNewSearchParams)
  .then(getResults)
  .then((result) => console.dir(result, { depth: null }));

Why use The Home Depot API from SerpApi?

Using API generally solves all or most problems that might get encountered while creating own parser or crawler. From webscraping perspective, our API can help to solve the most painful problems:

  • Bypass blocks from supported search engines by solving CAPTCHA or IP blocks.

  • No need to create a parser from scratch and maintain it.

  • Pay for proxies, and CAPTCHA solvers.

  • Don't need to use browser automation if there's a need to extract data in large amounts faster.

Head to the Playground for a live and interactive demo.

Preparation

First, we need to create a Node.js* project and add npm packages serpapi and dotenv.

To do this, in the directory with our project, open the command line and enter:

$ npm init -y

And then:

$ npm i serpapi dotenv

*If you don't have Node.js installed, you can download it from nodejs.org and follow the installation documentation.

  • SerpApi package is used to scrape and parse search engine results using SerpApi. Get search results from Google, Bing, Baidu, Yandex, Yahoo, Home Depot, eBay, and more.

  • dotenv package is a zero-dependency module that loads environment variables from a .env file into process.env.

Next, we need to add a top-level "type" field with a value of "module" in our package.json file to allow using ES6 modules in Node.JS:

ES6Module

For now, we complete the setup Node.JS environment for our project and move to the step-by-step code explanation.

Code explanation

First, we need to import dotenv from dotenv library, config and getJson from serpapi library, readline from readline Node.js built-in library, stdin and stdout (declare them as input and output) from process Node.js built-in library:

import dotenv from "dotenv";
import { config, getJson } from "serpapi";
import readline from "node:readline/promises";
import { stdin as input, stdout as output } from "node:process";

Then, we apply some config. Call dotenv config() method, call readline createInterface method with parameters input and output and set it to rl constant, set your SerpApi Private API key to global config object.

dotenv.config();
const rl = readline.createInterface({ input, output });
config.api_key = process.env.API_KEY; //your API key from serpapi.com
  • dotenv.config() will read your .env file, parse the contents, assign it to process.env, and return an object with a parsed key containing the loaded content or an error key if it failed.

  • readline.createInterface() creates a new readlinePromises.Interface instance. Once the readlinePromises.Interface instance is created, the most common case is to listen for the 'line' event.

  • config.api_keyallows you declare a global api_key value by modifying the config object.

Next, we write the necessary search parameters. We define search engine, how many results we want to receive (resultsLimit constant) and params object with q and page parameters for making a request:

📌Note: I specifically made a mistake in the search query to demonstrate how The Home Depot Spell Check API works.

const engine = "home_depot"; // search engine
const resultsLimit = 40; // hardcoded limit for demonstration purpose
const params = {
  q: "inverter generatot", // Parameter defines the search query
  page: 1, // Value is used to get the items on a specific page
};

You can use the next search params:

  • q parameter defines the search query. You can use anything that you would use in a regular The Home Depot search.

  • hd_sort parameter defines results sorted by diferent options. It can be set to: top_sellers (Top Sellers), price_low_to_high (Price Low to High), price_high_to_low (Price High to Low), top_rated (Top Rated), best_match (Best Match).

  • hd_filter_tokens used to pass filter tokens divided by comma. Filter tokens can be obtained from API response.

  • delivery_zip ZIP Postal code. To filter the shipping products by a selected area.

  • store_id store Id to filter the products by the specific store only.

  • lowerbound defines lower bound for price in USD.

  • upperbound defines upper bound for price in USD.

  • nao defines offset for products result. A single page contains 24 products. First page offset is 0, second -> 24, third -> 48 and so on.

  • page value is used to get the items on a specific page. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.).

  • ps determines the number of items per page. There are scenarios where Home depot overrides the ps value. By default Home depot returns 24 results.

  • no_cache parameter will force SerpApi to fetch the App Store Search results even if a cached version is already present. A cache is served only if the query and all parameters are exactly the same. Cache expires after 1h. Cached searches are free, and are not counted towards your searches per month. It can be set to false (default) to allow results from the cache, or true to disallow results from the cache. no_cache and async parameters should not be used together.

  • async parameter defines the way you want to submit your search to SerpApi. It can be set to false (default) to open an HTTP connection and keep it open until you got your search results, or true to just submit your search to SerpApi and retrieve them later. In this case, you'll need to use our Searches Archive API to retrieve your results. async and no_cache parameters should not be used together. async should not be used on accounts with Ludicrous Speed enabled.

Next, we declare the function getSearchRefinement that gets preliminary results. In this function we return available filters for this search and check if preResults has spelling_fix, set it to fixedQuery, and return it:

const getSearchRefinement = async () => {
  const preResults = await getJson(engine, params);
  let fixedQuery;
  if (preResults?.search_information?.spelling_fix) {
    fixedQuery = preResults.search_information.spelling_fix;
  }
  return { query: fixedQuery, filters: preResults.filters };
};

Next, we declare the function applyNewSearchParams which allows setting fixed query and filters. We need to destructure function arguments object to query and filters:

const applyNewSearchParams = async ({ query, filters }) => {
  ...
};

In this function, we need to check if query is present, and after that, we print a question about the change search query in the console (question() method) and wait for the answer from the user. If the user's answer is 'y' we set a new search query to the params object:

if (query) {
  const answer = await rl.question(`Do you want to change the search query from "${params.q}" to "${query}"? y/n: `);
  if (answer.toLowerCase() === "y") {
    params.q = query;
    console.log(`Now the search query is: "${query}"`);
  } else {
    console.log(`The search query didn't change`);
  }
}

Next, we need to ask the same questions about applying received filters, by running each filter result using for...of loop:

if (filters) {
  const appliedFilters = [];
  let tokens = "";
  for (const filter of filters) {
    const answer = await rl.question(`Do you want to apply some filter from "${filter.key}" category? y/n: `);
    if (answer.toLowerCase() === "y") {
      for (const filterValue of filter.value) {
        const answer = await rl.question(`Do you want to apply "${filterValue.name}" filter from "${filter.key}" category? y/n: `);
        if (answer.toLowerCase() === "y") {
          tokens += `${filterValue.value},`;
          appliedFilters.push(`${filter.key}: ${filterValue.name}`);
        }
      }
    }
  }
  ...
}

Next, we close readline stream (rl.close() method) and if the user chooses some filter, remove the comma at the end of tokens string and set it to the hd_filter_tokens value in the params object:

rl.close();
if (tokens) {
  params.hd_filter_tokens = tokens.slice(0, -1);
}

Next, we declare the function getResults that gets results from all pages (using pagination) and return it:

const getResults = async () => {
  ...
};

In this function we need to declare an empty results array, then using while loop to get json with results, add products from each page and set next page index (to params.page value).

If there are no more results on the page or if the number of received results is more than resultsLimit we stop the loop (using break) and return an array with results:

const results = [];
while (true) {
  const json = await getJson(engine, params);
  if (json.products) {
    results.push(...json.products);
    params.page += 1;
  } else break;
  if (results.length >= resultsLimit) break;
}
return results;

And finally, run the getSearchRefinement function and using Promise chaining run applyNewSearchParams and getResults functions. Then we print all the received information in the console with the console.dir method, which allows you to use an object with the necessary parameters to change default output options:

getSearchRefinement()
  .then(applyNewSearchParams)
  .then(getResults)
  .then((result) => console.dir(result, { depth: null }));

Output

[
  {
    "position": 1,
    "product_id": "318783386",
    "title": "2500-Watt Recoil Start Ultra-Light Portable Gas and Propane Powered Dual Fuel Inverter Generator with CO Shield",
    "thumbnails": [
      [
        "https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_65.jpg",
        "https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_100.jpg",
        "https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_145.jpg",
        "https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_300.jpg",
        "https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_400.jpg",
        "https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_600.jpg",
        "https://images.thdstatic.com/productImages/95951f39-703a-4efc-b12b-ad3a3276e73c/svn/champion-power-equipment-inverter-generators-201122-64_1000.jpg"
      ],
      [
        "https://images.thdstatic.com/productImages/27667698-e4fc-488f-b713-7f51e83d5b5b/svn/champion-power-equipment-inverter-generators-201122-e4_65.jpg",
        "https://images.thdstatic.com/productImages/27667698-e4fc-488f-b713-7f51e83d5b5b/svn/champion-power-equipment-inverter-generators-201122-e4_100.jpg",
        "https://images.thdstatic.com/productImages/27667698-e4fc-488f-b713-7f51e83d5b5b/svn/champion-power-equipment-inverter-generators-201122-e4_145.jpg",
        "https://images.thdstatic.com/productImages/27667698-e4fc-488f-b713-7f51e83d5b5b/svn/champion-power-equipment-inverter-generators-201122-e4_300.jpg",
        "https://images.thdstatic.com/productImages/27667698-e4fc-488f-b713-7f51e83d5b5b/svn/champion-power-equipment-inverter-generators-201122-e4_400.jpg",
        "https://images.thdstatic.com/productImages/27667698-e4fc-488f-b713-7f51e83d5b5b/svn/champion-power-equipment-inverter-generators-201122-e4_600.jpg",
        "https://images.thdstatic.com/productImages/27667698-e4fc-488f-b713-7f51e83d5b5b/svn/champion-power-equipment-inverter-generators-201122-e4_1000.jpg"
      ]
    ],
    "link": "https://www.homedepot.com/p/Champion-Power-Equipment-2500-Watt-Recoil-Start-Ultra-Light-Portable-Gas-and-Propane-Powered-Dual-Fuel-Inverter-Generator-with-CO-Shield-201122/318783386",
    "serpapi_link": "https://serpapi.com/search.json?delivery_zip=04401&engine=home_depot_product&product_id=318783386&store_id=2414",
    "model_number": "201122",
    "brand": "Champion Power Equipment",
    "collection": "https://www.homedepot.com/collection/Outdoors/Champion-Inverter-Generators-Accessories-Collection/Family-311405669?omsid=318783386",
    "rating": 4.7821,
    "reviews": 1606,
    "price": 899,
    "delivery": {
      "free": true,
      "free_delivery_threshold": false
    },
    "pickup": {
      "free_ship_to_store": true
    }
  }
]

If you want other functionality added to this blog post or if you want to see some projects made with SerpApi, write me a message.


Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞