How To Scrape Multiple Websites With One Spider

Lately, I’ve come across a scraping job where I needed to scrape the same kind of information from multiple websites. The whole story was to create a spider that scrapes price data of certain products from various ecommerce sites. Also each scraped item needed to have a unique id (uuid). So I decided to write a single spider to scrape each website rather than writing multiple spiders for each one.

scrapy multiple sites

How To Scrape Multiple Websites With One Spider

Because I wanted to make it work with one single spider I had to write product name and price selectors specifically for each website. Though for some websites I could use the same selector.

The product URLs were predefined so I didn’t have to deal with some pagination, following and such. Just put a ton of URLs into start_urls. Right after scrapy processes a request, in the parsing function I figure out which website is being scraped at the moment and select the name and price selectors accordingly.

 

Choosing the Right Selector For Each Website

So I pass a shop meta parameter in the request to be able to figure out which shop’s website is being parsed. Then with a bunch of IFs (which is ugly but whatever) I assign the right name and price selector to the item which is about to be populated. Here’s the code:

It’s not a fancy solution I guess but it does get the job done which was the priority now.

Download FREE ebook!

As a quick off-topic, I’ve updated my scrapy-templates Github repo. I modified all the pagination code in the templates because of the latest Scrapy release. Now the recommended way to create requests is to use response.follow function.