web scraping business

Building a Web Scraping Based SaaS Business, part 1

This is not a tutorial on how to scrape the web. This is something new I’m trying out. I’ve decided to document the whole process of creating my new “business”.  And as you would think from the post’s title it is based on online gathered data. That’s why I’m gonna…

scrapy ajax

Crawling with Scrapy – AJAX Forms and Infinite Scrolling

AJAX stands for Asynchronous JavaScript And XML (nowadays JSON instead). With AJAX websites can send and receive data from the server in the background, without reloading the whole page. This technique became really popular because it makes it easier to load data from the server in a convenient way. In…

scrapy javascript

Crawling with Scrapy – Javascript Generated Content

It’s really hard to find a modern website which doesn’t use javascript technology. It just makes it easier to create dynamic and fancy websites. When you want to scrape javascript generated content from a website you will realize that Scrapy or other web scraping libraries cannot run javascript code while…

Crawling with Scrapy – Download Images

One of the most useful features of Scrapy is that it can download and process images. For example in the ecommerce world, retail companies use web scraping technology to make use of online data of products. Scraping images is necessary in order to match competitors’ products with their own products….

scrapy-settings

Crawling with Scrapy – Crawling Settings

Scrapy provides a convenient way to customize the crawling settings of your scraper. Including the core mechanism, pipelines and spiders.  When you create a new scrapy project with scrapy startproject command you will find a settings.py file. Here you can customize your scraper’s settings. Scrapy Settings Let’s examine the key…

login scraping

Crawling with Scrapy – Login to Websites

There are situations when you have to be logged in to access the data you are after. When using scrapy it should not discourage you because scrapy deals with login forms and cookies easily. Be aware that when you need to login to reach the data it is not accessible…

scrapy cloud

Crawling with Scrapy – Scrapy Cloud

  As I always say web scraping is really useful and inevitable sometimes. Making raw web data useful is very important nowadays. If you’ve followed my Scrapy tutorial series you already know how to scrape hundreds of thousands of pages with Scrapy. (If you don’t click the link) Another great…

scrapy debug

Crawling with Scrapy – How to Debug Your Spider

When you write a software it’s obvious that sooner or later there will be a function or method which doesn’t work as you expected or doesn’t work at all. It’s the same when you code a web scraper and it doesn’t scrape a piece of data or the response you…

Crawling with Scrapy – ItemLoader

Item Loaders are used to populate your items. Earlier, you learnt how to create Scrapy Items and store your scraped data in them. Essentially, Item Loaders provide a way to populate these Items and run any input or output process you want alongside. Maybe you need to parse the scraped…

scrapy beautifulsoup