programming stack

Building a Web Scraping Based SaaS Business, part 3

Hey I’m back again with a new business documentation post. The last time we talked about how I validated my idea without starting to code. Now in this one, I’m gonna go (sort of) deep on the technical side. What programming languages I use for what. Which web framework I…

validate idea

Building a Web Scraping Based SaaS Business, part 2

In the opening post of this series about building my business I gave you a quick overall view what this blog post series will be about. Now in this one I want to be as specific and detailed as possible how I validated the idea before writing any piece of…

web scraping business

Building a Web Scraping Based SaaS Business, part 1

This is not a tutorial on how to scrape the web. This is something new I’m trying out. I’ve decided to document the whole process of creating my new “business”.  And as you would think from the post’s title it is based on online gathered data. That’s why I’m gonna…

scrapy ajax

Crawling with Scrapy – AJAX Forms and Infinite Scrolling

AJAX stands for Asynchronous JavaScript And XML (nowadays JSON instead). With AJAX websites can send and receive data from the server in the background, without reloading the whole page. This technique became really popular because it makes it easier to load data from the server in a convenient way. In…

scrapy javascript

Crawling with Scrapy – Javascript Generated Content

It’s really hard to find a modern website which doesn’t use javascript technology. It just makes it easier to create dynamic and fancy websites. When you want to scrape javascript generated content from a website you will realize that Scrapy or other web scraping libraries cannot run javascript code while…

Crawling with Scrapy – Download Images

One of the most useful features of Scrapy is that it can download and process images. For example in the ecommerce world, retail companies use web scraping technology to make use of online data of products. Scraping images is necessary in order to match competitors’ products with their own products….

scrapy-settings

Crawling with Scrapy – Crawling Settings

Scrapy provides a convenient way to customize the crawling settings of your scraper. Including the core mechanism, pipelines and spiders.  When you create a new scrapy project with scrapy startproject command you will find a settings.py file. Here you can customize your scraper’s settings. Scrapy Settings Let’s examine the key…

login scraping

Crawling with Scrapy – Login to Websites

There are situations when you have to be logged in to access the data you are after. When using scrapy it should not discourage you because scrapy deals with login forms and cookies easily. Be aware that when you need to login to reach the data it is not accessible…

scrapy cloud

Crawling with Scrapy – Scrapy Cloud

  As I always say web scraping is really useful and inevitable sometimes. Making raw web data useful is very important nowadays. If you’ve followed my Scrapy tutorial series you already know how to scrape hundreds of thousands of pages with Scrapy. (If you don’t click the link) Another great…

scrapy debug

Crawling with Scrapy – How to Debug Your Spider

When you write a software it’s obvious that sooner or later there will be a function or method which doesn’t work as you expected or doesn’t work at all. It’s the same when you code a web scraper and it doesn’t scrape a piece of data or the response you…