Installation

Prerequisites

The prerequisites for Scrapy Django Dashboard are as follows:

For scheduling mechanism, install django-celery 3.3.1:

Note

Due to the compatibility issues, the selected versions of celery 3.1, kombu 3.0.37 and django-celery 3.3.1 reside in root dir. I have also made a quick fix in kombu 3.0.37 package to circumvent this well known issue.

TypeError: __init__() missing 1 required positional argument: 'on_delete'

Find more about Django ORM <on_delete> by reading the documentation.

For scraping images, install Pillow (PIL fork) 5.4.1:

For javascript rendering, install Scrapy-Splash 0.7.2 and Splash (Optional).

Manual Installation

Clone the source code with git

git clone https://github.com/0xboz/scrapy_django_dashboard.git

Note

RECOMMENDATION: Run the code in a virtualenv. For the sake of this docs, let us use pyenv to cheery-pick the local Python interpreter, create a virtual environment for the sample project, and finally install all required packages list in requirements.txt.

If you are running Debian OS, you are in luck. You can install pyenv with a simple script.

sudo apt install -y curl && curl https://raw.githubusercontent.com/0xboz/install_pyenv_on_debian/master/install.sh | bash

If you are planning to uninstall pyenv sometime in the future, run this command:

curl https://raw.githubusercontent.com/0xboz/install_pyenv_on_debian/master/uninstall.sh | bash

Install Python and set it as the default interpreter locally.

pyenv install 3.7.7
pyenv local 3.7.7

Create a virtualenv with pyenv-virtualenv.

pyenv virtualenv venv

Activate this virtualenv.:

pyenv activate venv

Install all the required packages.

(venv) pip install -r requirements.txt

In case you need to exit from this virtual environment.

(venv) pyenv deactivate

Splash (Optional)

Scrapy Django Dashboard supports Splash (A javascript rendering service).

Install Splash (see Splash Installation Instructions).

Tested versions:

  • Splash 1.8
  • Splash 2.3

Once Splash is up running, install Scrapy-Splash

(venv) pip install scrapy-splash

Refer to Scrapy-Splash GitHub configuration page for further instructions.

To customize Splash args, use DSCRAPER_SPLASH_ARGS (see: Settings).

Splash can be later activated in Django Admin dashboard.

Note

Resources needed for rendering a website are larger than those for working with the plain HTML text, turn on Splash feature when necessary.