Installation¶
Prerequisites¶
The prerequisites for Scrapy Django Dashboard are as follows:
- Python
3.7.7 - Django
3.0.6 - Django Grappelli
2.14.2 - Scrapy
2.1.0 - scrapy-djangoitem
1.1.1 - Python JSONPath RW
1.4.0 - Python-Future
0.17.1(Easy, clean, reliable Python 2/3 compatibility)
For scheduling mechanism, install django-celery 3.3.1:
Note
Due to the compatibility issues, the selected versions of celery 3.1, kombu 3.0.37 and django-celery 3.3.1 reside in root dir. I have also made a quick fix in kombu 3.0.37 package to circumvent this well known issue.
TypeError: __init__() missing 1 required positional argument: 'on_delete'
Find more about Django ORM <on_delete> by reading the documentation.
For scraping images, install Pillow (PIL fork) 5.4.1:
For javascript rendering, install Scrapy-Splash 0.7.2 and Splash (Optional).
Manual Installation¶
Clone the source code with git
git clone https://github.com/0xboz/scrapy_django_dashboard.git
Note
RECOMMENDATION: Run the code in a virtualenv. For the sake of this docs, let us use pyenv to cheery-pick the local Python interpreter, create a virtual environment for the sample project, and finally install all required packages list in requirements.txt.
If you are running Debian OS, you are in luck. You can install pyenv with a simple script.
sudo apt install -y curl && curl https://raw.githubusercontent.com/0xboz/install_pyenv_on_debian/master/install.sh | bash
If you are planning to uninstall pyenv sometime in the future, run this command:
curl https://raw.githubusercontent.com/0xboz/install_pyenv_on_debian/master/uninstall.sh | bash
Install Python and set it as the default interpreter locally.
pyenv install 3.7.7
pyenv local 3.7.7
Create a virtualenv with pyenv-virtualenv.
pyenv virtualenv venv
Activate this virtualenv.:
pyenv activate venv
Install all the required packages.
(venv) pip install -r requirements.txt
In case you need to exit from this virtual environment.
(venv) pyenv deactivate
Splash (Optional)¶
Scrapy Django Dashboard supports Splash (A javascript rendering service).
Install Splash (see Splash Installation Instructions).
Tested versions:
- Splash
1.8 - Splash
2.3
Once Splash is up running, install Scrapy-Splash
(venv) pip install scrapy-splash
Refer to Scrapy-Splash GitHub configuration page for further instructions.
To customize Splash args, use DSCRAPER_SPLASH_ARGS (see: Settings).
Splash can be later activated in Django Admin dashboard.
Note
Resources needed for rendering a website are larger than those for working with the plain HTML text, turn on Splash feature when necessary.