Advice and answers from the Scrapinghub Team

⚠ Upgrade shub to the latest version via: pip install shub --upgrade 

You can define the requirements that you want to deploy to Scrapinghub via your local project's scrapinghub.yml. If you don't have this file in your project folder, run shub deploy to generate it.

After that, create a requirements.txt file (if you haven't yet) and add your dependencies to it, one dependency per line:

js2xml==0.2.1
extruct==0.1.0
requests==2.6.0

Note: you should always set the specific version for each of your requirements, as shown in the example above. Check this warning for details.

After creating the file, add the requirements_file setting to scrapinghub.yml pointing it to your project's requirements.txt path:

projects:
  default: 12345
requirements_file: requirements.txt

Now, when you run shub deploy again, it will deploy your project's dependencies to Scrapinghub too.

Things to keep in mind

Don't Set Requirements in Editable Mode

Scrapinghub doesn't support package installation in editable mode (also known as setuptools develop mode), so if your requirements.txt contains the following entries

js2xml
-e https://github.com/scrapinghub/extruct/archive/10cbb3a.zip#egg=extruct

please consider changing it to

js2xml
https://github.com/scrapinghub/extruct/archive/10cbb3a.zip#egg=extruct

Specify Each Requirement Version

The build process aggressively cache requirements, so pointing to a non-specific version of your requirement is not a good idea as you can't be sure which version of your code is going to be build.

So this is BAD:

js2xml
extruct
requests
git+git://github.com/scrapinghub/extruct#egg=extruct

This is GOOD:

js2xml==0.2.1
extruct==0.1.0
requests==2.6.0
git+git://github.com/scrapinghub/extruct@10cbb3a#egg=extruct

And this is faster and BETTER:

js2xml==0.2.1
extruct==0.1.0
requests==2.6.0
https://github.com/scrapinghub/extruct/archive/10cbb3a.zip#egg=extruct

You can learn more about requirements.txt  file format here.

This video demonstrates how to deploy dependencies to Scrapy Cloud:

Did this answer your question?