Advice and answers from the Scrapinghub Team

⚠ Note: You will need the Scrapinghub command line client to deploy projects into Scrapy Cloud, so install it if you have not done so yet. If you already have it installed, make sure you have the latest version:

$ pip install shub --upgrade

The next step is to deploy your Scrapy project to Scrapy Cloud. You will need your API key and the numeric ID of your Scrapy Cloud project. You can find both of these on your project’s Code & Deploys page. First, run:

$ shub login

to save your API key to a local file (~/.scrapinghub.yml). You can delete it from there anytime via shub logout. Next, run:

$ shub deploy

to be guided through a wizard that will set up the project configuration file (scrapinghub.yml) for you. After you complete the wizard, your project will be uploaded to Scrapy Cloud. You can re-trigger deployment (without having to go through the wizard again) anytime via another call to shub deploy.

Now you can schedule your spider to run on Scrapy Cloud:

$ shub schedule quotes-toscrape
Spider quotes-toscrape scheduled, job ID: 99830/1/1
Watch the log on the command line:
    shub log -f 1/1
or print items as they are being scraped:
    shub items -f 1/1
or watch it running in Scrapinghub's web interface:
    https://app.scrapinghub.com/p/99830/job/1/1

And watch it run (replace 1/1 with the job ID shub gave you on the previous command, you can leave out the project ID):

shub log -f 1/1

Alternatively, you can go to your project page and schedule the spider there:

Once the job has finished, or while it’s running, you can click on the job to review the scraped data and other information about the job:

Did this answer your question?