If you need to provide data to a spider within a given project, you can use the API, or the python-scrapinghub library to store the data in collections.
You can use collections to store an arbitrary number of records which are indexed by a key. Projects often use them as a single location to write data from multiple jobs.
The example below shows how you can create a collection and add some data:
$ curl -u APIKEY: -X POST -d '{"_key": "first_name", "value": "John"}{ "_key": "last_name", "value": "Doe"}' https://storage.zyte.com/collections/79855/s/form_filling
To retrieve the data, you would then simply do:
$ curl -u APIKEY: -X GET "https://storage.zyte.com/collections/79855/s/form_filling?key=first_name&key=last_name"
{"value":"John"}
{"value":"Doe"}
And finally, you can delete the data by sending a DELETE request:
$ curl -u APIKEY: -X DELETE "https://storage.zyte.com/collections/79855/s/form_filling"
Using python-scrapinghub programatically
As mentioned before, the python-scrapinghub library can be used to handle the API calls programatically. Here's a sample code that shows how to use the library within a simple python script:
scrapinghub import ScrapinghubClient API_KEY = 'APIKEY' PROJECT_ID = '12345' COLLECTION 'collection-name' client = ScrapinghubClient(API_KEY) project = client.get_project(PROJECT_ID) collection = project.collections.get_store(COLLECTION) collection.set({ '_key': '002d050ee3ff6192dcbecc4e4b4457d7', 'value': '1447221694537' }) collections.get('002d050ee3ff6192dcbecc4e4b4457d7') # Returns {'value': '1447221694537'} collections.iter() # Returns a Generator object
You can find more information about the library's full API within its documentation