Collections
...
Collections API
Collections

Create Collection

7min

Create Collection

Create a Collection to run up to 15,000 Requests (100 when adding requests with include_html=true set) on a schedule. You can have up to 10,000 Collections set up on your Rainforest API account at any one time.

Collections are created by making an HTTP POST to the /collections endpoint. POST parameters can be either supplied as x-www-form-urlencoded parameters or by POST'ing a JSON object to the /collections endpoint.

Using include_html=true in Collections When adding requests with include_html=true to a Collection the maximum number of requests per Collection is 100 (rather than 15,000) because including the HTML within the response makes the Collection Result Sets much larger. The limit is in place to ensure Result Set files are of a manageable size. If you have need to run a large number of requests all with include_html=true then simply split the requests across multiple 100-request Collections.



Example

POST /collections

Let's start with a simple example; we'll create a Collection that is set to run every day at 9am and 5pm. We'll use the notification_email parameter to specify an email address to send notifications to (when the Collection completes), the notification_as_csv parameter to request the Collection results be delivered in CSV format and the destination_ids parameter to request that the Result Sets generated by running the Collection be uploaded to the specified Destination ID of "ABCDEFG":

Curl
Node.js
Python
PHP


Rainforest API responds with the following JSON confirming the Collection has been successfully created. Note the Collection id , you'll use the Collection ID in subsequent calls to update, delete and add Requests to the Collection.

JSON



Scheduling Collections

Collections can be scheduled on a monthly, weekly or daily basis. You can also set a Collection to be started manually, either via the Rainforest API Dashboard or the Start Collection API.

A Collections' schedule is defined by the schedule_type parameter. The values of schedule_type , and other associated parameters are shown below:

Collection schedules are executed in the timezone set up on your profile

schedule_type

Scheduling Parameters

monthly

  1. schedule_days_of_month
  2. Array of days of the month to run the Collection, i.e. [1,20] for the 1st & 20th



  1. schedule_hours
  2. Array of hours of the day to run the Collection, i.e. [9,17] for 9am & 5pm

weekly

  1. schedule_days_of_week
  2. Array of days of the week (as integers) to run the Collection where 0=Sunday and 6=Saturday, i.e. [0,2] for Sunday & Tuesday



  1. schedule_hours
  2. Array of hours of the day to run the Collection, i.e. [9,17] for 9am & 5pm

daily

  1. schedule_hours
  2. Array of hours of the day to run the Collection, i.e. [9,17] for 9am & 5pm

manual

When schedule_type=manual the Collection must be manually started, either via the Rainforest API Dashboard or the Start Collection API .

Collection Parameters

POST /collections

The following parameters can be used when creating a Collection. The api_key parameter should be supplied as a querystring parameter on the request URL.

Parameters other than api_key should be supplied in either x-www-form-urlencoded or JSON format, in the request body.

Parameter

Required

Description

api_key

required

The API key for your account, should be supplied on the request querystring, i.e. api_key=demo

name

required

The name of the Collection.

enabled

optional

Determines whether the Collection is enabled or not. Disabled Collections do not automatically start on their schedule.

Parameter Values

  1. true
  2. Collection is enabled



  1. false
  2. Collection is disabled

schedule_type

optional

Determines the type of schedule the Collections uses. Parameter Values

  1. monthly
  2. Collection is run on a monthly schedule, use schedule_days_of_month to determine which days of the month and schedule_hours to determine which hours on those days.



  1. weekly
  2. Collection is run on a weekly schedule, use schedule_days_of_week to determine which days of the week and schedule_hours to determine which hours on those days.



  1. daily
  2. Collection is run on a daily schedule, use schedule_hours to determine which hours of the day the Collection is run.



  1. minutes
  2. Collection is run on a Per X Minutes schedule, use schedule_minutes to determine the minutes frequency that the Collection is run. You may optionally use schedule_hours , schedule_days_of_week and schedule_days_of_month to refine the hours of the day, days of the week or days of the month that you wish the Per X Minutes schedule to be run.



  1. manual
  2. Collection is run manually, use the Rainforest API Dashboard or the Start Collection API to start the Collection.

priority

optional

Determines the priority of the Collection. When multiple Collections are queued, Rainforest API starts them in priority order. Learn more about priorities .

Parameter Values

  1. highest



  1. high



  1. normal



  1. low



  1. lowest

schedule_days_of_month

optional

Determines the days of the month the Collection is run when schedule_type=monthly or schedule_type=minutes , expressed as an array of integers, i.e. [1,3,20] for the 1st, 3rd and 20th of the month.

Note When using schedule_days_of_month in combination with schedule_type=minutes the Collection is executed at the per-minute frequency set in schedule_minutes on the days of the month set in schedule_days_of_month . For example, if you set schedule_type=minutes , schedule_minutes=every_5_minutes and schedule_days_of_month=[1,3,20] your Collection would be run every 5 minutes on the 1st, 3rd and 20th of the month. When schedule_type=minutes you can use either schedule_days_of_month or schedule_days_of_week , not both simultaneously.

schedule_days_of_week

optional

Determines the days of the week the Collection is run when schedule_type=weekly or schedule_type=minutes , expressed as an array of integers where 0=Sunday and 6=Saturday, i.e. [0,2] for Sunday & Tuesday.

Note When using schedule_days_of_week in combination with schedule_type=minutes the Collection is executed at the per-minute frequency set in schedule_minutes on the days of the week set in schedule_days_of_week . For example, if you set schedule_type=minutes , schedule_minutes=every_5_minutes and schedule_days_of_week=[0,2] your Collection would be run every 5 minutes on Sunday & Tuesday. When schedule_type=minutes you can use either schedule_days_of_week or schedule_days_of_month , not both simultaneously.

schedule_hours

optional

Determines the hours of the day the Collection is run when schedule_type=minutes , schedule_type=monthly , schedule_type=weekly or schedule_type=daily . Expressed as an array of integers between 0 (midnight) and 23 (11pm), i.e. [9,17] for 9am & 5pm.

Note Collection schedules are executed in the timezone set up on your profile .

Note When using schedule_hours in combination with schedule_type=minutes the Collection is executed at the per-minute frequency set in schedule_minutes throughout the hours set in schedule_hours . For example, if you set schedule_type=minutes , schedule_minutes=every_5_minutes and schedule_hours=[10,11,12] your Collection would be run every 5 minutes between the hours of 10am and 1pm. If you wish for a schedule_type=minutes Collection to be run continuously 24/7 then you shoud omit the schedule_hours parameter.

schedule_minutes

optional

Determines the minutes frequency that the Collection is run when schedule_type=minutes .

Parameter Values

  1. every_minute
  2. Run the Collection every minute



  1. every_5_minutes
  2. Run the Collection every 5 minutes



  1. every_10_minutes
  2. Run the Collection every 10 minutes



  1. every_15_minutes
  2. Run the Collection every 15 minutes



  1. every_20_minutes
  2. Run the Collection every 20 minutes



  1. every_25_minutes
  2. Run the Collection every 25 minutes



  1. every_30_minutes
  2. Run the Collection every 30 minutes



  1. every_hour
  2. Run the Collection every hour



Note When using schedule_type=minutes Collections are run at the selected frequency 24/7 unless specific hours are chosen in the schedule_hours parameter, specific days of the week are chosen in the schedule_days_of_week parameter or specific days of the month are chosen in the schedule_days_of_month parameter.

destination_ids

optional

Specifies an array of Destination IDs (i.e. Amazon S3 Buckets, Google Cloud Storage buckets, Microsoft Azure Blob Storage or Alibaba Cloud OSS Buckets) that Result Sets from this Collection are uploaded to. Destination IDs can be retrieved through the Dashboard or via the Destinations API.

notification_email

optional

The email address to send notifications to when new Result Sets are available for this Collection.

notification_webhook

optional

The URL Rainforest API will send a webhook POST to when new Result Sets are available for this Collection. Should be a fully qualified URL, i.e. https://yourserver.com/rainforestapi

notification_as_json

optional

Determines whether Rainforest API sends download links (in the email notification if notification_email is set, or in the body of the webhook POST if notification_webhook is set) for the Collection Result Sets in JSON format, or not.

If destination_ids are configured for this Collection then this setting determines whether Result Set files in JSON format are uploaded to the Destinations on Collection completion.

Parameter Values

  1. true
  2. JSON output format enabled / sent



  1. false
  2. JSON output format disabled / not sent

notification_as_jsonlines

optional

Determines whether Rainforest API sends download links (in the email notification if notification_email is set, or in the body of the webhook POST if notification_webhook is set) for the Collection Result Sets in JSON Lines format, or not.

If destination_ids are configured for this Collection then this setting determines whether Result Set files in JSON Lines format are uploaded to the Destinations on Collection completion.

Parameter Values

  1. true
  2. JSON Lines output format enabled / sent



  1. false
  2. JSON Lines output format disabled / not sent

notification_as_csv

optional

Determines whether sends download links (in the email notification if notification_email is set, or in the body of the webhook POST if notification_webhook is set) for the Collection Result Sets CSV format, or not. To set the CSV fields included in the Result Set use the csv_fields parameter.

If destination_ids are configured for this Collection then this setting determines whether Result Set files in CSV format are uploaded to the Destinations on Collection completion.

Parameter Values

  1. true
  2. CSV output format enabled / sent



  1. false
  2. CSV output format disabled / not sent

notification_csv_fields

optional

Determines the fields that are returned when notification_as_csv=true . Should be specified as a comma seperated list of fields (in nested field, dot notation, format). For more information see the CSV Fields Reference .

requests_type

optional

Determines whether this Collection is locked to only allow requests of the specified type to be added to it.

Locking a Collection to a specific request type has several benefits including allowing Rainforest API to automatically choose the correct CSV fields, appropriate to the requests in the Collection, to be selected by default when exporting a Result Set in CSV mode. It can also help organize your account when you have many Collections set up.

To allow any requests type to be added either omit requests_type entirely, or set it to mixed .

When set, the requests_type_locked=true property is present when retrieving the Collection letting you know that the Collection is locked.

If you leave the Collection as unlocked (i.e. by setting requests_type=mixed ) then requests_type_locked=false will be present when retrieving the Collection and requests_type will be set to either mixed (in the case of the Collection containing mixed requests types) or a requests_type value automatically detected based on the type of requests that have been subsequently added to the Collection.

Note The requests_type cannot be changed after the Collection is created.

For more information see the Collection Locking documentation.

Valid values for requests_type are:

Parameter Values

  1. mixed
  2. Allow requests of any type to be added to this Collection.



  1. product
  2. Lock the Collection to allow only requests of type 'product' to be added.



  1. reviews
  2. Lock the Collection to allow only requests of type 'reviews' to be added.



  1. offers
  2. Lock the Collection to allow only requests of type 'offers' to be added.



  1. search
  2. Lock the Collection to allow only requests of type 'search' to be added.



  1. bestsellers
  2. Lock the Collection to allow only requests of type 'bestsellers' to be added.



  1. category
  2. Lock the Collection to allow only requests of type 'category' to be added.



  1. deals
  2. Lock the Collection to allow only requests of type 'deals' to be added.



  1. also_bought
  2. Lock the Collection to allow only requests of type 'also_bought' to be added.



  1. stock_estimation
  2. Lock the Collection to allow only requests of type 'stock_estimation' to be added.



  1. questions
  2. Lock the Collection to allow only requests of type 'questions' to be added.



  1. question_answers
  2. Lock the Collection to allow only requests of type 'question_answers' to be added.



  1. seller_profile
  2. Lock the Collection to allow only requests of type 'seller_profile' to be added.



  1. seller_feedback
  2. Lock the Collection to allow only requests of type 'seller_feedback' to be added.



  1. seller_products
  2. Lock the Collection to allow only requests of type 'seller_products' to be added.



  1. reviewer_profile
  2. Lock the Collection to allow only requests of type 'reviewer_profile' to be added.



  1. autocomplete
  2. Lock the Collection to allow only requests of type 'autocomplete' to be added.



  1. author_page
  2. Lock the Collection to allow only requests of type 'author_page' to be added.



  1. shop_by_look
  2. Lock the Collection to allow only requests of type 'shop_by_look' to be added.



  1. asin_to_gtin
  2. Lock the Collection to allow only requests of type 'asin_to_gtin' to be added.



  1. store
  2. Lock the Collection to allow only requests of type 'store' to be added.



  1. charts Lock the Collection to allow only requests of type 'charts' to be added.



  1. sales_estimation
  2. Lock the Collection to allow only requests of type 'sales_estimation' to be added.



  1. formats_editions
  2. Lock the Collection to allow only requests of type 'formats_editions' to be added.


Next Steps      Update Collections

Updated 11 Aug 2024
Did this page help you?