Create Collection
Create a Collection to run up to 15,000 Requests (100 when adding requests with include_html=true set) on a schedule. You can have up to 10,000 Collections set up on your Rainforest API account at any one time.
Collections are created by making an HTTP POST to the /collections endpoint. POST parameters can be either supplied as x-www-form-urlencoded parameters or by POST'ing a JSON object to the /collections endpoint.
Using include_html=true in Collections When adding requests with include_html=true to a Collection the maximum number of requests per Collection is 100 (rather than 15,000) because including the HTML within the response makes the Collection Result Sets much larger. The limit is in place to ensure Result Set files are of a manageable size. If you have need to run a large number of requests all with include_html=true then simply split the requests across multiple 100-request Collections.
POST /collections
Let's start with a simple example; we'll create a Collection that is set to run every day at 9am and 5pm. We'll use the notification_email parameter to specify an email address to send notifications to (when the Collection completes), the notification_as_csv parameter to request the Collection results be delivered in CSV format and the destination_ids parameter to request that the Result Sets generated by running the Collection be uploaded to the specified Destination ID of "ABCDEFG":
Rainforest API responds with the following JSON confirming the Collection has been successfully created. Note the Collection id , you'll use the Collection ID in subsequent calls to update, delete and add Requests to the Collection.
Collections can be scheduled on a monthly, weekly or daily basis. You can also set a Collection to be started manually, either via the Rainforest API Dashboard or the Start Collection API.
A Collections' schedule is defined by the schedule_type parameter. The values of schedule_type , and other associated parameters are shown below:
Collection schedules are executed in the timezone set up on your profile
schedule_type | Scheduling Parameters |
---|---|
monthly |
|
weekly |
|
daily |
|
manual | When schedule_type=manual the Collection must be manually started, either via the Rainforest API Dashboard or the Start Collection API . |
POST /collections
The following parameters can be used when creating a Collection. The api_key parameter should be supplied as a querystring parameter on the request URL.
Parameters other than api_key should be supplied in either x-www-form-urlencoded or JSON format, in the request body.
Parameter | Required | Description |
---|---|---|
api_key | required | The API key for your account, should be supplied on the request querystring, i.e. api_key=demo |
name | required | The name of the Collection. |
enabled | optional | Determines whether the Collection is enabled or not. Disabled Collections do not automatically start on their schedule. Parameter Values
|
schedule_type | optional | Determines the type of schedule the Collections uses. Parameter Values
|
priority | optional | Determines the priority of the Collection. When multiple Collections are queued, Rainforest API starts them in priority order. Learn more about priorities . Parameter Values
|
schedule_days_of_month | optional | Determines the days of the month the Collection is run when schedule_type=monthly or schedule_type=minutes , expressed as an array of integers, i.e. [1,3,20] for the 1st, 3rd and 20th of the month. Note When using schedule_days_of_month in combination with schedule_type=minutes the Collection is executed at the per-minute frequency set in schedule_minutes on the days of the month set in schedule_days_of_month . For example, if you set schedule_type=minutes , schedule_minutes=every_5_minutes and schedule_days_of_month=[1,3,20] your Collection would be run every 5 minutes on the 1st, 3rd and 20th of the month. When schedule_type=minutes you can use either schedule_days_of_month or schedule_days_of_week , not both simultaneously. |
schedule_days_of_week | optional | Determines the days of the week the Collection is run when schedule_type=weekly or schedule_type=minutes , expressed as an array of integers where 0=Sunday and 6=Saturday, i.e. [0,2] for Sunday & Tuesday. Note When using schedule_days_of_week in combination with schedule_type=minutes the Collection is executed at the per-minute frequency set in schedule_minutes on the days of the week set in schedule_days_of_week . For example, if you set schedule_type=minutes , schedule_minutes=every_5_minutes and schedule_days_of_week=[0,2] your Collection would be run every 5 minutes on Sunday & Tuesday. When schedule_type=minutes you can use either schedule_days_of_week or schedule_days_of_month , not both simultaneously. |
schedule_hours | optional | Determines the hours of the day the Collection is run when schedule_type=minutes , schedule_type=monthly , schedule_type=weekly or schedule_type=daily . Expressed as an array of integers between 0 (midnight) and 23 (11pm), i.e. [9,17] for 9am & 5pm. Note Collection schedules are executed in the timezone set up on your profile . Note When using schedule_hours in combination with schedule_type=minutes the Collection is executed at the per-minute frequency set in schedule_minutes throughout the hours set in schedule_hours . For example, if you set schedule_type=minutes , schedule_minutes=every_5_minutes and schedule_hours=[10,11,12] your Collection would be run every 5 minutes between the hours of 10am and 1pm. If you wish for a schedule_type=minutes Collection to be run continuously 24/7 then you shoud omit the schedule_hours parameter. |
schedule_minutes | optional | Determines the minutes frequency that the Collection is run when schedule_type=minutes . Parameter Values
Note When using schedule_type=minutes Collections are run at the selected frequency 24/7 unless specific hours are chosen in the schedule_hours parameter, specific days of the week are chosen in the schedule_days_of_week parameter or specific days of the month are chosen in the schedule_days_of_month parameter. |
destination_ids | optional | Specifies an array of Destination IDs (i.e. Amazon S3 Buckets, Google Cloud Storage buckets, Microsoft Azure Blob Storage or Alibaba Cloud OSS Buckets) that Result Sets from this Collection are uploaded to. Destination IDs can be retrieved through the Dashboard or via the Destinations API. |
notification_email | optional | The email address to send notifications to when new Result Sets are available for this Collection. |
notification_webhook | optional | The URL Rainforest API will send a webhook POST to when new Result Sets are available for this Collection. Should be a fully qualified URL, i.e. https://yourserver.com/rainforestapi |
notification_as_json | optional | Determines whether Rainforest API sends download links (in the email notification if notification_email is set, or in the body of the webhook POST if notification_webhook is set) for the Collection Result Sets in JSON format, or not. If destination_ids are configured for this Collection then this setting determines whether Result Set files in JSON format are uploaded to the Destinations on Collection completion. Parameter Values
|
notification_as_jsonlines | optional | Determines whether Rainforest API sends download links (in the email notification if notification_email is set, or in the body of the webhook POST if notification_webhook is set) for the Collection Result Sets in JSON Lines format, or not. If destination_ids are configured for this Collection then this setting determines whether Result Set files in JSON Lines format are uploaded to the Destinations on Collection completion. Parameter Values
|
notification_as_csv | optional | Determines whether sends download links (in the email notification if notification_email is set, or in the body of the webhook POST if notification_webhook is set) for the Collection Result Sets CSV format, or not. To set the CSV fields included in the Result Set use the csv_fields parameter. If destination_ids are configured for this Collection then this setting determines whether Result Set files in CSV format are uploaded to the Destinations on Collection completion. Parameter Values
|
notification_csv_fields | optional | Determines the fields that are returned when notification_as_csv=true . Should be specified as a comma seperated list of fields (in nested field, dot notation, format). For more information see the CSV Fields Reference . |
requests_type | optional | Determines whether this Collection is locked to only allow requests of the specified type to be added to it. Locking a Collection to a specific request type has several benefits including allowing Rainforest API to automatically choose the correct CSV fields, appropriate to the requests in the Collection, to be selected by default when exporting a Result Set in CSV mode. It can also help organize your account when you have many Collections set up. To allow any requests type to be added either omit requests_type entirely, or set it to mixed . When set, the requests_type_locked=true property is present when retrieving the Collection letting you know that the Collection is locked. If you leave the Collection as unlocked (i.e. by setting requests_type=mixed ) then requests_type_locked=false will be present when retrieving the Collection and requests_type will be set to either mixed (in the case of the Collection containing mixed requests types) or a requests_type value automatically detected based on the type of requests that have been subsequently added to the Collection. Note The requests_type cannot be changed after the Collection is created. For more information see the Collection Locking documentation. Valid values for requests_type are: Parameter Values
|
Next Steps Update Collections