Collections
...
Collections API
Collections
Create Collection
7min
create collection create a collection to run up to 15,000 requests (100 when adding requests with include html=true set) on a schedule you can have up to 10,000 collections set up on your rainforest api account at any one time collections are created by making an http post to the /collections endpoint post parameters can be either supplied as x www form urlencoded parameters or by post'ing a json object to the /collections endpoint using include html=true in collections when adding requests with include html=true to a collection the maximum number of requests per collection is 100 (rather than 15,000) because including the html within the response makes the collection result sets much larger the limit is in place to ensure result set files are of a manageable size if you have need to run a large number of requests all with include html=true then simply split the requests across multiple 100 request collections example post post /collections let's start with a simple example; we'll create a collection that is set to run every day at 9am and 5pm we'll use the notification email parameter to specify an email address to send notifications to (when the collection completes), the notification as csv parameter to request the collection results be delivered in csv format and the destination ids parameter to request that the result sets generated by running the collection be uploaded to the specified destination id of "abcdefg" $ curl "https //api rainforestapi com/collections?api key=demo" \\ d name="my first collection" \\ d enabled=true \\ d schedule type="daily" \\ d priority="normal" \\ d schedule hours="9,17" \\ d destination ids="abcdefg" \\ d notification email="john smith\@example com" \\ d notification as csv=trueconst axios = require('axios'); const body = { name 'my first collection', enabled true, schedule type 'daily', priority 'normal', schedule hours \[9,17], destination ids \[ 'abcdefg' ], notification email 'john smith\@example com', notification as csv true } axios post('https //api rainforestapi com/collections?api key=demo', body) then(response => { const apiresponse = response data; console log('collection created ' + json stringify(apiresponse, 0, 2)); }) catch(error => { console log(error); });import requests body = { "name" "my first collection", "enabled" true, "schedule type" "daily", "priority" "normal", "schedule hours" \[9,17], "destination ids" \[ "abcdefg" ], "notification email" "john smith\@example com", "notification as csv" true } api result = requests post('https //api rainforestapi com/collections?api key=demo', json=body) api response = api result json() print "collection created ", json dumps(api response)\<?php $body = http build query(\[ "name" => "my first collection", "enabled" => true, "schedule type" => "daily", "priority" => "normal", "schedule hours" => "9,17", "destination ids" => "abcdefg", "notification email" => "john smith\@example com", "notification as csv" => true ]); $ch = curl init('https //api rainforestapi com/collections?api key=demo'); curl setopt($ch, curlopt returntransfer, true); curl setopt($ch, curlopt followlocation, true); \# the following options are required if you're using an outdated openssl version \# more details https //www openssl org/blog/blog/2021/09/13/letsencryptrootcertexpire/ curl setopt($ch, curlopt ssl verifyhost, false); curl setopt($ch, curlopt ssl verifypeer, false); curl setopt($ch, curlopt timeout, 180); curl setopt($ch, curlopt customrequest, "post"); curl setopt($ch, curlopt postfields, $body); $json = curl exec($ch); curl close($ch); $api result = json decode($json, true); print r($api result); echo "collection created", php eol; ?> rainforest api responds with the following json confirming the collection has been successfully created note the collection id , you'll use the collection id in subsequent calls to update https //docs trajectdata com/rainforestapi/collections api/collections/update , delete https //docs trajectdata com/rainforestapi/collections api/collections/delete and add requests https //docs trajectdata com/rainforestapi/collections api/requests/create to the collection { "request info" { "success" true }, "collection" { "id" "f994420d", "created at" "2020 01 01 00 00 000z", "name" "my first collection", "schedule type" "daily", "priority" "normal", "schedule hours" \[ 9, 17 ], "destination ids" \[ "abcdefg" ], "enabled" true, "status" "idle", "request total count" 0, "request page count" 0, "next result set id" 1, "results count" 0, "notification email" "john smith\@example com", "notification as json" false, "notification as csv" true, "request type" "mixed", "request type locked" false } } scheduling collections collections can be scheduled on a monthly, weekly or daily basis you can also set a collection to be started manually, either via the rainforest api dashboard https //app rainforestapi com/collections or the start collection api https //docs trajectdata com/rainforestapi/collections api/collections/start a collections' schedule is defined by the schedule type parameter the values of schedule type , and other associated parameters are shown below collection schedules are executed in the timezone set up on your profile https //app rainforestapi com/profile schedule type scheduling parameters monthly schedule days of month array of days of the month to run the collection, i e \[1,20] for the 1st & 20th schedule hours array of hours of the day to run the collection, i e \[9,17] for 9am & 5pm weekly schedule days of week array of days of the week (as integers) to run the collection where 0=sunday and 6=saturday, i e \[0,2] for sunday & tuesday schedule hours array of hours of the day to run the collection, i e \[9,17] for 9am & 5pm daily schedule hours array of hours of the day to run the collection, i e \[9,17] for 9am & 5pm manual when schedule type=manual the collection must be manually started, either via the rainforest api dashboard https //app rainforestapi com/collections or the start collection api https //docs trajectdata com/rainforestapi/collections api/collections/start collection parameters post post /collections the following parameters can be used when creating a collection the api key parameter should be supplied as a querystring parameter on the request url parameters other than api key should be supplied in either x www form urlencoded or json format, in the request body parameter required description api key required the api key for your account, should be supplied on the request querystring, i e api key=demo name required the name of the collection enabled optional determines whether the collection is enabled or not disabled collections do not automatically start on their schedule parameter values true collection is enabled false collection is disabled schedule type optional determines the type of schedule the collections uses parameter values monthly collection is run on a monthly schedule, use schedule days of month to determine which days of the month and schedule hours to determine which hours on those days weekly collection is run on a weekly schedule, use schedule days of week to determine which days of the week and schedule hours to determine which hours on those days daily collection is run on a daily schedule, use schedule hours to determine which hours of the day the collection is run minutes collection is run on a per x minutes schedule, use schedule minutes to determine the minutes frequency that the collection is run you may optionally use schedule hours , schedule days of week and schedule days of month to refine the hours of the day, days of the week or days of the month that you wish the per x minutes schedule to be run manual collection is run manually, use the rainforest api dashboard https //app rainforestapi com/collections or the start collection api https //docs trajectdata com/rainforestapi/collections api/collections/start to start the collection priority optional determines the priority of the collection when multiple collections are queued, rainforest api starts them in priority order learn more about priorities https //docs trajectdata com/rainforestapi/collections api/priorities parameter values highest high normal low lowest schedule days of month optional determines the days of the month the collection is run when schedule type=monthly or schedule type=minutes , expressed as an array of integers, i e \[1,3,20] for the 1st, 3rd and 20th of the month note when using schedule days of month in combination with schedule type=minutes the collection is executed at the per minute frequency set in schedule minutes on the days of the month set in schedule days of month for example, if you set schedule type=minutes , schedule minutes=every 5 minutes and schedule days of month=\[1,3,20] your collection would be run every 5 minutes on the 1st, 3rd and 20th of the month when schedule type=minutes you can use either schedule days of month or schedule days of week , not both simultaneously schedule days of week optional determines the days of the week the collection is run when schedule type=weekly or schedule type=minutes , expressed as an array of integers where 0=sunday and 6=saturday, i e \[0,2] for sunday & tuesday note when using schedule days of week in combination with schedule type=minutes the collection is executed at the per minute frequency set in schedule minutes on the days of the week set in schedule days of week for example, if you set schedule type=minutes , schedule minutes=every 5 minutes and schedule days of week=\[0,2] your collection would be run every 5 minutes on sunday & tuesday when schedule type=minutes you can use either schedule days of week or schedule days of month , not both simultaneously schedule hours optional determines the hours of the day the collection is run when schedule type=minutes , schedule type=monthly , schedule type=weekly or schedule type=daily expressed as an array of integers between 0 (midnight) and 23 (11pm), i e \[9,17] for 9am & 5pm note collection schedules are executed in the timezone set up on your profile https //app rainforestapi com/profile note when using schedule hours in combination with schedule type=minutes the collection is executed at the per minute frequency set in schedule minutes throughout the hours set in schedule hours for example, if you set schedule type=minutes , schedule minutes=every 5 minutes and schedule hours=\[10,11,12] your collection would be run every 5 minutes between the hours of 10am and 1pm if you wish for a schedule type=minutes collection to be run continuously 24/7 then you shoud omit the schedule hours parameter schedule minutes optional determines the minutes frequency that the collection is run when schedule type=minutes parameter values every minute run the collection every minute every 5 minutes run the collection every 5 minutes every 10 minutes run the collection every 10 minutes every 15 minutes run the collection every 15 minutes every 20 minutes run the collection every 20 minutes every 25 minutes run the collection every 25 minutes every 30 minutes run the collection every 30 minutes every hour run the collection every hour note when using schedule type=minutes collections are run at the selected frequency 24/7 unless specific hours are chosen in the schedule hours parameter, specific days of the week are chosen in the schedule days of week parameter or specific days of the month are chosen in the schedule days of month parameter destination ids optional specifies an array of destination ids (i e amazon s3 buckets, google cloud storage buckets, microsoft azure blob storage or alibaba cloud oss buckets) that result sets from this collection are uploaded to destination ids can be retrieved through the dashboard or via the destinations api notification email optional the email address to send notifications to when new result sets https //docs trajectdata com/rainforestapi/collections api/results/list are available for this collection notification webhook optional the url rainforest api will send a webhook https //docs trajectdata com/rainforestapi/collections api/collections/webhook post to when new result sets https //docs trajectdata com/rainforestapi/collections api/results/list are available for this collection should be a fully qualified url, i e https //yourserver com/rainforestapi notification as json optional determines whether rainforest api sends download links (in the email notification if notification email is set, or in the body of the webhook https //docs trajectdata com/rainforestapi/collections api/collections/webhook post if notification webhook is set) for the collection result sets https //docs trajectdata com/rainforestapi/collections api/results/list in json format, or not if destination ids are configured for this collection then this setting determines whether result set files in json format are uploaded to the destinations on collection completion parameter values true json output format enabled / sent false json output format disabled / not sent notification as jsonlines optional determines whether rainforest api sends download links (in the email notification if notification email is set, or in the body of the webhook https //docs trajectdata com/rainforestapi/collections api/collections/webhook post if notification webhook is set) for the collection result sets https //docs trajectdata com/rainforestapi/collections api/results/list in json lines format, or not if destination ids are configured for this collection then this setting determines whether result set files in json lines format are uploaded to the destinations on collection completion parameter values true json lines output format enabled / sent false json lines output format disabled / not sent notification as csv optional determines whether sends download links (in the email notification if notification email is set, or in the body of the webhook https //docs trajectdata com/rainforestapi/collections api/collections/webhook post if notification webhook is set) for the collection result sets https //docs trajectdata com/rainforestapi/collections api/results/list csv format, or not to set the csv fields included in the result set use the csv fields parameter if destination ids are configured for this collection then this setting determines whether result set files in csv format are uploaded to the destinations on collection completion parameter values true csv output format enabled / sent false csv output format disabled / not sent notification csv fields optional determines the fields that are returned when notification as csv=true should be specified as a comma seperated list of fields (in nested field, dot notation, format) for more information see the csv fields reference https //docs trajectdata com/rainforestapi/product data api/reference/csv fields requests type optional determines whether this collection is locked to only allow requests of the specified type to be added to it locking a collection to a specific request type has several benefits including allowing rainforest api to automatically choose the correct csv fields, appropriate to the requests in the collection, to be selected by default when exporting a result set in csv mode it can also help organize your account when you have many collections set up to allow any requests type to be added either omit requests type entirely, or set it to mixed when set, the requests type locked=true property is present when retrieving the collection letting you know that the collection is locked if you leave the collection as unlocked (i e by setting requests type=mixed ) then requests type locked=false will be present when retrieving the collection and requests type will be set to either mixed (in the case of the collection containing mixed requests types) or a requests type value automatically detected based on the type of requests that have been subsequently added to the collection note the requests type cannot be changed after the collection is created for more information see the collection locking https //docs trajectdata com/rainforestapi/collections api/locking documentation valid values for requests type are parameter values mixed allow requests of any type to be added to this collection product lock the collection to allow only requests of type 'product' to be added reviews lock the collection to allow only requests of type 'reviews' to be added offers lock the collection to allow only requests of type 'offers' to be added search lock the collection to allow only requests of type 'search' to be added bestsellers lock the collection to allow only requests of type 'bestsellers' to be added category lock the collection to allow only requests of type 'category' to be added deals lock the collection to allow only requests of type 'deals' to be added also bought lock the collection to allow only requests of type 'also bought' to be added stock estimation lock the collection to allow only requests of type 'stock estimation' to be added questions lock the collection to allow only requests of type 'questions' to be added question answers lock the collection to allow only requests of type 'question answers' to be added seller profile lock the collection to allow only requests of type 'seller profile' to be added seller feedback lock the collection to allow only requests of type 'seller feedback' to be added seller products lock the collection to allow only requests of type 'seller products' to be added reviewer profile lock the collection to allow only requests of type 'reviewer profile' to be added autocomplete lock the collection to allow only requests of type 'autocomplete' to be added author page lock the collection to allow only requests of type 'author page' to be added shop by look lock the collection to allow only requests of type 'shop by look' to be added asin to gtin lock the collection to allow only requests of type 'asin to gtin' to be added store lock the collection to allow only requests of type 'store' to be added charts lock the collection to allow only requests of type 'charts' to be added sales estimation lock the collection to allow only requests of type 'sales estimation' to be added formats editions lock the collection to allow only requests of type 'formats editions' to be added next steps update collections https //docs trajectdata com/rainforestapi/collections api/collections/update