Introduction
In this blog, I am going show how to use the Salesforce Bulk API query features to fetch a large amount of data from the salesforce.Use the bulk query to efficiently query large data sets and reduce the number of API requests. A bulk query can retrieve up to 15 GB of data, divided into 15 1-GB files. The data formats supported are CSV, XML, and JSON.When a bulk query is processed, Salesforce attempts to execute the query. If the query doesn’t execute within the standard 2-minute timeout limit, the job fails and a QUERY_TIMEOUT error is returned. In this case, rewrite a simpler query and resubmit the batch. If the query succeeds, Salesforce attempts to retrieve the results. If the results exceed the 1-GB file size limit or take longer than 10 minutes to retrieve, the completed results are cached and another attempt is made. After 15 attempts, the job fails and the error message “Retried more than fifteen times” is returned. In this case, consider using the PK Chunking header to split the query results into smaller chunks. If the attempts succeed, the results are returned and stored for seven days. The following flowchart depicts how bulk queries are processed.
Walk Through a Bulk Query Sample
Step 1: Log In Using the SOAP API
The Bulk API doesn’t provide a login operation, so you must use SOAP API to log in.Create a text file called login.txt containing the following text and replace the username and password
<?xml version="1.0" encoding="utf-8" ?> <env:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"> <env:Body> <n1:login xmlns:n1="urn:partner.soap.sforce.com"> <n1:username>your_username</n1:username> <n1:password>your_password</n1:password> </n1:login> </env:Body> </env:Envelope>
Run the following command from the cURL command line
curl https://login.salesforce.com/services/Soap/u/42.0 -H "Content-Type: text/xml; charset=UTF-8" -H "SOAPAction: login" -d @login.txt
Which will return the an XML response that includes <sessionId> and elements. We are going to use the sessionId for the subsequent request authentication.
<?xml version="1.0" encoding="UTF-8"?> <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns="urn:partner.soap.sforce.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <soapenv:Body> <loginResponse> <result> <metadataServerUrl>https://asdasdasdasdasd-dev-ed.my.salesforce.com/services/Soap/m/42.0/00D6A000002EfbH</metadataServerUrl> <passwordExpired>false</passwordExpired> <sandbox>false</sandbox> <serverUrl>https://asdasdasdasdasd-dev-ed.my.salesforce.com/services/Soap/u/42.0/00D6A000002EfbH</serverUrl> <sessionId>00D6A000002EfbH!AQwAQJsqae03YeAqwkWXd03FO97eEAfNpCiqoAbaiIQoDFzW3qqmWDL9TvQHDsXVRlMIYzH_Sevb9c_ZGJX76wD_pNGI7mnc</sessionId> <userId>0056A000001dwToQAI</userId> <userInfo> <accessibilityMode>false</accessibilityMode> <chatterExternal>false</chatterExternal> <currencySymbol>$</currencySymbol> <orgAttachmentFileSizeLimit>5242880</orgAttachmentFileSizeLimit> <orgDefaultCurrencyIsoCode>USD</orgDefaultCurrencyIsoCode> <orgDefaultCurrencyLocale>en_US</orgDefaultCurrencyLocale> <orgDisallowHtmlAttachments>false</orgDisallowHtmlAttachments> <orgHasPersonAccounts>false</orgHasPersonAccounts> <organizationId>00D6A000002EfbHUAS</organizationId> <organizationMultiCurrency>false</organizationMultiCurrency> <organizationName>Alumni Center Services</organizationName> <profileId>00e6A000001YVOQQA4</profileId> <roleId xsi:nil="true" /> <sessionSecondsValid>7200</sessionSecondsValid> <userDefaultCurrencyIsoCode xsi:nil="true" /> <userEmail>rajamohanvakati@gmail.com</userEmail> <userFullName>Rajamohan vakati</userFullName> <userId>0056A000001dwToQAI</userId> <userLanguage>en_US</userLanguage> <userLocale>en_US</userLocale> <userName>rvakati@spring18.com</userName> <userTimeZone>America/Los_Angeles</userTimeZone> <userType>Standard</userType> <userUiSkin>Theme3</userUiSkin> </userInfo> </result> </loginResponse> </soapenv:Body> </soapenv:Envelope>
Step 2.Create a Job
Create a file called create-job.xml containing the following text.
<?xml version="1.0" encoding="UTF-8"?> <jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload"> <operation>query</operation> <object>Contact</object> <concurrencyMode>Parallel</concurrencyMode> <contentType>CSV</contentType> </jobInfo>
Using a command-line window, execute the following cURL command to create a job. Replace the session Id which you got in step 1.
curl -H "X-SFDC-Session: 00D6A000002EfbH!AQwAQJsqae03YeAqwkWXd03FO97eEAfNpCiqoAbaiIQoDFzW3qqmWDL9TvQHDsXVRlMIYzH_Sevb9c_ZGJX76wD_pNGI7mnc" -H "Content-Type: application/xml; charset=UTF-8" -d @create-job.xml https://asdasdasdasdasd-dev-ed.my.salesforce.com/services/async/42.0/job
You will get the response which will return job information.
<?xml version="1.0" encoding="UTF-8"?><jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload"> <id>7506A00000496mnQAA</id> <operation>query</operation> <object>Contact</object> <createdById>0056A000001dwToQAI</createdById> <createdDate>2018-03-27T18:01:13.000Z</createdDate> <systemModstamp>2018-03-27T18:01:13.000Z</systemModstamp> <state>Open</state> <concurrencyMode>Parallel</concurrencyMode> <contentType>CSV</contentType> <numberBatchesQueued>0</numberBatchesQueued> <numberBatchesInProgress>0</numberBatchesInProgress> <numberBatchesCompleted>0</numberBatchesCompleted> <numberBatchesFailed>0</numberBatchesFailed> <numberBatchesTotal>0</numberBatchesTotal> <numberRecordsProcessed>0</numberRecordsProcessed> <numberRetries>0</numberRetries> <apiVersion>42.0</apiVersion> <numberRecordsFailed>0</numberRecordsFailed> <totalProcessingTime>0</totalProcessingTime> <apiActiveProcessingTime>0</apiActiveProcessingTime> <apexProcessingTime>0</apexProcessingTime> </jobInfo>
Step 3: Add a Batch to the Job
Create a file called query.txt to contain the SOQL query statement. Bulk Query will fetch the data based on this Query
SELECT Id, FirstName,LastName,Email FROM Contact
Using a command-line window, execute the following cURL command to add a batch to the job. Replace JobId which you hot in step 2
curl -d @query.txt -H "X-SFDC-Session: 00D6A000002EfbH!AQwAQJsqae03YeAqwkWXd03FO97eEAfNpCiqoAbaiIQoDFzW3qqmWDL9TvQHDsXVRlMIYzH_Sevb9c_ZGJX76wD_pNGI7mnc" -H "Content-Type: text/csv; charset=UTF-8" https://asdasdasdasdasd-dev-ed.my.salesforce.com/services/async/42.0/job/7506A00000496mnQAA/batch
You will get the batch Id and batch information as response shown below.
<?xml version="1.0" encoding="UTF-8"?><batchInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload"> <id>7516A000003RFDpQAO</id> <jobId>7506A00000496mnQAA</jobId> <state>Queued</state> <createdDate>2018-03-27T18:02:21.000Z</createdDate> <systemModstamp>2018-03-27T18:02:21.000Z</systemModstamp> <numberRecordsProcessed>0</numberRecordsProcessed> <numberRecordsFailed>0</numberRecordsFailed> <totalProcessingTime>0</totalProcessingTime> <apiActiveProcessingTime>0</apiActiveProcessingTime> <apexProcessingTime>0</apexProcessingTime> </batchInfo>
Step 4: Check the Status of the Job and Batch
Using a command-line window, execute the following cURL command to check the job status.
curl -H "X-SFDC-Session: 00D6A000002EfbH!AQwAQJsqae03YeAqwkWXd03FO97eEAfNpCiqoAbaiIQoDFzW3qqmWDL9TvQHDsXVRlMIYzH_Sevb9c_ZGJX76wD_pNGI7mnc" https://asdasdasdasdasd-dev-ed.my.salesforce.com/services/async/42.0/job/7506A00000496mnQAA
You will get the below response along with the job status
<?xml version="1.0" encoding="UTF-8"?><jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload"> <id>7506A00000496mnQAA</id> <operation>query</operation> <object>Contact</object> <createdById>0056A000001dwToQAI</createdById> <createdDate>2018-03-27T18:01:13.000Z</createdDate> <systemModstamp>2018-03-27T18:01:13.000Z</systemModstamp> <state>Open</state> <concurrencyMode>Parallel</concurrencyMode> <contentType>CSV</contentType> <numberBatchesQueued>0</numberBatchesQueued> <numberBatchesInProgress>0</numberBatchesInProgress> <numberBatchesCompleted>1</numberBatchesCompleted> <numberBatchesFailed>0</numberBatchesFailed> <numberBatchesTotal>1</numberBatchesTotal> <numberRecordsProcessed>74</numberRecordsProcessed> <numberRetries>0</numberRetries> <apiVersion>42.0</apiVersion> <numberRecordsFailed>0</numberRecordsFailed> <totalProcessingTime>0</totalProcessingTime> <apiActiveProcessingTime>0</apiActiveProcessingTime> <apexProcessingTime>0</apexProcessingTime> </jobInfo>
Using a command-line window, execute the following cURL command to check the batch status.
curl -H "X-SFDC-Session: 00D6A000002EfbH!AQwAQJsqae03YeAqwkWXd03FO97eEAfNpCiqoAbaiIQoDFzW3qqmWDL9TvQHDsXVRlMIYzH_Sevb9c_ZGJX76wD_pNGI7mnc" https://asdasdasdasdasd-dev-ed.my.salesforce.com/services/async/42.0/job/7506A00000496mnQAA/batch/7516A000003RFDpQAO >batchress.txt
Which will return the batch status
<?xml version="1.0" encoding="UTF-8"?><batchInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload"> <id>7516A000003RFDpQAO</id> <jobId>7506A00000496mnQAA</jobId> <state>Completed</state> <createdDate>2018-03-27T18:02:21.000Z</createdDate> <systemModstamp>2018-03-27T18:02:21.000Z</systemModstamp> <numberRecordsProcessed>74</numberRecordsProcessed> <numberRecordsFailed>0</numberRecordsFailed> <totalProcessingTime>0</totalProcessingTime> <apiActiveProcessingTime>0</apiActiveProcessingTime> <apexProcessingTime>0</apexProcessingTime> </batchInfo>
Step 5: Retrieve the Results
Using the command-line window, execute the following cURL command to retrieve the batch result list.
curl -H "X-SFDC-Session: 00D6A000002EfbH!AQwAQJsqae03YeAqwkWXd03FO97eEAfNpCiqoAbaiIQoDFzW3qqmWDL9TvQHDsXVRlMIYzH_Sevb9c_ZGJX76wD_pNGI7mnc" https://asdasdasdasdasd-dev-ed.my.salesforce.com/services/async/42.0/job/7506A00000496mnQAA/batch/7516A000003RFDpQAO/result >finalRes.txt
Salesforce returns an XML response with data such as the following.
<?xml version="1.0" encoding="UTF-8"?> <result-list xmlns="http://www.force.com/2009/06/asyncapi/dataload"> <result>7526A000001yxM0</result> </result-list>
Using the command-line window, execute the following cURL command to retrieve the results of the query in the result.csv file .
curl -H "X-SFDC-Session: 00D6A000002EfbH!AQwAQJsqae03YeAqwkWXd03FO97eEAfNpCiqoAbaiIQoDFzW3qqmWDL9TvQHDsXVRlMIYzH_Sevb9c_ZGJX76wD_pNGI7mnc" https://asdasdasdasdasd-dev-ed.my.salesforce.com/services/async/42.0/job/7506A00000496mnQAA/batch/7516A000003RFDpQAO/result/7526A000001yxM0 >result.csv
Step 6: Close the Job
Create a file called close-job.xml containing the following text.
?xml version="1.0" encoding="UTF-8"?> <jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload"> <state>Closed</state> </jobInfo>
Using a command-line window, execute the following cURL command to close the job.
curl -H "X-SFDC-Session: 00D6A000002EfbH!AQwAQJsqae03YeAqwkWXd03FO97eEAfNpCiqoAbaiIQoDFzW3qqmWDL9TvQHDsXVRlMIYzH_Sevb9c_ZGJX76wD_pNGI7mnc" -H "Content-Type: text/csv; charset=UTF-8" -d @close-job.xml https://asdasdasdasdasd-dev-ed.my.salesforce.com/services/async/42.0/job/7506A00000496mnQAA
Salesforce returns an XML response with data such as the following.
<?xml version="1.0" encoding="UTF-8"?><jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload"> <id>7506A00000496mnQAA</id> <operation>query</operation> <object>Contact</object> <createdById>0056A000001dwToQAI</createdById> <createdDate>2018-03-27T18:01:13.000Z</createdDate> <systemModstamp>2018-03-27T18:01:13.000Z</systemModstamp> <state>Closed</state> <concurrencyMode>Parallel</concurrencyMode> <contentType>CSV</contentType> <numberBatchesQueued>0</numberBatchesQueued> <numberBatchesInProgress>0</numberBatchesInProgress> <numberBatchesCompleted>1</numberBatchesCompleted> <numberBatchesFailed>0</numberBatchesFailed> <numberBatchesTotal>1</numberBatchesTotal> <numberRecordsProcessed>74</numberRecordsProcessed> <numberRetries>0</numberRetries> <apiVersion>42.0</apiVersion> <numberRecordsFailed>0</numberRecordsFailed> <totalProcessingTime>0</totalProcessingTime> <apiActiveProcessingTime>0</apiActiveProcessingTime> <apexProcessingTime>0</apexProcessingTime> </jobInfo>