DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Kafka JDBC Source Connector for Large Data
  • How to Use Mulesoft VM Connector
  • Deep Dive Into JMS Integration Patterns With MuleSoft and JMS Behavior With Anypoint Clustering and Server Groups
  • The Ultimate Guide to React Dashboards Part 1: Overview and Analytics

Trending

  • The Role of AI in Identity and Access Management for Organizations
  • Exploring Intercooler.js: Simplify AJAX With HTML Attributes
  • Next Evolution in Integration: Architecting With Intent Using Model Context Protocol
  • Traditional Testing and RAGAS: A Hybrid Strategy for Evaluating AI Chatbots
  1. DZone
  2. Data Engineering
  3. Data
  4. External Data Operations on Salesforce Analytics Using Mulesoft Salesforce Analytics Connector Part 3

External Data Operations on Salesforce Analytics Using Mulesoft Salesforce Analytics Connector Part 3

This post is a continuation of the series dedicated to Salesforce Analytics Integration using Mulesoft's Salesforce Analytics Connector.

By 
Rahul Kumar user avatar
Rahul Kumar
·
Jun. 26, 18 · Analysis
Likes (1)
Comment
Save
Tweet
Share
7.5K Views

Join the DZone community and get the full member experience.

Join For Free

This post is a continuation of the series dedicated to Salesforce Analytics Integration using Mulesoft's Salesforce Analytics Connector.

If you have missed reading Part 1 and Part 2 of the series, make sure you read it first.

We already covered some scenarios in our previous post. In this post, we will be using upsert, delete, operations over a Salesforce Analytics Dataset.

Scenario 3: Creating a Dataset With a Primary Key and Inserting/Updating Records in the Dataset

This is a sample MetaData File, which I have created for our example:

{
"fileFormat": {
"charsetName": "UTF-8",
"fieldsDelimitedBy": ",",
"fieldsEnclosedBy": "\"",
"linesTerminatedBy": "\n",
"numberOfLinesToIgnore": 1
},
"objects": [{
"connector": "CSV",
"fullyQualifiedName": "uniquedataforWave_csv",
"label": "Unique Data for Wave",
"name": "uniquedataforWave_csv",
"fields": [{
"fullyQualifiedName": "CustomerID",
"name": "CustomerID",
"type": "Text",
"label": "Customer ID"
}, {
"fullyQualifiedName": "PurchaseID",
"name": "PurchaseID",
"type": "Text",
"label": "Purchase ID"
}, {
"fullyQualifiedName": "ItemsPurchased",
"name": "ItemsPurchased",
"type": "Text",
"label": "Items Purchased",
"isMultiValue": true,
"multiValueSeparator" : ";"
}, {
"fullyQualifiedName": "PurchaseAmount",
"label": "Purchase Amount",
"name": "PurchaseAmount",
"defaultValue": "0",
"type": "Numeric",
"precision": 10,
"scale": 2,
"format": "$#,##0.00"
}, {
"fullyQualifiedName": "PurchaseDate",
"name": "PurchaseDate",
"type": "Date",
"label": "Purchase Date",
"format": "MM/dd/yyyy",
"firstDayOfWeek": -1,
"fiscalMonthOffset": 0,
"isYearEndFiscalYear": true
}, {
"fullyQualifiedName": "PrimaryKeyField",
"name": "PrimaryKeyField",
"type": "Text",
"label": "PrimaryKeyField",
"isUniqueId": true,
"description": "Combination of Purchase ID and Purchase Date"
}]
}]
}

Please note that I have declared a Primary key "PrimaryKeyField" by using the attribute "isUniqueId" and setting it to true. This primary key will enable us to upsert and delete records in the existing dataset. To create and upload data into Salesforce Wave with this Metadata, we will be using UPSERT operation. APPEND can't be used when declared the primary key. For an UPSERT operation, Wave will identify the records that were already present in the Dataset and will perform an update on those records with the new values, and for the records loaded, which were not already present in the DataSet wave and will insert those records in the dataset.

   <sub-flow name="salesforce-analytics-batch-upsert-Sub_Flow">
        <set-variable variableName="dataSetContainerName" value="${dataSetContainerName}" doc:name="Variable : DataSetContainerName" doc:description="DataSet Container Name - Salesforce ID or Developer Name of the App in which Dataset is to be created"/>
        <enricher source="#[payload]" target="#[flowVars.datasetid]" doc:name="Message Enricher" doc:description="Get the Salesforce ID of the Dataset Created in a variable.">
            <sfdc-analytics:create-data-set config-ref="Salesforce_Analytics_Cloud__Basic_authentication2" operation="UPSERT" description="Sample data Set" label="Data Set 2" dataSetName="demodataset2" edgemartContainer="#[flowVars.dataSetContainerName]" type="metadata\uniqueDataForWave.json:RELATIVE" doc:name="Salesforce Analytics Cloud : Upsert DataSet"/>
        </enricher>
        <dw:transform-message doc:name="Create Sample Data for DataSet">
            <dw:set-payload><![CDATA[%dw 1.0
%output application/java
---
[
{"CustomerID":"CustomerID1","PurchaseID":"PurchaseID1","ItemsPurchased":["Item1","Item7","Item9"],"PurchaseAmount":1.1,"PurchaseDate":"2018-06-23"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID1-23/06/2018"},
{"CustomerID":"CustomerID2","PurchaseID":"PurchaseID2","ItemsPurchased":["Item2"],"PurchaseAmount":2.2,"PurchaseDate":"2018-06-24"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID2-24/06/2018"},
{"CustomerID":"CustomerID3","PurchaseID":"PurchaseID3","ItemsPurchased":["Item6","Item10"],"PurchaseAmount":3.3,"PurchaseDate":"2018-06-25"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID3-25/06/2018"},
{"CustomerID":"CustomerID4","PurchaseID":"PurchaseID4","ItemsPurchased":["Item4","Item2"],"PurchaseAmount":4.4,"PurchaseDate":"2018-06-26"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID4-26/06/2018"},
{"CustomerID":"CustomerID5","PurchaseID":"PurchaseID5","ItemsPurchased":["Item9","Item1"],"PurchaseAmount":5.5,"PurchaseDate":"2018-06-27"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID5-27/06/2018"},
{"CustomerID":"CustomerID6","PurchaseID":"PurchaseID6","ItemsPurchased":["Item6"],"PurchaseAmount":6.6,"PurchaseDate":"2018-06-28"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID6-28/06/2018"},
{"CustomerID":"CustomerID7","PurchaseID":"PurchaseID7","ItemsPurchased":["Item2","Item4"],"PurchaseAmount":7.7,"PurchaseDate":"2018-06-29"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID7-29/06/2018"},
{"CustomerID":"CustomerID8","PurchaseID":"PurchaseID8","ItemsPurchased":["Item1"],"PurchaseAmount":8.8,"PurchaseDate":"2018-06-30"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID8-30/06/2018"},
{"CustomerID":"CustomerID9","PurchaseID":"PurchaseID9","ItemsPurchased":["Item9","Item6"],"PurchaseAmount":9.9,"PurchaseDate":"2018-07-01"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID9-01/07/2018"},
{"CustomerID":"CustomerID10","PurchaseID":"PurchaseID10","ItemsPurchased":["Item1","Item8","Item9","Item4"],"PurchaseAmount":10.1,"PurchaseDate":"2018-07-02"as :date {format : "yyyy-MM-dd"},"PrimaryKeyField":"PurchaseID10-02/07/2018"}
]]]></dw:set-payload>
        </dw:transform-message>
        <batch:execute name="salesforce-analytics-appBatch2" doc:name="Batch Execute"/>
    </sub-flow>

    <batch:job name="salesforce-analytics-appBatch2">
        <batch:process-records>
            <batch:step name="Parts_Upload_Batch_Step">
                <batch:commit size="5" doc:name="Batch Commit">
                    <sfdc-analytics:upload-external-data config-ref="Salesforce_Analytics_Cloud__Basic_authentication2" type="metadata\uniqueDataForWave.json:RELATIVE" dataSetId="#[flowVars.datasetid]" doc:name="Salesforce Analytics Cloud : Upload Data Part">
                        <sfdc-analytics:payload ref="#[payload]"/>
                    </sfdc-analytics:upload-external-data>
                </batch:commit>
            </batch:step>
        </batch:process-records>
        <batch:on-complete>
            <sfdc-analytics:start-data-processing config-ref="Salesforce_Analytics_Cloud__Basic_authentication2" dataSetId="#[flowVars.datasetid]" doc:name="Salesforce Analytics Cloud : Trigger Data Processing" doc:description="Trigger the processing of data which was uploaded in Parts till now. On the Data processing is triggered the status can be monitored in Data Manager"/>
        </batch:on-complete>
    </batch:job>

For the sake of illustration, I have kept Batch commit size as 5, however, in actual use cases, the Size should be greater than this. We have to decide on a size keeping in mind that "upload-external-data" operation will create an InsightsExternalDataPart object, which can accommodate up to 10MB data.

Also, note that the sample data created in Dataweave in the above snippet are all unique records. After the Batch job completes, A dataset gets created and the records get inserted in analytics dataset. If the dataset was already present, then the records get upserted (update if the record already exists; insert if the record was not present before).

For a Dataset declared with a primary key, we have to be aware of how Upsert actually works in Salesforce Analytics Cloud system on a dataset. As I have mentioned earlier in the previous posts, after the "start-data-processing" operation is completed using Mulesoft's connector, a job gets created in Salesforce Analytics, which takes care of the actual upload/delete/append/overwrite. For an Upsert operation, Salesforce Analytics does not allow duplicate records i.e. records with the same primary key. The entire job will fail in such cases.

For example, if, suppose, 2 out of 10 records were passed with the same primary key, "PurchaseID1-23/06/2018," then after the Mule Bbtch job completes, the Salesforce Analytics job will fail out complaining that, "Something went wrong while executing the digest node: Primary Key validation has failed for dimension PrimaryKeyField. It contains the following non-unique value(s): PurchaseID1-23/06/2018." Even the eight unique records will be rejected. So, take care with the interface design approach so that only unique values are passed to the Wave on one particular invocation (it doesn't matter even if duplicate records were scattered across different data parts or in the same data part).

Also, keep in mind that every time "start-data-processing" is used and a Salesforce Analytics job has been created, it's actually a costly operation since a Salesforce organization generally has a very small number of these job quotas per day.

Scenario 4: Deleting Records in the DataSet

For Deleting records in the dataset, the dataset's name needs to be configured in "Create Data Set" or "Upload External Data into new Dataset and Start Processing" operation whichever is used and DELETE sub-operation needs to be selected.

    <sub-flow name="salesforce-analytics-batch-delete-Sub_Flow">
        <set-variable variableName="dataSetContainerName" value="${dataSetContainerName}" doc:name="Variable : DataSetContainerName" doc:description="DataSet Container Name - Salesforce ID or Developer Name of the App in which Dataset is to be created"/>
        <enricher source="#[payload]" target="#[flowVars.datasetid]" doc:name="Message Enricher" doc:description="Get the Salesforce ID of the Dataset Created in a variable.">
            <sfdc-analytics:create-data-set config-ref="Salesforce_Analytics_Cloud__Basic_authentication2" operation="DELETE" description="Sample data Set" label="Data Set 2" dataSetName="demodataset2" edgemartContainer="#[flowVars.dataSetContainerName]" type="metadata\uniqueDataForWave.json:RELATIVE" doc:name="Salesforce Analytics Cloud : Upsert DataSet"/>
        </enricher>
        <dw:transform-message doc:name="Create Sample List of Records to be Deleted">
            <dw:set-payload><![CDATA[%dw 1.0
%output application/java
---
[
{"PrimaryKeyField":"PurchaseID1-23/06/2018"},
{"PrimaryKeyField":"PurchaseID2-24/06/2018"},
{"PrimaryKeyField":"PurchaseID5-27/06/2018"},
{"PrimaryKeyField":"PurchaseID7-29/06/2018"},
{"PrimaryKeyField":"PurchaseID10-02/07/2018"}
]]]></dw:set-payload>
        </dw:transform-message>
        <batch:execute name="salesforce-analytics-appBatch2" doc:name="Batch Execute"/>
    </sub-flow>

    <batch:job name="salesforce-analytics-appBatch2">
        <batch:process-records>
            <batch:step name="Parts_Upload_Batch_Step">
                <batch:commit size="10000" doc:name="Batch Commit">
                    <sfdc-analytics:upload-external-data config-ref="Salesforce_Analytics_Cloud__Basic_authentication2" type="metadata\uniqueDataForWave.json:RELATIVE" dataSetId="#[flowVars.datasetid]" doc:name="Salesforce Analytics Cloud : Upload Data Part">
                        <sfdc-analytics:payload ref="#[payload]"/>
                    </sfdc-analytics:upload-external-data>
                </batch:commit>
            </batch:step>
        </batch:process-records>
        <batch:on-complete>
            <sfdc-analytics:start-data-processing config-ref="Salesforce_Analytics_Cloud__Basic_authentication2" dataSetId="#[flowVars.datasetid]" doc:name="Salesforce Analytics Cloud : Trigger Data Processing" doc:description="Trigger the processing of data which was uploaded in Parts till now. On the Data processing is triggered the status can be monitored in Data Manager"/>
        </batch:on-complete>
    </batch:job>

Note that, for delete, we have to pass the Primary Key as the onlyfield. Using the primary key Analytics will delete the record if it exists. Same as upsert, we have to pass a unique set of primary keys in this case as well.

The source code of the above scenarios can be found here.

Please note that this post is applicable only for Mule 3. For Mule 4, we have a Salesforce analytics Module instead of a connector. I will be doing a post for it later.

References

1. External Data API

2. Analytics External Data Format Reference

3. Mulesoft Documentation

4. Salesforce Analytics Glossary

Analytics Data (computing) Record (computer science) Connector (mathematics) MuleSoft Relational database

Opinions expressed by DZone contributors are their own.

Related

  • Kafka JDBC Source Connector for Large Data
  • How to Use Mulesoft VM Connector
  • Deep Dive Into JMS Integration Patterns With MuleSoft and JMS Behavior With Anypoint Clustering and Server Groups
  • The Ultimate Guide to React Dashboards Part 1: Overview and Analytics

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!