Securely Access Azure SQL Database from Azure Synapse
At the time of writing, there is no linked service or AAD pass-through support with the Azure SQL connector via Azure Synapse Analytics.
Join the DZone community and get the full member experience.Join For Free
The Apache Spark connector for Azure SQL Database (and SQL Server) enables these databases to be used as input data sources and output data sinks for Apache Spark jobs. You can use the connector in Azure Synapse Analytics for big data analytics on real-time transactional data and to persist results for ad-hoc queries or reporting.
At the time of writing, there is no linked service or AAD pass-through support with the Azure SQL connector via Azure Synapse Analytics. But you can use other options such as Azure Active Directory authentication or via direct SQL authentication (username and password-based). A secure way of doing this is to store the Azure SQL Database credentials in Azure Key Vault (as Secret) — this is what’s covered in this short blog post.
- Create an Azure Key Vault and add a Secret to store the Azure SQL Database connectivity info.
- Create a Linked Service for your Azure Key Vault in Azure Synapse Workspace.
- Provide appropriate permissions to Azure Synapse workspace managed service identity to Azure Key Vault
To retrieve secrets from Azure Key Vault, the recommended way is to create a Linked Service to your Azure Key Vault. Also, make sure that the Synapse workspace managed service identity (MSI) has
Secret Get privileges on your Azure Key Vault. This will let Synapse authenticate to Azure Key Vault using the Synapse workspace managed service identity.
You can also authenticate using your user Azure Active Directory credential.
Create a Linked Service in Azure Synapse Workspace:
Grant appropriate access for Azure Synapse workspace service managed identity to your Azure Key Vault:
Get permission on
Search for the Synapse Workspace Managed Service Identity — it’s the same name as that of the workspace
Add the policy:
Click Save to confirm:
How To Use
I will be using
pysparkin Synapse Spark pools as an example.
Synapse uses Azure Active Directory (AAD) passthrough by default for authentication between resources. If you need to connect to a resource using other credentials, use the TokenLibrary directly — this simplifies the process of retrieving SAS tokens, AAD tokens, connection strings, and secrets stored in a linked service or from an Azure Key Vault.
For example, to access data from
SalesLT.Customer table (part of AdventureWorks sample database), you can use the following:
url = TokenLibrary.getSecret("<Azure Key Vault name>", "<Secret name>", "<Linked Service name>")dbtable = "SalesLT.Customer"customers = spark.read \ .format("com.microsoft.sqlserver.jdbc.spark") \ .option("url", url) \ .option("dbtable", dbtable) \ .load()print(customers.count()) customers.show(5)
That’s all there is to it!
Published at DZone with permission of Abhishek Gupta, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.