Microsoft Azure Synapse Analytics: Scaling Hurdles and Limitations

Azure Synapse faces serious challenges limiting its use in the Enterprise Data space, impacting performance and functionality.

Vamshidhar Morusu

May. 02, 25 · Analysis

Likes (2)

Comment

Save

4.8K Views

Azure Synapse Analytics is a strong tool for processing large amounts of data. It does have some scaling challenges that can slow things down as your data grows. There are also a few built-in restrictions that could limit what you’re able to do and affect both performance and overall functionality. So, while Synapse is powerful, it’s important to be aware of these potential roadblocks as you plan your projects.

Data Distribution and Skew

Data skew remains a significant performance bottleneck in Synapse Analytics. Poor distribution key selection can lead to:

80-90% of data concentrated on 10% of nodes
Hotspots during query execution
Excessive data movement via TempDB

You can check for data skew by checking how rows are distributed across the distribution_id (which typically maps 1:1 to compute nodes at maximum scale).

    SQL
   
 

   SELECT
    distribution_id,
    COUNT(*) AS row_count
FROM
    [table_name]
GROUP BY
    distribution_id
ORDER BY
    row_count DESC;
  

If you see that a few (distribution_id)s have a much higher (row_count) than others, this indicates skew.

To mitigate this:

Use high-cardinality columns for even distribution
Monitor skew using DBCC PDW_SHOWSPACEUSED
Redistribute tables with CREATE TABLE AS SELECT (CTAS)

Resource Management and Scaling

1. SQL Pools

You do not have any control over the built-in pool configuration. For a dedicated pool, the defaults are:

Maximum DWU: Gen1: DW6000, Gen2: DW30000c
Scaling requires manual intervention using SQL commands

To manually scale your dedicated SQL pool, you use the following ALTER DATABASE command. Here’s how you do it:

    SQL
   
   ALTER DATABASE [your_database] 
MODIFY (SERVICE_OBJECTIVE = 'DW1000c');

When you scale a Synapse pool, it goes into “Scaling” mode for a little while, and once it’s done, it switches back to “Online” and is ready to use.

Key Points

Scaling is not automatic, so you must run the command yourself.
The SQL pool must be online to scale.
You can also scale using PowerShell or the Azure portal, but the SQL command is a direct way to do it.

2. Apache Spark Pools

Scale-up triggers if resource utilization exceeds capacity for 1 minute.
Scale-down requires 2 minutes of underutilization.

3. Integration Runtimes

Manual scaling through the Azure portal and not from the Synapse workspace.

4. Concurrency Limits

Maximum 128 concurrent queries; any further excess queries are queued.
Concurrent open sessions: 1024 for DWU1000c and higher, 512 for DWU500c and lower.

Query and Data Limitations

1. SQL Feature Gaps

No support for triggers, cross-database queries, or geospatial data types
Limited use of expressions like GETDATE() or SUSER_SNAME()
No FOR XML/FOR JSON clauses or cursor support

2. Data Size Restrictions

Source table row size limited to 7,500 bytes for Azure Synapse Link for SQL
LOB data > 1 MB not supported in initial snapshots for certain data types

3. Query Constraints

Maximum 4,096 columns per row in SELECT results
Up to 32 nested subqueries in a SELECT statement
JOIN limited to 1,024 columns

4. View Limitations

Maximum of 1023 columns in a view. If you have more columns, view restructuring is needed.

    SQL
   
   Error: CREATE TABLE failed because column 'VolumeLable' in table 'QTable' exceeds the maximum of 1024 columns.

To get around this, you’ll just need to break your view up into a few smaller ones, each with fewer than 1,024 columns. For example:

    SQL
   
 

   -- First view with columns 1 to 1023
CREATE VIEW dbo.BigTable_Part1 AS
SELECT col1, col2, ..., col1023
FROM dbo.BigTable;

-- Second view with the remaining columns
CREATE VIEW dbo.BigTable_Part2 AS
SELECT col1024, col1025, ..., col1100
FROM dbo.BigTable;

  

    SQL
   
   -- Combine views 
SELECT *
FROM dbo.BigTable_Part1 p1
JOIN dbo.BigTable_Part2 p2
  ON p1.PrimaryKey = p2.PrimaryKey;

Limited Data Format Support

ORC and Avro formats are not supported which are common file formats in Enterprise data. Moving to parquet or deltalake format is recommended.
Integrated with the very old version of Deltalake, which does not support critical features like column-mapping, column renaming, etc.

Azure Synapse Spark Pool showing Delta Lake version

Access Limitations

When you try to set up Azure Synapse Link for SQL, you might run into an error if the database owner isn’t linked to a valid login. Basically, the system needs the database owner to be tied to a real user account to work properly. If it’s not, Synapse Link can’t connect and throws an error.

Workaround

To fix this, just make sure the database owner is set to a real user that actually has a login. You can do this with a quick command:

    SQL
   
   sqlALTER AUTHORIZATION ON DATABASE::[YourDatabaseName] TO [ValidLogin];

Replace (YourDatabaseName) with your actual database name and (ValidLogin) with the name of a valid server-level login or user. This command changes the ownership of the database to the specified login, ensuring that the database owner is properly mapped and authenticated.

Performance Optimization Challenges

1. Indexing Issues

Clustered Columnstore Index (CCI) degradation due to frequent updates or low memory
Outdated statistics leading to suboptimal query plans

2. TempDB Pressure

Data movement from skew or incompatible joins can quickly fill TempDB
Maximum TempDB size: 399 GB per DW100c

3. IDENTITY Column Behavior

Distributed across 60 shards, leading to non-sequential values

Backup and Recovery Limitations

No offline .BAK or .BACPAC backups with data
Limited to 7-day retention or creating database copies (incurring costs)

Conclusion

Azure Synapse Analytics is a powerful tool for handling big data, but it’s not without its quirks. You’ll run into some scaling headaches and built-in limits that can slow things down or make certain tasks tricky. To get the best performance, you’ve got to be smart about how you distribute your data, manage resources, and optimize your queries.

Keeping an eye on things and tuning regularly helps avoid bottlenecks and keeps everything running smoothly. Basically, it’s great — but you’ll need to work around some bumps along the way to make it really shine.

Analytics Database azure Scaling (geometry)

Published at DZone with permission of Vamshidhar Morusu. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending