Optimizing Financial Data Pipelines: Accelerating OneStream-to-Snowflake Exports by 85%
A case study on optimizing financial data pipelines by refactoring legacy SQL batch inserts into a "Stage and Copy" architecture.
Join the DZone community and get the full member experience.
Join For FreeIn the world of Enterprise Performance Management (EPM), the "Financial Close" is a race against the clock. As an Architect, my goal is to ensure that when the FP&A team finishes their forecast in OneStream, that data is available in our Snowflake Data Warehouse immediately for downstream analytics.
Recently, we encountered a significant bottleneck. Exporting a medium-sized forecast dataset (~500K records) from OneStream to Snowflake was taking over 8 minutes. This latency was unacceptable for our executive team, who needed near real-time "What-If" scenario analysis.
By refactoring our data integration pattern from a "Batch Insert" model to a "Stage and Copy" model, we reduced this runtime to under 1 minute — an 85% performance improvement. Here is the technical breakdown of how we architected this solution using VB.NET.
The Bottleneck: The "Batch Insert" Anti-Pattern
Initially, our export process relied on a standard SQL injection pattern. We would pull the data from the OneStream cube into a DataTable, loop through the rows, and construct massive SQL INSERT statements.
Even though we attempted to optimize this by chunking the inserts (sending 16,000 rows per batch), the approach was fundamentally flawed for cloud data warehousing.
The "Before" Code Logic
The legacy function worked roughly like this:
- Iterate through the
DataTable. - Concatenate string values into a massive SQL command (
INSERT INTO Table VALUES (...), (...);). - Execute the command against Snowflake over the network every 16k rows.
' LEGACY APPROACH (Slow ~8 Minutes)
' Loop thru datatable (chunk INSERTS into n record intervals)
For Each dr As DataRow In dt.Rows
i = i + 1
' Concatenate row values
cvValue = "( "
For Each dc As DataColumn In dt.Columns
cvValue = cvValue & "'" & (dr(dc.ColumnName).ToString & "',")
Next
' Append to batch buffer
cvInsert.append(cvValue & ",")
' Execute batch every 16,384 rows
If ((i Mod insertCounter) = 0 ) Then
sqlInsert.Append("INSERT INTO " & sTableName & " VALUES ")
sqlInsert.Append(cvInsert)
' Network Call to Snowflake (High Latency)
brapi.Database.ExecuteActionQuery(dbConnApp, sqlInsert.ToString, True, True)
End If
Next
Why This Failed
- Network latency: Even with batching, dozens of round-trip network calls were required.
- Parsing overhead: Snowflake had to parse and plan an insert statement for every batch.
- Transaction logging: Each batch was treated as an individual transaction, increasing write overhead.
The Solution: The "Stage and COPY INTO" Pattern
Snowflake (and most cloud data warehouses) is optimized for bulk ingestion. The fastest way to load data isn't INSERT; it is COPY INTO.
We rewrote the Extender Rule to follow a three-step architecture:
- Stream to disk: Write the
DataTableto a local CSV file on the OneStream application server using a bufferedStreamWriter. - Stage the file: Upload the CSV to a Snowflake Internal Stage using a
PUTcommand. - Bulk load: Execute a single
COPY INTOcommand to ingest the entire file instantly.
Step 1: High-Performance CSV Writing
We moved away from string concatenation and used a StreamWriter with a 1MB buffer. This minimizes memory pressure on the OneStream server, even when processing large datasets.
' NEW APPROACH (Fast ~1 Minute)
' 1. Write CSV to disk with Buffered Stream
Using writer As New StreamWriter(fullFilePath, False, Encoding.UTF8, 65536)
Dim sb As New StringBuilder(1048576) ' ~1MB buffer
Dim counter As Integer = 0
For Each r As DataRow In dt.Rows
Dim fields(dt.Columns.Count - 1) As String
' Loop columns and escape quotes
For i As Integer = 0 To dt.Columns.Count - 1
Dim field As String = If(r(i) Is DBNull.Value, "", r(i).ToString())
If field.Contains("""") Then field = field.Replace("""", """""")
fields(i) = """" & field & """"
Next
sb.AppendLine(String.Join(",", fields))
counter += 1
' Flush every 10k rows to avoid Large Object Heap fragmentation
If counter Mod 10000 = 0 Then
writer.Write(sb.ToString())
sb.Clear()
End If
Next
' Final flush
If sb.Length > 0 Then writer.Write(sb.ToString())
End Using
Step 2 & 3: Upload and Copy
Once the file is on disk, we utilize a helper function to push the file to the Snowflake Internal Stage (@DB_NAME.Schema_Name.STAGE_Name) and trigger the load.
' 2. Upload file to Snowflake internal stage
snowflakeHelper.TransferFileToSnowflake(si, fullFilePath, fileNameOnly)
' 3. TRUNCATE + COPY INTO using the staged file
Using dbConnApp As DbConnInfo = BRApi.Database.CreateExternalDbConnInfo(si, connString)
' Truncate Target
Dim truncateSql As String = "TRUNCATE TABLE " & sTableName & ";"
BRApi.Database.ExecuteSql(dbConnApp, truncateSql, True)
' Execute Bulk Load (Single Transaction)
Dim copySql As New StringBuilder()
copySql.AppendLine("COPY INTO " & sTableName)
copySql.AppendLine("FROM @DB_NAME.Schema_Name.STAGE_Name/" & fileNameOnly)
copySql.AppendLine("FILE_FORMAT = (FORMAT_NAME = 'CSV_NOHEADER');")
BRApi.Database.ExecuteSql(dbConnApp, copySql.ToString(), True)
' Clean up stage
Dim removeSql As String = "REMOVE @DB_NAME.Schema_Name.STAGE_Name/" & fileNameOnly & ";"
BRApi.Database.ExecuteSql(dbConnApp, removeSql, True)
End Using
The Results
By shifting the heavy lifting from the application layer (iterative inserts) to the database layer (bulk copy), the performance gains were immediate.
| Metric | Legacy "Batch Insert" | New "Stage + Copy" | Improvement |
| Runtime | ~8 Minutes | ~1 Minute | 87.5% Faster |
| Network Calls | 50+ | 3 (Put, Copy, Remove) | 94% Reduction |
| Complexity | High (String parsing loops) | Low (Stream + SQL) | Easier Maintenance |
Conclusion
When integrating EPM systems like OneStream with cloud data warehouses, it’s tempting to rely on legacy loops and standard SQL inserts. However, mimicking the native behavior of the target system, in this case, Snowflake's bulk loading capabilities can yield order-of-magnitude performance improvements.
For Architects, the lesson is clear: don't just move data; optimize the transport mechanism for the destination environment.
Opinions expressed by DZone contributors are their own.
Comments