Optimizing Financial Data Pipelines: Accelerating OneStream-to-Snowflake Exports by 85%

A case study on optimizing financial data pipelines by refactoring legacy SQL batch inserts into a "Stage and Copy" architecture.

Sridhar Mannava

Jan. 13, 26 · Analysis

Likes (3)

Comment

Save

976 Views

In the world of Enterprise Performance Management (EPM), the "Financial Close" is a race against the clock. As an Architect, my goal is to ensure that when the FP&A team finishes their forecast in OneStream, that data is available in our Snowflake Data Warehouse immediately for downstream analytics.

Recently, we encountered a significant bottleneck. Exporting a medium-sized forecast dataset (~500K records) from OneStream to Snowflake was taking over 8 minutes. This latency was unacceptable for our executive team, who needed near real-time "What-If" scenario analysis.

By refactoring our data integration pattern from a "Batch Insert" model to a "Stage and Copy" model, we reduced this runtime to under 1 minute — an 85% performance improvement. Here is the technical breakdown of how we architected this solution using VB.NET.

The Bottleneck: The "Batch Insert" Anti-Pattern

Initially, our export process relied on a standard SQL injection pattern. We would pull the data from the OneStream cube into a DataTable, loop through the rows, and construct massive SQL INSERT statements.

Even though we attempted to optimize this by chunking the inserts (sending 16,000 rows per batch), the approach was fundamentally flawed for cloud data warehousing.

The "Before" Code Logic

The legacy function worked roughly like this:

Iterate through the DataTable.
Concatenate string values into a massive SQL command (INSERT INTO Table VALUES (...), (...);).
Execute the command against Snowflake over the network every 16k rows.

    VB.NET
   
 

   ' LEGACY APPROACH (Slow ~8 Minutes)
' Loop thru datatable (chunk INSERTS into n record intervals)
For Each dr As DataRow In dt.Rows
    i = i + 1
    ' Concatenate row values
    cvValue = "( "
    For Each dc As DataColumn In dt.Columns
        cvValue = cvValue & "'" & (dr(dc.ColumnName).ToString & "',")
    Next
    
    ' Append to batch buffer
    cvInsert.append(cvValue & ",")
    
    ' Execute batch every 16,384 rows
    If ((i Mod insertCounter) = 0 ) Then
        sqlInsert.Append("INSERT INTO " & sTableName & " VALUES ")
        sqlInsert.Append(cvInsert)
        ' Network Call to Snowflake (High Latency)
        brapi.Database.ExecuteActionQuery(dbConnApp, sqlInsert.ToString, True, True)
    End If 
Next
  

Why This Failed

Network latency: Even with batching, dozens of round-trip network calls were required.
Parsing overhead: Snowflake had to parse and plan an insert statement for every batch.
Transaction logging: Each batch was treated as an individual transaction, increasing write overhead.

The Solution: The "Stage and COPY INTO" Pattern

Snowflake (and most cloud data warehouses) is optimized for bulk ingestion. The fastest way to load data isn't INSERT; it is COPY INTO.

We rewrote the Extender Rule to follow a three-step architecture:

Stream to disk: Write the DataTable to a local CSV file on the OneStream application server using a buffered StreamWriter.
Stage the file: Upload the CSV to a Snowflake Internal Stage using a PUT command.
Bulk load: Execute a single COPY INTO command to ingest the entire file instantly.

Step 1: High-Performance CSV Writing

We moved away from string concatenation and used a StreamWriter with a 1MB buffer. This minimizes memory pressure on the OneStream server, even when processing large datasets.

    VB.NET
   
 

   ' NEW APPROACH (Fast ~1 Minute)
' 1. Write CSV to disk with Buffered Stream
Using writer As New StreamWriter(fullFilePath, False, Encoding.UTF8, 65536)
    Dim sb As New StringBuilder(1048576) ' ~1MB buffer
    Dim counter As Integer = 0

    For Each r As DataRow In dt.Rows
        Dim fields(dt.Columns.Count - 1) As String
        ' Loop columns and escape quotes
        For i As Integer = 0 To dt.Columns.Count - 1
            Dim field As String = If(r(i) Is DBNull.Value, "", r(i).ToString())
            If field.Contains("""") Then field = field.Replace("""", """""")
            fields(i) = """" & field & """"
        Next
        
        sb.AppendLine(String.Join(",", fields))
        counter += 1
        
        ' Flush every 10k rows to avoid Large Object Heap fragmentation
        If counter Mod 10000 = 0 Then
            writer.Write(sb.ToString())
            sb.Clear()
        End If
    Next
    ' Final flush
    If sb.Length > 0 Then writer.Write(sb.ToString())
End Using
  

Step 2 & 3: Upload and Copy

Once the file is on disk, we utilize a helper function to push the file to the Snowflake Internal Stage (@DB_NAME.Schema_Name.STAGE_Name) and trigger the load.

    VB.NET
   
 

   ' 2. Upload file to Snowflake internal stage
snowflakeHelper.TransferFileToSnowflake(si, fullFilePath, fileNameOnly)

' 3. TRUNCATE + COPY INTO using the staged file
Using dbConnApp As DbConnInfo = BRApi.Database.CreateExternalDbConnInfo(si, connString)
    
    ' Truncate Target
    Dim truncateSql As String = "TRUNCATE TABLE " & sTableName & ";"
    BRApi.Database.ExecuteSql(dbConnApp, truncateSql, True)

    ' Execute Bulk Load (Single Transaction)
    Dim copySql As New StringBuilder()
    copySql.AppendLine("COPY INTO " & sTableName)
    copySql.AppendLine("FROM @DB_NAME.Schema_Name.STAGE_Name/" & fileNameOnly)
    copySql.AppendLine("FILE_FORMAT = (FORMAT_NAME = 'CSV_NOHEADER');")

    BRApi.Database.ExecuteSql(dbConnApp, copySql.ToString(), True)
    
    ' Clean up stage
    Dim removeSql As String = "REMOVE @DB_NAME.Schema_Name.STAGE_Name/" & fileNameOnly & ";"
    BRApi.Database.ExecuteSql(dbConnApp, removeSql, True)
End Using
  

The Results

By shifting the heavy lifting from the application layer (iterative inserts) to the database layer (bulk copy), the performance gains were immediate.

Metric	Legacy "Batch Insert"	New "Stage + Copy"	Improvement
Runtime	~8 Minutes	~1 Minute	87.5% Faster
Network Calls	50+	3 (Put, Copy, Remove)	94% Reduction
Complexity	High (String parsing loops)	Low (Stream + SQL)	Easier Maintenance

Conclusion

When integrating EPM systems like OneStream with cloud data warehouses, it’s tempting to rely on legacy loops and standard SQL inserts. However, mimicking the native behavior of the target system, in this case, Snowflake's bulk loading capabilities can yield order-of-magnitude performance improvements.

For Architects, the lesson is clear: don't just move data; optimize the transport mechanism for the destination environment.

Data (computing) sql

Opinions expressed by DZone contributors are their own.

Related

Trending