Checklist: Convert Row-by-Row to Bulk Processing w/FORALL and BULK COLLECT

DZone 's Guide to

Checklist: Convert Row-by-Row to Bulk Processing w/FORALL and BULK COLLECT

How to to fix a loop in your code that executes non-query DML statements.

· Database Zone ·
Free Resource

Scenario: You have a loop in your code that executes one or more non-query DML statements inside of it. It is running too slowly and you need to make it go faster. Can FORALL with BULK COLLECT help? Follow this checklist for a successful conversion.

1. Analyze and Document All Aspects of Your Transaction

Behavior post-conversion must be the same. Of particular importance is to be clear on what happens when an error occurs. Do you stop immediately? Do you log the error and continue? If multiple DML statements inside the loop, how does an error in one statement affect what happens in the other statements? Do you commit at the end of the process, with each iteration of the loop?

2. Build a Regression Test From the Above Analysis

Sorry, but it must be said: the best (only?) way to ensure that your bulk-ified logic is error-free is to turn your documentation of current behavior into a regression test to also run against the new code. Check out SQL Developer's unit testing feature, utPLSQL or another framework.

3. Make Sure You Need to Use PL/SQL's Bulk Processing Features

Can you do everything you need in "pure" SQL? Can you leverage DBMS_PARALLEL_EXECUTE or pipelined table functions? Generally, you need a PL/SQL-driven solution if you need to perform complex procedural steps on each row of data. So you do that first in PL/SQL on collections, then push back to the database.

4. Implement Bulk Processing Phases as Nested Subprograms

There are three basic "phases" with bulk processing:

  1. Load collections, usually with BULK COLLECT.

  2. Modify collections (complex procedural logic). That's why you can't use "pure" SQL.

  3. Push collections to table(s) with FORALL. One for each DML statement.

Bulk code can get very long and complex, so the best approach is to encapsulate each step inside its own subprogram and keep the main body short and readable. And if any of your subprograms has an OUT or IN OUT collection type parameter, don't forget to add the NOCOPY hint!

5. Use BULK COLLECT LIMIT and Soft-code the Limit.

Avoid "unlimited" SELECT BULK COLLECT INTO statements, since you could use too much PGA memory (now or in the future). Instead, declare a cursor, and within a simple loop FETCH BULK COLLECT INTO with a LIMIT clause. Soft-code the limit, preferably as a parameter to the subprogram. And keep in mind:

  • A default of 100 is a good start but be ready to experiment with larger numbers.
  • To terminate the loop, use EXIT WHEN your_collection.COUNT = 0; immediately after FETCH or EXIT WHEN your_cursor%NOTFOUND; just before END LOOP statement.

6. Determine How to Communicate Between FORALL Steps

If your loop has > 1 DML statement, it will have > 1 FORALL statement. An error in one DML statement often affects what happens in a subsequent DML statement. So when you switch to bulk, you must communicate the impact of failures in one FORALL to another, later one. SAVE EXCEPTIONS and SQL%BULK_EXCEPTIONS will be key for this.

7. FORALL Bind Arrays Sparse or Selectively Using Bind Array Elements?

BULK COLLECT always fills sequentially, but sometimes you will have sparse arrays or need to selectively use elements in your bind array for DML processing. In this case, switch from FORALL indx IN low .. high to FORALL indx IN INDICES OF or FORALL indx IN VALUES OF.

Links to resources on bulk processing:

pl/sql ,performance

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}