SELECT AI Query Integration Using Oracle Autonomous Database 26AI and OpenAI

A custom framework testing Oracle 26ai’s ability to convert natural language into SQL using the 22 TPC-H benchmark. With no prompt engineering, it achieves high accuracy.

Sanjay Mishra

Apr. 01, 26 · Analysis

Likes (0)

Comment

Save

3.6K Views

This is one of the first studies on Oracle 26 AI, which was launched in January 2026. Instead of writing complex SQL queries, Oracle Autonomous Database introduced Oracle 26ai, which uses natural-language programming techniques to write SQL queries. As part of this research paper, I am examining the capabilities of the SELECT AI. I will be examining the accuracy and latency of LLMs.

Setup

I used approximately 10 GB of data generated by my Python script (the underlying tables are not different from what is available in TPC-H and Kaggle). I inserted my own data to enable this in a faster way. Please refer to https://github.com/sanmish4ds/oracle26ai-eval/blob/main/data_setup.py.

I have also created an Oracle Autonomous Database environment, which is free, and I found it online easy enough for a 5–10 minute setup:

Sign up for Oracle Cloud.
Go to Autonomous Database in the OCI console.
Click "Create Autonomous Database."
Select the "Always Free" tier.
Choose "Data Warehouse" workload.
Version: 26ai.
Download wallet (credentials).
You'll get a wallet.zip file with connection info.

Once you follow the above and log in, create an OpenAI or any LLM key. I used OpenAI.

      SQL
     
 

     BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'OPENAI_CRED',
    username        => 'openai',
    password        => 'sk-your-openai-api-key-here'  -- Get from openai.com
  );
END;
/
    

Configure Select AI to use OpenAI.

      SQL
     
 

     BEGIN
  DBMS_CLOUD_AI.CREATE_PROFILE(
    profile_name => 'EVAL_PROFILE',
    attributes   => '{"provider": "openai",
                      "credential_name": "OPENAI_CRED",
                      "model": "gpt-4",
                      "temperature": 0.1,
                      "max_tokens": 1000}'
  );
END;
/
    

Set as the default profile.

      SQL
     
     BEGIN
  DBMS_CLOUD_AI.SET_PROFILE(profile_name => 'EVAL_PROFILE');
END;
/

Test it works!

      SQL
     
 

     SELECT DBMS_CLOUD_AI.GENERATE(
         prompt       => 'show me all customer ',
         profile_name => 'OPENAI_GPT4',
         action       => 'runsql'
       ) FROM DUAL;
    

Refer to the following GitHub file to further see the file https://github.com/sanmish4ds/oracle26ai-eval/blob/main/Data-Setup-Help-Sqls-26ai.sql.

I have created the NL_SQL_TEST_QUERIES table with 22 TPC-H benchmark queries (simple/medium/complex) and stored the ground truth SQL results as a comparison baseline. We are all set to test how SELECT AI works and the accuracy with which it creates SQL queries.

Here is the ground truth Queries link on GitHub: https://github.com/sanmish4ds/oracle26ai-eval/blob/main/TPCH_22_QUERIES.txt.

Execution

Here is the Python project structure:

I created a Python orchestrator (main.py) that runs two experiments:

Accuract test
Latency test

Accuracy Test

The accuracy test compares AI-generated SQL result counts with ground truth.

Step-by-Step

Load ground truth questions: Gets 22 test questions ordered Q1-Q22
For each question:
- Generate SQL: Asks Oracle AI to write SQL from the English question (using DBMS_CLOUD_AI)
- Run AI SQL: Run the AI-generated SQL and counts results using COUNT(*)
- Run ground truth SQL: Run the ground truth SQL and counts results
- Compare: Checks if both returned the same number of rows
Smart pattern detection:
- If row counts of both AI and Ground Truth match → marks as successful.
- Also recognize equivalent SQL patterns (e.g., ROWNUM vs. FETCH FIRST). Sometimes the LLM chooses a different—and even better—approach for generating queries; for example, it may avoid using COUNT(*) when counting rows.
Save results: Saves to a CSV file with columns: query_id, question, AI SQL, row counts, success status, and visualizes the comparison.

Accuracy Test Summary and Result Visualization

Total queries tested: 22

By SQL complexity: The table below shows that, even though queries are simpler, the LLM is not giving an exact match, so we probably need a better prompt.

Category	Success Rate	Exact Match	Semantic Match	Total Queries
Complex	90.0%	90.0%	90.0%	10
Medium	100.0%	100.0%	100.0%	8
Simple	100.0%	75.0%	75.0%	4
Overall Total	95.5%	90.9%	90.9%	22

Accuracy Metrics Visually Presented by Graphs

Latency Test

The latency test measures LLM generation time and Oracle execution time.

Step-by-Step

Load questions: Gets 22 ground truth test questions
For each question (3-stage timing):
- Stage 1: LLM generation time (thinking)
  - Asks Oracle AI to generate SQL from English
  - Stops before running the SQL
  - Records how long AI took to think and write the SQL
  - Result: llm_latency_ms (e.g., 2500ms = 2.5 seconds)
- Stage 2: Oracle execution time (doing)
  - Runs the ground truth SQL
  - Records database query execution time
  - Result: oracle_exe_ms (e.g., 500ms = 0.5 seconds)
- Stage 3: Ground truth execution
  - Runs the correct SQL to compare timing
  - Records for comparison
Calculate metrics:
- Total time = LLM time + Oracle execution time
- Overhead ratio = LLM time ÷ Oracle time (how much slower is LLM vs. SQL?)
- Example: If AI takes 3000ms and database takes 1000ms → Ratio = 3x (AI is 3x slower than database)
Save and report:
- Saves to CSV with columns: query_id, llm_latency_ms, oracle_exe_ms, total_latency_ms, overhead_ratio
- Prints statistics: Mean, median, P95, P99 latencies
- Shows breakdown: "Average LLM: 3500ms, Average Oracle: 500ms, Ratio: 7x"

Latency Test Summary and Result Visualization

Latency analysis: LLM is doing the heavy lifting, as shown in the table below.

Metric	Latency (ms)
Mean (Average)	3,906.64 ms
Median	3,872.82 ms
P95 (95th Percentile)	4,812.40 ms
P99 (99th Percentile)	5,018.30 ms

Latency breakdown: The table below explains that the time spent during SQL by the LLM generative phase is significantly more time-intensive than the execution phase.

Component	Avg. Time (ms)	% of Total Latency
LLM Generation	3,691.35 ms	94.5%
Oracle Execution	215.29 ms	5.5%
Total	3,906.64 ms	100%

Visualization Using a Graph

Conclusion

I built a comprehensive framework to measure Oracle Database 26ai's capability to generate SQL from natural language, utilizing 22 TPC-H benchmark queries. While this technology is highly promising and can save developers, engineers, and data scientists significant time, thorough validation remains essential.

One of the most critical observations in my research is the model's ability to generate efficient queries; for instance, where the ground truth was a broad COUNT(*), the AI intelligently selected only the specific columns required.

Do note that even the latency metrics indicate the LLM takes time to generate SQL, it is important to understand that a developer would likely take longer to manually draft a complete, complex query.

I kept prompt engineering outside the current scope of my research, as it has the potential to drive even higher accuracy, making SELECT AI of Oracle 26ai an invaluable tool for software engineers.

Refer to the GitHub code repo for full functional code: https://github.com/sanmish4ds/oracle26ai-eval/tree/main.

AI Oracle Database sql

Opinions expressed by DZone contributors are their own.

Related

Trending

SELECT AI Query Integration Using Oracle Autonomous Database 26AI and OpenAI

A custom framework testing Oracle 26ai’s ability to convert natural language into SQL using the 22 TPC-H benchmark. With no prompt engineering, it achieves high accuracy.

Setup

Execution

Accuracy Test

Step-by-Step

Accuracy Test Summary and Result Visualization

Accuracy Metrics Visually Presented by Graphs

Latency Test

Step-by-Step

Latency Test Summary and Result Visualization

Visualization Using a Graph

Conclusion

Related

Partner Resources