Develop With OCI Real-Time Speech Transcription and Oracle Database NL2SQL/Select AI To Speak With Your Data

Use a combination of real-time speech-to-text, natural language to SQL, and Gen AI, to talk with your data just as you would to a person and in real-time.

Paul Parkinson

Aug. 06, 24 · Tutorial

Likes (4)

Comment

Save

5.2K Views

Speak in your natural language, ask questions about your data, and have the answers returned to you in your natural language as well: that's the objective, and what I'll show in this quick blog and, as always, provide full src repos for as well. I'll leave the use cases up to you from there.

You can learn more about these Oracle Database features here for the free cloud version and here for the free container/image version. Also, you can check out the Develop with Oracle AI and Database Services: Gen, Vision, Speech, Language, and OML workshop, which explains how to create this application and numerous other examples as well as the GitHub repos that contain all the src code.

Now, let's get into it. First, I'll show the setup for the Select AI database side (which, in turn, calls Gen AI service), then the OCI Real-time Speech AI Transcription service, and finally the front-end Python app that brings it all together.

Oracle Database NL2SQL/Select AI (With Gen AI)

While Oracle Database version 23ai contains a number of AI features such as vector search, RAG, Spatial AI, etc., NL2SQL/Select AI was introduced in version 19.

We have a stateless Python application, so we'll be making calls to:

DBMS_CLOUD_AI.GENERATE(
            prompt       => :prompt,
            profile_name => :profile_name,
            action       => :action)

Let's look at each of these three arguments.

The prompt is the natural language string, starting with "select ai."
The profile_name is the name of the AI Profile created in the database for OCI Generative AI (or whatever AI service is being used) with the credential info and optionally an object_list with meta info about the data. Oracle Autonomous Database supports models from OCI Generative AI, Azure OpenAI, OpenAI, and Cohere. In our sample app, we use the Llama 3 model provided by OCI Generative AI. Here is an example code to create_profile:

    PLSQL
   
 

   dbms_cloud_admin.enable_resource_principal(username  => 'MOVIESTREAM');

dbms_cloud_ai.create_profile(
        profile_name => 'genai',
        attributes =>       
            '{"provider": "oci",
            "credential_name": "OCI$RESOURCE_PRINCIPAL",
            "comments":"true",            
            "object_list": [
                {"owner": "MOVIESTREAM", "name": "GENRE"},
                {"owner": "MOVIESTREAM", "name": "CUSTOMER"},
                {"owner": "MOVIESTREAM", "name": "PIZZA_SHOP"},
                {"owner": "MOVIESTREAM", "name": "STREAMS"},            
                {"owner": "MOVIESTREAM", "name": "MOVIES"},
                {"owner": "MOVIESTREAM", "name": "ACTORS"}
             ]
            }'
        );
  

Finally, the action is one of four options for the interaction/prompt and type/format of the answers that are returned to you from Oracle Database's Select AI feature. In our sample app, we use narrate; however, we could use others.
- narrate returns the reply as a narration in natural language.
- chat as a chat exchange in natural language
- showsql returns the raw SQL for the answer/query.
- runsql gets the SQL and then runs it and returns the raw query results.

OCI Real-Time Speech Transcription

OCI Real-time Speech Transcription is expected to be released within the month and includes Whisper model multilingual support with diarization capabilities.

Using this service simply requires that certain policies are created to provide access for a given user/compartment/group/tenancy. These can be specified at various levels and would generally be more restricted than the following but this gives a list of the resources needed.

    Plain Text
   
 

   allow any-user to manage ai-service-speech-family in tenancy
allow any-user to manage object-family in tenancy
allow any-user to read tag-namespaces in tenancy
allow any-user to use ons-family in tenancy
allow any-user to manage cloudevents-rules in tenancy
allow any-user to use virtual-network-family in tenancy
allow any-user to manage function-family in tenancy
  

The options for accessing the service from an external client are essentially the same as accessing any OCI/cloud service. In this case, we use an OCI config file and generate a security_token using the following.

    Shell
   
   oci session authenticate ; oci iam region list --config-file /Users/YOURHOMEDIR/.oci/config --profile MYSPEECHAIPROFILE --auth security_token

From there it's just a matter of using the preferred SDK client libraries to call the speech service. In our case, we are using Python.

The Python App

Here is the output of our application where we can see:

A printout of the words (natural language) spoken into the microphone and transcribed by the real-time transcription service.
The trigger of a Select AI command, with "narrate" action, in response to the user saying "select ai."
The results of the call to the Oracle database Select AI function returned in natural language.

Let's take the application step by step.

First, we see the Python imports:

asyncio event processing loop
getpass to get the database and wallet/ewallet.pem passwords from the application prompt
pyaudio for processing microphone events/sound
oracledb thin driver for accessing the Oracle database and making Select AI calls
oci sdk core and speech libraries for real-time speech transcription calls

    Python
   
 

   import asyncio
import getpass

import pyaudio
import oracledb
import oci
from oci.config import from_file
from oci.auth.signers.security_token_signer import SecurityTokenSigner
from oci.ai_speech_realtime import (
    RealtimeClient,
    RealtimeClientListener,
    RealtimeParameters,
)
  

Then we see the main loop where sound from the microphone is fed to the OCI teal-time speech transcription API client and to the cloud services via WebSocket. The client is created by specifying the OCI config mentioned earlier along with the URL of the speech service and the compartment ID.

def message_callback(message):
    print(f"Received message: {message}")
realtime_speech_parameters: RealtimeParameters = RealtimeParameters()
realtime_speech_parameters.language_code = "en-US"
realtime_speech_parameters.model_domain = (
    realtime_speech_parameters.MODEL_DOMAIN_GENERIC
)
realtime_speech_parameters.partial_silence_threshold_in_ms = 0
realtime_speech_parameters.final_silence_threshold_in_ms = 2000
realtime_speech_parameters.should_ignore_invalid_customizations = False
realtime_speech_parameters.stabilize_partial_results = (
    realtime_speech_parameters.STABILIZE_PARTIAL_RESULTS_NONE
)
realtime_speech_url = "wss://realtime.aiservice.us-phoenix-1.oci.oraclecloud.com"
client = RealtimeClient(
    config=config,
    realtime_speech_parameters=realtime_speech_parameters,
    listener=SpeechListener(),
    service_endpoint=realtime_speech_url,
    signer=authenticator(),
    compartment_id="ocid1.compartment.oc1..MYcompartmentID",
)
loop = asyncio.get_event_loop()
loop.create_task(send_audio(client))
loop.create_task(check_idle())
loop.run_until_complete(client.connect())
if stream.is_active():
    stream.close()

If the transcribed speech contains "select ai", the application waits for 2 seconds, and if there is no further speech, takes the command from "select ai" on, and sends it over to the database server using the Oracle Python driver. The following is the code for the connection creation and execution of this using DBMS_CLOUD_AI.GENERATE (prompt, profile_name, action) described earlier.

    Python
   
 

   pw = getpass.getpass("Enter database user password:")
# Use this when making a connection with a wallet
connection = oracledb.connect(
    user="moviestream",
    password=pw,
    dsn="selectaidb_high",
    config_dir="/Users/pparkins/Downloads/Wallet_SelectAIDB",
    wallet_location="/Users/pparkins/Downloads/Wallet_SelectAIDB"
)

def executeSelectAI():
    global cummulativeResult
    print(f"executeSelectAI called cummulative result: {cummulativeResult}")
    # for example prompt => 'select ai I am looking for the top 5 selling movies for the latest month please',
    query = """SELECT DBMS_CLOUD_AI.GENERATE(
                prompt       => :prompt,
                profile_name => 'openai_gpt35',
                action       => 'narrate')
            FROM dual"""
    with connection.cursor() as cursor:
        cursor.execute(query, prompt=cummulativeResult)
        result = cursor.fetchone()
        if result and isinstance(result[0], oracledb.LOB):
            text_result = result[0].read()
            print(text_result)
        else:
            print(result)
    # Reset cumulativeResult after execution
    cummulativeResult = ""
  

Video

A walkthrough of this content can also be viewed here:

Concluding Notes

The next logical step of course is to add text-to-speech (TTS) functionality for the reply and OCI has a new service for that as well. I'll post an updated example including this in the near future.

Thank you for reading and please do not hesitate to contact me with any questions or feedback you may have. I'd love to hear from you.

AI Oracle Database Data (computing) Python (language) generative AI

Opinions expressed by DZone contributors are their own.

Related

Trending