Microsoft Fabric AI Functions: A Practical Overview for Data Engineers

The article reviews and compares AI capabilities in Microsoft Fabric's Spark Notebooks, focusing on accuracy and performance.

Iurii Iurchenko

Apr. 15, 26 · Analysis

Likes (1)

Comment

Save

2.9K Views

In this article, we will focus on AI functions in Microsoft Fabric Spark Notebooks. Then, we will observe in depth the most interesting of them. Our focus will be on their practical use in real-world scenarios and on assessing production readiness. So, we will also validate their accuracy and performance. The code will be added to support those who want to check how these AI functions work in their own environment, with minimal development time.

AI Functions in Microsoft Fabric: My Overview

What are the AI functions in Microsoft Fabric Notebooks? Their purpose is to make it easy to introduce the large language model's data-processing capabilities into Spark Notebooks data processing in Fabric. As a result, with minimal code and configuration, a wide range of people, from data analysts to data engineers, can use them and achieve the necessary practical results.

Having a Spark or Pandas dataframe is a prerequisite. So, the first step is always to load data into a dataframe. We will use a Spark dataframe for our observation.

Next, let us observe the functions in the arsenal. Microsoft documentation offers nine functions as of the time when the article was created. We will be focusing on these in particular: ai.analyze_sentiment(), ai.extract(), and ai.generate_response().

The rest include ai.classify(), ai.fix_grammar(), ai.summarize(), ai.translate() for text processing, ai.similarity() for identifying similar text cohorts, and ai.embed() for building a vector representation of the text. We will skip observing them in this article, given their self-descriptive nature and their common ground with the rest.

To start using these functions, we may need to apply the following steps:

Define a source data frame (Spark or Pandas).
Import the necessary library (Spark or Pandas).
Apply the function to the dataframe, which will add the necessary columns to it with the LLM-processed data.

Now, we may start observing how those functions work.

AI Functions: Practical Use Cases

These functions may be utilized to make data in a data warehouse more structured, enriched, and valuable for further insights. On the other hand, AI functions may introduce some ambiguity in their responses, given their nature. It is important to view AI-generated data with some degree of subjectivity.

Now, let us observe some practical scenarios, where generally AI functions may be useful:

Scenario 1: We want to understand user sentiment from their responses to specific product SKUs, further classify them, observe trends, and improve the process that triggered these results. For that purpose, we may use sentiment analysis, categorization, or even custom prompting.
Scenario 2: We want to unify information across international data sets, applying translation functionality.
Scenario 3: We want to build a quick summary report for different business domains in the KPI report by feeding the necessary information. We may, for example, apply a summarization functionality.

Now, when we observe the fundamentals, we may start with practical implementation steps.

Setting Up the Test Dataset

We will use a test data set with 30 default rows. This data set will have a predefined ID, the review itself, and predefined sentiment and its reason, to further compare with the AI-generated version. Sentiment may have three values here:

1 – negative
2 – neutral
3 – positive

Reason mentions the main predefined reason of the sentiment (both positive and negative). It may be empty, having NA as a value.

    Python
   
 

   from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType

schema = StructType([
    StructField("id", IntegerType(), False),
    StructField("review", StringType(), False),
    StructField("sentiment", IntegerType(), False),
    StructField("reason", StringType(), False),
])

data = [
    (1, "Absolutely love this phone! The battery easily lasts two full days even with heavy usage.", 3, "Battery life"),
    (2, "The phone dies by noon every single day. I have to carry a charger everywhere I go.", 1, "Battery drain"),
    (3, "Decent phone overall. The screen is fine and calls are clear enough. Nothing exciting.", 2, "NA"),
    (4, "The camera on this phone is unreal. Night mode photos look like they were taken by a professional.", 3, "Camera quality"),
    (5, "Photos are grainy and washed out even in daylight. The camera app is slow to launch too.", 1, "Poor camera"),
    (6, "It's an okay phone for the price. Handles basic apps but lags a bit with multitasking.", 2, "NA"),
    (7, "This AMOLED display is stunning! Colors are vibrant and the 120Hz refresh rate makes scrolling buttery smooth.", 3, "Display quality"),
    (8, "The phone gets scorching hot after just ten minutes of gaming. Almost too hot to hold.", 1, "Overheating"),
    (9, "Average mid-range phone. Camera is passable in good light but nothing impressive in the dark.", 2, "NA"),
    (10, "Blazing fast performance! Apps open instantly and I can run multiple heavy games without any stutter.", 3, "Performance"),
    (11, "Constant app crashes and random reboots since the last update. The UI freezes at least twice a day.", 1, "Software bugs"),
    (12, "Solid phone for everyday use. Battery gets me through the day if I'm not gaming too much.", 2, "NA"),
    (13, "The build quality is premium. The titanium frame feels incredible and it survived multiple drops without a scratch.", 3, "Build quality"),
    (14, "The screen cracked from a tiny drop off my nightstand. The back glass shattered even inside a case.", 1, "Fragile build"),
    (15, "Not bad for a budget phone. The screen is decent and it handles social media apps just fine.", 2, "NA"),
    (16, "I'm blown away by the battery. Three days on a single charge with moderate use. Best I've ever had!", 3, "Battery life"),
    (17, "Every photo has a weird yellow tint. The selfie camera makes me look like a completely different person.", 1, "Poor camera"),
    (18, "The phone works. It makes calls, sends texts, and browses the web. Nothing more to say really.", 2, "NA"),
    (19, "The 200MP camera captures insane detail. Zoom shots from across the stadium looked crystal clear.", 3, "Camera quality"),
    (20, "Phone heats up just from watching YouTube. It throttles so badly that even scrolling becomes laggy.", 1, "Overheating"),
    (21, "Mid-tier performance for a mid-tier price. Acceptable but you feel the slowdown with heavier apps.", 2, "NA"),
    (22, "The 4K OLED screen is the best I've ever seen on a phone. HDR content looks absolutely breathtaking.", 3, "Display quality"),
    (23, "Bluetooth keeps disconnecting, the fingerprint sensor fails half the time, and notifications arrive late.", 1, "Software bugs"),
    (24, "A perfectly adequate phone. Does everything you need without standing out in any particular way.", 2, "NA"),
    (25, "This phone handles everything I throw at it. Video editing, 3D gaming, heavy multitasking — zero lag.", 3, "Performance"),
    (26, "The battery barely lasts four hours of screen time. By lunchtime I'm desperately looking for an outlet.", 1, "Battery drain"),
    (27, "It's fine. Camera takes decent photos in daylight and the battery lasts about a day. Standard stuff.", 2, "NA"),
    (28, "The ceramic back and stainless steel frame feel absolutely luxurious. This phone is built like a tank.", 3, "Build quality"),
    (29, "The plastic body feels flimsy and cheap. The back panel creaks when you press it. Embarrassing at this price.", 1, "Fragile build"),
    (30, "Reasonable phone for the money. Not the fastest or prettiest but gets the job done without issues.", 2, "NA"),
]

df = spark.createDataFrame(data, schema=schema)
  

As a next step, we are importing the necessary library. For Spark, we will use this:

    Python
   
   import synapse.ml.spark.aifunc as aifunc

And now, we are ready to apply our AI functions to check how they work.

Sentiment Analysis in Action

By running that code:

    Python
   
   sentiment = df.ai.analyze_sentiment(input_col="review", output_col="sentiment_ai")
sentiment.show()

We will receive this:

    Plain Text
   
 

   +---+--------------------+---------+---------------+------------+------------------------------+
| id|              review|sentiment|         reason|sentiment_ai|review_analyze_sentiment_error|
+---+--------------------+---------+---------------+------------+------------------------------+
|  1|Absolutely love t...|        3|   Battery life|    positive|                          NULL|
|  2|The phone dies by...|        1|  Battery drain|    negative|                          NULL|
|  3|Decent phone over...|        2|             NA|     neutral|                          NULL|
|  4|The camera on thi...|        3| Camera quality|    positive|                          NULL|
|  5|Photos are grainy...|        1|    Poor camera|    negative|                          NULL|
|  6|It's an okay phon...|        2|             NA|     neutral|                          NULL|
......
  

As we may see, the accuracy is relatively high, more than 90%, when comparing sentiment and sentiment_ai (generated by the AI function).

Label Extraction: What the AI Actually Returns

Now, let us try to proactively extract the information about the reasons for the review. For that purpose, we may use that code:

    Python
   
   df_entities = df.ai.extract(labels=["Battery drain", "Poor camera", "Overheating", "Software bugs", "Fragile build", "Other"], input_col="review")
df_entities.show()

It returns the following:

    Plain Text
   
 

   +---+--------------------+---------+---------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
| id|              review|sentiment|         reason|       Battery drain|         Poor camera|         Overheating|       Software bugs|       Fragile build|               Other|review_extract_error|
+---+--------------------+---------+---------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|  1|Absolutely love t...|        3|   Battery life|                NULL|                NULL|                NULL|                NULL|                NULL|                NULL|                NULL|
|  2|The phone dies by...|        1|  Battery drain|[phone dies by noon]|                NULL|                NULL|                NULL|                NULL|                NULL|                NULL|
|  3|Decent phone over...|        2|             NA|                NULL|                NULL|                NULL|                NULL|                NULL|[Decent phone ove...|                NULL|
|  4|The camera on thi...|        3| Camera quality|                NULL|                NULL|                NULL|                NULL|                NULL|                NULL|                NULL|
|  5|Photos are grainy...|        1|    Poor camera|                NULL|[photos grainy an...|                NULL|[camera app slow ...|                NULL|                NULL|                NULL|
|  6|It's an okay phon...|        2|             NA|                NULL|                NULL|                NULL|                NULL|                NULL|[okay phone price...|                NULL|
                                                                                                                    ...
  

My observation shows it has much lower accuracy. It means we tried to categorize the text using the extraction function. And the extraction function does not work like that. It may be good to extract very specific information from plain text, such as phone numbers and addresses. But it cannot extract higher-level information in one step. We may take that into account.

Custom Information Extraction With AI Functions

Now, let us solve the challenge with the extraction of the reason using a slightly different function from the Fabric's tool belt:

    Python
   
 

   responses = df.ai.generate_response(
    prompt="""Analyze this phone review and return ONLY a JSON object with three fields:
- "sentiment": one of "Positive", "Negative", or "Neutral"
- "score": integer 1 (negative) to 3 (positive)
- "reason": one of these categories ONLY:
    Positive: "Battery life", "Camera quality", "Display quality", "Performance", "Build quality"
    Negative: "Battery drain", "Poor camera", "Overheating", "Software bugs", "Fragile build"
    Neutral: "NA"

Review: {review}

Return ONLY the JSON object, no explanation. """,
    is_prompt_template=True,
    output_col="reason_ai"
)

responses.show()
  

This code returns the following:

    Plain Text
   
 

   +---+--------------------+---------+---------------+--------------------+-----------------------+
| id|              review|sentiment|         reason|           reason_ai|generate_response_error|
+---+--------------------+---------+---------------+--------------------+-----------------------+
|  1|Absolutely love t...|        3|   Battery life|```json\n{\n  "se...|                   NULL|
|  2|The phone dies by...|        1|  Battery drain|```json\n{\n  "se...|                   NULL|
|  3|Decent phone over...|        2|             NA|```json\n{\n  "se...|                   NULL|
|  4|The camera on thi...|        3| Camera quality|```json\n{\n  "se...|                   NULL|
|  5|Photos are grainy...|        1|    Poor camera|```json\n{\n  "se...|                   NULL|
|  6|It's an okay phon...|        2|             NA|```json\n{\n  "se...|                   NULL|
|  7|This AMOLED displ...|        3|Display quality|```json\n{\n  "se...|                   NULL|
...
  

First of all, the accuracy of problem identification with this type of function is excellent. It means that custom prompting allows us to create buckets and classify text as needed. But the precondition is a tailored prompt. Also, we may observe that the code above uses the parameter is_prompt_template=True, which allows us to easily inject the necessary information from the other columns into the prompt.

The problem we may see is that the output in the reason_ai column does not match what we would like. It returns a JSON file with additional descriptors. To solve this problem, we may just introduce one more parameter:

    Python
   
   response_format="json_object"

After that, the result will be a pure JSON. And we may easily convert it to the Spark dictionary and use its subelements. Also, we may define a JSON schema if we need.

So, the full code may look like:

    Python
   
 

   responses = df.ai.generate_response(
    prompt="""Analyze this phone review and return ONLY a JSON object with three fields:
- "sentiment": one of "Positive", "Negative", or "Neutral"
- "score": integer 1 (negative) to 3 (positive)
- "reason": one of these categories ONLY:
    Positive: "Battery life", "Camera quality", "Display quality", "Performance", "Build quality"
    Negative: "Battery drain", "Poor camera", "Overheating", "Software bugs", "Fragile build"
    Neutral: "NA"
Review: {review}
Return ONLY the JSON object, no explanation. """,
    is_prompt_template=True,
    output_col="reason_ai",
    response_format="json_object",
)
responses.show()
  

And it will return a pure JSON in the column "reason_ai":

    Plain Text
   
 

   +---+--------------------+---------+---------------+--------------------+-----------------------+
| id|              review|sentiment|         reason|           reason_ai|generate_response_error|
+---+--------------------+---------+---------------+--------------------+-----------------------+
|  1|Absolutely love t...|        3|   Battery life|{\n  "sentiment":...|                   NULL|
|  2|The phone dies by...|        1|  Battery drain|{\n  "sentiment":...|                   NULL|
|  3|Decent phone over...|        2|             NA|{\n  "sentiment":...|                   NULL|
...
  

Performance: Speed and Limits

Now, let us observe the critical implementation parts of the AI functions in Microsoft Fabric. First of all, what about performance? To answer that question, I multiplied the source data frame by 100, which yielded 3,000 rows. And measured query performance three times to get the average. For contextual awareness, I used the F2 capacity model in my Fabric as an environment to run all these tests.

    Python
   
   from functools import reduce
from pyspark.sql import DataFrame

df_large = reduce(DataFrame.unionAll, [df] * 100)
df_large.cache()

print(f"Original count: {df.count()}")
print(f"Multiplied count: {df_large.count()}")

# Original count: 30
# Multiplied count: 3000

Then, I ran that code to get the timings:

    Python
   
 

   import time
start = time.time()

responses_large = df_large.ai.generate_response(
    prompt="""Analyze this phone review and return ONLY a JSON object with three fields:
- "sentiment": one of "Positive", "Negative", or "Neutral"
- "score": integer 1 (negative) to 3 (positive)
- "reason": one of these categories ONLY:
    Positive: "Battery life", "Camera quality", "Display quality", "Performance", "Build quality"
    Negative: "Battery drain", "Poor camera", "Overheating", "Software bugs", "Fragile build"
    Neutral: "NA"

Review: {review}

Return ONLY the JSON object, no explanation. """,
    is_prompt_template=True,
    output_col="reason_ai"
)
display(responses_large)

end = time.time()
print(f"Execution time: {end - start:.2f} seconds")

# Execution time: 46.28 seconds
# Execution time: 83.23 seconds
# Execution time: 64.10 seconds
  

So, we may conclude that the average processing time is 64 seconds to process 3,000 rows. It shows about 46 rows per second. According to the official documentation, AI functions now support up to 200 parallel executions of the prompts. With a larger data volume, we may potentially achieve faster results by ignoring the time required to answer each request.

Now, let us observe that configuration from a different angle. According to the official documentation, we may configure a lot of parameters for these AI functions, which include:

Which LLM model to use (include custom ones uploaded to the MS Foundry)
Concurrence level
Temperature of LLM response (it defines the creativity/conservativity level of the LLM)
And other parameters

Generally, it means that we may use different models,

Are Fabric's AI Functions Worth Using?

Summarizing the previous observations, we may conclude that these AI functions in Microsoft Fabric are at a good maturity level to start using in production. They may have some limitations related to the nature of the LLM performance. When comparing AI functions with built-in functions, or even with custom UDF functions, we may find AI functions very slow. It means we need to use them very efficiently, ideally on the most critical or aggregated information. But if these conditions are satisfied, AI functions may bring significant power to enrich our data with additional insights.

AI Data (computing) large language model

Opinions expressed by DZone contributors are their own.

Related

Trending