Harnessing Generative AI in Data Analysis With PandasAI
By applying generative models, PandasAI can understand and respond to human-like queries, execute complex data manipulations, and generate visual representations.
Join the DZone community and get the full member experience.
Join For FreeEver wish your data would analyze itself? Well, we are one step closer to that day. PandasAI is a groundbreaking tool that significantly streamlines data analysis. This Python library expands on the capabilities of the popular Pandas library with the help of generative AI, making automated yet sophisticated data analysis a reality.
By applying generative models like OpenAI's GPT-3.5, PandasAI can understand and respond to human-like queries, execute complex data manipulations, and generate visual representations. Data analysis and AI combine to create insights that open new avenues for businesses and researchers.
This tutorial will explore how to use this powerful library for various tasks. Let’s get started!
Setting up PandasAI
To set up PandasAI, we’ll need to pip install PandasAI as shown below:
pip install pandasai
To interact with OpenAI's models, you'll need an API key. If you don’t have an OpenAI API key, you can sign up for an account on the OpenAI platform and generate your API key there. The following code helps initialize an instance of PandasAI with OpenAI:
import pandas as pd
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI
# storing the API Token in Open AI environment
# replace "YOUR_API_KEY" with your generated API key
llm = OpenAI(api_token='YOUR_API_KEY')
#initializing an instance of Pandas AI with openAI environment
pandas_ai = PandasAI(llm, verbose=True, conversational=False)
Generative AI: A Brief Overview
Generative AI is a subset of artificial intelligence that creates new data similar to an existing dataset. Unlike discriminative models, which classify or make predictions based on given data, generative models can produce new content. Generative AI can be applied to text, images, and complex data structures.
For data analysis, generative AI can synthesize realistic datasets for training models, fill in missing data points, and even assist in generating analytical reports. Its capability to understand and mimic data patterns makes it a powerful engine.
How PandasAI Uses Generative AI for Data Cleaning
PandasAI uses generative AI to automate and enhance the data-cleaning process. Rather than manually identifying and fixing errors, you can use natural language prompts to instruct the AI to clean your data.
For example, you can ask it to "remove duplicate entries" or "fill missing values," and the AI engine will generate a cleaned dataset, saving you valuable time and effort.
Let’s create a data frame with some missing values:
df = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", None, "Australia", "Japan", "China"],
"gdp": [19294482071552, 2891615567872, 2411255037952, None, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
"happiness_index": [None, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})
response = pandas_ai.run(df, "Preprocess this dataframe for me")
print(response)
Feature Engineering With the Help of Generative AI
Creating new features manually in a dataset can be a tedious task. You can instruct the AI engine to generate new features based on existing data columns.
For example, with the following code snippet, you can effortlessly create new data attributes, significantly enhancing the scope and quality of your data analysis.
response = pandas_ai.run(df, "Create new features from this data")
print(response)
Intelligent Data Visualization Through Generative AI
PandasAI improves data visualization by using generative AI to recommend the most fitting visual representations for your dataset. Instead of puzzling over which chart or graph to use, you can get tailored suggestions that help you make the most out of your data.
For example:
response = pandas_ai.run(df, "Which data visualization do you recommend for this data?")
print(response)
You can see in the output below that the data has been visualized in the way the AI engine thinks is best.
Real-Life Use Case: Generative AI in Financial Forecasting
Let’s look at a real-life use case of PandasAI. It can go beyond just analyzing past stock price data; it can simulate future scenarios based on market trends, company performance, and global events.
We can use generative models to create a range of possible future stock prices, considering volatility and other market indicators. This comprehensive, forward-looking approach allows investors and analysts to better prepare for financial outcomes, making generative AI an invaluable asset in financial forecasting.
Pandas vs. PandasAI: The Generative AI Edge
While Pandas is a well-known library many people use for data manipulation and analysis, PandasAI takes it further by integrating generative AI capabilities. With traditional Pandas, you might write code to filter, transform, and visualize data, but you're restricted to the data you already have.
PandasAI, on the other hand, can generate new insights and visualizations and even manipulate data based on natural language prompts. The generative AI engine can provide analytics that would be difficult to code manually. Imagine asking your data, "What is the potential revenue for the next quarter?" and receiving a generated report as an answer — this is the power of PandasAI.
Note: We’ve gone over various prompts that PandasAI accepts. If you try out your creative prompts, just a little warning that some may throw errors. Here’s a link to a helpful thread for debugging that issue: Crash "Invalid input data. Must be a Pandas or Polars data frame" on the "row" question.
Conclusion
PandasAI isn't just another data manipulation tool; it's a monumental step in data analysis thanks to its generative AI capabilities. It transcends the limitations of traditional analytics frameworks by not just working with your data but understanding it to generate new insights.
From filling gaps in datasets to forecasting financial markets, the possibilities are endless. As we move towards a future where data is increasingly complex, the ability to generate meaningful insights from it becomes crucial. PandasAI provides a glimpse into that future, an opportunity you will want to explore.
Additional Resources
- Another interesting tutorial on PandasAI: PandasAI Library from OpenAI
- The official documentation: PandasAI
Opinions expressed by DZone contributors are their own.
Comments