DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Improving the Capabilities of LLM-Based Analytics Copilots With Semantic Search and Fine-Tuning
  • Comprehensive Guide to Data Analysis and Visualization With Pandas and Matplotlib
  • Python Polars: Unleashing Speed and Efficiency for Large-Scale Data Analysis
  • Advancements in AI for Health Data Analysis

Trending

  • SaaS in an Enterprise - An Implementation Roadmap
  • After 9 Years, Microsoft Fulfills This Windows Feature Request
  • Simplifying Multi-LLM Integration With KubeMQ
  • Endpoint Security Controls: Designing a Secure Endpoint Architecture, Part 1
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Harnessing Generative AI in Data Analysis With PandasAI

Harnessing Generative AI in Data Analysis With PandasAI

By applying generative models, PandasAI can understand and respond to human-like queries, execute complex data manipulations, and generate visual representations.

By 
Deepanshu Lulla user avatar
Deepanshu Lulla
·
Sep. 12, 23 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
2.3K Views

Join the DZone community and get the full member experience.

Join For Free

Ever wish your data would analyze itself? Well, we are one step closer to that day. PandasAI is a groundbreaking tool that significantly streamlines data analysis. This Python library expands on the capabilities of the popular Pandas library with the help of generative AI, making automated yet sophisticated data analysis a reality.

By applying generative models like OpenAI's GPT-3.5, PandasAI can understand and respond to human-like queries, execute complex data manipulations, and generate visual representations. Data analysis and AI combine to create insights that open new avenues for businesses and researchers.

This tutorial will explore how to use this powerful library for various tasks. Let’s get started!

Setting up PandasAI

To set up PandasAI, we’ll need to pip install PandasAI as shown below:

pip install pandasai

To interact with OpenAI's models, you'll need an API key. If you don’t have an OpenAI API key, you can sign up for an account on the OpenAI platform and generate your API key there. The following code helps initialize an instance of PandasAI with OpenAI:

Python
import pandas as pd
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

# storing the API Token in Open AI environment
# replace "YOUR_API_KEY" with your generated API key
llm = OpenAI(api_token='YOUR_API_KEY')

#initializing an instance of Pandas AI with openAI environment
pandas_ai = PandasAI(llm, verbose=True, conversational=False)


Generative AI: A Brief Overview

Generative AI is a subset of artificial intelligence that creates new data similar to an existing dataset. Unlike discriminative models, which classify or make predictions based on given data, generative models can produce new content. Generative AI can be applied to text, images, and complex data structures.

For data analysis, generative AI can synthesize realistic datasets for training models, fill in missing data points, and even assist in generating analytical reports. Its capability to understand and mimic data patterns makes it a powerful engine.

How PandasAI Uses Generative AI for Data Cleaning

PandasAI uses generative AI to automate and enhance the data-cleaning process. Rather than manually identifying and fixing errors, you can use natural language prompts to instruct the AI to clean your data. 

For example, you can ask it to "remove duplicate entries" or "fill missing values," and the AI engine will generate a cleaned dataset, saving you valuable time and effort.

Let’s create a data frame with some missing values:

Python
df = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", None, "Australia", "Japan", "China"],
"gdp": [19294482071552, 2891615567872, 2411255037952, None, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
"happiness_index": [None, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})


Now, we can prompt to get clean preprocessed data with the following code:
Python
response = pandas_ai.run(df, "Preprocess this dataframe for me")
print(response)


The output is shown below. You can see that the data has been cleaned.

Feature Engineering With the Help of Generative AI

Creating new features manually in a dataset can be a tedious task. You can instruct the AI engine to generate new features based on existing data columns. 

For example, with the following code snippet, you can effortlessly create new data attributes, significantly enhancing the scope and quality of your data analysis.

Python
response = pandas_ai.run(df, "Create new features from this data")
print(response)


You can see in the output below that the new feature created by AI is a happiness rank. AI put two and two together to understand that the countries could be ranked based on the happiness index and GDP per capita!

Intelligent Data Visualization Through Generative AI

PandasAI improves data visualization by using generative AI to recommend the most fitting visual representations for your dataset. Instead of puzzling over which chart or graph to use, you can get tailored suggestions that help you make the most out of your data.

For example:

Python
response = pandas_ai.run(df, "Which data visualization do you recommend for this data?")
print(response)


You can see in the output below that the data has been visualized in the way the AI engine thinks is best.

You can see in the output below that the data has been visualized in the way the AI engine thinks is best.

Real-Life Use Case: Generative AI in Financial Forecasting

Let’s look at a real-life use case of PandasAI. It can go beyond just analyzing past stock price data; it can simulate future scenarios based on market trends, company performance, and global events. 

We can use generative models to create a range of possible future stock prices, considering volatility and other market indicators. This comprehensive, forward-looking approach allows investors and analysts to better prepare for financial outcomes, making generative AI an invaluable asset in financial forecasting.

Pandas vs. PandasAI: The Generative AI Edge

While Pandas is a well-known library many people use for data manipulation and analysis, PandasAI takes it further by integrating generative AI capabilities. With traditional Pandas, you might write code to filter, transform, and visualize data, but you're restricted to the data you already have. 

PandasAI, on the other hand, can generate new insights and visualizations and even manipulate data based on natural language prompts. The generative AI engine can provide analytics that would be difficult to code manually. Imagine asking your data, "What is the potential revenue for the next quarter?" and receiving a generated report as an answer — this is the power of PandasAI.

Note: We’ve gone over various prompts that PandasAI accepts. If you try out your creative prompts, just a little warning that some may throw errors. Here’s a link to a helpful thread for debugging that issue: Crash "Invalid input data. Must be a Pandas or Polars data frame" on the "row" question.

Conclusion

PandasAI isn't just another data manipulation tool; it's a monumental step in data analysis thanks to its generative AI capabilities. It transcends the limitations of traditional analytics frameworks by not just working with your data but understanding it to generate new insights. 

From filling gaps in datasets to forecasting financial markets, the possibilities are endless. As we move towards a future where data is increasingly complex, the ability to generate meaningful insights from it becomes crucial. PandasAI provides a glimpse into that future, an opportunity you will want to explore.

Additional Resources

  • Another interesting tutorial on PandasAI: PandasAI Library from OpenAI
  • The official documentation: PandasAI
AI Data analysis Pandas

Opinions expressed by DZone contributors are their own.

Related

  • Improving the Capabilities of LLM-Based Analytics Copilots With Semantic Search and Fine-Tuning
  • Comprehensive Guide to Data Analysis and Visualization With Pandas and Matplotlib
  • Python Polars: Unleashing Speed and Efficiency for Large-Scale Data Analysis
  • Advancements in AI for Health Data Analysis

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!