Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
PlatformCon 2024 Session Recap: Platform Engineering and AI
The Role of AI in Low- and No-Code Development
In the information age, dealing with huge PDFs happens on a day-to-day basis. Most of the time, I have found myself drowning in a sea of text, struggling to find the information I wanted or needed reading page after page. But, what if I can ask questions about the PDF and recover not only the relevant information but also the page contents? That's where the Retrieval-Augmented Generation (RAG) technique comes into play. By combining these cutting-edge technologies, I have created a locally hosted application that allows you to chat with your PDFs, ask questions, and receive all the necessary context. Let me walk you through the full process of building this kind of application! What Is Retrieval-Augmented Generation (RAG)? Retrieval-Augmented Generation or RAG is a method designed to improve the performance of the LLM by incorporating extra information on a given topic. This information reduces uncertainty and offers more context helping the model respond to the questions in a better way. When building a basic Retrieval-Augmented Generation (RAG) system, there are two main components to focus on: the areas of Data Indexing and Data Retrieval and Generation. Data Indexing enables the system to store and/or search for documents whenever needed. Data Retrieval and Generation is where these indexed documents are queried, the data required is then pulled out, and answers are generated using this data. Data Indexing Data Indexing comprises of four key stages: Data loading: This initial stage involves the ingestion of PDFs, audio files, videos, etc. into a unified format for the next phases. Data splitting: The next step is to divide the content into manageable segments: segmenting the text into coherent sections or chunks that retain the context and meaning. Data embeddings: In this stage, the text chunks are transformed into numerical vectors. This transformation is done using embedding techniques that capture the semantic essence of the content. Data storing: The last step is storing the generated embeddings which is typically in a vector store. Data Retrieval and Generation Retrieval Embedding the query: Transforming the user’s query into an embedding form so it can be compared for similarity with the document embeddings Searching the vector: The vector store contains vectors of different chunks of documents. Thus, by comparing this query embedding with the stored ones, the system determines which chunks are the most relevant to the query. Such comparison is often done with the help of computing cosine similarity or any other similarity metric. Selecting top-k chunks: The system takes the k-chunks closest to the query embedding based on the similarity scores obtained. Generation Combining context and query: The top-k chunks provide the necessary context related to the query. When combined with the user's original question, the LLM receives a comprehensive input that will be used to generate the output. Now that we have more context about it, let's jump into the action! RAG for PDF Document Prerequisites Everything is described in this GitHub repository. There is also a Docker file to test the full application. I have used the following libraries: LangChain: It is a framework for developing applications using Large Language Models (LLMs). It gives the right instruments and approaches to control and coordinate LLMs should they be applied. PyPDF: This is used for loading and processing PDF documents. While PyMuPDF is known for its speed, I have faced several compatibility issues when setting up the Docker environment. FAISS stands for Facebook AI Similarity Search and is a library used for fast similarity search and clustering of dense vectors. FAISS is also good for fast nearest neighbor search, so its use is perfect when dealing with vector embeddings, as in the case of document chunks. I have decided to use this instead of a vector database for simplicity. Streamlit is employed for building the user interface of the application. Streamlit allows for the rapid development of interactive web applications, making it an excellent choice for creating a seamless user experience. Data Indexing Load the PDF document. from langchain_community.document_loaders import PyPDFLoader loader = PyPDFLoader(pdf_docs) pdf_data = loader.load() Split it into chunks. I have used a chunk size of 1000 characters. from langchain.text_splitter import CharacterTextSplitter text_splitter = CharacterTextSplitter( separator="\n", chunk_size=1000, chunk_overlap=150, length_function=len ) docs = text_splitter.split_documents(pdf_data) I have used the OpenAI embedding model and loaded them into the FAISS vector store. from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import FAISS embeddings = OpenAIEmbeddings(api_key = open_ai_key) db = FAISS.from_documents(docs, embeddings) I have configured the retrieval to only the top 3 relevant chunks. retriever = db.as_retriever(search_kwargs={'k': 3}) Data Retrieval and Generation Using the RetrievalQA chain from LangChain, I have created the full Retrieval and Generation system linking into the previous FAISS retriever configured. from langchain.chains import RetrievalQA from langchain import PromptTemplate from langchain_openai import ChatOpenAI model = ChatOpenAI(api_key = open_ai_key) custom_prompt_template = """Use the following pieces of information to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer. Context: {context} Question: {question} Only return the helpful answer below and nothing else. Helpful answer: """ prompt = PromptTemplate(template=custom_prompt_template, input_variables=['context', 'question']) qa = RetrievalQA.from_chain_type(llm=model, chain_type="stuff", retriever=retriever, return_source_documents=True, chain_type_kwargs={"prompt": prompt}) Streamlit I have used Streamlit to create an application where you can upload your own documents and start the RAG process with them. The only parameter required is your OpenAI API Key. I used the book "Cloud Data Lakes for Dummies" as the example for the conversation shown in the following image. Conclusion At a time when information is available in a voluminous form and at users’ disposal, the opportunity to engage in meaningful discussions with documents can go a long way in saving time in the process of mining valuable information from large PDF documents. With the help of the Retrieval-Augmented Generation, we can filter out unwanted information and pay attention to the actual information. This implementation offers a naive RAG solution; however, the possibilities to optimize it are enormous. By using different RAG techniques, it may be possible to further refine aspects such as embedding models, document chunking methods, and retrieval algorithms. I hope that for you, this article is as fun to read as it was for me to create!
In recent years, there have been significant advancements in language models. This progress is a result of extensive training and tuning on billions of parameters, along with benchmarking for commercial use. The origins of this work can be traced back to the 1950s when research in Natural Language Understanding and Processing began. This article aims to provide an overview of the history and evolution of language models over the last 70 years. It will also examine the current available Large Language Models (LLMs), including their architecture, tuning parameters, enterprise readiness, system configurations, and more, to gain a high-level understanding of their training and inference processes. This exploration will allow us to appreciate the progress in this field and assess the options available for commercial use. Finally, we will delve into the environmental impact of deploying these models, including their power consumption and carbon footprint, and understand the measures organizations are taking to mitigate these effects. Brief History About the Advancement of NLU/NLP Over the Last 70 Plus Years Somewhere around the 1950s, Claude Shannon invented the field of Information theory. The work focuses on the encoding problem of messages that need to be transmitted. It introduced concepts like entropy and redundancy in language, that became a fundamental contribution and foundational stone for NLP and computational linguistics. In the year 1957, Noam Chomsky provided theories on syntax and grammar that provided a formal structure for understanding natural languages. This work influenced early computational linguistics and the development of formal grammar for language processing. Moving towards some of the early computational models, a few of them namely Hidden Markov Models (HMMs) early 60s and n-gram models (early 80s) were the early computations models that paved the way for advancements in the field of understanding natural languages from the computational point of view. Hidden Markov Models (HMMs) were used for statistical modeling of sequences, crucial for tasks like speech recognition. They provided a probabilistic framework for modeling language sequences. On the other hand, n-gram models used fixed-length sequences of words to predict the next word in a sequence. They were simple yet effective and became a standard for language modeling for many years. Next in the line were advancements in the neural network and embedding space. In the early 90s, early neural network models like recurrent neural networks (RNNs) and long short-term memory (LSTM) networks were developed. These models allowed for learning patterns in sequential data, a key requirement for language modeling. Later, Techniques like Latent Semantic Analysis (LSA) and later Word2Vec (Mikolov et al., 2013) allowed for dense vector representations of words. Word embeddings captured semantic relationships between words, which improved various NLP tasks significantly. By this time, we are now entering into a phase where data has been exploding across the industries and it was the time as well when some of the key modern-day foundational models were evolved. In the year 2014, the attention mechanism, introduced by Bahdanau et al., allowed models to focus on relevant parts of the input sequence. It significantly improved machine translation and set the stage for more complex architectures. Then one of the breakthroughs surfaced in the year 2017 in a research paper “Attention is all you need” by Vaswani et al that highlights the Transformer Architecture. The transformer model introduced a fully attention-based mechanism, removing the need for recurrence. Transformers enabled parallel processing of data, leading to more efficient training and superior performance on a wide range of NLP tasks. Generative Pre-trained Transformers (GPT) marked a significant milestone in NLP with GPT-1 in 2018, introduced by Radford et al. This model leveraged the concept of pre-training on a large corpus of text followed by fine-tuning on specific tasks, resulting in notable improvements across numerous NLP applications and establishing GPT's architecture as a cornerstone in the field. In the same year, BERT (Bidirectional Encoder Representations from Transformers) by Devlin et al. revolutionized NLP by introducing a bidirectional transformer model that considers the context from both sides of a word, setting new performance benchmarks and popularizing transformer-based models. Subsequent developments saw GPT-2 in 2019, which scaled up the GPT-1 model significantly, demonstrating the power of unsupervised pre-training on even larger datasets and generating coherent, contextually relevant text. GPT-3, released in 2020 with 175 billion parameters, showcased remarkable few-shot and zero-shot learning capabilities, highlighting the potential of large-scale language models for diverse applications, from creative writing to coding assistance. Following BERT, derivatives like RoBERTa, ALBERT, and T5 emerged, offering various adaptations and improvements tailored for specific tasks, enhancing training efficiency, reducing parameters, and optimizing task-specific performance. Progression of Large Language Models The following table provides a brief snapshot of the progression in the space of LLMs. It is not a comprehensive list but provides high-level insights on the type of model, developer for that model, underlying architecture, parameters, type of training data, potential applications, Enterprise worthiness, and bare minimum system specifications to utilize them. Model Developer Architecture Parameters Training Data Applications First Release Enterprise Worthiness System Specifications BERT Google Transformer (Encoder) 340 million (large) Wikipedia, BooksCorpus Sentiment analysis, Q&A, named entity recognition Oct-18 High GPU (e.g., NVIDIA V100), 16GB RAM, TPU GPT-2 OpenAI Transformer 1.5 billion Diverse internet text Text generation, Q&A, translation, summarization Feb-19 Medium GPU (e.g., NVIDIA V100), 16GB RAM XLNet Google/CMU Transformer (Autoregressive) 340 million (large) BooksCorpus, Wikipedia, Giga5 Text generation, Q&A, sentiment analysis Jun-19 Medium GPU (e.g., NVIDIA V100), 16GB RAM RoBERTa Facebook Transformer (Encoder) 355 million (large) Diverse internet text Sentiment analysis, Q&A, named entity recognition Jul-19 High GPU (e.g., NVIDIA V100), 16GB RAM DistilBERT Hugging Face Transformer (Encoder) 66 million Wikipedia, BooksCorpus Sentiment analysis, Q&A, named entity recognition Oct-19 High GPU (e.g., NVIDIA T4), 8GB RAM T5 Google Transformer (Encoder-Decoder) 11 billion (large) Colossal Clean Crawled Corpus (C4) Text generation, translation, summarization, Q&A Oct-19 High GPU (e.g., NVIDIA V100), 16GB RAM, TPU ALBERT Google Transformer (Encoder) 223 million (xxlarge) Wikipedia, BooksCorpus Sentiment analysis, Q&A, named entity recognition Dec-19 Medium GPU (e.g., NVIDIA V100), 16GB RAM CTRL Salesforce Transformer 1.6 billion Diverse internet text Controlled text generation Sep-19 Medium GPU (e.g., NVIDIA V100), 16GB RAM GPT-3 OpenAI Transformer 175 billion Diverse internet text Text generation, Q&A, translation, summarization Jun-20 High Multi-GPU setup (e.g., 8x NVIDIA V100), 96GB RAM ELECTRA Google Transformer (Encoder) 335 million (large) Wikipedia, BooksCorpus Text classification, Q&A, named entity recognition Mar-20 Medium GPU (e.g., NVIDIA V100), 16GB RAM ERNIE Baidu Transformer 10 billion (version 3) Diverse Chinese text Text generation, Q&A, summarization (focused on Chinese) Mar-20 High GPU (e.g., NVIDIA V100), 16GB RAM Megatron-LM NVIDIA Transformer 8.3 billion Diverse internet text Text generation, Q&A, summarization Oct-19 High Multi-GPU setup (e.g., 8x NVIDIA V100), 96GB RAM BlenderBot Facebook Transformer (Encoder-Decoder) 9.4 billion Conversational datasets Conversational agents, dialogue systems Apr-20 High GPU (e.g., NVIDIA V100), 16GB RAM Turing-NLG Microsoft Transformer 17 billion Diverse internet text Text generation, Q&A, translation, summarization Feb-20 High Multi-GPU setup (e.g., 8x NVIDIA V100), 96GB RAM Megatron-Turing NLG Microsoft/NVIDIA Transformer 530 billion Diverse internet text Text generation, Q&A, translation, summarization Oct-20 High Multi-GPU setup (e.g., 8x NVIDIA A100), 320GB RAM GPT-4 OpenAI Transformer ~1.7 trillion (estimate) Diverse internet text Text generation, Q&A, translation, summarization Mar-23 High Multi-GPU setup (e.g., 8x NVIDIA A100), 320GB RAM Dolly 2.0 Databricks Transformer 12 billion Databricks-generated data Text generation, Q&A, translation, summarization Apr-23 High GPU (e.g., NVIDIA A100), 40GB RAM LLaMA Meta Transformer 65 billion (LLaMA 2) Diverse internet text Text generation, Q&A, translation, summarization Jul-23 High Multi-GPU setup (e.g., 8x NVIDIA A100), 320GB RAM PaLM Google Transformer 540 billion Diverse internet text Text generation, Q&A, translation, summarization Apr-22 High Multi-GPU setup (e.g., 8x NVIDIA A100), 320GB RAM Claude Anthropic Transformer Undisclosed Diverse internet text Text generation, Q&A, translation, summarization Mar-23 High Multi-GPU setup (e.g., 8x NVIDIA A100), 320GB RAM Chinchilla DeepMind Transformer 70 billion Diverse internet text Text generation, Q&A, translation, summarization Mar-22 High GPU (e.g., NVIDIA A100), 40GB RAM Bloom BigScience Transformer 176 billion Diverse internet text Text generation, Q&A, translation, summarization Jul-22 High Multi-GPU setup (e.g., 8x NVIDIA A100), 320GB RAM Large Language Models Power Consumption and Carbon Footprint While we are leveraging the huge potential and benefits that LLMs are providing across various segments of the industries It's also important to understand the other implications that LLMs are posing in the space of overall computational resources and how potentially they are having an impact on the other power consumption and carbon footprint. The power consumption and carbon footprint of training large language models have become significant concerns due to their resource-intensive nature. Here’s an overview of these issues based on various studies and estimates: Training and Inference Costs Training large language models such as GPT-3, which has 175 billion parameters, requires significant computational resources. Typically, this process involves the use of thousands of GPUs or TPUs over weeks or months. Utilizing these models in real-world applications, known as inference, also consumes substantial power, especially when deployed at scale. Estimates of Energy Consumption For GPT-3, training consumes approximately 1,287 MWh of power, while training BERT (base) is estimated to require 650 kWh, and BERT (large) requires about 1,470 kWh. Carbon Footprint The carbon footprint of training these models varies depending on the energy source and efficiency of the data center. The use of renewable energy sources can significantly reduce the carbon impact. GPT-3: The estimated carbon emissions for training GPT-3 are around 552 metric tons of CO2e (carbon dioxide equivalent), assuming an average carbon intensity of electricity. BERT: Training BERT (large) is estimated to emit approximately 1.9 metric tons of CO2e. To provide some context, a study from MIT suggested that training a large language model could have a carbon footprint equivalent to the lifetime emissions of five average cars in the United States. Factors Influencing Energy Consumption and Carbon Footprint The energy consumption and carbon footprint of large language models (LLMs) are influenced by several high-level factors. Firstly, model size is crucial; larger models with more parameters demand significantly more computational resources, leading to higher energy consumption and carbon emissions. Training duration also impacts energy use, as longer training periods naturally consume more power. The efficiency of the hardware (e.g., GPUs, TPUs) used for training is another key factor; more efficient hardware can substantially reduce overall energy requirements. Additionally, data center efficiency plays a significant role, with efficiency measured by Power Usage Effectiveness (PUE). Data centers with lower PUE values are more efficient, reducing the energy needed for cooling and other non-computational operations. Lastly, the source of electricity powering these data centers greatly affects the carbon footprint. Data centers utilizing renewable energy sources have a considerably lower carbon footprint compared to those relying on non-renewable energy. These factors combined determine the environmental impact of training and running LLMs. Efforts To Mitigate Environmental Impact To mitigate the energy consumption and carbon footprint of large language models, several strategies can be employed. Developing more efficient training algorithms can reduce computational demands, thus lowering energy use. Innovations in hardware, such as more efficient GPUs and TPUs, can also decrease power requirements for training and inference. Utilizing renewable energy sources for data centers can significantly cut the carbon footprint. Techniques like model pruning, quantization, and distillation can optimize model size and power needs without compromising performance. Additionally, cloud-based services and shared resources can enhance hardware utilization and reduce idle times, leading to better energy efficiency. Recent Efforts and Research Several recent efforts have focused on understanding and reducing the environmental impact of language models: Green AI: Researchers advocate for transparency in reporting the energy and carbon costs of AI research, as well as prioritizing efficiency and sustainability. Efficiency studies: Studies like "Energy and Policy Considerations for Deep Learning in NLP" (Strubell et al., 2019) provide detailed analyses of energy costs and suggest best practices for reducing environmental impact. Energy-aware AI development: Initiatives to incorporate energy efficiency into the development and deployment of AI models are gaining traction, promoting sustainable AI practices. In summary, while large language models offer significant advancements in NLP, they also pose challenges in terms of energy consumption and carbon footprint. Addressing these issues requires a multi-faceted approach involving more efficient algorithms, advanced hardware, renewable energy, and a commitment to sustainable practices in AI development.
Poor data quality can cause inaccurate analysis and decision-making in information-driven systems. Algorithms for Machine learning (ML) classification have emerged as efficient tools for addressing a wide range of data quality issues by automatically finding and correcting anomalies in datasets. There are various methods and strategies used to apply ML classifiers to tasks such as data purification, outlier identification, missing value imputation, and record linkage. The evaluation criteria and performance analysis methodologies used to measure the efficacy of machine learning models in resolving data quality issues are evolving. Overview of Machine Learning Classification Techniques Machine learning classification techniques are critical for recognizing patterns and making projections from input data. Four popular methods are Naive Bayes, Support Vector Machines (SVM), Random Forest, and Neural Networks. Each strategy has unique advantages and disadvantages. Naive Bayes A probabilistic model is based on Bayes' theorem. It assumes feature independence based on the class label. Naive Bayes is renowned for its simplicity as well as its efficacy. Its ability to handle enormous datasets and high-dimensional data sets makes it a popular choice for a variety of applications. Furthermore, it performs well in text classification problems due to the intrinsic sparsity of text data. Naive Bayes is capable of effectively handling both numerical and categorical features. However, its "naive" assumption of feature independence may restrict its usefulness in some cases. Support Vector Machines (SVM) SVM seeks the ideal border or hyperplane that maximizes the margin between various classes in high-dimensional domains. SVM's versatility stems from being able to handle nonlinearly distinguishable data using kernel functions. Large datasets and high-dimensional data benefit greatly from SVM. However, choosing suitable kernel types and optimizing relevant parameters can be difficult during implementation. Furthermore, SVM's performance in high-dimensional feature spaces limits its comprehensibility. Random Forest A combination approach that mixes several decision trees to improve overall prediction accuracy. Random Forest lowers variation by aggregating the results of individual trees and offers feature importance. This approach supports both numerical and category features. While Random Forest produces excellent results, overfitting may occur if the number of trees surpasses a sensible threshold. Neural Networks Neural Networks mimic the structure and functionality of the human brain. Neural Networks understand sophisticated patterns and relationships in data via nodes that are interlinked. Their strength rests in their ability to recognize complicated structures, which makes them important for a variety of applications. In contrast to other methods, constructing and training Neural Networks requires significant computational resources and time investment. Furthermore, their opaque character makes interpretation difficult. Understanding the differences between Naive Bayes, Support Vector Machines, Random Forests, and Neural Networks allows programmers to choose the best technique for their specific use case. The choice is influenced by data size, dimensionality, complexity, interpretability, and available processing resources. Naive Bayes, due to its simplicity and efficacy, may be suitable for text categorization jobs. On the contrary, SVM's robustness to nonlinearly separable data makes it an excellent contender for specialized applications. Meanwhile, Random Forest improves accuracy and minimizes volatility. Finally, although Neural Networks need significant resources and are less interpretable, they display exceptional capabilities in recognizing complicated patterns. Methodologies and Approaches in ML Classification for Data Quality Improvement Machine learning (ML) classification algorithms are crucial for enhancing data quality since they can automatically detect and rectify inconsistent or erroneous data points in large datasets. Recently, there has been a significant increase in interest in investigating new procedures and ways to tackle the difficulties presented by the growing complexity and volume of data. This post will examine notable machine learning classification algorithms that aim to improve data quality. We will investigate their essential characteristics and practical uses. Active Learning (AL) AL is a widely used method that involves the collaboration of human experience with machine learning algorithms to continuously improve the performance of a classifier through iterative refinement. Active learning (AL) commences by manually categorizing a limited number of cases and subsequently trains the classifier using this initial dataset. Subsequently, the computer chooses ambiguous cases, namely those whose true labels are still undetermined, and seeks human verification. Once the ground truth labels are acquired, the classifier enhances its knowledge base and continues to assign labels to new uncertain cases until it reaches a state of convergence. This interactive learning approach enables the system to progressively enhance its comprehension of the underlying data distribution while decreasing the need for human interventions. Deep Learning (DL) A very promising machine learning classification technique that utilizes artificial neural networks (ANNs) that are inspired by the structure and operation of biological neurons. Deep learning models can autonomously acquire feature representations with hierarchy from unprocessed data by applying multiple layers of nonlinear transformations. Deep learning is highly proficient in processing intricate data formats, such as images, sounds, and text, which allows it to achieve cutting-edge performance in a wide range of applications. Ensemble Learning (EL) A robust classification approach in machine learning that combines numerous weak learners to form a strong classifier. Ensemble learning methods, such as Random Forest, Gradient Boosting, and AdaBoost, create a variety of decision trees or other base models using subsets of the given data. During the prediction process, each individual base model contributes a vote, and the ultimate output is chosen by combining or aggregating these votes. Ensemble learning (EL) models generally achieve higher accuracy and resilience compared to individual-based learners because they have the ability to capture complementary patterns in the data. Feature Engineering (FE) A crucial part of ML classification pipelines involves transforming raw data into meaningful representations that may be used as input for ML models. Feature extraction techniques, such as Bag of Words, TF-IDF, and Word Embeddings, have the objective of retaining significant semantic connections between data pieces. Bag of Words represents text data as binary vectors indicating the presence or absence of certain terms, while TF-IDF applies weights to terms based on their frequency distribution in texts. Word Embeddings, such as Word2Vec and Doc2Vec, transform words or complete documents into compact vector spaces while maintaining their semantic significance. Evaluation metrics are crucial instruments for quantifying the effectiveness of machine learning classification systems and objectively evaluating their performances. Some common evaluation metrics include Precision, Recall, F1 Score, and Accuracy. The precision metric is the ratio of correctly predicted positive instances to all anticipated positive instances. On the other hand, Recall calculates the percentage of real positive cases that are accurately identified. The F1 Score is the harmonic mean of Precision and Recall which provides a well-balanced evaluation using both false negatives and false positives. Accuracy is a measure of the proportion of correctly identified cases compared to the total number of samples. Conclusion ML classification algorithms offer valuable approaches to tackle the difficulties of upholding high data quality in the constantly changing data environments nowadays. Techniques such as Active Learning, Deep Learning, Ensemble Learning, Feature Engineering, and Evaluation Metrics are constantly expanding the limits of what can be achieved in data analysis and modeling. By adopting these innovative processes and approaches, firms can uncover concealed insights, reduce risks, and make well-informed decisions based on dependable and precise data.
Snowflake Cortex is a suite of Machine Learning (ML) and Artificial Intelligence (AI) capabilities letting businesses leverage the power of computing on their data. The machine learning functions like FORECAST, TOP_INSIGHTS and ANOMALY_DETECTION allows access to the leading large language models (LLMs) for working on both structured and unstructured data through SQL statements. Using these functions, data/business analysts can produce estimations, and recommendations and identify abnormalities within their data without knowing Python or other programming languages and without an understanding of building large language models. FORECAST: SNOWFLAKE.ML.FORECAST function enables businesses to forecast the metrics based on historical performance. You can use these functions to forecast future demand, Pipeline gen, sales, and revenue over a period. ANOMALY_DETECTION: SNOWFLAKE.ML.ANOMALY_DETECTION function helps flag outliers based on both unsupervised and supervised learning models. These functions can be used to identify the spikes in your key performance indicators and track the abnormal trends. TOP_INSIGHTS: SNOWFLAKE.ML.TOP_INSIGHTS function enables the analysts to root cause the significant contributors to a particular metric of interest. This can help you track the drivers like demand channels driving your sales, and agents dragging your customer satisfaction down. In this article, I will focus on exploring the FORECAST function to implement the time series forecast model to estimate the sales for a superstore based on the historical sales. Data Setup and Exploration For the purpose of this article, we will use the historical Superstore Sales data along with the holiday calendar. The following code block can be used to create both the tables being used in this article and visualize the historical sales data. SQL CREATE OR REPLACE TABLE superstore.superstore_ml_functions.superstore_sales( Order_Date DATE, Segment VARCHAR(16777216), Region VARCHAR(16777216), Category VARCHAR(16777216), Sub_Category VARCHAR(16777216), Sales NUMBER(17,0) ); CREATE OR REPLACE TABLE superstore.superstore_ml_functions.us_calender( Date DATE, HOLIDAY VARCHAR(16777216) ); select * from superstore.superstore_ml_functions.superstore_sales where category = 'Technology'; Having explored the historical sales, I would train the forecast model based on the last 12 months of sales. The following code can be used to create the training data table. SQL CREATE OR REPLACE TABLE superstore_sales_last_year AS ( SELECT to_timestamp_ntz(Order_Date) AS timestamp, Segment, Category, Sub_Category, Sales FROM superstore_sales WHERE Order_Date > (SELECT max(Order_Date) - interval '1 year' FROM superstore_sales) GROUP BY all ); Train the Forecast Model SNOWFLAKE.ML.FORECAST SQL function can be used to train the forecast model based on the historical data, in this section we will create a view to be used as a training dataset for technology sales and train the model. SQL CREATE OR REPLACE VIEW technology_sales AS ( SELECT timestamp, sum(Sales) as Sales FROM superstore_sales_last_year WHERE category = 'Technology' group by timestamp ); CREATE OR REPLACE SNOWFLAKE.ML.FORECAST technology_forecast ( INPUT_DATA => SYSTEM$REFERENCE('VIEW', 'technology_sales'), TIMESTAMP_COLNAME => 'TIMESTAMP', TARGET_COLNAME => 'SALES' ); SHOW SNOWFLAKE.ML.FORECAST; Creating and Visualizing the Forecasts Having trained the forecast model, let’s use the following code block to create predictions for the next 90 days. SQL CALL technology_forecast!FORECAST(FORECASTING_PERIODS => 90); -- Run immediately after the above call to store results! CREATE OR REPLACE TABLE technology_predictions AS ( SELECT * FROM TABLE(RESULT_SCAN(-1)) ); SELECT timestamp, sales, NULL AS forecast FROM technology_sales WHERE timestamp > '2023-01-01' UNION SELECT TS AS timestamp, NULL AS sales, forecast FROM technology_predictions ORDER BY timestamp asc; The trend line in YELLOW in the above chart visualizes the predictions for the same in the next 90 days. Conclusion In the end, in this article, we have explored the SNOWFLAKE.ML.FORECAST function to build an LLM forecast model for a superstore sales prediction, visualized the historical data, created necessary training datasets, build the forecast model, and visualized the estimations. As a next step, I would recommend continued exploration of the Snowflake Cortex framework to build multiple forecast models based on dimensions, anomaly detection, and top insights based on in-house large language models.
As developers and engineers, we constantly seek ways to streamline our workflows, increase productivity, and solve complex problems efficiently. With the advent of advanced language models like ChatGPT, we now have powerful tools to assist us in our daily tasks. By leveraging the capabilities of ChatGPT, we can generate prompts that enhance our productivity and creativity, making us more effective problem solvers and innovators. In this article, we’ll explore 10 ChatGPT prompts tailored specifically for developers and engineers to boost their productivity and streamline their workflow. Code Refactoring Suggestions Here is the sample prompt: “I have a code that needs refactoring. Can you provide suggestions to improve its readability and efficiency? Here is the code: <paste or write code here>” Use ChatGPT to generate recommendations for refactoring code snippets, such as identifying redundant lines, suggesting better variable names, or proposing alternative algorithms to optimize performance. Please refer to the screenshot below: Here is the response: Troubleshooting Assistance Here is the sample prompt: “I’m encountering an error message [insert error message here] in my code. Can you help me troubleshoot and find a solution?” This prompt will help you troubleshoot bugs or issues in your code. Again, there may need a couple of iterations to really nail down the problems but this is a good starting prompt. API Documentation Retrieval Here is the prompt: “I’m working with the [insert API name] API. Can you provide me with relevant documentation or usage examples?” This is really helpful when we are working with new systems or platforms and instead of reading all the documentation, you can ask the ChatGPT to retrieve useful information for you in a summarized way. Design Pattern Recommendations Here is the prompt: “I’m designing a new software component. Here is the requirement: [ Put your requirement here]. What design pattern would you recommend for implementing [insert functionality]?” This prompt requires a good level of detail but it can help you recommend some of the best design patterns you should use for your problem set. Algorithm Optimization Techniques Here is the prompt: “I’m implementing [insert algorithm name]. Are there any optimization techniques or best practices I should consider?” This is not only limited to algorithms but you can use some code as well. In short, this prompt will help you optimize the algorithm/code. Code Review Feedback Here is the prompt: “I’ve written a new feature. Can you review my code and provide feedback on potential improvements? Here is the code : [Insert code here]” ChatGPT can provide some really good feedback about your code. You may or may not implement all those feedbacks but it can certainly be a good starting point. Library or Framework Recommendations Here is the prompt: “I’m starting a new project. Can you recommend a suitable [insert programming language] library or framework for [insert functionality]?” ChatGPT can suggest popular libraries, frameworks, and tools based on the programming language and desired functionality, enabling you to make informed technology choices. Technical Documentation Summaries Here is the prompt: “I need a summary of the [insert technology or concept] technical documentation. Can you provide a concise overview?” This is my most used prompt, I summarize the technical documentation and read the gist, this has certainly improved my productivity. Code Snippet Generation Here is the prompt: “I need a code snippet for [insert functionality or task]. Can you generate a sample code snippet?” This is a good prompt to generate a starter code pack. But be sure not just to copy the code and use it, be cautious with the code generated by LLMs as they contain security flaws and bugs as well. Project Planning and Task Prioritization Here is the prompt: “I’m planning my project roadmap. Can you suggest a prioritized list of tasks based on [insert project requirements or constraints]?” ChatGPT can analyze project requirements, dependencies, and deadlines to generate a prioritized task list, helping you effectively manage project timelines and deliverables. Conclusion Incorporating ChatGPT prompts into your development workflow can significantly enhance productivity, creativity, and problem-solving capabilities. By leveraging ChatGPT’s natural language understanding and generation capabilities, developers and engineers can streamline tasks such as code refactoring, troubleshooting, documentation retrieval, and project planning. By integrating ChatGPT into your toolkit, you empower yourself to tackle challenges more effectively and unlock new levels of innovation in your projects.
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Cloud Native: Championing Cloud Development Across the SDLC. 2024 and the dawn of cloud-native AI technologies marked a significant jump in computational capabilities. We're experiencing a new era where artificial intelligence (AI) and platform engineering converge to transform cloud computing landscapes. AI is now merging with cloud computing, and we're experiencing an age where AI transcends traditional boundaries, offering scalable, efficient, and powerful solutions that learn and improve over time. Platform engineering is providing the backbone for these AI systems to operate within cloud environments seamlessly. This shift entails designing, implementing, and managing the software platforms that serve as the fertile ground for AI applications to flourish. Together, the integration of AI and platform engineering in cloud-native environments is not just an enhancement but a transformative force, redefining the very fabric of how services are now being delivered, consumed, and evolved in the digital cosmos. The Rise of AI in Cloud Computing Azure and Google Cloud are pivotal solutions in cloud computing technology, each offering a robust suite of AI capabilities that cater to a wide array of business needs. Azure brings to the table its AI Services and Azure Machine Learning, a collection of AI tools that enable developers to build, train, and deploy AI models rapidly, thus leveraging its vast cloud infrastructure. Google Cloud, on the other hand, shines with its AI Platform and AutoML, which simplify the creation and scaling of AI products, integrating seamlessly with Google's data analytics and storage services. These platforms empower organizations to integrate intelligent decision-making into their applications, optimize processes, and provide insights that were once beyond reach. A quintessential case study that illustrates the successful implementation of AI in the cloud is that of the Zoological Society of London (ZSL), which utilized Google Cloud's AI to tackle the biodiversity crisis. ZSL's "Instant Detect" system harnesses AI on Google Cloud to analyze vast amounts of images and sensor data from wildlife cameras across the globe in real time. This system enables rapid identification and categorization of species, transforming the way conservation efforts are conducted by providing precise, actionable data, leading to more effective protection of endangered species. Such implementations as ZSL's not only showcase the technical prowess of cloud AI capabilities but also underscore their potential to make a significant positive impact on critical global issues. Platform Engineering: The New Frontier in Cloud Development Platform engineering is a multifaceted discipline that refers to the strategic design, development, and maintenance of software platforms to support more efficient deployment and application operations. It involves creating a stable and scalable foundation that provides developers the tools and capabilities needed to develop, run, and manage applications without the complexity of maintaining the underlying infrastructure. The scope of platform engineering spans the creation of internal development platforms, automation of infrastructure provisioning, implementation of continuous integration and continuous deployment (CI/CD) pipelines, and the insurance of the platforms' reliability and security. In cloud-native ecosystems, platform engineers play a pivotal role. They are the architects of the digital landscape, responsible for constructing the robust frameworks upon which applications are built and delivered. Their work involves creating abstractions on top of cloud infrastructure to provide a seamless development experience and operational excellence. Figure 1. Platform engineering from the top down Platform engineers enable teams to focus on creating business value by abstracting away complexities related to environment configurations, along with resource scaling and service dependencies. They guarantee that the underlying systems are resilient, self-healing, and can be deployed consistently across various environments. The convergence of DevOps and platform engineering with AI tools is an evolution that is reshaping the future of cloud-native technologies. DevOps practices are enhanced by AI's ability to predict, automate, and optimize processes. AI tools can analyze data from development pipelines to predict potential issues, automate root cause analyses, and optimize resources, leading to improved efficiency and reduced downtime. Moreover, AI can drive intelligent automation in platform engineering, enabling proactive scaling and self-tuning of resources, and personalized developer experiences. This synergy creates a dynamic environment where the speed and quality of software delivery are continually advancing, setting the stage for more innovative and resilient cloud-native applications. Synergies Between AI and Platform Engineering AI-augmented platform engineering introduces a layer of intelligence to automate processes, streamline operations, and enhance decision-making. Machine learning (ML) models, for instance, can parse through massive datasets generated by cloud platforms to identify patterns and predict trends, allowing for real-time optimizations. AI can automate routine tasks such as network configurations, system updates, and security patches; these automations not only accelerate the workflow but also reduce human error, freeing up engineers to focus on more strategic initiatives. There are various examples of AI-driven automation in cloud environments, such as implementing intelligent systems to analyze application usage patterns and automatically adjust computing resources to meet demand without human intervention. The significant cost savings and performance improvements provide exceptional value to an organization. AI-operated security protocols can autonomously monitor and respond to threats more quickly than traditional methods, significantly enhancing the security posture of the cloud environment. Predictive analytics and ML are particularly transformative in platform optimization. They allow for anticipatory resource management, where systems can forecast loads and scale resources accordingly. ML algorithms can optimize data storage, intelligently archiving or retrieving data based on usage patterns and access frequencies. Figure 2. AI resource autoscaling Moreover, AI can oversee and adjust platform configurations, ensuring that the environment is continuously refined for optimal performance. These predictive capabilities are not limited to resource management; they also extend to predicting application failures, user behavior, and even market trends, providing insights that can inform strategic business decisions. The proactive nature of predictive analytics means that platform engineers can move from reactive maintenance to a more visionary approach, crafting platforms that are not just robust and efficient but also self-improving and adaptive to future needs. Changing Landscapes: The New Cloud Native The landscape of cloud native and platform engineering is rapidly evolving, particularly with leading cloud service providers like Azure and Google Cloud. This evolution is largely driven by the growing demand for more scalable, reliable, and efficient IT infrastructure, enabling businesses to innovate faster and respond to market changes more effectively. In the context of Azure, Microsoft has been heavily investing in Azure Kubernetes Service (AKS) and serverless offerings, aiming to provide more flexibility and ease of management for cloud-native applications. Azure's emphasis on DevOps, through tools like Azure DevOps and Azure Pipelines, reflects a strong commitment to streamlining the development lifecycle and enhancing collaboration between development and operations teams. Azure's focus on hybrid cloud environments, with Azure Arc, allows businesses to extend Azure services and management to any infrastructure, fostering greater agility and consistency across different environments. In the world of Google Cloud, they've been leveraging expertise in containerization and data analytics to enhance cloud-native offerings. Google Kubernetes Engine (GKE) stands out as a robust, managed environment for deploying, managing, and scaling containerized applications using Google's infrastructure. Google Cloud's approach to serverless computing, with products like Cloud Run and Cloud Functions, offers developers the ability to build and deploy applications without worrying about the underlying infrastructure. Google's commitment to open-source technologies and its leading-edge work in AI and ML integrate seamlessly into its cloud-native services, providing businesses with powerful tools to drive innovation. Both Azure and Google Cloud are shaping the future of cloud-native and platform engineering by continuously adapting to technological advancements and changing market needs. Their focus on Kubernetes, serverless computing, and seamless integration between development and operations underlines a broader industry trend toward more agile, efficient, and scalable cloud environments. Implications for the Future of Cloud Computing AI is set to revolutionize cloud computing, making cloud-native technologies more self-sufficient and efficient. Advanced AI will oversee cloud operations, enhancing performance and cost effectiveness while enabling services to self-correct. Yet integrating AI presents ethical challenges, especially concerning data privacy and decision-making bias, and poses risks requiring solid safeguards. As AI reshapes cloud services, sustainability will be key; future AI must be energy efficient and environmentally friendly to ensure responsible growth. Kickstarting Your Platform Engineering and AI Journey To effectively adopt AI, organizations must nurture a culture oriented toward learning and prepare by auditing their IT setup, pinpointing AI opportunities, and establishing data management policies. Further: Upskilling in areas such as machine learning, analytics, and cloud architecture is crucial. Launching AI integration through targeted pilot projects can showcase the potential and inform broader strategies. Collaborating with cross-functional teams and selecting cloud providers with compatible AI tools can streamline the process. Balancing innovation with consistent operations is essential for embedding AI into cloud infrastructures. Conclusion Platform engineering with AI integration is revolutionizing cloud-native environments, enhancing their scalability, reliability, and efficiency. By enabling predictive analytics and automated optimization, AI ensures cloud resources are effectively utilized and services remain resilient. Adopting AI is crucial for future-proofing cloud applications, and it necessitates foundational adjustments and a commitment to upskilling. The advantages include staying competitive and quickly adapting to market shifts. As AI evolves, it will further automate and refine cloud services, making a continued investment in AI a strategic choice for forward-looking organizations. This is an excerpt from DZone's 2024 Trend Report,Cloud Native: Championing Cloud Development Across the SDLC.Read the Free Report
Traditional machine learning (ML) models and AI techniques often suffer from a critical flaw: they lack uncertainty quantification. These models typically provide point estimates without accounting for the uncertainty surrounding their predictions. This limitation undermines the ability to assess the reliability of the model's output. Moreover, traditional ML models are data-hungry and often require correctly labeled data, and as a result, tend to struggle with problems where data is limited. Furthermore, these models lack a systematic framework for incorporating expert domain knowledge or prior beliefs into the model. Without the ability to leverage domain-specific insights, the model might overlook crucial nuances in data and tend not to perform up to its potential. ML models are becoming more complex and opaque, while there is a growing demand for more transparency and accountability in decisions derived from data and AI. Probabilistic Programming: A Solution To Addressing These Challenges Probabilistic programming provides a modeling framework that addresses these challenges. At its core lies Bayesian statistics, a departure from the frequentist interpretation of statistics. Bayesian Statistics In frequentist statistics, probability is interpreted as the long-run relative frequency of an event. Data is considered random and a result of sampling from a fixed-defined distribution. Hence, noise in measurement is associated with the sampling variations. Frequentists believe that probability exists and is fixed, and infinite experiments converge to that fixed value. Frequentist methods do not assign probability distributions to parameters, and their interpretation of uncertainty is rooted in the long-run frequency properties of estimators rather than explicit probabilistic statements about parameter values. In Bayesian statistics, probability is interpreted as a measure of uncertainty in a particular belief. Data is considered fixed, while the unknown parameters of the system are regarded as random variables and are modeled using probability distributions. Bayesian methods capture uncertainty within the parameters themselves and hence offer a more intuitive and flexible approach to uncertainty quantification. Frequentist vs. Bayesian Statistics [1] Probabilistic Machine Learning In frequentist ML, model parameters are treated as fixed and estimated through Maximum Likelihood Estimation (MLE), where the likelihood function quantifies the probability of observing the data given the statistical model. MLE seeks point estimates of parameters maximizing this probability. To implement MLE: Assume a model and the underlying model parameters. Derive the likelihood function based on the assumed model. Optimize the likelihood function to obtain point estimates of parameters. Hence, frequentist models which include Deep Learning rely on optimization, usually gradient-based, as its fundamental tool. To the contrary, Bayesian methods model the unknown parameters and their relationships with probability distributions and use Bayes' theorem to compute and update these probabilities as we obtain new data. Bayes Theorem: "Bayes’ rule tells us how to derive a conditional probability from a joint, conditioning tells us how to rationally update our beliefs, and updating beliefs is what learning and inference are all about" [2]. This is a simple but powerful equation. Prior represents the initial belief about the unknown parameters Likelihood represents the probability of the data based on the assumed model Marginal Likelihood is the model evidence, which is a normalizing coefficient. The Posterior distribution represents our updated beliefs about the parameters, incorporating both prior knowledge and observed evidence. In Bayesian machine learning inference is the fundamental tool. The distribution of parameters represented by the posterior distribution is utilized for inference, offering a more comprehensive understanding of uncertainty. Bayesian update in action: The plot below illustrates the posterior distribution for a simple coin toss experiment across various sample sizes and with two distinct prior distributions. This visualization provides insights into how the combination of different sample sizes and prior beliefs influences the resulting posterior distributions. Impact of Sample Size and Prior on Posterior Distribution How to Model the Posterior Distribution The seemingly simple posterior distribution in most cases is hard to compute. In particular, the denominator i.e. the marginal likelihood integral tends to be interactable, especially when working with a higher dimension parameter space. And in most cases there's no closed-form solution and numerical integration methods are also computationally intensive. To address this challenge we rely on a special class of algorithms called Markov Chain Monte Carlo simulations to model the posterior distribution. The idea here is to sample from the posterior distribution rather than explicitly modeling it and using those samples to represent the distribution of the model parameters Markov Chain Monte Carlo (MCMC) "MCMC methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by recording states from the chain" [3]. A few of the commonly used MCMC samplers are: Metropolis-Hastings Gibbs Sampler Hamiltonian Monte Carlo (HMC) No-U-Turn Sampler (NUTS) Sequential Monte Carlo (SMC) Probabilistic Programming Probabilistic Programming is a programming framework for Bayesian statistics i.e. it concerns the development of syntax and semantics for languages that denote conditional inference problems and develop "solvers” for those inference problems. In essence, Probabilistic Programming is to Bayesian Modeling what automated differentiation tools are to classical Machine Learning and Deep Learning models [2]. There exists a diverse ecosystem of Probabilistic Programming languages, each with its own syntax, semantics, and capabilities. Some of the most popular languages include: BUGS (Bayesian inference Using Gibbs Sampling) [4]: BUGS is one of the earliest probabilistic programming languages, known for its user-friendly interface and support for a wide range of probabilistic models. It implements Gibbs sampling and other Markov Chain Monte Carlo (MCMC) methods for inference. JAGS (Just Another Gibbs Sampler) [5]: JAGS is a specialized language for Bayesian hierarchical modeling, particularly suited for complex models with nested structures. It utilizes the Gibbs sampling algorithm for posterior inference. STAN: A probabilistic programming language renowned for its expressive modeling syntax and efficient sampling algorithms. STAN is widely used in academia and industry for a variety of Bayesian modeling tasks. "Stan differs from BUGS and JAGS in two primary ways. First, Stan is based on a new imperative probabilistic programming language that is more flexible and expressive than the declarative graphical modeling languages underlying BUGS or JAGS, in ways such as declaring variables with types and supporting local variables and conditional statements. Second, Stan’s Markov chain Monte Carlo (MCMC) techniques are based on Hamiltonian Monte Carlo (HMC), a more efficient and robust sampler than Gibbs sampling or Metropolis-Hastings for models with complex posteriors" [6]. BayesDB: BayesDB is a probabilistic programming platform designed for large-scale data analysis and probabilistic database querying. It enables users to perform probabilistic inference on relational databases using SQL-like queries [7] PyMC3: PyMC3 is a Python library for Probabilistic Programming that offers an intuitive and flexible interface for building and analyzing probabilistic models. It leverages advanced sampling algorithms such as Hamiltonian Monte Carlo (HMC) and Automatic Differentiation Variational Inference (ADVI) for inference [8]. TensorFlow Probability: "TensorFlow Probability (TFP) is a Python library built on TensorFlow that makes it easy to combine probabilistic models and deep learning on modern hardware (TPU, GPU)" [9]. Pyro: "Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend. Pyro enables flexible and expressive deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling" [10]. These languages share a common workflow, outlined below: Model definition: The model defines the processes governing data generation, latent parameters, and their interrelationships. This step requires careful consideration of the underlying system and the assumptions made about its behavior. Prior distribution specification: Define the prior distributions for the unknown parameters within the model. These priors encode the practitioner's beliefs, domain, or prior knowledge about the parameters before observing any data. Likelihood specification: Describe the likelihood function, representing the probability distribution of observed data conditioned on the unknown parameters. The likelihood function quantifies the agreement between the model predictions and the observed data. Posterior distribution inference: Use a sampling algorithm to approximate the posterior distribution of the model parameters given the observed data. This typically involves running Markov Chain Monte Carlo (MCMC) or Variational Inference (VI) algorithms to generate samples from the posterior distribution. Case Study: Forecasting Stock Index Volatility In this case study, we will employ Bayesian modeling techniques to forecast the volatility of a stock index. Volatility here measures the degree of variation in a stock's price over time and is a crucial metric for assessing the risk associated with a particular stock. Data: For this analysis, we will utilize historical data from the S&P 500 stock index. The S&P 500 is a widely used benchmark index that tracks the performance of 500 large-cap stocks in the United States. By examining the percentage change in the index's price over time, we can gain insights into its volatility. S&P 500 — Share Price and Percentage Change From the plot above, we can see that the time series — price change between consecutive days has: Constant Mean Changing variance over time, i.e., the time series exhibits heteroscedasticity Modeling Heteroscedasticity: "In statistics, a sequence of random variables is homoscedastic if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as heterogeneity of variance" [11]. Auto-regressive Conditional Heteroskedasticity (ARCH) models are specifically designed to address heteroscedasticity in time series data. Bayesian vs. Frequentist Implementation of ARCH Model The key benefits of Bayesian modeling include the ability to incorporate prior information and quantify uncertainty in model parameters and predictions. These are particularly useful in settings with limited data and when prior knowledge is available. In conclusion, Bayesian modeling and probabilistic programming offer powerful tools for addressing the limitations of traditional machine-learning approaches. By embracing uncertainty quantification, incorporating prior knowledge, and providing transparent inference mechanisms, these techniques empower data scientists to make more informed decisions in complex real-world scenarios. References Fornacon-Wood, I., Mistry, H., Johnson-Hart, C., Faivre-Finn, C., O'Connor, J.P. and Price, G.J., 2022. Understanding the differences between Bayesian and frequentist statistics. International journal of radiation oncology, biology, physics, 112(5), pp.1076-1082. Van de Meent, J.W., Paige, B., Yang, H. and Wood, F., 2018. An Introduction to Probabilistic Programming. arXiv preprint arXiv:1809.10756. Markov chain Monte Carlo Spiegelhalter, D., Thomas, A., Best, N. and Gilks, W., 1996. BUGS 0.5: Bayesian inference using Gibbs sampling manual (version ii). MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK, pp.1-59. Hornik, K., Leisch, F., Zeileis, A. and Plummer, M., 2003. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of DSC (Vol. 2, No. 1). Carpenter, B., Gelman, A., Hoffman, M.D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M.A., Guo, J., Li, P. and Riddell, A., 2017. Stan: A probabilistic programming language. Journal of statistical software, 76. BayesDB PyMC TensorFlow Probability Pyro AI Homoscedasticity and heteroscedasticity Introduction to ARCH Models pymc.GARCH11
Motivation and Background Why is it important to build interpretable AI models? The future of AI is in enabling humans and machines to work together to solve complex problems. Organizations are attempting to improve process efficiency and transparency by combining AI/ML technology with human review. In recent years with the advancement of AI, AI-specific regulations have emerged, for example, Good Machine Learning Practices (GMLP) in pharma and Model Risk Management (MRM) in finance industries, other broad-spectrum regulations addressing data privacy, EU’s GDPR and California’s CCPA. Similarly, internal compliance teams may also want to interpret a model’s behavior when validating decisions based on model predictions. For instance, underwriters want to learn why a specific loan application was tagged suspicious by an ML model. Overview What is interpretability? In the ML context, interpretability refers to trying to backtrack what factors have contributed to an ML model for making a certain prediction. As shown in the graph below, simpler models are easier to interpret but may often produce lower accuracy compared to complex models like Deep Learning and transformer-based models that can understand non-linear relations in the data and often have high accuracy. Loosely defined, there are two types of explanations: Global explanation: is explaining on an overall model level to understand what features have contributed the most to the output? For example, in a finance setting where the use case is to build an ML model to identify customers who are most likely to default, some of the most influential features for making that decision are the customer’s credit score, total no. of credit cards, revolving balance, etc. Local explanation: This can enable you to zoom in on a particular data point and observe the behavior of the model in that neighborhood. For example, for sentiment classification of a movie review use case, certain words in the review may have a higher impact on the outcomes vs the other. “I have never watched something as bad.” What is a transformer model? A transformer model is a neural network that tracks relationships in sequential input, such as the words in a sentence, to learn context and subsequent meaning. Transformer models use an evolving set of mathematical approaches, called attention or self-attention, to find minute relationships between even distance data elements in a series. Refer to Google’s publication for more information. Integrated Gradients Integrated Gradients (IG), is an Explainable AI technique introduced in the paper Axiomatic Attribution for Deep Networks. In this paper, an attempt is made to assign an attribution value to each input feature. This tells how much the input contributed to the final prediction. IG is a local method that is a popular interpretability technique due to its broad applicability to any differentiable model (e.g., text, image, structured data), ease of implementation, computational efficiency relative to alternative approaches, and theoretical justifications. Integrated gradients represent the integral of gradients with respect to inputs along the path from a given baseline to input. The integral can be approximated using a Riemann Sum or Gauss Legendre quadrature rule. Formally, it can be described as follows: Integrated Gradients along the i — th dimension of input X. Alpha is the scaling coefficient. The equations are copied from the original paper. The cornerstones of this approach are two fundamental axioms, namely sensitivity and implementation invariance. More information can be found in the original paper. Use Case Now let’s see in action how the Integrated Gradients method can be applied using the Captum package. We will be fine-tuning a question-answering BERT (Bidirectional Encoder Representations from Transformers) model, on the SQUAD dataset using the transformers library from HuggingFace, review notebook for a detailed walkthrough. Steps Load the tokenizer and pre-trained BERT model, in this case, bert-base-uncased Next is computing attributions w.r.t BertEmbeddings layer. To do so, define baseline/references and numericalize both the baselines and inputs. Python def construct_whole_bert_embeddings(input_ids, ref_input_ids, \ token_type_ids=None, ref_token_type_ids=None, \ position_ids=None, ref_position_ids=None): Python input_embeddings = model.bert.embeddings(input_ids, token_type_ids=token_type_ids, position_ids=position_ids) Python ref_input_embeddings = model.bert.embeddings(ref_input_ids, token_type_ids=ref_token_type_ids, position_ids=ref_position_ids) Python return input_embeddings, ref_input_embeddings Now, let's define the question-answer pair as an input to our BERT model Question = “What is important to us?” text = “It is important to us to include, empower and support humans of all kinds.” Generate corresponding baselines/references for question-answer pair The next step is to make predictions, one option is to use LayerIntegratedGradients and compute the attributions with respect to BertEmbedding. LayerIntegratedGradients represents the integral of gradients with respect to the layer inputs/outputs along the straight-line path from the layer activations at the given baseline to the layer activation at the input. Python start_scores, end_scores = predict(input_ids, \ token_type_ids=token_type_ids, \ position_ids=position_ids, \ attention_mask=attention_mask) Python print(‘Question: ‘, question) print(‘Predicted Answer: ‘, ‘ ‘.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1])) Python lig = LayerIntegratedGradients(squad_pos_forward_func, model.bert.embeddings) Output: Question: What is important to us? Predicted Answer: to include , em ##power and support humans of all kinds Visualize attributes for each word token in the input sequence using a helper function Python # storing couple samples in an array for visualization purposes Python start_position_vis = viz.VisualizationDataRecord( attributions_start_sum, torch.max(torch.softmax(start_scores[0], dim=0)), torch.argmax(start_scores), torch.argmax(start_scores), str(ground_truth_start_ind), attributions_start_sum.sum(), all_tokens, delta_start) Python print(‘\033[1m’, ‘Visualizations For Start Position’, ‘\033[0m’) viz.visualize_text([start_position_vis]) Python print(‘\033[1m’, ‘Visualizations For End Position’, ‘\033[0m’) viz.visualize_text([end_position_vis]) From the results above we can tell that for predicting the start position, our model is focusing more on the question side. More specifically on the tokens ‘what’ and ‘important’. It has also a slight focus on the token sequence ‘to us’ on the text side. In contrast to that, for predicting end position, our model focuses more on the text side and has relatively high attribution on the last end position token ‘kinds’. Conclusion This blog describes how explainable AI techniques like Integrated Gradients can be used to make a deep learning NLP model interpretable by highlighting positive and negative word influences on the outcome of the model. References Axiomatic Attribution for Deep Networks Model Interpretability for PyTorch Towards Better Understanding of Gradient-Based Attribution Methods for Deep Neural Networks
Healthcare has ushered in a transformative era dominated by artificial intelligence (AI) and machine learning (ML), which are now central to data analytics and operational utilities. The transformative power of AI and ML is unlocking unprecedented value by rapidly converting vast datasets into actionable insights. These insights not only enhance patient care and streamline treatment processes but also pave the way for groundbreaking medical discoveries. With the precision and efficiency brought by AI and ML, diagnoses and treatment strategies become significantly more accurate and effective, accelerating the pace of medical research and marking a fundamental shift in healthcare. Benefits of AI in Healthcare AI and ML will influence the healthcare industry’s entire ecosystem. From more accurate diagnostic procedures to personalized treatment recommendations and operational efficiency, everything can be sought with the help of AI and ML. AI technologies help healthcare providers take real-time data analytics, predictive analysis, and decision support capabilities towards the most proactive and highly personalized approach to patient care. For instance, AI algorithms will increase diagnostic accuracy through the study of images, while ML models will help analyze historical data to predict the outcomes of a patient, hence making the treatment approach used. Machine Learning in Health Data Analysis The revolution in health data lies at the door of machine learning, with powerful tools that identify patterns and predict future outcomes based on historical data. Prime importance falls on the algorithms that forecast disease progression, improve treatment methodologies, and streamline healthcare delivery. These findings will enable improved personalized medicine for better strategies for slowing disease progression and improving patient care. Most importantly, ML algorithms optimize healthcare operations through thorough data analysis of the trends, which may include patient admission levels and resource utilization in a streamlined hospital workflow to yield improved service delivery. Example: Patient Admission Rates With Random Forest Explanation Data loading: Load your data from a CSV file. Replace 'patient_data.csv' with the path to your actual data file. Feature selection: Only the features relevant to the hospital admissions targets, such as age, blood pressure, heart rate, and previous admissions, are selected. Data splitting: Split the data into training and testing sets to evaluate the model performance. Feature scaling should be used to rescale the features so that the model considers all features equally because logistic regression is sensitive to the features' scaling. Model training: Train a logistic regression model using the training data. Try making admission predictions using the model on the test set. Evaluation: The built model should be evaluated based on accuracy, confusion matrix, and a detailed classification report from the test set to validate model prediction for patient admission. Natural Language Processing in Health Data Analysis Natural Language Processing (NLP) is another critical feature that allows the extraction of useful information, including clinical notes, patient feedback, and medical journals. The NLP tools help analyze and interpret the overwhelming text data produced in health settings daily, thus easing access to appropriate information. This capability is precious for supporting clinical decisions and research, with fast insights from existing patient records and literature, improving the speed and accuracy of medical diagnostics and patient management. Example: Deep Learning Model for Disease Detection in Medical Imaging Explanation ImageDataGenerator: Automatically adjusts the image data during training for augmentation (such as rotation, width shift, and height shift), which helps the model generalize better from limited data. Flow_from_directory: Loads images directly from a directory structure, resizing them as necessary and applying the transformations specified in ImageDataGenerator. Model architecture: In sequence, the model uses several convolutional (Conv2D) and pooling layers (MaxPooling2D). Convolutional layers help the model learn the features in the images, and pooling layers reduce the dimensionality of each feature map. Dropout: This layer randomly sets a fraction of the input units to 0 at each update during training time, which helps to prevent overfitting. Flatten: Converts the pooled feature maps to a single column passed to the densely connected layers. Dense: Fully connected layers. These layers consist of fully connected neurons that take input from the features in the data. The final layer uses a sigmoid activation function to give binary classification output. Compilation and training: The model is compiled using a binary cross-entropy loss function, which is generally suitable for this classification task. Then, it's compiled and optimized with the given optimizer and finally trained using the .fit method on the train data received from the train_generator with validation using the validation_generator. Saving the model: Save the trained model for later use, whether for deployment in medical diagnostic applications or further refinement. Deep Learning in Health Data Analysis Deep learning is a complicated subject of machine learning used for analyzing high-complexity data structures using appropriate neural networks. The technology has been proven helpful in areas such as medical imaging, where deep learning models effectively detect and diagnose diseases from images with a level of precision that is sometimes higher than that exhibited by human experts. In genomics, deep learning aids in parsing and understanding genetic sequences, offering insight central to parsing for personalized medicine and treatment planning. Example: Deep Learning for Genomic Sequence Classification Explanation Data preparation: We simulate sequence data where each base of the DNA sequence (A, C, G, T) is represented as a one-hot encoded vector. This means each base is converted into a vector of four elements. The sequences and corresponding labels (binary classification) are randomly generated for demonstration. Model architecture and Conv1D layers: These convolutional layers are specifically useful for sequence data (like time series or genetic sequences). They process data in a way that respects its temporal or sequential nature. MaxPooling1D layers: These layers reduce the spatial size of the representation, decreasing the number of parameters and computation in the network, and hence, help to prevent overfitting. Flatten layer: This layer flattens the output from the convolutional and pooling layers to be used as input to the densely connected layers. Dense layers: These are fully connected layers. Dropout between these layers reduces overfitting by preventing complex co-adaptations on training data. Compilation and training: The model is compiled with the 'adam' optimizer and 'categorical_crossentropy' loss function, typical for multi-class classification tasks. It is trained using the .fit method, and performance is validated on a separate test set. Evaluation: After training, the model's performance is evaluated on the test set to see how well it can generalize to new, unseen data. AI Applications in Diagnostics and Treatment Planning AI has dramatically improved the speed and accuracy of diagnosing diseases by using medical images, genetic indicators, and patient histories for the most minor signs of the disease. Secondly, AI algorithms help develop personalized treatment regimens by filtering through enormous amounts of treatment data and patient responses to provide tailored care, optimizing therapeutic effectiveness while minimizing side effects. Challenges and Ethical Considerations in AI and Health Data Analysis However, integrating AI and ML in healthcare at its face value also brings ethical considerations. Nevertheless, the areas of concern to be adjusted are data privacy, algorithmic bias, and transparent decision-making processes, pointing to the essential landmarks of these adjustments for proper, responsible use of AI in healthcare. It is necessary to ensure patient data’s safety and protection, and any installation should guarantee freedom from any biases and not lose trust and fairness in service deployment. Conclusion The future of health is quite promising, with the development of AI and ML technologies that provide new sophistication in the spectrum of analytical tools, such as AR in surgical procedures and virtual health assistants powered by AI. These advances will make better diagnosis and treatment possible while ensuring smooth operations and ultimately contributing to more tailor-made and effective patient care. In the further development and continuous integration of AI/ML technologies, healthcare delivery will change through more efficient, accurate, and central patient service provision. This means that several regulatory constraints need to be considered in addition to the business and technical challenges discussed.
In this article, we’ll explore how to build intelligent AI agents using Azure Open AI and semantic kernels (Microsoft C# SDK). You can combine it with Open AI, Azure Open AI, Hugging Face, or any other model. We’ll cover the fundamentals, dive into implementation details, and provide practical code examples in C#. Whether you’re a beginner or an experienced developer, this guide will help you harness the power of AI for your applications. What Is Semantic Kernel? In Kevin Scott's talk on "The era of the AI copilot," he showcased how Microsoft's Copilot system uses a mix of AI models and plugins to enhance user experiences. At the core of this setup is an AI orchestration layer, which allows Microsoft to combine these AI components to create innovative features for users. For developers looking to create their own copilot-like experiences using AI plugins, Microsoft has introduced Semantic kernel. Semantic Kernel is an open-source framework that enables developers to build intelligent agents by providing a common interface for various AI models and algorithms. The Semantic Kernel SDK allows you to integrate the power of large language models (LLMs) in your own applications. The Semantic Kernel SDK allows developers to integrate prompts to LLMs and results in their applications, and potentially craft their own copilot-like experiences. It allows developers to focus on building intelligent applications without worrying about the underlying complexities of AI models. Semantic Kernel is built on top of the .NET ecosystem and provides a robust and scalable platform for building intelligent apps/agents. Figure courtesy of Microsoft Key Features of Semantic Kernel Modular architecture: Semantic Kernel has a modular architecture that allows developers to easily integrate new AI models and algorithms. Knowledge graph: Semantic Kernel provides a built-in knowledge graph that enables developers to store and query complex relationships between entities. Machine learning: Semantic Kernel supports various machine learning algorithms, including classification, regression, and clustering. Natural language processing: Semantic Kernel provides natural language processing capabilities, including text analysis and sentiment analysis. Integration with external services: Semantic Kernel allows developers to integrate with external services, such as databases and web services. Let's dive deep into writing some intelligent code using Semantic kernel C# SDK. I will write them in steps so it will be easy to follow along. Step 1: Setting up the Environment Let's set up our environment. You will need to install the following to follow along. .NET 8 or later Semantic Kernel SDK (available on NuGet) Your preferred IDE (Visual Studio, Visual Studio Code, etc.) Azure OpenAI access Step 2: Creating a New Project in VS Open Visual Studio and create a blank empty console DotNet 8 Application. Step 3: Install NuGet References Right-click on the project --> click on Manage NuGet reference section to install the below 2 latest NuGet packages. 1) Microsoft.SemanticKernel 2) Microsoft.Extensions.Configuration.json Note: To avoid Hardcoding Azure Open AI key and endpoint, I am storing these as key-value pair into appsettings.json, and using the #2 package, I can easily retrieve them based on the key. Step 4: Create and Deploy Azure OpenAI Model Once you have obtained access to Azure OpenAI service, login to the Azure portal or Azure OpenAI studio to create Azure OpenAI resource. The screenshots below are from the Azure portal: You can also create an Azure Open AI service resource using Azure CLI by running the following command: PowerShell az cognitiveservices account create -n <nameoftheresource> -g <Resourcegroupname> -l <location> \ --kind OpenAI --sku s0 --subscription subscriptionID You can see your resource from Azure OpenAI studio as well by navigating to this page and selecting the resource that was created from: Deploy a Model Azure OpenAI includes several types of base models as shown in the studio when you navigate to the Deployments tab. You can also create your own custom models by using existing base models as per your requirements. Let's use the deployed GPT-35-turbo model and see how to consume it in the Azure OpenAI studio. Fill in the details and click Create. Once the model is deployed, grab the Azure OpenAI key and endpoint to paste it inside the appsettings.json file as shown below Step 5: Create Kernel in the Code Step 6: Create a Plugin to Call the Azure OpenAI Model Step 7: Use Kernel To Invoke the LLM Models Once you run the program by pressing F5 you will see the response generated from the Azure OpenAI model. Complete Code C# using Microsoft.Extensions.Configuration; using Microsoft.SemanticKernel; var config = new ConfigurationBuilder() .AddJsonFile("appsettings.json", optional: true, reloadOnChange: true) .Build(); var builder = Kernel.CreateBuilder(); builder.Services.AddAzureOpenAIChatCompletion( deploymentName: config["AzureOpenAI:DeploymentModel"] ?? string.Empty, endpoint: config["AzureOpenAI:Endpoint"] ?? string.Empty, apiKey: config["AzureOpenAI:ApiKey"] ?? string.Empty); var semanticKernel = builder.Build(); Console.WriteLine(await semanticKernel.InvokePromptAsync("Give me shopping list for cooking Sushi")); Conclusion By combining AI LLM models with semantic kernels, you’ll create intelligent applications that go beyond simple keyword matching. Experiment, iterate, and keep learning to build remarkable apps that truly understand and serve your needs.
Tuhin Chattopadhyay
CEO at Tuhin AI Advisory and Professor of Practice,
JAGSoM
Yifei Wang
Senior Machine Learning Engineer,
Meta
Austin Gil
Developer Advocate,
Akamai
Tim Spann
Principal Developer Advocate,
Zilliz