Lessons From a Decade of Generative AI
To understand the future of generative AI, it's helpful to look at where it came from and the challenges and opportunities that will evolve with the technology.
Join the DZone community and get the full member experience.Join For Free
With the recent buzz around generative AI, led by the likes of ChatGPT and Bard, businesses are increasingly seeking to understand the use cases for the technology. It’s a great time for instigating conversations around the power of AI, but generative AI is nothing new. Generative modeling (i.e., generative AI) has been blowing up behind the scenes for more than a decade, propelled by three major factors: the development of open-source software libraries such as Tensorflow in 2015 and PyTorch in 2016; innovations in neural network architectures and training; and hardware improvements such as graphics processing units (GPUs) and tensor processing units (TPUs) to facilitate training and inference on massive neural networks.
In this article, I’ll aim to explain what generative models are, how they got to where they are today, and how they should be used, but also explore their limitations.
What Are Generative Models, and Where Did They Come From?
Generative models learn the distribution of training data for the purpose of being able to sample or produce synthetic data that is statistically like the original data. This requires a two-step process: firstly, the model is trained on a large static data set, and secondly, the model is then sampled to obtain a new data point. The benefit of this two-step process is that once the model is trained, new data can be cheaply generated at scale.
While early generative models were relatively simple, such as Hidden Markov models, Naïve Bayes, or Gaussian mixtures, the introduction of GPUs into mainstream machine learning around 2010 allowed for more flexible generative models based on deep neural networks. New well-provisioned research labs such as Deepmind (2010), Google Brain (2011), and Facebook AI Research (2013) also began opening around this time, with OpenAI coming along slightly later towards the end of 2015, further fueling the development of deep learning and thus generative modeling. During this time, many new architectures, such as variational autoencoders (VAEs, 2013) and generative adversarial networks (GANs, 2014), began to appear, which produced state-of-the-art results in generating images.
To facilitate both the development and deployment of these more complex models, Google released the open-source library Tensorflow in 2015, which was followed shortly afterward by PyTorch from Facebook in 2016. These libraries made deep learning accessible to a wide range of practitioners and researchers, leading to the rapid development of new models and new applications.
One of these breakthrough models was the Transformer — a deep learning model that appeared in 2017 and now forms the basis of all current state-of-the-art language models such as GPT-4. Two specific transformer-based models that arose the following year in 2018 were BERT (Bidirectional Encoder Representations from Transformers) from Google and GPT (Generative Pretrained Transformer) from OpenAI. Both were designed as general-purpose language models to perform a variety of tasks, from text classification and sentiment analysis to language translation. Another breakthrough model, appearing in 2019 and inspired by thermodynamics, was the diffusion model for generating images.
As of today, diffusion and transformer models are the dominant approaches for text-to-image and language models, respectively achieving state-of-the-art results. For example, ChatGPT was released in 2022, and the more advanced GPT-4 released this year (2023) uses a transformer architecture, while models such as Stable Diffusion and Midjourney are both diffusion-based models. Over the last couple of years, the trend in generative AI has been to train bigger and bigger models with more parameters to achieve better and better results. These feats of engineering, such as GPT-4 and Midjourney v5, relied on a combination of improved hardware, well-developed software libraries, and efficient deep neural network architectures (i.e., transformers) and have become so popular in part because they are easy to use and accessible to the general public.
Applications of Generative Models
As generative models begin to produce more compelling results and are becoming increasingly available to the public through easy-to-use APIs, they have become more suitable for a variety of applications. For images, most of these applications revolve around some form of content creation and design. A notorious example of how generative models have been applied is the rise of deepfakes. While this has potentially good uses in the film and advertising industries, deepfakes can also be used nefariously for spreading misinformation. For language models such as ChatGPT, Bard, and GPT-4, applications include text summarization, translation, and completion, which are particularly useful for marketing content and internal comms.
On the more technical side, language models such as Codex and GitHub Copilot have been used successfully to generate code that can speed up development and aid programmers. Though, of course, effectively instructing the models is the art of prompt engineering.
Challenges and Risks To Consider
The fundamental risk with current generative models is that they are black-box models with uncontrollable output. This problem can manifest itself in several different ways, such as:
- There is no way to explicitly prevent these models from producing offensive or graphic text and images. There still needs to be a human in the loop to filter out inappropriate material.
- Generative models may return substantial portions of the training data, leading to both privacy and copyright concerns. This issue has been highlighted in the recent lawsuit brought against Stability AI by Getty Images.
- Information returned from language models may be inaccurate or misleading as the model has no way of fact-checking its own output. Thus, these models should not be relied upon for producing content in high-stakes situations such as medical, financial, or legal matters. Moreover, for code generation tools such as GitHub Copilot, care should be taken before putting the code into production, as there may be missed edge cases or bugs that can break a production pipeline.
These are just a few examples of the risks of working with generative models. To mitigate these, effective generative models should be used in collaboration with humans to monitor their output and correct results when needed.
The Future of Generative AI
It’s safe to say that the future of generative AI will continue to be driven by the same forces that have brought it this far. Hardware and software improvements will increase the capacity of models that we are able to train. New innovations in architecture and training will inevitably appear, leading to jumps in performance with new state-of-the-art models. Moreover, with new opportunities come new challenges. Copyright and intellectual property laws will need to be adapted, and there will likely be further privacy concerns about what data is used to train these models as AI and data regulations evolve. Deepfake technology will also continue to mature, allowing for more advanced methods of spreading misinformation and fake content. Despite these challenges, the future of generative AI remains bright, with the potential to revolutionize industries from healthcare to film to finance.
Opinions expressed by DZone contributors are their own.