How StackOverflow Is Adapting in the Face of Generative AI
OverflowAI revolutionizes content creation by streamlining the process through its powerful generative AI loop.
Join the DZone community and get the full member experience.
Join For FreeStackOverflow, the most commonly used platform by software developers for programming support, has been through a rough ride lately. Despite an impressive 69% of questions answered, StackOverflow’s traffic has been in decline. Similarweb’s data shows that their traffic dropped 14% year over year (StackOverflow says it’s closer to 5%). Nevertheless, the trend is downward and is explained primarily by the emergence of AI coding products like ChatGPT and GitHub Copilot. These products have meaningful code-writing capabilities and are, therefore, able to provide programming support, at least partly as good as StackOverflow does. Ironically, several of the large language models (LLMs) behind these AI products were trained using scraped StackOverflow data.
The company has gotten pretty harsh media coverage with these developments. Business Insider, in their article Death by LLM, wrote:
Welcome to the future of the internet in an AI world. Online communities like Stack Overflow and Wikipedia thrived as hubs for experts and curious browsers to come together and share information freely. Now, these digital meeting places are being pillaged by big tech companies prowling for human data to train their large language models.
The new products emerging from this generative AI boom are putting the future of these online forums in doubt. The chatbots answer questions clearly, automatically, and often pleasantly — so humans don’t need to deal with other humans to get information.
In the midst of all this attention, StackOverflow has played a steady hand and articulated its two-pronged approach to addressing this challenge:
- A few weeks back, they announced that they will start charging large AI developers who use the platform’s 50M+ questions and answers for model training (we dug into this issue in the data scraping article earlier).
- Last week, they launched the OverflowAI product, which is a set of actually useful generative AI features that can help kick off their second innings — we will focus on this today.
In this article, we’ll dive deep into:
- AI code writing tools disrupting StackOverflow.
- What OverflowAI does.
- Underlying trends from the StackOverflow strategy.
AI Code Writing Tools Disrupting StackOverflow
There are several AI code writing and editing tools available in the market today. These are either independent products (like OpenAI Codex, ChatGPT, Google Bard) or products that are natively integrated inside existing platforms (like GitHub Copilot, Replit Ghostwriter, Amazon CodeWhisperer). They have a broad range of capabilities, including code generation, code editing, autocomplete, and debugging.
The products with native distribution (like GitHub Copilot) are at a large advantage because they can operate seamlessly within environments that programmers already use today, and we will see more products attempting to get plugged into existing environments. For example, CodeGPT has a plugin that lets developers use the product from within Visual Studio Code (a popular code editing tool).
Existing AI code-writing tools are good at certain tasks. For example, this Reddit thread captures feedback from several web developers about GitHub Copilot — the overarching theme is that the product is useful in a subset of situations where developers have to write net new code and don’t want to spend time writing from scratch. Even for those situations, it’s often hit or miss.
The reason is not surprising. Conceptually, large language models (LLMs) take in a ton of data and generate output on the basis of this construct: in a particular context, for the question you asked, what is the most likely word/text to follow the previous word? It’s essentially calculating the probability of a word following another and generating output based on that. Despite this construct, given the amount of data that’s gone into training these models, the results for the more general ChatGPT use cases (like drafting an email or summarizing a page) have been nothing short of impressive. But it’s important to remember that language models, by design, have limited analytical/math capabilities. In other words, when you ask the model, “What is 2+2?” it may give you the right answer — not because it knows math but because it has seen that text pattern before in its training data.
Similarly, when it comes to code generation, the model does not really “know” the underlying concepts behind programming but is predicting results based on its training with a ton of text data. The consequence of this is the GitHub Copilot feedback above — it is sometimes good at generating the base code you need, but its ability to actually understand code, debug, and provide you explanations is limited. This will get better over time, but it’s hard to say if it will ever get to the point of high accuracy/high reliability.
StackOverflow CEO Prashanth Chandrasekar describes it succinctly:
One problem with modern LLM systems is that they will provide incorrect answers with the same confidence as correct ones and will ‘hallucinate’ facts and figures if they feel they fit the pattern of the answer a user seeks.
At some point, you’re going to need to know what you’re building. You may have to debug it and have no idea what was just built, and it’s hard to skip the learning journey by taking shortcuts.
This is the opportunity for StackOverflow — their traffic drop may be permanent, and it’s very likely that programmers come to StackOverflow less often for simpler questions (e.g., they might not visit StackOverflow anymore for an off-the-shelf sorting algorithm). But where the product can shine is: 1) providing high accuracy / high-reliability answers to more complex questions that language models might not have the capability to answer, and 2) providing answers to questions in new technologies/problem spaces that the models have not had previous data to train on. OverflowAI is designed to directly tap into this opportunity.
What OverflowAI Does
There are three key facets they are betting on — direct answers to questions, usability from within developer environments, and supercharging knowledge within enterprises.
OverflowAI Search provides direct answers to users in a Q&A format (similar to ChatGPT) but provides several links to actual StackOverflow posts. Besides helping create trust, this also provides users with the opportunity to go deeper where the answer provided by AI does not fully solve the user’s problem. This strikes the delicate balance of giving a direct answer when the question is simple but also guiding the user along a more exploratory path for difficult questions.
![Overflow AI Search (Source: captured from OverflowAI demo video)](https://dz2cdn1.dzone.com/storage/temp/17149251-1692515723500.png)
If the user is not satisfied with the responses, they can enter a chat-like interface to ask follow-up questions. If none of the answers are satisfactory, they can ask StackOverflow to draft a question on their behalf, ready to be posted to the Q&A forum. This experience also saves users from the semi-often situation where the question they ask is already answered previously.
![Automatic question draft (Source: captured from OverflowAI demo video)](https://dz2cdn1.dzone.com/storage/temp/17149252-1692515811988.png)
The product also doubles down on usability by making all of this capability available from Visual Studio Code through an extension. This helps StackOverflow compete more effectively with natively integrated coding assistants by letting developers get answers from within their coding environments (instead of having to context switch and search from a browser).
![Extension inside Visual Studio Code (Source: captured from OverflowAI demo video).](https://dz2cdn1.dzone.com/storage/temp/17149253-1692515844474.png)
In addition to this, for enterprise customers, OverflowAI is creating the ability to plug in several different sources of information within a company (internal Q&A, wiki pages, document repositories) to provide a cohesive Q&A experience for developers. Being able to utilize internal and StackOverflow data, and more importantly, exposing this easily in a Q&A type interface, can be a big productivity boost for engineering organizations. They also intend to launch a Slack integration as a seamless interface to expose this capability.
What’s impressive about OverFlowAI’s product approach is that it takes the company’s core asset (answers to difficult questions), exposes answers in a highly usable interface wherever the users are (whether on Slack or within developer environments), and in turn, creates a loop where users can leverage generative AI to submit new questions.
Underlying Trends From the StackOverflow Strategy
StackOverflow is not exactly a public company — they are owned by Prosus, which is, in turn, part of a bigger holding company, Naspers, which is publicly traded. Therefore, it’s hard to get clean revenue data, but a report from Prosus published in May 2022 sheds some light:
- The company made ~$89M in revenue in 2022, split 50–50 between the enterprise product StackOverflow for Teams and Reach products (advertising and employer branding).
- From 2021 to 2022, StackOverflow for Teams revenue was +69% while Reach products revenue was -12% (there could have been extraneous factors that impacted 2022 revenue, like slower hiring).
![](https://dz2cdn1.dzone.com/storage/temp/17149254-1692515883498.png)
This revenue data, combined with the what the OverflowAI product does, points to a few clear trends towards where StackOverflow is headed in the world of Generative AI (these trends can also be extended to other Q&A platforms):
- Their advertising business, whose success is directly tied to traffic, is in decline. This isn’t necessarily dire and just points towards a broader trend . There will likely be fewer eyeballs/page views because consumers will directly get answers to easier questions (which is good), and therefore, advertising becomes a less critical source of revenue.
- StackOverflow will continue to be a valuable source of answers for difficult questions, and the volume of questions and answers will continue to grow with the company’s generative AI push to automatically draft/submit questions. In addition, it’s also likely that if StackOverflow can keep the content engine running, the quality of content on the platform will improve, as repetitive/easy questions will no longer be the highest volume of content.
- StackOverflow will double down on building experiences where they can deliver the most value to users (like OverflowAI Search and Visual Studio Code extension) and focus on product lines where customers willing to pay for these superior experiences (e.g.... StackOverflow for Teams).
- Data licensing programs, where they charge AI companies for training on their data, will accelerate.
The trends all point towards a direction where StackOverflow is successfully pivoting to the next phase of the company, and the company has made the right product/business investments to weather what was a potential disruption. In addition, they have also done valuable community service and laid out a playbook for other Q&A platforms to leverage. Overall, I’m optimistic about the direction they are headed toward and that this will ignite a thriving content ecosystem in the future.
Published at DZone with permission of Vignesh Balagopalakrishnan. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments