Hugging Face Is the New GitHub for LLMs
Hugging Face is becoming the "GitHub" for large language models (LLMs). Hugging Face offers tools that simplify LLM development and deployment.
Join the DZone community and get the full member experience.Join For Free
Large language models (LLMs) have taken the tech industry by storm in recent years, unleashing new frontiers of innovation and disrupting everything from search to customer service. Underpinning this revolution in artificial intelligence are open ecosystems like GitHub and Hugging Face, which enable developers and companies to build, deploy and scale LLMs rapidly. Just as GitHub has become the go-to platform for software development and collaboration, Hugging Face is now emerging as the de facto hub for all things related to LLMs.
The Rise of Large Language Models
LLMs like GPT-3, BERT and PaLM have captured the imagination of the tech world with their ability to generate human-like text, answer questions, summarize documents and even write code based on simple text prompts. According to a McKinsey report, investments in natural language processing startups focusing on LLMs ballooned from $100 million in 2020 to over $1.5 billion in 2021.
This surge of interest stems from LLMs' versatility in tackling diverse AI challenges. For instance, OpenAI's ChatGPT excels at conversational tasks while tools like Cohere's Generative NLP API summarize texts and moderate content. LLMs are transforming how businesses operate, spurring everything from intelligent search to automated customer support.
McKinsey estimates that LLMs could create $200 billion to $300 billion in annual economic value by 2025 just within the US economy. Tech giants in the US like Google, Meta and Microsoft as well as startups are racing to tap into the potential of LLMs. But building, deploying and iterating on LLMs require specialized infrastructure and tooling.
GitHub's Pivotal Role in Software Collaboration
To understand Hugging Face's growing significance as a hub for LLMs, it is instructive to examine the indispensable role GitHub has played in software development. Launched in 2008, GitHub pioneered the open source Git protocol for version control and source code management.
Today, GitHub hosts over 200 million code repositories and over 83 million developers. It offers developers tools to collaborate, review code, track issues and release software. GitHub has become integral to how software teams operate, exemplified by Microsoft's $7.5 billion acquisition of it in 2018.
According to Stack Overflow's 2021 survey, over 90% of developers use GitHub. The platform's social coding capabilities have broken down barriers in software development. Developers can tap into open source projects to accelerate builds. Companies use GitHub's enterprise offerings to streamline coding workflows. GitHub is deeply embedded in developer culture and shapes how the software community creates, scales and deploys code.
Hugging Face Emerges as the Go-To Platform for LLMs
Just as GitHub spurred open source development, Hugging Face is spearheading the open ecosystems approach to LLMs. Founded in 2016, Hugging Face began by focusing on natural language processing. In 2020, it pivoted into LLMs and created the Transformers library that unifies different LLM architectures like BERT and GPT-2 with standardized APIs.
This library democratized access to LLMs by abstracting away the complexities of working with them. Today, Hugging Face has became a vibrant community with over 200,000 users. Its main offerings are:
- Model Hub: A repository of over 100,000 AI models including LLMs like OpenAI's CLIP and Salesforce's BLENDER. It lowers barriers to utilizing LLMs.
- Tokenizers: Pretrained models to tokenize and encode text for LLMs. Critical for preprocessing data.
- Datasets: Carefully curated datasets to train and evaluate LLMs.
- Spaces: A MLOps platform to deploy, monitor and scale LLM-powered apps.
- Infinite: A wiki-style dataset based on GPT models to generate answers to natural language queries.
This suite of tools tackles the full LLM development lifecycle, from discovery to deployment. Hugging Face is also building out integrations with platforms like Streamlit, enabling no-code LLM experimentation.
Hugging Face has raised $100 million in funding so far, reflecting its soaring prominence. Its valuation quintupled to $2 billion over the past year. Top AI labs and companies internationally are partnering with Hugging Face as well.
The GitHub of LLMs
Hugging Face's expansive hub of models, datasets and development tools has earned it the moniker of "GitHub for LLMs." Its Model Hub serves as the starting point for anyone wanting to work with LLMs. Developers can find optimized implementations of models like Meta AI's OPT-175B there.
They can then seamlessly access these models via Hugging Face's Transformers library. This drastically lowers the barriers to using cutting-edge LLMs. Companies no longer have to build their own LLMs from scratch. Instead, they can take Hugging Face's prescaled LLMs and fine-tune them for custom use cases in areas like search and analytics.
Spaces enables collaboratively building, testing and deploying LLM applications. Combined with Hugging Face's open datasets and active community forums, it replicates core elements of GitHub's open source ethos tailored for LLMs.
Leo Zhao, a machine learning engineer at a major US tech company, encapsulates how deeply Hugging Face has embedded itself into LLM workflows:
“Hugging Face is our first stop whenever we need an LLM for a new project. Their Model Hub has a huge taxonomy of options to choose from. We can immediately tokenize and feed data to models with just a few lines of code. Spaces makes it easy to scale model training on GPU clusters. It really is a one-stop platform for everything related to LLMs.”
The GitHub analogy also applies to how Hugging Face has fostered a collaborative community around LLMs. Its forums have become a vital source of knowledge and support for thousands of LLM developers and users. Hugging Face further cultivates this community through its popular LLM conference, democratizing access to the latest advancements.
Overcoming LLM Adoption Challenges
Hugging Face is proving instrumental in helping companies overcome key barriers to adopting LLMs. According to a McKinsey study, the top challenge organizations face with LLMs is assessing value and identifying use cases. Hugging Face alleviates this by centralizing a wide selection of LLMs and recommended fine-tuning datasets.
Furthermore, putting LLMs into production poses complex data and infrastructure problems. Hugging Face's end-to-end platform from model access to deployment smoothing out these roadblocks for enterprises.
The financial investment required to build and run LLMs at scale has also deterred adoption. Hugging Face reduces costs by providing easy access to pretrained models. Spaces further optimizes expenditures via its serverless architecture and support for scalable cloud hardware like TPUs. For smaller teams and startups, this can make experimenting with large LLMs viable.
Transforming the Future with LLMs
Looking ahead, Hugging Face seems poised to continue growing as the hub for LLMs. Its community already surpasses popular AI forums. More developers and companies are relying on tools like the Transformers library and Tokenizers in their production pipelines.
LLMs will drive seismic changes in areas like marketing, sales and finance. McKinsey envisions LLMs could automate 30% to 45% of current work activities, creating major societal impacts. Platforms like Hugging Face that lower barriers to LLM innovation will be central to realizing their transformative potential.
Just as GitHub accelerated software engineering, Hugging Face is enabling developers and businesses to tap into LLMs' capabilities more rapidly and effectively. For the growing LLM-powered economy, Hugging Face represents the gateway to the future. Its comprehensive platform could catalyze new markets and unlock human-AI collaboration at scale, ushering in the next era of technological progress.
Published at DZone with permission of Arvind Bhardwaj. See the original article here.
Opinions expressed by DZone contributors are their own.