Programmatic Brand Extraction: Pulling Logos, Colors, and Assets from Any URL
OpenBrand is an open-source library that extracts structured brand assets from any URL - available as an npm package, API, or AI agent skill.
Join the DZone community and get the full member experience.
Join For FreeHere’s a problem I kept running into: I need a company’s brand assets — their logo, their colors, maybe a hero image — and there’s no API for it.
You’re building a white-label dashboard. Or a proposal generator. Or an integration that sends branded emails on behalf of customers. Every time, you end up on their website, right-clicking “Inspect Element,” eyedropping hex codes, and downloading a pixelated PNG from their footer. It’s tedious, it breaks when they redesign, and it doesn’t scale.
So I built OpenBrand, an open-source library that extracts brand assets from any URL. Give it a website, get back structured JSON with logos, colors, and backdrop images. No API key needed if you run it as a library.
The Problem Is Harder Than It Looks
You might think: “Just scrape the <link rel='icon'> and call it a day.” But favicons are 16x16 pixels. That’s not a logo — that’s a logo for ants.
Real brand extraction needs to handle:
Logo detection. Companies put their logos in wildly different places. Some use an <svg> in the header. Some use a <img> with a class like .site-logo or .brand. Some only have it as an Open Graph image in their <meta> tags. Some have it nowhere obvious, and you need to check their favicon manifest for higher-resolution variants.
Color extraction. The brand’s primary color might be in CSS custom properties (--brand-primary), in computed styles on key elements, in their stylesheet as the most-used non-white/non-black color, or embedded in their logo SVG. And you need to distinguish between “the brand color” and “the color they use for body text.”
Backdrop images. Hero images, background gradients, Open Graph images — these are useful for building branded experiences, but they’re scattered across different DOM locations and meta tags.
The point is: there’s no standard for where brands put their assets. Every website is a snowflake.
How OpenBrand Works
OpenBrand uses server-side HTML scraping with Cheerio and image analysis with Sharp. No headless browser, no Puppeteer — just direct HTTP requests and intelligent heuristics. Here’s the approach:
// Fetch the page HTML with a browser-like User-Agent
const html = await fetch('https://stripe.com', {
headers: { 'User-Agent': 'Mozilla/5.0 ...' }
}).then(r => r.text());
// Parse with Cheerio (jQuery-like DOM API for Node.js)
const $ = cheerio.load(html);
// Run extraction heuristics across the parsed markup
For sites that block direct requests, it falls back to Jina Reader, a service that renders pages and returns clean content.
The extraction pipeline runs in this order:
- Logos – Check
<svg>elements in header/nav,<img>elements with logo-related classes/IDs,<link rel="icon">manifest for high-res variants, Open Graph/Twitter card images as fallback - Colors – Extract theme-color meta tags, parse manifest.json, sample dominant colors from logo images using Sharp
- Backdrops – Find Open Graph images, hero/banner images, background images on key sections
The library returns structured data:
import { extractBrandAssets } from "openbrand";
const result = await extractBrandAssets("https://stripe.com");
if (result.ok) {
console.log(result.data.brand_name); // "Stripe"
console.log(result.data.logos); // LogoAsset[] - SVGs, PNGs with URLs and dimensions
console.log(result.data.colors); // ColorAsset[] - hex values with context
console.log(result.data.backdrop_images); // BackdropAsset[] - hero images, backgrounds
}
Three Ways to Use It
As an npm package (no API key, runs on your server):
npm add openbrand
import { extractBrandAssets } from "openbrand";
const result = await extractBrandAssets("https://linear.app");
Lightweight and fast — no browser process to manage. Good for build scripts, CI pipelines, serverless functions, or backend services.
As an API (free API key from openbrand.sh):
curl "https://openbrand.sh/api/extract?url=https://stripe.com" \
-H "Authorization: Bearer your_api_key"
Good for client-side apps or anywhere you want a simple HTTP call.
As an agent skill (for Claude Code, Cursor, Codex, Gemini CLI):
npx skills add ethanjyx/openbrand
Then just ask your AI agent: “Extract brand assets from linear.app.” This is probably the most interesting distribution channel — 40+ AI coding agents can use it as a tool.
What I Got Wrong (And What I’d Do Differently)
Some honest takes on the tradeoffs:
Static HTML has limits. We don’t execute JavaScript, which means heavily SPA-dependent sites may not expose all their brand assets in the initial HTML. In practice, this matters less than you’d think - logos, favicons, OG tags, and most brand-relevant markup live in static HTML. For the few sites where it fails, the Jina Reader fallback helps. We chose speed and simplicity over completeness.
Logo detection is fuzzy. There’s no semantic HTML tag for “this is the company’s logo.” Heuristics work well for ~85% of sites but break on unusual layouts. Some sites put their logo in a <div> with a background image. Some use CSS mask-image. The current approach has a priority-ranked list of strategies, but it’s not perfect.
Color extraction conflates brand color with design system color. A company might use blue as its brand color but green for its primary CTA buttons. OpenBrand currently returns both without distinguishing between them. This is a known limitation - brand identity and UI design tokens overlap but aren’t identical.
Rate limiting. If you’re extracting from many URLs, you need to be respectful. The API has rate limits built in, but the npm package doesn’t throttle — that’s your responsibility.
Where This Is Actually Useful
Real use cases I’ve seen or built:
- White-label SaaS: Automatically theme a customer’s dashboard using their brand colors on first login
- Proposal/invoice generators: Pull the client’s logo and colors to brand documents without asking them to upload assets
- Competitive analysis tools: Track how competitors’ branding evolves over time
- AI agents: Give LLMs the ability to “see” a brand without manual configuration — useful for generating branded content, emails, or presentations
- Design system bootstrapping: Start a new project by extracting the brand’s existing visual language
Try It
The repo is at github.com/ethanjyx/openbrand. MIT licensed.
The fastest way to see if it works for your use case:
npm add openbrand
node -e "
import('openbrand').then(async ({extractBrandAssets}) => {
const r = await extractBrandAssets('https://your-target-site.com');
if (r.ok) console.log(JSON.stringify(r.data, null, 2));
else console.error(r.error);
});
"
If you find sites where the extraction breaks, open an issue — the heuristics improve with every edge case.
Opinions expressed by DZone contributors are their own.
Comments