DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Bridging UI, DevOps, and AI: A Full-Stack Engineer’s Approach to Resilient Systems
  • Modern Test Automation With AI (LLM) and Playwright MCP
  • AI Speaks for the World... But Whose Humanity Does It Learn From?
  • Blue Skies Ahead: An AI Case Study on LLM Use for a Graph Theory Related Application

Trending

  • Implementing API Design First in .NET for Efficient Development, Testing, and CI/CD
  • How to Merge HTML Documents in Java
  • Supervised Fine-Tuning (SFT) on VLMs: From Pre-trained Checkpoints To Tuned Models
  • The Role of AI in Identity and Access Management for Organizations
  1. DZone
  2. Coding
  3. Tools
  4. Build an AI Browser Agent With LLMs, Playwright, Browser Use

Build an AI Browser Agent With LLMs, Playwright, Browser Use

A guide on how to build an AI browser agent using LLMs, Playwright, and Browser Use to automate web interactions, extract data, and navigate sites efficiently.

By 
Kailash Pathak user avatar
Kailash Pathak
DZone Core CORE ·
Feb. 07, 25 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
8.2K Views

Join the DZone community and get the full member experience.

Join For Free

Browser Use is a tool or platform designed to enable AI agents (such as OpenAI’s GPT models or other large language models) to interact with and control web browsers in an intelligent and automated way. It essentially bridges the gap between AI capabilities and real-world browser interactions, making it possible for AI systems to perform tasks like navigating websites, extracting data, filling out forms, clicking buttons, and more — just as a human user would.

The primary goal of Browser Use is to make websites accessible and actionable for AI agents by abstracting away the complexities of browser automation. Instead of requiring developers to write intricate scripts to locate and interact with webpage elements, Browser Use simplifies this process by extracting all interactive elements (like buttons, input fields, links, etc.) and providing a structured interface for AI agents to interact with.

Key Characteristics of Browser Use

AI-Driven Automation

Browser Use leverages AI to understand and interact with web pages. For example, it can analyze the content of a webpage, identify relevant actions (like clicking a button or filling out a form), and execute those actions autonomously.

Vision + HTML Extraction

It combines visual understanding (recognizing elements on the screen) with HTML structure extraction (parsing the underlying code of a webpage). This dual approach ensures that AI agents can interact with both static and dynamic web elements, even if they don’t have clear identifiers like IDs or classes.

Multi-Tab Management

Browser Use can handle multiple browser tabs simultaneously, allowing AI agents to perform complex workflows that involve interacting with several web pages at once.

The tool tracks the exact actions performed by the AI agent (e.g., clicking a button or filling out a form) and can replicate those actions consistently, even if the website layout changes slightly. This is particularly useful for creating self-healing tests in QA automation.

Custom Actions

Users can extend Browser Use by adding custom actions, such as saving data to files, performing database operations, sending notifications, or handling human input during specific steps in the automation process.

Self-Correcting

Browser Use includes intelligent error handling and automatic recovery mechanisms. If something goes wrong during automation (e.g., a missing element or a network timeout), the tool can detect the issue and attempt to recover automatically, ensuring that workflows continue without interruption.

Compatibility With Multiple LLMs

Browser Use supports various large language models (LLMs), including OpenAI’s GPT-4, Anthropic’s Claude, and Meta’s Llama 2. This flexibility allows users to choose the best AI model for their specific needs.

How Browser Use Works

Browser Use scans a webpage and extracts all interactive elements (buttons, input fields, links, forms, etc.). It then provides a structured representation of these elements that AI agents can understand and interact with.

AI Interaction

Once the interactive elements are identified, AI agents can perform actions like clicking buttons, filling out forms, navigating between pages, or extracting data. The AI agent can also analyze the content of the webpage and make decisions based on the information it finds.

Automation Workflows

Browser Use allows users to create complex automation workflows. For example, an AI agent could navigate through an e-commerce site, add items to a shopping cart, and complete a purchase — all without human intervention.

Error Handling and Recovery

If something goes wrong during the automation process (e.g., a missing element or a slow-loading page), Browser Use can detect the issue and attempt to recover automatically. This ensures that workflows continue smoothly, even in unpredictable environments.

Installation Guide

Getting started with Browser Use is straightforward, but it requires some initial setup to ensure everything runs smoothly. Below is a detailed installation guide based on the prerequisites and steps you’ve provided. This guide will walk you through setting up Browser Use locally on your machine.

Prerequisites

Before you begin, ensure that your system meets the following requirements:

  • Python 3.11 or higher. You can check your Python version by running the command:
    Python
     
    python --version
    
  • Git. Git is required to clone the repository

Local Installation

Step 1: Clone the Repository

Shell
 
git clone https://github.com/browser-use/web-ui.git
cd web-ui


Step 2: Set Up Python Environment

We recommend using uv for managing the Python environment (recommended for Mac):

Shell
 
curl -LsSf https://astral.sh/uv/install.sh | sh


Use uv for managing the Python environment

1. Create a virtual environment. Run the following command to create a virtual environment with Python 3.11:
Shell
 
uv venv -- python 3.11


Create a virtual environment


2. Activate the virtual environment.
  • Windows (command prompt):
    Shell
     
    .venv\Scripts\activate
  • macOS/Linux:
    Shell
     
    source .venv/bin/activate

Once activated, you should see .venv in your terminal prompt, indicating that the virtual environment is active.

Step 3: Install Dependencies

Now that your environment is set up, it’s time to install the necessary dependencies.

Install Python packages. Use the following command to install the required Python packages listed in requirements.txt:

Shell
 
uv pip install -r requirements.txt


Step 4: Install Playwright

Playwright is a browser automation library used by Browser Use.

To install it, run the command:

Shell
 
playwright install


Local Setup Guide for Browser Use WebUI

Once you’ve completed the installation steps for Browser Use, you can start running the WebUI locally. This guide will walk you through launching the application, customizing its settings, and configuring it to use your own browser if needed.

Running the WebUI

After completing the installation steps, you can start the Browser Use WebUI by running the following command:

Shell
 
python webui.py --ip 127.0.0.1 --port 7788


The WebUI provides several options to customize its behavior. Here’s a breakdown of the available flags:

  • --ip– the IP address to bind the WebUI to
    • Default – 127.0.0.1 (localhost)
  • --port– the port to bind the WebUI to
    • Default – 7788
  • --theme – the theme for the user interface

Accessing the WebUI

Once the WebUI is running, open your web browser and navigate to:

Plain Text
 
http://127.0.0.1:7788


Set 'share=True' in 'launch()'


Once the above command is executed, you should see the Browser Use interface, where you can interact with the tool and configure AI-driven browser automation tasks.


LLM Configuration

In LLM configuration, select a language model, e.g., gemini. Gemini provides the free API key.

Generate the API keys from the link attached below.

Generate the API keys


In the screenshot below, you can see we have added the API keys generated with the above link.

Run Agent

In Run agent, let's give the prompt "go to amazon.in and type 'Playwright' click search and give me the first URL."

Add the prompt

In the screenshot below, you can see that when we run the prompt, it will open the Chromium browser and interact with the whole DOM of the page.

Chromium browser

Finally, it will enter the value Playwright in the search box, and you can see the below screenshot.

The value Playwright is entered in the search box

In the below screenshot, you can see it gives us the first URL.

First URL

In the backend, you can see all the logs are executed; whatever agent is performing its log, all logs are in the backend.

Logs are executed

Result

In the result tab, you can see the final result, model action, model thoughts, trace file, and agent history.

Result tab

Video

You can download the video by clicking on the link provided. You can also see the attached video under the Recordings tab. When you run the video, you will see all the steps the agent has performed.

Below are some screenshots of the video.

Video screenshot (1/2)

Video screenshot (2/2)


Conclusion

The integration of LLMs, Playwright, and Browser Use represents a new leap in browser automation and AI-driven workflows. Combining these tools will allow you to create intelligent browser agents capable of performing complex tasks with minimal human intervention. From automating repetitive processes to enabling dynamic QA testing and real-time decision-making, the possibilities are endless.

Reference

  • Browser Use
AI UI large language model Tool

Opinions expressed by DZone contributors are their own.

Related

  • Bridging UI, DevOps, and AI: A Full-Stack Engineer’s Approach to Resilient Systems
  • Modern Test Automation With AI (LLM) and Playwright MCP
  • AI Speaks for the World... But Whose Humanity Does It Learn From?
  • Blue Skies Ahead: An AI Case Study on LLM Use for a Graph Theory Related Application

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!