Setting Up Browser Use

Subhajeet Dey

•

June 18, 2025

Make websites accessible for AI agents 🤖.Browser use is the easiest way to connect your AI agents with the browser, One big challenge in AI has been getting AI agents to browse the web as effortlessly as humans do. That’s where Browser Use, an npm package, comes in. It gives Language Learning Models (LLMs) and AI agents the ability to navigate and interact with websites directly. This game-changing tool is making web access easier for AI, unlocking exciting possibilities in automation, data collection, and interactive applications.

Understanding Browser Use

Browser Use is an npm package that provides AI agents with the capability to navigate, interact with, and extract information from websites just as a human would. Rather than relying on complex API integrations or specialized web scraping tools, Browser Use offers a straightforward approach that mimics human browsing behavior, making websites accessible to AI without requiring site-specific customizations.

The key innovation of Browser Use lies in its ability to translate natural language instructions into browser actions. This means developers can simply tell their AI agents what to do on a website using plain language, and Browser Use handles the technical implementation behind the scenes.

Setting Up Browser Use

Getting started with Browser Use is remarkably straightforward. Let's walk through the setup process step by step:

Installation

First, you'll need to install the package using npm:

npm install browser-use

Basic Configuration

After installation, you'll need to set up the basic configuration for Browser Use. This involves creating a new instance of the BrowserUse class and configuring it with your preferred settings:

const { BrowserUse } = require('browser-use');

// Create a new instance with basic configuration
const browserUse = new BrowserUse({
  headless: false,  // Set to true if you don't need to see the browser window
  defaultViewport: null,
  args: ['--start-maximized']  // Start with a maximized window
});

// Initialize the browser
async function initialize() {
  await browserUse.initialize();
  console.log('Browser has been initialized successfully');
}

initialize();

This basic setup initializes a browser instance that your AI agent can control. The headless option determines whether the browser window is visible (useful for debugging) or runs in the background.

Implementing AI Agent Interactions

Now that we have our browser set up, let's explore how to implement interactions between an AI agent and websites using Browser Use:

Basic Navigation and Information Extraction

const { BrowserUse } = require('browser-use');
const { OpenAI } = require('openai');

// Initialize OpenAI client
const openai = new OpenAI({
  apiKey: 'your-api-key-here'
});

// Initialize Browser Use
const browserUse = new BrowserUse({ headless: false });

async function runAIBrowserTask() {
  // Initialize browser
  await browserUse.initialize();
  
  // Navigate to a website
  await browserUse.navigate('https://example.com');
  
  // Get page content
  const pageContent = await browserUse.getPageContent();
  
  // Ask the AI to analyze the content
  const completion = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      {
        role: "system",
        content: "You are an assistant that analyzes web content and extracts key information."
      },
      {
        role: "user",
        content: `Analyze this webpage content and summarize the main points: ${pageContent}`
      }
    ]
  });
  
  console.log('AI Analysis:', completion.choices[0].message.content);
  
  // Close the browser
  await browserUse.close();
}

runAIBrowserTask();

In this example, we navigate to a website, extract its content, and then use an LLM (in this case, OpenAI's GPT-4) to analyze the information. This demonstrates the basic workflow of using Browser Use together with an AI model.

Interactive Form Filling

One of the most powerful features of Browser Use is the ability to interact with web forms. Here's how you might implement an AI that can fill out forms based on natural language instructions:

async function fillFormWithAI() {
  await browserUse.initialize();
  await browserUse.navigate('https://someformwebsite.com/contact');
  
  // Get the user's intent
  const userIntent = "I want to submit a question about pricing for enterprise packages. My name is John Smith, email is [email protected], and my question is about volume discounts.";
  
  // Ask the AI to determine what to fill in each form field
  const formFillingInstructions = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      {
        role: "system",
        content: "You are an assistant that helps fill out web forms. Given the form fields and user intent, provide the values to enter in each field."
      },
      {
        role: "user",
        content: `
          Form fields found:
          - Name (input#name)
          - Email (input#email)
          - Subject (select#subject with options: General, Support, Pricing, Other)
          - Message (textarea#message)
          
          User intent: ${userIntent}
          
          Provide JSON with field selectors and values to fill.
        `
      }
    ]
  });
  
  // Parse the AI's response
  const fillInstructions = JSON.parse(formFillingInstructions.choices[0].message.content);
  
  // Fill the form based on AI instructions
  for (const [selector, value] of Object.entries(fillInstructions)) {
    await browserUse.fill(selector, value);
  }
  
  // Submit the form
  await browserUse.click('button[type="submit"]');
  
  console.log('Form submitted successfully');
  await browserUse.close();
}

fillFormWithAI();

This example demonstrates how an AI agent can intelligently fill out a form based on a user's natural language request, showcasing the power of combining LLMs with browser automation.

Advanced Features and Capabilities

Browser Use extends beyond basic navigation and form filling. It offers a comprehensive suite of features that enable AI agents to perform complex web interactions:

Conditional Navigation

async function conditionalNavigation() {
  await browserUse.initialize();
  await browserUse.navigate('https://news-site.com');
  
  // Check if a specific element exists on the page
  const hasLoginPrompt = await browserUse.elementExists('.login-prompt');
  
  if (hasLoginPrompt) {
    // Handle login if needed
    await browserUse.click('.login-button');
    await browserUse.fill('#username', 'myusername');
    await browserUse.fill('#password', 'mypassword');
    await browserUse.click('#submit-login');
  }
  
  // Continue with the main task
  await browserUse.click('.top-story');
  
  // Extract the article content
  const articleContent = await browserUse.getTextContent('.article-body');
  console.log('Article content:', articleContent);
  
  await browserUse.close();
}

Multi-step Workflows

Browser Use excels at handling complex, multi-step workflows that involve navigating through multiple pages and making decisions based on what's found:

async function researchProduct() {
  await browserUse.initialize();
  
  // Search for a product
  await browserUse.navigate('https://search-engine.com');
  await browserUse.fill('#search-box', 'best laptop 2025');
  await browserUse.press('Enter');
  
  // Analyze search results with AI
  const searchResults = await browserUse.getTextContent('.search-results');
  
  const analysisPrompt = `
    Analyze these search results and determine the top 3 websites to visit for reliable laptop reviews:
    ${searchResults}
  `;
  
  const analysis = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      { role: "system", content: "You're a research assistant helping find reliable product information." },
      { role: "user", content: analysisPrompt }
    ]
  });
  
  // Parse the AI's suggestions
  const topSites = extractURLsFromText(analysis.choices[0].message.content);
  
  // Visit each recommended site and gather information
  let allInformation = [];
  for (const site of topSites) {
    await browserUse.navigate(site);
    const pageInfo = await browserUse.getTextContent('main');
    allInformation.push({ site, content: pageInfo });
  }
  
  // Compile final research report with AI
  const finalReport = await generateResearchReport(allInformation);
  console.log('Research Report:', finalReport);
  
  await browserUse.close();
}

The introduction of tools like Browser Use represents a significant step toward truly autonomous AI agents that can interact with the web just as humans do. By providing a simple, intuitive interface that translates natural language instructions into browser actions, Browser Use is democratizing web access for AI systems of all types.

As this technology continues to evolve, we can expect to see increasingly sophisticated AI agents capable of performing complex tasks on the web, from research and data gathering to customer service and automated testing. The potential applications are virtually limitless, limited only by our imagination and the capabilities of the AI models themselves.

For developers looking to create more capable, web-aware AI agents, Browser Use offers an accessible entry point that eliminates much of the complexity traditionally associated with web automation. By focusing on natural language instructions rather than low-level browser manipulation, Browser Use allows developers to create more intuitive, user-friendly AI applications that can seamlessly interact with the web.