LumiTeh BUA
https://x.com/LumiTeh / https://www.lumiteh.com/ / https://github.com/LumiTeh-hub

Overview
Browser-Using Agents (BUA) follow the Computer-Using Agent (CUA) model popularized by OpenAI, but are extended to operate specifically in browser environments.
Traditional CUA models combine the vision and reasoning capabilities of LLMs to simulate control over computer interfaces and perform tasks. Browser-Using Agents focus solely on the browser as the primary interface, as giving AI agents access to the page DOM can significantly enhance their performance.
BUA is accessible via the bua/completions endpoint.
How it works?
In terms of input, in addition to the traditional CUA Screenshot + Prompt method, BUA also utilizes the page’s DOM to enhance understanding and reasoning of web content. This is illustrated in the figure below.
Setting up your environment
Before you can use BUA, you require a browser environment that can capture screenshots and DOM snapshots of a given web page.
We advise using playwright for this purpose.You can check out the library for an example implementation, in particular:
computer.screenshot()computer.dom()
Integrating the BUA loop
1. Send a request to the model
The first request will contain the initial state of the environment, which is a screenshot of the page and the DOM of the page
2. Receive a suggested action
The response will provide a sequence of actions to help achieve the specified goal. These actions may include clicking at a specific position, entering text, scrolling, or pausing as needed.
How you map a browser call to actions through code depends on your environment. If you are using playwright as your browser automation library, we already have a library that maps the browser calls to playwright actions:
bua-playwright-agent. (GITHUB)
Last updated