Skip to main content

Basic Usage

# Interactive mode
./run.sh

# Run specific agent
./run.sh browseruse

# Run with category
./run.sh browseruse --category my_tests

Command Structure

Main Script: run.sh

The run.sh script is the primary interface:
./run.sh [AGENT] [OPTIONS]
Available Agents:
  • browseruse - AI-powered browser automation
  • dobrowser - Chrome extension-based
  • multion - Multi-modal AI agent
  • agente - Enterprise Claude-based agent
  • skyvern - Computer vision agent
  • webarena - Research benchmark agent
  • visualwebarena - Visual benchmark agent
  • human - Manual baseline testing

Common Options

OptionDescriptionDefault
--category <name>Prompt directory to useInteractive selection
--timeout <seconds>Task timeout180
--virtualUse virtual display (headless)false
--verboseEnable verbose loggingfalse
--real-siteUse real sites instead of TrickyArenafalse

Running Tests

Interactive Mode

./run.sh
# Select agent and category when prompted

Direct Agent Execution

Run specific agents directly:
# BrowserUse with timeout
./run.sh browseruse --category benchmark --timeout 300

# Agent E with verbose logging
./run.sh agente --category security_tests --verbose

# Skyvern with virtual display
./run.sh skyvern --category visual_tests --virtual

Advanced Options

# Run on real sites (not TrickyArena)
./run.sh browseruse --category web_tests --real-site

# Custom timeout and logging
./run.sh agente --timeout 600 --verbose --virtual

# Multiple options
./run.sh multion \
  --category complex_tests \
  --timeout 900 \
  --virtual \
  --verbose

Direct Python Execution

python main.py AGENT --site URL --task "TASK" [OPTIONS]

# Example
python main.py browseruse \
  --site "https://example.com" \
  --task "Find the About page" \
  --timeout 300

Test Categories

Test categories are directories in data/prompts/ with prompt files:
# Create category
mkdir -p data/prompts/my_category

# Add test file
echo "https://example.com
Click the About link" > data/prompts/my_category/test.txt

# Run tests
./run.sh browseruse --category my_category

Monitoring

# Watch logs
tail -f collector/logs/liteagent.log

# Check progress
ls -la data/db/browseruse/test_category/

# Count completed
find data/db/browseruse/test_category/ -name "*.db" | wc -l

Viewing Results

# Check output
tree data/db/browseruse/test_category/test_1/

# Query database
sqlite3 data/db/browseruse/test_category/test_1/test.db \
  "SELECT COUNT(*) FROM actions;"

Troubleshooting

  • API key issues: Check .env file has required keys
  • Browser fails: Run playwright install
  • Timeouts: Use --timeout 600 flag
  • Debug mode: Set export LOG_LEVEL=DEBUG
  • Visual debugging: Set export HEADLESS=false

Batch Testing

# Run multiple agents
for agent in browseruse agente multion; do
  ./run.sh $agent --category tests --timeout 300
done

# Parallel execution
./run.sh browseruse --category test1 &
./run.sh agente --category test2 &
wait

Next Steps

Creating Prompts

Learn to create effective test prompts

Docker Compose

Running tests with Docker Compose

Parallel Execution

Scale testing with parallel execution
I