Command Line Interface

Basic Usage

# Interactive mode
./run.sh

# Run specific agent
./run.sh browseruse

# Run with category
./run.sh browseruse --category my_tests

Command Structure

Main Script: `run.sh`

The run.sh script is the primary interface:

./run.sh [AGENT] [OPTIONS]

Available Agents:

browseruse - AI-powered browser automation
dobrowser - Chrome extension-based
multion - Multi-modal AI agent
agente - Enterprise Claude-based agent
skyvern - Computer vision agent
webarena - Research benchmark agent
visualwebarena - Visual benchmark agent
human - Manual baseline testing

Common Options

Option	Description	Default
`--category <name>`	Prompt directory to use	Interactive selection
`--timeout <seconds>`	Task timeout	180
`--virtual`	Use virtual display (headless)	false
`--verbose`	Enable verbose logging	false
`--real-site`	Use real sites instead of TrickyArena	false

Running Tests

Interactive Mode

./run.sh
# Select agent and category when prompted

Direct Agent Execution

Run specific agents directly:

# BrowserUse with timeout
./run.sh browseruse --category benchmark --timeout 300

# Agent E with verbose logging
./run.sh agente --category security_tests --verbose

# Skyvern with virtual display
./run.sh skyvern --category visual_tests --virtual

Advanced Options

# Run on real sites (not TrickyArena)
./run.sh browseruse --category web_tests --real-site

# Custom timeout and logging
./run.sh agente --timeout 600 --verbose --virtual

# Multiple options
./run.sh multion \
  --category complex_tests \
  --timeout 900 \
  --virtual \
  --verbose

Direct Python Execution

python main.py AGENT --site URL --task "TASK" [OPTIONS]

# Example
python main.py browseruse \
  --site "https://example.com" \
  --task "Find the About page" \
  --timeout 300

Test Categories

Test categories are directories in data/prompts/ with prompt files:

# Create category
mkdir -p data/prompts/my_category

# Add test file
echo "https://example.com
Click the About link" > data/prompts/my_category/test.txt

# Run tests
./run.sh browseruse --category my_category

Monitoring

# Watch logs
tail -f collector/logs/liteagent.log

# Check progress
ls -la data/db/browseruse/test_category/

# Count completed
find data/db/browseruse/test_category/ -name "*.db" | wc -l

Viewing Results

# Check output
tree data/db/browseruse/test_category/test_1/

# Query database
sqlite3 data/db/browseruse/test_category/test_1/test.db \
  "SELECT COUNT(*) FROM actions;"

Troubleshooting

API key issues: Check .env file has required keys
Browser fails: Run playwright install
Timeouts: Use --timeout 600 flag
Debug mode: Set export LOG_LEVEL=DEBUG
Visual debugging: Set export HEADLESS=false

Batch Testing

# Run multiple agents
for agent in browseruse agente multion; do
  ./run.sh $agent --category tests --timeout 300
done

# Parallel execution
./run.sh browseruse --category test1 &
./run.sh agente --category test2 &
wait

Next Steps

Creating Prompts

Learn to create effective test prompts

Docker Compose

Running tests with Docker Compose

Parallel Execution

Scale testing with parallel execution

Getting Started

Core Concepts

Setup & Configuration

Running Tests

Output & Analysis

Basic Usage

Command Structure

Main Script: `run.sh`

Common Options

Running Tests

Interactive Mode

Direct Agent Execution

Advanced Options

Direct Python Execution

Test Categories

Monitoring

Viewing Results

Troubleshooting

Batch Testing

Next Steps

Creating Prompts

Docker Compose

Parallel Execution

Getting Started

Core Concepts

Setup & Configuration

Running Tests

Output & Analysis

​Basic Usage

​Command Structure

​Main Script: run.sh

​Common Options

​Running Tests

​Interactive Mode

​Direct Agent Execution

​Advanced Options

​Direct Python Execution

​Test Categories

​Monitoring

​Viewing Results

​Troubleshooting

​Batch Testing

​Next Steps

Creating Prompts

Docker Compose

Parallel Execution

Basic Usage

Command Structure

Main Script: `run.sh`

Common Options

Running Tests

Interactive Mode

Direct Agent Execution

Advanced Options

Direct Python Execution

Test Categories

Monitoring

Viewing Results

Troubleshooting

Batch Testing

Next Steps