Basic Usage
Command Structure
Main Script: run.sh
The run.sh
script is the primary interface:
browseruse
- AI-powered browser automationdobrowser
- Chrome extension-basedmultion
- Multi-modal AI agentagente
- Enterprise Claude-based agentskyvern
- Computer vision agentwebarena
- Research benchmark agentvisualwebarena
- Visual benchmark agenthuman
- Manual baseline testing
Common Options
Option | Description | Default |
---|---|---|
--category <name> | Prompt directory to use | Interactive selection |
--timeout <seconds> | Task timeout | 180 |
--virtual | Use virtual display (headless) | false |
--verbose | Enable verbose logging | false |
--real-site | Use real sites instead of TrickyArena | false |
Running Tests
Interactive Mode
Direct Agent Execution
Run specific agents directly:Advanced Options
Direct Python Execution
Test Categories
Test categories are directories indata/prompts/
with prompt files:
Monitoring
Viewing Results
Troubleshooting
- API key issues: Check
.env
file has required keys - Browser fails: Run
playwright install
- Timeouts: Use
--timeout 600
flag - Debug mode: Set
export LOG_LEVEL=DEBUG
- Visual debugging: Set
export HEADLESS=false
Batch Testing
Next Steps
Creating Prompts
Learn to create effective test prompts
Docker Compose
Running tests with Docker Compose
Parallel Execution
Scale testing with parallel execution