Basic Usage
Command Structure
Main Script: run.sh
The run.sh script is the primary interface:
browseruse- AI-powered browser automationdobrowser- Chrome extension-basedmultion- Multi-modal AI agentagente- Enterprise Claude-based agentskyvern- Computer vision agentwebarena- Research benchmark agentvisualwebarena- Visual benchmark agenthuman- Manual baseline testing
Common Options
| Option | Description | Default |
|---|---|---|
--category <name> | Prompt directory to use | Interactive selection |
--timeout <seconds> | Task timeout | 180 |
--virtual | Use virtual display (headless) | false |
--verbose | Enable verbose logging | false |
--real-site | Use real sites instead of TrickyArena | false |
Running Tests
Interactive Mode
Direct Agent Execution
Run specific agents directly:Advanced Options
Direct Python Execution
Test Categories
Test categories are directories indata/prompts/ with prompt files:
Monitoring
Viewing Results
Troubleshooting
- API key issues: Check
.envfile has required keys - Browser fails: Run
playwright install - Timeouts: Use
--timeout 600flag - Debug mode: Set
export LOG_LEVEL=DEBUG - Visual debugging: Set
export HEADLESS=false
Batch Testing
Next Steps
Creating Prompts
Learn to create effective test prompts
Docker Compose
Running tests with Docker Compose
Parallel Execution
Scale testing with parallel execution
