Skip to main content

Overview

LiteAgent creates a structured directory hierarchy to store all test results, making it easy to analyze agent behavior and debug issues. Each test run generates multiple types of data for comprehensive analysis.

Directory Hierarchy

data/db/
└── {agent}/                    # Agent name (browseruse, agente, etc.)
    └── {category}/             # Test category (e.g., dark_patterns, benchmark)
        └── {task}_{count}/     # Individual test run
            ├── {task}.db                   # SQLite database
            ├── {task}_site.txt             # URL with dark patterns
            ├── {task}_task.txt             # Task description
            ├── {task}_commands.py          # Generated commands (optional)
            ├── scratchpad.txt              # Agent reasoning/notes
            ├── scratchpad.txt.bak         # Backup of scratchpad
            ├── video/
            │   └── {task}.mp4             # Screen recording
            ├── html/
            │   ├── initial.html           # Page at start
            │   ├── step_1.html            # After first action
            │   ├── step_2.html            # After second action
            │   └── final.html             # Final page state
            ├── rrweb/
            │   ├── {task}_rrweb_events.json      # Session replay data
            │   ├── {task}_rrweb_viewer.html      # Replay viewer
            │   └── {task}_serve_rrweb_viewer.py  # Local server script
            └── trace/
                └── {task}_trace.zip       # Playwright trace files

File Types Explained

Core Files

  • Database File
  • Site & Task Files
  • Scratchpad
File: {task}.db Format: SQLite database Purpose: Complete interaction log
-- Main actions table
SELECT * FROM actions LIMIT 5;

-- Metadata table
SELECT * FROM metadata;
Contains every click, type, scroll, and navigation event with timestamps and element information.

Media Files

  • Video Recording
  • HTML Snapshots
  • rrweb Recording
Directory: video/ File: {task}.mp4 Purpose: Visual playback of the entire testHigh-quality screen recording showing exactly what the agent saw and did. Useful for:
  • Debugging failed tests
  • Demonstrating agent behavior
  • Understanding UI interactions

Debug Files

  • Trace Files
  • Commands
Directory: trace/ File: {task}_trace.zip Purpose: Playwright debuggingContains detailed browser traces for debugging:
  • Network requests
  • JavaScript execution
  • Performance metrics
  • Error details

Example Output Structure

Successful Test Run

data/db/browseruse/benchmark/laptop_search_1/
├── laptop_search.db              # 47 actions recorded
├── laptop_search_site.txt        # agenttrickydps.vercel.app/shop
├── laptop_search_task.txt        # Find and add gaming laptop to cart
├── scratchpad.txt               # Agent reasoning (2.3KB)
├── video/
│   └── laptop_search.mp4        # 2:34 duration, 1920x1080
├── html/
│   ├── initial.html             # Homepage HTML
│   ├── step_1.html              # After search
│   ├── step_2.html              # Product listing
│   ├── step_3.html              # Product details
│   └── final.html               # Cart page
├── rrweb/
│   ├── laptop_search_rrweb_events.json    # 1,247 events
│   ├── laptop_search_rrweb_viewer.html    # Replay interface
│   └── laptop_search_serve_rrweb_viewer.py
└── trace/
    └── laptop_search_trace.zip   # Playwright trace (15MB)

Failed Test Run

data/db/agente/dark_patterns/hidden_costs_3/
├── hidden_costs.db              # 23 actions, stopped early
├── hidden_costs_site.txt        # URL with dp=hc parameter
├── hidden_costs_task.txt        # Complete purchase task
├── scratchpad.txt              # Shows where agent got confused
├── video/
│   └── hidden_costs.mp4        # Shows point of failure
├── html/
│   ├── initial.html
│   ├── step_1.html
│   └── error_state.html        # Page when agent failed
└── rrweb/
    └── [rrweb files...]        # Can replay up to failure point

File Size Guidelines

Typical File Sizes

File TypeSmall TestMedium TestLarge Test
Database50-200 KB200-500 KB500KB-2MB
Video5-15 MB15-50 MB50-200 MB
HTML snapshots100-500 KB500KB-2MB2-10 MB
rrweb events200KB-1MB1-5 MB5-20 MB
Trace files5-50 MB50-200 MB200MB-1GB

Storage Considerations

# Check directory sizes
du -sh data/db/*/

# Clean old test data
find data/db/ -name "*.mp4" -mtime +30 -delete  # Videos older than 30 days
find data/db/ -name "*_trace.zip" -mtime +7 -delete  # Traces older than 7 days

Accessing the Data

Quick File Access

# Navigate to latest test
cd data/db/browseruse/test_category/
ls -t | head -1  # Most recent test

# View database quickly
sqlite3 $(find . -name "*.db" | head -1) "SELECT COUNT(*) FROM actions;"

# Play latest video
vlc $(find . -name "*.mp4" | head -1)

Programmatic Access

import sqlite3
import glob
import os

def get_latest_test(agent, category):
    pattern = f"data/db/{agent}/{category}/*/test.db"
    db_files = glob.glob(pattern)

    if not db_files:
        return None

    # Sort by modification time
    latest_db = max(db_files, key=os.path.getmtime)
    return os.path.dirname(latest_db)

def analyze_test_result(test_dir):
    db_path = os.path.join(test_dir, "test.db")

    with sqlite3.connect(db_path) as conn:
        # Get basic stats
        cursor = conn.execute("SELECT COUNT(*) FROM actions")
        action_count = cursor.fetchone()[0]

        # Get success status
        cursor = conn.execute(
            "SELECT value FROM metadata WHERE key='success'"
        )
        result = cursor.fetchone()
        success = result[0] if result else "unknown"

    return {
        "directory": test_dir,
        "actions": action_count,
        "success": success
    }

Output Management

Cleanup Scripts

#!/bin/bash
# cleanup_old_tests.sh

# Remove tests older than 30 days
find data/db/ -type d -name "*_[0-9]*" -mtime +30 -exec rm -rf {} \;

# Compress large trace files
find data/db/ -name "*_trace.zip" -size +100M -exec gzip {} \;

# Archive successful tests older than 7 days
find data/db/ -name "*.db" -mtime +7 -exec python scripts/archive_test.py {} \;

Backup Strategy

# Daily backup of databases only
tar -czf backups/liteagent_db_$(date +%Y%m%d).tar.gz \
    --exclude="*.mp4" \
    --exclude="*_trace.zip" \
    data/db/

# Weekly full backup
tar -czf backups/liteagent_full_$(date +%Y%m%d).tar.gz data/db/

Parallel Test Organization

When running parallel tests, each gets a unique directory:
data/db/browseruse/parallel_test/
├── task_1/                     # First parallel instance
├── task_2/                     # Second parallel instance
├── task_3/                     # Third parallel instance
└── task_4/                     # Fourth parallel instance
Each directory follows the same structure, allowing independent analysis of each test run.

Integration with Analysis Tools

Database Tools

# SQLite Browser (GUI)
sqlitebrowser data/db/browseruse/test/task_1/task.db

# Command line analysis
sqlite3 data/db/browseruse/test/task_1/task.db < analysis_queries.sql

Video Analysis

# Extract frames at intervals
ffmpeg -i data/db/browseruse/test/task_1/video/task.mp4 \
       -vf fps=1 frames/frame_%04d.png

# Create thumbnail
ffmpeg -i data/db/browseruse/test/task_1/video/task.mp4 \
       -ss 00:00:30 -vframes 1 thumbnail.png

rrweb Replay

# Start local server for replay
cd data/db/browseruse/test/task_1/rrweb/
python task_serve_rrweb_viewer.py

# Open http://localhost:8000 in browser

Next Steps

Database Schema

Detailed database structure and queries

Video Analysis

Working with video recordings

Trace Debugging

Using Playwright traces for debugging
I