# Phase 5: Event Tracking, Business Dashboards, Per-User Stats

## Summary

Add event tracking infrastructure, revenue/conversion/retention dashboards, and per-user
usage stats in the account portal. Three layers that build on each other.

## Layer 1: Events Table

Append-only `events` table in accounts-db (new migration in web-api):

```sql
CREATE TABLE events (
    id         BIGSERIAL PRIMARY KEY,
    user_sub   TEXT NOT NULL REFERENCES users(keycloak_sub),
    event_type TEXT NOT NULL,
    metadata   JSONB,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_events_type_created ON events (event_type, created_at);
CREATE INDEX idx_events_user ON events (user_sub, created_at);
```

### Event types

| event_type        | Where recorded                  | metadata                              |
|-------------------|---------------------------------|---------------------------------------|
| `signup`          | auth callback (first login)     | `{}`                                  |
| `login`           | auth callback (every login)     | `{}`                                  |
| `bouncer_create`  | bouncer creation route          | `{"bouncer": "<name>"}`               |
| `bouncer_delete`  | bouncer deletion route          | `{"bouncer": "<name>"}`               |
| `plan_upgrade`    | billing webhook                 | `{"from": "free", "to": "pro"}`       |
| `plan_downgrade`  | billing webhook                 | `{"from": "pro", "to": "free"}`       |
| `paste_create`    | txt service (internal API call) | `{"paste_id": "<id>"}`                |
| `pic_upload`      | pics service (internal API call)| `{"pic_id": "<id>"}`                  |

Recording is inline at each call site. No async queue. A helper function
`record_event(pool, user_sub, event_type, metadata)` keeps it simple.

For paste_create and pic_upload: txt/pics call their own DB, web-api does not
record these. They are tracked via Prometheus counters on the respective services
and per-user counts come from the internal usage API (Layer 3).

## Layer 2: Business Dashboards (Grafana)

### New Prometheus metrics

**Stripe MRR (300s background task in web-api):**

```
irc_now_mrr_cents          gauge   -- total MRR in cents
irc_now_subscriptions_active gauge -- active subscription count
irc_now_subscriptions_canceled_30d gauge -- cancellations in last 30 days
```

Implementation: `stripe::Subscription::list()` with `status=active` to sum
`plan.amount` for MRR. Canceled count via `status=canceled` with
`created > now - 30d`.

**Event-derived metrics (60s background task in web-api):**

```
irc_now_signups_7d         gauge   -- signups in last 7 days
irc_now_logins_total       counter -- incremented on each login (already referenced in dashboard)
irc_now_conversion_rate_30d gauge  -- plan_upgrade events / signup events over 30 days
irc_now_churn_rate_30d     gauge   -- plan_downgrade events / active pro users over 30 days
```

Retention cohorts: monthly signup cohorts with percentage still active (has
`downstream_interacted_at` in soju within last 30 days). Exposed as gauge with
`{cohort="2026-01"}` label.

```
irc_now_retention_cohort{cohort} gauge -- % of cohort still active
```

**Fix active users:**

Replace `last_login_at`-based `irc_now_active_users_24h` with the soju
`downstream_interacted_at > now() - 24h` query. This is already computed in
`record_soju_metrics` as `irc_now_bouncer_active_users`. Either rename it or
keep both and update dashboards to use the soju-based one.

### New Grafana panels

Admin dashboard additions:
- MRR (stat panel, formatted as currency)
- Active subscriptions (stat)
- Signups this week (stat)
- Conversion rate 30d (stat, percentage)
- Churn rate 30d (stat, percentage)
- Retention by cohort (bar chart)
- Messages sent total (timeseries, already have the data)

## Layer 3: Per-User Portal Stats

### Internal usage APIs

**txt service** -- new endpoint `GET /api/usage/{sub}`:

```json
{
  "paste_count": 12,
  "storage_bytes": 45678
}
```

Requires: authenticated internal call (service-to-service). Use a shared secret
or just restrict by network policy (cluster-internal only, no Route).

**pics service** -- new endpoint `GET /api/usage/{sub}`:

```json
{
  "image_count": 8,
  "storage_bytes": 234567
}
```

Same pattern as txt.

### Dashboard page expansion

The dashboard template gets new data:

```rust
struct DashboardTemplate {
    page: Page,
    has_bouncer: bool,
    is_pro: bool,
    plan: String,
    // new fields:
    networks: Vec<NetworkInfo>,      // name, address, enabled
    channels: Vec<String>,           // channel names
    messages_relayed: i64,           // total from soju "Message" table
    bouncer_uptime: Option<String>,  // pod age from K8s
    paste_count: i64,
    paste_storage: i64,              // bytes
    pic_count: i64,
    pic_storage: i64,                // bytes
}
```

Data sources on page load:
- Networks/channels/messages: query user's bouncer soju DB (existing pattern)
- Uptime: K8s pod creation timestamp for the bouncer deployment
- Paste/pic stats: HTTP calls to txt/pics internal usage APIs

### Dashboard UI

Stats displayed in a grid of cards:
- Bouncer: networks count, channels count, messages relayed, uptime
- Content: pastes count, images count, total storage used

## Configuration changes

web-api needs new env vars:
- `TXT_INTERNAL_URL` -- e.g. `http://txt:8080`
- `PIC_INTERNAL_URL` -- e.g. `http://pics:8080`

These are cluster-internal service URLs, no auth needed if restricted by
NetworkPolicy or just trusted within the namespace.

## Migration sequence

1. Apply events table migration to accounts-db
2. Deploy txt/pics with new `/api/usage/{sub}` endpoints
3. Deploy web-api with event recording, Stripe metrics, dashboard expansion
4. Update Grafana dashboards with new panels

## Retention / cleanup

Events table: no automatic cleanup initially. At current scale (handful of users),
growth is negligible. Add a retention policy (delete events older than 1 year)
when the table exceeds ~1M rows.