# Critical Code Review: irc-now Full security and correctness review of the irc-now workspace. Findings organized by severity with specific file locations and fix guidance. --- ## Critical ### 1. Unauthenticated internal admin endpoints in txt and pics **Files:** - `crates/txt/src/main.rs:95-97` - `crates/pics/src/main.rs:142-144` - `crates/txt/src/routes/paste.rs:305-357` - `crates/pics/src/routes/image.rs:441-510` The `/api/internal/pastes/{id}/hide`, `/api/internal/pastes/{id}/unhide`, `/api/internal/pastes/{id}` (DELETE), and equivalent image routes are on the public router with zero authentication. Any unauthenticated client can hide, unhide, or permanently delete any paste or image by ID. **Fix:** Require a pre-shared secret header (`X-Internal-Token` matched against an env var) in a middleware or directly in handlers. --- ### 2. Stripe webhook HMAC comparison is not timing-safe **File:** `crates/web-api/src/stripe_util.rs:47` ```rust computed == expected_sig ``` Normal string equality short-circuits on the first mismatched byte. Exploitable as a timing side-channel to brute-force the HMAC signature and forge webhook events (plan upgrades/downgrades). **Fix:** Use `mac.verify_slice()` from the `hmac` crate: ```rust let expected_bytes = hex::decode(expected_sig)?; mac.verify_slice(&expected_bytes).is_ok() ``` --- ### 3. Lua sandbox limits silently not applied **File:** `crates/bot/src/lua/sandbox.rs:9,11` ```rust let _ = lua.set_memory_limit(MEMORY_LIMIT); let _ = lua.set_hook(...); ``` Both return values are suppressed. If the Lua build doesn't support the allocator hook, the memory limit silently doesn't apply -- a Lua script can allocate unbounded memory, causing OOM. Same for the instruction limit. **Fix:** Propagate errors: `lua.set_memory_limit(MEMORY_LIMIT)?;` and `lua.set_hook(...)?;` -- the function already returns `LuaResult`. --- ### 4. Config injection in soju.conf via CRD string fields **File:** `crates/soju-operator/src/resources/configmap.rs:29,32,36,51` `hostname`, `hostname_cloak`, `title`, and `auth oauth2` URL are formatted directly into soju.conf with no sanitization. A newline in any field injects arbitrary config directives (e.g., overriding `auth internal` to bypass OAuth2). **Fix:** Validate fields at admission time (CRD validation pattern regex) or reject values containing whitespace/newlines in the operator before writing config. --- ### 5. Ergo config injection via string-formatted YAML **File:** `crates/ergo-operator/src/resources/configmap.rs:50,57-59,119,132,140-141` The ergo YAML config is built by string formatting, not YAML serialization. `netname`, `max_sendq`, `oper_hash`, `oauth2_secret`, `vhost_regexp` -- all user-influenced values -- can break YAML structure with `"`, `\n`, or `'` characters. **Fix:** Use `serde_yaml` to construct and serialize the config struct instead of building YAML by hand. --- ### 6. `/metrics` endpoint publicly accessible with no auth **File:** `crates/web-api/src/main.rs:118` Prometheus metrics endpoint exposes MRR, subscription counts, user counts by plan, conversion/churn rates, per-bouncer stats to anyone who can reach `my.irc.now/metrics`. **Fix:** Bind metrics on a separate internal-only listener (e.g., `0.0.0.0:9090`), matching the pattern txt and pics already use. --- ### 7. IRC command injection in ergo_admin **File:** `crates/web-api/src/ergo_admin.rs:98,131` ```rust format!("NS SAREGISTER {username} {password}\r\n") ``` If `username` contains a space or `\r\n`, it splits/injects IRC commands executed as an IRC operator. **Fix:** Validate `username` matches `^[a-zA-Z0-9_-]{1,50}$` before use. --- ### 8. Image decompression bomb risk **File:** `crates/pics/src/routes/image.rs:202-210` 10 MB file size check exists, but `image::load_from_memory` has no decoded-size limit. A 1 MB PNG can expand to gigabytes in memory. The image is also decoded twice (once for dimensions, once for thumbnail). **Fix:** After getting dimensions, reject images where `width * height * 4 > MAX_DECODED_BYTES` (e.g., 100 MB). Combine the two decode passes. --- ## Important ### 9. IDOR: cross-user script deletion in bot **File:** `crates/bot/src/routes/script.rs:139-157` The `delete` handler verifies bot ownership but calls `db::delete_script(&state.db, &script_id)` without verifying the script belongs to that bot. An attacker can delete any script by brute-forcing nanoid(8) IDs. The `update` handler correctly checks `existing.bot_id != bot_id` -- the delete handler is missing this check. **Fix:** Add `AND bot_id = $2` to the DELETE query, or check `script.bot_id == bot_id` before deleting. --- ### 10. Unauthenticated internal APIs in bot service **File:** `crates/bot/src/main.rs:121` `/api/usage/{user_id}`, `/api/bots?user_id=` are publicly reachable with no auth. Same pattern in txt and pics `/api/usage/{sub}`. **Fix:** Require pre-shared secret header on internal API endpoints, or bind on a separate internal port. --- ### 11. Per-request soju Postgres connections never pooled **Files:** - `crates/web-api/src/routes/bouncer.rs:294-299` - `crates/web-api/src/routes/dashboard.rs:188-193` - `crates/web-api/src/routes/migrate.rs:184-191` - `crates/web-api/src/bouncer_watcher.rs:88-96` - `crates/web-api/src/business_metrics.rs:168-180` Each call opens a new `tokio_postgres::connect()` and spawns a detached connection task. Under load this creates unbounded connection growth against soju PostgreSQL instances. **Fix:** Create a per-bouncer connection pool with idle timeout, or at minimum abort the connection task when the client drops. --- ### 12. Ergo-operator has no finalizer -- PVC orphaned on deletion **File:** `crates/ergo-operator/src/controller.rs:36-44` No `finalizer()` call. PVCs persist indefinitely after ErgoNetwork deletion, consuming storage and leaking user data. **Fix:** Add `kube::runtime::finalizer` (matching the soju-operator pattern) with a cleanup handler that deletes the PVC. --- ### 13. Ergo PVC empty selector prevents dynamic provisioning **File:** `crates/ergo-operator/src/resources/pvc.rs:39` ```rust selector: Some(LabelSelector::default()), ``` An empty `selector` prevents dynamic provisioning on many provisioners. PVC will stay Pending forever. **Fix:** Remove line 39. The platform-operator's `build_minio_pvc` correctly omits `selector`. --- ### 14. Lua sandbox: `load` and `coroutine` globals not removed **File:** `crates/bot/src/lua/sandbox.rs:30-38` `load` allows compiling arbitrary Lua chunks. `coroutine.wrap`/`coroutine.yield` can bypass the instruction-count hook by yielding before the counter fires. **Fix:** Also remove `load`, `coroutine`, and `rawset`/`rawget` globals. --- ### 15. Soju DB tenant URI: `sslmode=disable` and password in ConfigMap **Files:** - `crates/soju-operator/src/db.rs:47` - `crates/soju-operator/src/resources/configmap.rs:47` All traffic between soju pods and PostgreSQL is unencrypted. The full DB URI including plaintext password is in a ConfigMap (not a Secret). OAuth2 client secret is also embedded in the config URL. **Fix:** Use `sslmode=require` at minimum. Move the DB URI to a Secret and mount it as a volume, or use env var injection. --- ### 16. No IRC line length limit in bot **File:** `crates/bot/src/manager.rs:199` `read_line` appends until `\n` or EOF with no size limit. A malicious IRC server can exhaust memory. **Fix:** Use `take(8192)` or check line length and disconnect on oversize. --- ### 17. `reqwest::Client::new()` created per-bot-reconnect **File:** `crates/bot/src/manager.rs:158` Each reconnect creates a new connection pool. Old pools leak via in-flight tokio tasks holding clones. **Fix:** Create one `reqwest::Client` at startup and share via `Arc`. --- ### 18. User-supplied filename not sanitized in S3 key (pics) **File:** `crates/pics/src/routes/image.rs:221-222` `filename` from multipart `file_name()` is used directly in the S3 key. Path traversal sequences like `../../` may be normalized differently by S3-compatible implementations. **Fix:** Sanitize to `[a-zA-Z0-9._-]` only. --- ### 19. TOCTOU race on storage/paste count limits **Files:** - `crates/pics/src/routes/image.rs:117-200` - `crates/txt/src/routes/paste.rs` (equivalent pattern) Storage query and file upload are not atomic. Concurrent uploads all see the same `used_bytes` and pass the check. **Fix:** Use `SELECT ... FOR UPDATE` or an advisory lock per user. --- ### 20. User-supplied path segments not URL-encoded in bot API calls **File:** `crates/web-api/src/routes/bot.rs:739,769` `form.key` from user input is interpolated into request URLs without percent-encoding. A key like `foo/bar` navigates to a different path on the bot service. **Fix:** Apply `urlencoding::encode()` at these call sites. --- ### 21. Stripe webhook handler has no idempotency guard **File:** `crates/web-api/src/routes/billing.rs:155-159` Stripe retries webhooks. Each replay fires another `plan_upgrade` event, skewing analytics. **Fix:** Add `WHERE plan != 'pro'` to the UPDATE and check `rows_affected()` before recording the event. --- ### 22. Ergo-operator missing config-hash annotation **File:** `crates/ergo-operator/src/resources/deployment.rs:158-160` No config hash in pod template annotations. Config changes (oper password rotation, OAuth2 secret change) don't trigger pod rollout. Pods run stale config until manually restarted. **Fix:** Add `irc.now/config-hash` annotation matching the soju-operator pattern. --- ### 23. Keycloak suspend creates split state **File:** `crates/web-api/src/routes/admin.rs:551-553` Suspension sets `suspended_at` in DB regardless of whether Keycloak disable succeeded. User appears suspended in portal but can still authenticate to soju, gamja, ergo via Keycloak. **Fix:** Fail the entire operation if Keycloak disable fails, or document the split-state risk. --- ### 24. `DefaultHasher` used for config hash is not stable across Rust versions **File:** `crates/soju-operator/src/controller.rs:143-149` `DefaultHasher` output can change between Rust releases, causing spurious Deployment rolling restarts on operator upgrade. **Fix:** Use SHA-256 via the `sha2` crate. --- ### 25. `provision_tenant_db` TOCTOU race on role creation **File:** `crates/soju-operator/src/db.rs:66-89` Separate `SELECT EXISTS` + `CREATE ROLE` can race between concurrent reconcile loops. **Fix:** Use `CREATE ROLE IF NOT EXISTS` (PostgreSQL 9.6+). --- ## Medium ### 26. `update_status` always overwrites `last_transition_time` **Files:** All operator controller.rs files Every 300-second reconcile patches status with a new timestamp even when nothing changed. Generates unnecessary API server load and audit log noise. **Fix:** Only update `lastTransitionTime` when the condition's `status` field actually transitions. --- ### 27. `unwrap()` on header value in htmx redirect **File:** `crates/web-api/src/htmx.rs:10` `HeaderValue::from_str(url).unwrap()` panics on non-ASCII characters. **Fix:** `.unwrap_or_else(|_| HeaderValue::from_static("/"))` --- ### 28. `is_signup` heuristic based on 5-second time window **File:** `crates/web-api/src/routes/auth.rs:165` Distinguishes signup from login by checking if `created_at` is within 5 seconds of now. Unreliable under load. **Fix:** Check `xmax = 0` from the `RETURNING` clause, or use separate INSERT/UPDATE queries. --- ### 29. `sanitize_bouncer_name` has no max length **File:** `crates/web-api/src/k8s.rs:5-13` No length limit. A 500-char input exceeds Kubernetes DNS label limits (63 chars). **Fix:** `.chars().take(63).collect()` and re-trim trailing hyphens. --- ### 30. Session deserialization errors silently swallowed **Files:** All `auth_guard.rs` files `session.get().await.unwrap_or(None)` converts deserialization errors to `None`, producing silent auth failures. **Fix:** Log the error before returning None. --- ### 31. Auth login handler panics on session insert failure **Files:** All `routes/auth.rs` files `.expect("session insert failed")` in handler code. Panics become 500s but with noisy stack traces. **Fix:** Map to a clean error/redirect. --- ### 32. `minio/minio:latest` and `ergo:stable` are mutable tags **Files:** - `crates/platform-operator/src/resources/deployment.rs:18` - Ergo-operator deployment.rs Mutable image tags cause unpredictable behavior changes on pod restart. **Fix:** Pin to specific version tags or digests. --- ### 33. Platform-operator cross-controller watch churn **File:** `crates/platform-operator/src/main.rs:38-116` All three controllers watch all Deployments/Services/Certificates cluster-wide. Each processes events for resources it doesn't own. **Fix:** Add label-selector-based watches to scope each controller's watch traffic. --- ### 34. `timer.every()` fires immediately and can flood the channel **File:** `crates/bot/src/lua/api.rs:220-230` No initial sleep. `every(0.001, fn)` fills the 64-capacity channel, silently dropping subsequent events. **Fix:** Add initial sleep matching the `after` pattern. Enforce a minimum interval. --- ### 35. Duplicated `get_keycloak_admin_token` function **Files:** - `crates/web-api/src/routes/profile.rs:187-220` - `crates/web-api/src/routes/admin.rs:39-72` Identical function bodies. Fixes in one won't reach the other. **Fix:** Extract to a shared helper module. --- ## Recommended fix order **Immediate (security, data loss risk):** 1. #1 -- Unauthenticated internal endpoints (txt/pics) 2. #2 -- Stripe HMAC timing attack 3. #7 -- IRC command injection 4. #9 -- IDOR script deletion 5. #3 -- Lua sandbox limits 6. #14 -- Lua sandbox escape via `load`/`coroutine` **Soon (significant exposure):** 7. #6 -- Public /metrics 8. #10 -- Unauthenticated bot internal APIs 9. #4, #5 -- Config injection (soju + ergo) 10. #8 -- Image decompression bomb 11. #12 -- Ergo finalizer 12. #13 -- Ergo PVC selector 13. #15 -- sslmode=disable + secrets in ConfigMap **Next sprint:** 14. Remaining important items (#16-#25) 15. Medium items (#26-#35)