Why this role exists
We’re building two distinct AI-first businesses:
- a long/short equity hedge fund (agent-driven research at scale)
- a prop trading operation running higher-frequency crypto strategies
Across both, the edge comes from tight iteration loops—propose → test → select → ship → monitor → iterate—and we’re hiring someone to contribute materially to making our systems fast, correct, and production-grade.
What you’ll work on (three tracks across two businesses)
Track A — Agent-driven long/short equity research (Hedge Fund)
- Scale evaluation for thousands of agent proposals (deterministic runs, artifact tracking, lineage)
- Build point-in-time and leakage-resistant feature/data workflows (validation, staleness checks, idempotent jobs)
- Improve scoring, baselines, and ablations so “wins” are real and reproducible
- Package signals behind clean interfaces; add telemetry + drift/performance dashboards
Track B — Statistical trading R&D with Alpha-Evolve workflows (Prop Trading)
- Harden the backtesting/simulation stack for higher-frequency statistical strategies
- Implement realistic cost models (fees/spreads/slippage), turnover/capacity constraints, strict time alignment
- Build system for LLM powered algorithm evolution: variant generation → evaluation → selection, with fitness metrics + constraints that account for potential overfitting
- Build experiment tooling: comparisons, leaderboards, regression gates, promotion/rollback safety
Track C — Autonomous market making buildout (Prop Trading)
Market making is an unusually good fit for autonomous iteration because it natively provides tight feedback: quote → get filled (or not) → observe P&L/inventory/adverse selection → update logic → repeat. The rapidly measurable rewards make it an ideal optimization target for long-running multi-agent coding systems.
- Help build the multi-agent harness that can run for days/weeks:
-
Planners generate and refine tasks, workers implement, judges/CI gate merges and reset cycles (inspired by long-running agent coordination patterns)
- Wire changes into trading reality: commit → sim/paper → metrics → new tasks
- Strengthen coordination and safety: avoid duplicate work, prevent drift, ensure “done” means tested + observable
- Build guardrails: risk limits, kill-switches, monitoring, and post-trade diagnostics
Day-to-day
- Ship improvements that stay correct under scale (many assets, many runs, many variants)
- Profile and optimize Python hot paths (vectorization, IO/layout, Polars/NumPy where it counts)
- Write deterministic tests (unit + property-based) around point-in-time joins, feature lags, fills, and cost models
- Add guardrails that prevent leakage, stale data, and “too-good-to-be-true” results
- Partner with infra to move outputs to staging → live with metrics, alerts, and SLOs
Must-haves
- Strong Python + practical SQL; you can ship robust systems, not just notebooks
- Experience contributing to end-to-end data/ML/quant pipelines (ingest → compute → test → deploy/operate)
- You understand evaluation correctness:
- leakage prevention (point-in-time data, walk-forward, embargo/purged CV where applicable)
- realistic transaction/borrow costs, turnover/capacity constraints
- Comfortable with Docker + CI; you treat reproducibility and auditability as product features