System Foundation
Overview
The Strategy Execution Platform is designed with distinct layers, each with clear responsibilities. This separation of concerns makes the system easier to understand, maintain, and scale.
System Architecture
The 30,000-Foot View: Layers and Boundaries
Component-Level Breakdown
Core Design Decisions
Why Intent-Driven Architecture?
By separating the what (the strategy’s goal, or Intent) from the how (the ExecutionPlan), the platform gains immense flexibility. Strategy developers can focus purely on signal generation without worrying about the intricacies of gas management, slippage control across different venues, or transient network issues.
This abstraction allows the core platform to evolve its execution logic—such as introducing a new DEX, a more advanced routing algorithm, or MEV protection—without requiring any changes to existing strategies.
Why a Hybrid Python/Rust Backend?
This hybrid model offers the best of both worlds:
-
Python’s Strengths: Rich ecosystem, ease of use, and rapid development cycle make it ideal for the system’s orchestration layer. This includes the API server (FastAPI), strategy framework, and high-level business logic where developer velocity is paramount.
-
Rust’s Strengths: For performance-critical hot paths, Rust provides memory safety, fearless concurrency, and near-bare-metal speed. We delegate tasks like decoding blockchain transactions, aggregating order books, and running Dijkstra’s algorithm for route optimization to a Rust core compiled to a native Python module via PyO3.
Why Event Sourcing with CQRS?
In a system where auditability and reliability are paramount, simply overwriting data in a traditional CRUD database is insufficient and dangerous. We chose Event Sourcing because it provides a perfect, immutable log of everything that has ever happened in the system.
How it Works:
Instead of storing the current state of an Intent, we store the sequence of events that describe its history.
-
Traditional DB:
intentstable:id: "abc", status: "COMPLETED" -
Event Sourced System:
eventstable:id: 1, aggregate_id: "abc", type: "IntentSubmitted"id: 2, aggregate_id: "abc", type: "IntentValidated"id: 3, aggregate_id: "abc", type: "PlanCreated"id: 4, aggregate_id: "abc", type: "ExecutionCompleted"
Benefits:
- Full Audit Trail: We know exactly what happened and when
- Debugging Power: We can reproduce the exact state of the system at the time of a bug
- Temporal Queries: We can ask questions like “What did this portfolio look like yesterday?”
- Replayability: We can rebuild the state of any entity at any point in time
Why NATS for Real-Time Messaging?
While Kafka is a common choice for event streaming, NATS was selected for its simplicity, high performance, and low operational overhead, which is a better fit for the real-time, low-latency messaging patterns within a trading core.
Key Features:
- Simplicity and Performance: Incredibly lightweight and fast, designed for high-throughput, low-latency messaging
- Flexible Topologies: Supports various communication patterns out of the box:
- Pub/Sub: For broadcasting events to any interested service
- Request/Reply: For when one service needs a direct answer from another
- Queue Groups: For distributing work across a pool of services
- Resilience with JetStream: For critical events that must not be lost, providing at-least-once delivery guarantees
Why Server-Side Rendering (SSR) with a Real-Time Overlay?
To provide the best user experience, the frontend dashboard uses a two-stage loading process:
- SSR for Fast Initial Load: The Next.js server pre-renders the initial dashboard view with a snapshot of essential data
- Client-Side Real-Time Overlay: Once the initial page is loaded, the client connects to the WebSocket stream and hydrates the application with live data
Why a Dual-State Frontend?
Our frontend architecture explicitly separates two types of state:
- Live/Client State (Zustand): Data pushed from the server in real-time (e.g., live market prices, intent status)
- Server/Cache State (TanStack Query): Data fetched on-demand (e.g., historical intents)
The two systems work in synergy: real-time events can invalidate TanStack Query cache keys, ensuring the UI always displays fresh data.
The Infrastructure Stack
TimescaleDB (Event Store)
A PostgreSQL extension optimized for time-series data. It’s the perfect fit for our event store, as every event has a timestamp. It allows us to perform efficient time-based queries and manage our data lifecycle automatically.
Redis (Cache & Read Models)
A high-performance in-memory data store. We use it for two purposes: caching frequently accessed data (like asset details) and storing the CQRS read models for lightning-fast UI queries.
Ray.io (Distributed Compute)
A framework for scaling Python applications. We use Ray to distribute the computationally intensive work of the IntentProcessor, allowing us to process many intents in parallel across multiple cores or even multiple machines.
Docker (Containerization)
The entire platform, including its infrastructure dependencies, is containerized using Docker and orchestrated with Docker Compose for local development, ensuring a consistent and reproducible environment.
Component Breakdown Table
| Layer | Component | Responsibility | Technology |
|---|---|---|---|
| Strategy | Strategy Objects | Defines trading logic; generates Intent objects. | Python |
| Protocol | Intent Manager | Accepts, validates, queues, and tracks the lifecycle of intents. | Python |
| Event Stream | Manages publishing and subscribing to system-wide events. | NATS, Python | |
| Execution | Execution Planner | Decomposes intents into concrete ExecutionPlans; queries solvers. | Python, Rust |
| Execution Orchestrator | Executes plans step-by-step, interacting with market adapters. | Python | |
| Solver Network | A competitive network to find optimal execution paths. | Rust | |
| Market | Market Abstraction | Provides a unified interface to various market venues. | Python |
| Venue Adapters | Concrete implementations for interacting with DEXs, CEXs, etc. | Python | |
| Risk | Risk Engine | Performs pre-trade risk checks and portfolio-level analysis. | Python |
| State | State Coordinator | Persists events to the event store and updates read models. | Python, TimescaleDB, Redis |
| Settlement | Settlement Manager | Manages cross-chain settlement and reconciliation. | Python |
Next Steps
- Backend Engineers: Continue to Backend Engine to understand the core components and intent lifecycle
- Frontend Engineers: Jump to Frontend Dashboard to explore the UI architecture
- System Architects: Explore Integration & Operations for advanced concepts