┌─────────────────────────────────────────────────────────────────────┐
│ User Message Entry │
│ Feishu / QQ / NapCat / Web / CLI / Cron / !bang_command │
└─────────────┬───────────────────────────────────────────────────────┘
│ InboundMessage (cap=64)
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Agent.processMessage() │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ CommandRegistry│ → │ Pipeline │ → │ engine.Run() │ │
│ │ slash/! cmds │ │ middleware │ │ LLM ↔ Tool loop │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ │ │ │ │
│ │ system prompt context mgmt │
│ │ + memory injection ├─ Offload (Layer 1) │
│ │ + tool catalog ├─ Masking (Layer 2) │
│ │ + user message ├─ ContextEdit (L3) │
│ │ └─ Compress │
│ ▼ │
│ OutboundMessage → Dispatcher → Channel.Send() │
└─────────────────────────────────────────────────────────────────────┘
│
┌────────┴────────┐
▼ ▼
SQLite (.xbot/xbot.db) Vector DB (.xbot/vectordb/)
sessions/memory/config Archival memory embeddings
xbot has three independent executables:
| Binary | Entry File | Purpose |
|---|---|---|
| xbot (server) | main.go | Main service: LLM Agent engine + multi-channel message gateway |
| xbot-cli | cmd/xbot-cli/main.go | Standalone terminal chat UI (single user, direct filesystem access) |
| xbot-runner | cmd/runner/main.go | Remote sandbox runner: connects to server over WebSocket |
config.Load() → createLLM() → bus.NewMessageBus()
→ storage.MigrateIfNeeded() → OAuth init
→ tools.InitSandbox() → agent.New(cfg) ← sandbox initialized before Agent (sync.Once)
→ register tools → IndexGlobalTools()
→ channel.NewDispatcher(msgBus) → register channels
→ start Agent event loop and channels
xbot/
├── main.go # Server entry point
├── cmd/
│ ├── xbot-cli/ # Standalone CLI client
│ └── runner/ # Remote sandbox runner
├── config/ # Configuration system (JSON + env vars)
├── bus/ # Message bus (InboundMessage / OutboundMessage)
├── channel/ # Channel layer (abstract + implementations + Dispatcher)
├── agent/ # Agent core engine
├── llm/ # LLM abstraction (OpenAI / Anthropic / Proxy)
├── session/ # Multi-tenant session management
├── memory/ # Memory system (flat / letta)
├── storage/
│ ├── sqlite/ # SQLite data service layer
│ └── vectordb/ # Vector database (chromem-go)
├── tools/ # Tool system (including Worktree, config, tui_control)
├── plugin/ # Plugin system (RPC bridge, runtimes, extension points)
├── internal/
│ ├── runnerclient/ # Runner client (runner-side)
│ └── runnerproto/ # Runner protocol definitions
├── serverapp/ # Server core, RPC handlers
├── oauth/ # OAuth service
├── event/ # Event system (webhook receive + trigger routing)
├── cron/ # Scheduled task scheduler
├── crypto/ # Encryption utilities
├── prompt/ # Embedded prompt templates (go:embed)
├── version/ # Version info
├── agents/ # [runtime] Agent role definitions (~/.xbot/agents/), not a source package
└── web/ # Frontend static assets (React 19 + Vite + TailwindCSS 4 + Tiptap)
User Message
│
▼
Channel Implementation (feishu / qq / napcat / web / cli)
│ Parse platform message → construct bus.InboundMessage
│ Inject into bus.Inbound (cap=64 buffer)
▼
Agent.processMessage()
│ 1. Normalize SenderID (singleUser mode → "default")
│ 2. CommandRegistry.Match() — slash/! command dispatch
│ ├─ Concurrent()=true → goroutine (does not consume semaphore)
│ └─ Concurrent()=false → msgCh serial queue (cap=32)
│ 3. Get or create TenantSession
│ 4. Fetch message history + eager-save user message
│ 5. Pipeline.Run() → assemble system prompt + history + user message
│ 6. engine.Run()
▼
engine.Run() — core loop (see §2.2)
│
▼
OutboundMessage → bus.Outbound → Dispatcher.Run()
│
▼
Channel.Send(msg) → platform API
Interface:
type Channel interface {
Name() string
Start() error
Stop()
Send(msg bus.OutboundMessage) (string, error)
}
Dispatcher: reads from bus.Outbound, routes by msg.Channel to the corresponding Channel. Supports observer pattern and SendDirect() for synchronous sends.
| Channel | Protocol | Reconnect Strategy |
|---|---|---|
| Feishu | Lark SDK (WebSocket) | SDK managed |
| QQ Bot WebSocket | Exponential backoff (1s→60s), fast disconnect detection (3 in 5s → 60s cooldown), Intent downgrade, Resume support | |
| NapCat | OneBot 11 WebSocket | Exponential backoff (same as QQ), 30s long-connection reset |
| Web | HTTP + WebSocket | Client-side reconnect; server-side WS Ping 30s, ReadDeadline 60s, offline ringBuffer(50) |
| CLI | Terminal | N/A |
CommandRegistry (agent/command.go): linear scan matching in registration order, first match wins.
Bang commands (! prefix): execute shell commands directly inside the sandbox, marked Concurrent()=true (does not consume semaphore). Timeout 120s, output > 4000 chars written to file.
Agent (agent/agent.go) is the system core, holding:
bus— message bus referencemultiSession— multi-tenant session managementtools— tool registrycontextManager— context manager (interface, hot-swappable at runtime)offloadStore— large result offload storagemaskStore— observation mask storagecontextEditor— context editorllmFactory— user-custom LLM factoryskills/agents— skill/agent directoriesregistryManager— marketplace/sharingsettingsSvc— user settingsinteractiveSubAgents— interactive SubAgent session poolhookChain— tool execution hook chain (shared with SubAgents; replaced byagent/hooks.Manager)bgTaskMgr— background task managerpluginRegistry— plugin registry
engine.Run() entry in agent/engine.go is the unified agent loop, shared by main Agent and SubAgents. Internal sub-functions split into agent/engine_run.go (runState, compress, LLM calls, etc.), tool execution and SubAgent spawning in agent/engine_wire.go.
RunConfig differentiated injection:
- Main Agent: ToolExecutor=full (session MCP + hooks), ProgressNotifier=sendMessage, ContextManager=global, Memory=enabled, EnableReadWriteSplit=true
- SubAgent: ToolExecutor=simplified, ProgressNotifier=nil, ContextManager=independent phase1, Memory=optional
Core loop logic (engine.go):
for i := 0; i < maxIter; i++ {
① maybeCompress() ← 75% token threshold + 5 round cooldown
② LLM Generate() ← perAttemptCtx 120s independent timeout
③ No tool_calls → exit loop
④ Tool execution (read/write split parallel or serial)
⑤ MaybeOffload() ← Layer 1: large result offload
⑥ InvalidateStaleReads() ← detect stale Read offloads
⑦ PurgeStaleMessages() ← replace stale offloads with warnings
⑧ DynamicContext injection ← stale detection results
⑨ System Reminder injection ← strip old reminder each round
⑩ Incremental persist: Session.AddMessage()
}
Token count recovery: from in-memory lastPromptTokens (in-process), or from tenant_state table (after restart). Written back to DB + memory after Run completes.
Read/write split parallel (EnableReadWriteSplit):
- Phase 0: SubAgent concurrent (constrained by SubAgentSem)
- Phase 1: Read-only tools parallel (Read/Grep/Glob/WebSearch/ChatHistory, maxParallel=8)
- Phase 2: Write tools serial
Read-only tool set (hardcoded readOnlyTools map; new additions require manual updates).
Executed in priority order, assembling system prompt parts into the SystemParts map, finally concatenated by key dictionary order:
| Order | Middleware | Priority | SystemParts Key | Responsibility |
|---|---|---|---|---|
| 1 | SystemPromptMiddleware | 0 | 00_base | Render prompt.md (hot reload) |
| 2 | ProjectContextMiddleware | 5 | 04_global_context, 05_project_context | Inject global (~/.xbot/AGENTS.md) + project (.xbot/AGENTS.md) instructions |
| 3 | ChannelPromptMiddleware | 5 | channel-specific (e.g. 05_channel_xxx) | Inject channel-specific system prompt parts |
| 4 | SkillsCatalogMiddleware | 100 | 10_skills | Available skills catalog |
| 5 | AgentsCatalogMiddleware | 110 | 15_agents | Available agents catalog |
| 6 | PermissionControlMiddleware | 115 | 14_perm_control | Read/write/tool permission control |
| 7 | MemoryMiddleware | 120 | 20_memory | Long-term memory (Recall) |
| 8 | SenderInfoMiddleware | 130 | 30_sender | Sender name |
| 9 | LanguageMiddleware | 135 | 32_language | Language instructions |
| 10 | pluginEnricherMiddleware | 150 | plugin_enrichers | Plugin-enriched system prompt content |
| 11 | UserMessageMiddleware | 200 | — | Timestamp + guide text |
CacheHint: system prompt marked "static", converted to cache_control: {type: "ephemeral"} under Anthropic.
MultiTenantSession (session/multitenant.go): isolates sessions by (channel, chatID), internal cache supports 24h TTL expiry, no capacity limit.
TenantSession (session/tenant.go): holds tenantID, MemoryProvider, SessionMCPManager, CWD state.
Tenant isolation:
- IM users:
(channel, chatID)→ positive tenantID - SubAgent: SHA-256 hash → negative tenantID
One-shot SubAgent: created via bus.InboundMessage{Channel="agent", AllowedTools, SystemPrompt, RoleName}. Optional capabilities: memory, send_message, spawn_agent.
Interactive SubAgent: persistent sessions stored in sync.Map, supporting multi-turn dialog, 30 min inactivity auto-cleanup. Key format: channel:chatID/roleName[:instance].
type ChatMessage struct {
Role string // "system" / "user" / "assistant" / "tool"
Content string // LLM-visible content
ReasoningContent string // DeepSeek/OpenAI reasoning models
ToolCallID/Name/Arguments // Tool call/result identifiers
ToolCalls []ToolCall // Assistant's tool call list
Detail string // Full content (persisted to DB, used by Web/CLI display)
DisplayOnly bool // Display only (not loaded into LLM context)
CacheHint string // "static" = cacheable across requests
}
The four layers form a defense-in-depth strategy, triggered at different points in engine.Run():
Single tool result too large
↓ exceeds 2000 tokens or 10240 bytes
Layer 1: Offload → replace with 📂 [offload:ol_xxxx] + summary
↓ auto-marked STALE when file modified
Too many old tool results
↓ triggered after each compress (token reaches 60%)
Layer 2: Masking → replace with 📂 [masked:mk_xxxx] + summary
↓ grouped by consecutive tool groups, keep recent active groups
Overall context too large
↓ totalTokens >= 75% maxTokens
Layer 3: Compress → LLM structured compression
LLM-initiated editing
↓ any time
Layer 4: ContextEdit → precise delete/truncate/replace
- Trigger:
MaybeOffload()— single tool result exceeds threshold - Storage: filesystem
.xbot/offload_store/{sessionKey}/id.json - Summary generation: rule-based summary per tool type (Read/Grep/Shell/Glob/default)
- Read special: computes ContentHash (SHA-256) for stale detection
- Stale detection: re-read file after each tool execution, compare hash; mismatch → replace with expiry warning
- Recall:
offload_recalltool, supports pagination (offset/limit, max 16000 runes) - Anti-recursion: results from offload_recall and recall_masked are not offloaded again
- Read offset protection: Read results with offset/limit are not offloaded
- SubAgent path: SubAgent offload data stored under parent session directory (RootSessionKey)
- Trigger:
MaskOldToolResults()— called insidemaybeCompress()(token reaches 60%) - Grouping strategy: assistant[tool_calls] + consecutive tool messages = 1 group
- Retention strategy: keepGroups dynamically computed by ratio (3/5/8/12 groups)
- Active file protection: files touched in last 3 turns are not masked
- Collapse optimization: pure tool groups (assistant.Content empty) collapsed to a single message pair, reducing message count
- Storage: in-memory
ObservationMaskStore, FIFO eviction, dual limit (200 entries / 2MB chars) - Recall:
recall_maskedtool - Post-compress cleanup:
CleanOldEntries(cutoff)removes records prior to compression point
- Tool:
context_edit— called by LLM proactively - Operations:
delete_turn/delete/truncate/replace(regex support) - Safety: cannot edit system messages, cannot delete last 3 messages
Trigger conditions (four paths):
| Path | Condition | Method |
|---|---|---|
| Auto compress | totalTokens >= 75% maxTokens + 5 rounds cooldown | cm.Compress() |
| Input too long | LLM returns InputTooLong error | cm.ManualCompress() |
| Context window exceeded | finish_reason = context_window_exceeded | cm.Compress() |
| Manual /compress | User command | cm.ManualCompress() |
Phase 1 compression flow:
- Find tail cut point (last user/assistant message)
- Select messages to compress from tail backwards by token budget
- LLM call (max 10 multi-turn rounds, may call memory tools to organize memories)
- Build results: LLMView + SessionView (excluding system/display_only)
- Persistence:
Session.Clear()+ per-messageAddMessage(SessionView) - Cleanup: delete offload files and mask entries prior to compression point
SessionService: appends to session_messages table.
| Timing | Method |
|---|---|
| User message | eager-save before Run() |
| After each iteration | incremental persist messages[lastPersistedCount:] (skip system + strip reminder) |
| After compression | Session.Clear() + per-message AddMessage(SessionView) |
| After masking | UpdateMessageContent(idx) in-place update |
| After cancel | save interrupted iteration progress |
| After Run() returns | save final assistant reply |
GetHistory: uses user messages as turn boundaries, fetches messages starting from the Nth user message.
Single SQLite file, WAL mode, MaxOpenConns=1 (SQLite single-writer model), BusyTimeout=5000ms. Schema Version 21.
| Table | Purpose |
|---|---|
tenants | Tenants (channel+chatID → tenantID) |
session_messages | Session message history |
tenant_state | Tenant state (last_consolidated, token counts) |
long_term_memory | Long-term memory (flat mode) |
event_history | Event history (Letta recall, FTS5 index) |
user_profiles | User profiles |
core_memory_blocks | Core memory blocks (Letta persona/human/working_context) |
archival_memory | Archival memory (vector index BLOB) |
cron_jobs | Scheduled tasks |
runners | Runner management |
shared_registry | Skill/Agent marketplace |
web_users | Web user authentication |
user_settings | User settings |
user_llm_configs | User custom LLM configuration |
user_token_usage | User token usage statistics |
schema_version | Schema version |
Data migration: storage.MigrateIfNeeded() tracks version via schema_version table, executing incremental migrations step by step.
DB path: Server: {workDir}/.xbot/xbot.db, CLI: ~/.xbot/xbot.db. Data is not shared between the two modes.
MemoryProvider interface:
type MemoryProvider interface {
Recall(ctx, query) (string, error)
Memorize(ctx, MemorizeInput) (MemorizeResult, error)
Close() error
}
| Mode | Recall | Memorize | Use Case |
|---|---|---|---|
flat | Returns long_term_memory full text directly | LLM compresses conversation → writes to long_term_memory | Simple scenarios |
letta | Returns Core Memory structured blocks | LLM rethink → update Core + write to Archival | Complex long-term memory |
Memorize trigger timing (⚠️ NOT triggered after every conversation):
/newcommand (session switch) —ArchiveAll=true- SubAgent exit —
ArchiveAll=true - During compression, called by LLM via memory tools
Letta Memory:
- Core Memory: 3 structured blocks (persona / human / working_context), injected into system prompt
- Archival Memory: embedding vector storage, retrieved on-demand via tools, with deduplication (search similarity > 0.5)
- Recall Memory: time-range-based session history search (FTS5)
Uses chromem-go embedded vector database, persisted to .xbot/vectordb/. Supports OpenAI and Ollama embedding providers, with automatic truncation of overlong content.
| Category | Tools |
|---|---|
| File Operations | Read, FileCreate, FileReplace, Glob, Grep, Cd, DownloadFile |
| Command Execution | Shell |
| Search | WebSearch, ChatHistory |
| Memory | memory_write, memory_list (flat), core_memory_, archival_memory_, recall_memory_search (letta) |
| Context Management | offload_recall, recall_masked, context_edit |
| SubAgent & Collaboration | SubAgent (one-shot + Interactive), CreateChat, SendMessage, Worktree |
| Skill/Agent | Skill, ManageTools, search_tools |
| AI-Native Config | config (AI reads/writes config), tui_control (AI operates TUI) |
| Cards | card_create, card_add_content, card_add_interactive, card_add_container, card_preview, card_send |
| Tasks | todo_write/todo_list (cross-session persistent), Cron, task_status, task_kill, task_read |
| Management | Logs, AskUser, oauth_authorize, EventTrigger |
| Feishu MCP | 20+ tools (Bitable, Knowledge Base, Docs, Drive) |
The old ToolHook/HookChain has been replaced by Manager in agent/hooks/.
17 lifecycle events: SessionStart, SessionEnd, UserPromptSubmit, PreToolUse, PostToolUse, PostToolUseFailure, PostToolBatch, PermissionRequest, PermissionDenied, SubAgentStart, SubAgentStop, AgentStop, AgentError, PreCompact, PostCompact, CronFired, WebhookReceived
4 handler types: command (shell command), http (HTTP POST), mcp_tool (MCP tool call), callback (Go function)
Decision priority (multi-handler conflicts): deny > defer > ask > allow
Configuration: three-layer merge (~/.xbot/hooks.json → <project>/.xbot/hooks.json → <project>/.xbot/hooks.local.json), project-level can be committed to git.
coreTools vs sessionActivated:
| Attribute | coreTools | sessionActivated |
|---|---|---|
| Appear in tool definitions | Always | When activated and not expired |
| Expiry | Never | Auto-expire after 5 consecutive rounds unused |
| Flat mode | Same as above | Register = RegisterCore, never expires |
TickSession(): increment round on each new user messageTouchTool(): refresh active time before tool executionDeactivateSession(): cleanup when session evicted from cache
Global MCP: GetMCPCatalog() tool name prefix mcp_{server}_{tool}.
Session MCP: SessionMCPManager per-user independent, lazy-loaded (connect on first access), unload after 30 min inactivity.
Stub mechanism: unactivated MCP tools return Parameters() = nil (LLM invisible); activated = expose full schema. search_tools can fetch full schema for semantic search.
Sandbox interface: Name, Workspace, Exec, ReadFile, WriteFile, Stat, ReadDir, MkdirAll, Close.
SandboxRouter (tools/sandbox_router.go): unified sandbox entry, routes per-user to different backends. Implements Sandbox and SandboxResolver interfaces. Routing rules (per-user, determined by user_settings.active_runner):
active_runner == "__docker__"→ DockerSandbox (if enabled)active_runner == specific remote runner name→ corresponding RemoteSandbox connection (if connected)- Fallback: Remote → Docker → None
Supports simultaneously holding Docker and Remote instances (dual-mode), routes independently per user. Users can switch active runner in settings panel.
Routing strategy: Remote Runner → Docker → None priority.
Remote Runner / Multi-Runner: RemoteSandbox supports multiple simultaneous runner connections, each with independent name and token. Runners connect via WebSocket, token auth (subtle.ConstantTimeCompare), runners table manages tokens. Users select runner by name in settings (active_runner). Sync configuration supports syncing skills/agents directories to runner side.
DockerSandbox: per-user independent container, stop (not rm) for reuse on next session, DinD mode path translation.
RemoteSandbox: WebSocket server waits for runner connections, token auth (subtle.ConstantTimeCompare), request-response pattern, supports stdio streaming output and runner-local LLM.
1. .env file (godotenv) ← lowest
2. config.json (LoadFromFile, merge non-zero fields)
3. Env var overrides (applyEnvOverrides) ← highest
func XbotHome() string {
dir := os.Getenv("XBOT_HOME") // priority 1: env var
if dir == "" {
dir = filepath.Join(home, ".xbot") // priority 2: ~/.xbot
}
os.MkdirAll(dir, 0o755)
return dir
}
Used for: config.json, xbot.db, offload_store/, skills/, agents/.
- User settings:
user_settingstable, modified via Web/Feishu/TUI settings panels - LLM subscriptions:
user_llm_subscriptionstable (Server mode) orconfig.jsonsubscriptionsarray (CLI mode) — single source of truth for LLM config. Supports Model Tiers (vanguard/balance/swift) and runtime switching - AI-Native config:
configtool lets Agent read/write config directly;tui_controltool lets Agent operate the TUI - Context mode: hot-swappable at runtime (
SetContextMode) - MaxConcurrency: adjustable at runtime (settings panel)
| Parameter | Default | Env Var |
|---|---|---|
| Max attempts | 5 | LLM_RETRY_ATTEMPTS |
| Initial delay | 1s | LLM_RETRY_DELAY |
| Max delay | 30s | LLM_RETRY_MAX_DELAY |
| Single call timeout | 120s | LLM_RETRY_TIMEOUT |
- Backoff: exponential backoff + random jitter + 429 extra backoff (
2^(min(n,4))s) - perAttemptCtx: each retry creates a fresh 120s timeout context, does not inherit parent ctx deadline, but propagates cancellation signal
- Retryable: 429 / 5xx / network errors / timeout; non-retryable: context.Canceled / 4xx
- Input too long: detects
InputTooLong, forces compression, bypasses retry framework - Retry notification: pushes progress message via
RetryNotifyFunc:⚠️ LLM request failed (rate limit), retrying 2/5 ...
| LLM Type | Default Concurrency Cap |
|---|---|
| Global LLM | 5 |
| User Custom LLM | 3 |
| SubAgent | 3 |
Controlled by LLMSemaphoreManager per-tenant semaphore. Released immediately after Generate completes (no defer).
- execErr: error converted to tool message
"Error: <details>\nPlease fix the issue and try again.", loop continues - IsError: hint
"Do NOT retry the same command. Analyze the error, fix the root cause." - Timeout: Shell default 120s (max 600s); on timeout auto-converts to background task (BgTaskManager.Adopt/Start)
- OAuth: matches OAuth flow, returns alternative content, enters wait-for-user-interaction state
Inbound/Outbound both have cap=64 buffered channels. No backpressure handling — senders block when full. Web channel’s sendToClient() uses select + default non-blocking write, degrading to offline ringBuffer(50) when full.
Tokens stored as plaintext by default (DB colocated with server process). If XBOT_ENCRYPTION_KEY is set, encryption still applies (AES-256-GCM, backward compatible with old data).
Encrypted data: OAuth access_token/refresh_token, user LLM api_key.
| Mechanism | Implementation |
|---|---|
| Web login | bcrypt hash, in-memory sessions (30 days), HttpOnly + SameSite=Lax |
| Runner token | 256-bit random, subtle.ConstantTimeCompare, dual verification (tokenStore + authToken) |
| Admin | Single ADMIN_CHAT_ID (fallback to STARTUP_NOTIFY_CHAT_ID), Logs tool permission check |
| AllowFrom | Exact match whitelist (comma-separated), empty = allow all |
| OAuth CSRF | 16-byte random state token |
- PathGuard (
internal/runnerclient/pathguard.go):filepath.EvalSymlinksresolve symlinks then check prefix - Docker container names: regex validation
^[a-z0-9][a-z0-9_.-]{0,127}$ - Process groups:
Setpgid=true, kill entire process group on timeout
Layer 1: Global concurrency semaphore (MaxConcurrency=3)
Layer 2: Per-Chat queue (cap=32, drop when full)
Layer 2.5: Per-user semaphore (custom LLM users, cap=1)
Layer 3: LLM Per-Tenant semaphore (Global=5, Personal=3, SubAgent=3)
Layer 4: LLM retry + single timeout (120s × 5 = max 600s)
Layer 5: Tool timeout (120s) + background task (24h safety ceiling)
| Component | Timeout |
|---|---|
| Shell tool | default 120s, max 600s |
| Docker commands | 30s (slow ops 120s) |
| Remote exec | 60s + 5s |
| Remote WebSocket | Ping 30s, Pong 60s |
| Web HTTP | Read 10s, Write 60s, Idle 120s |
| Web WebSocket | Ping 30s, ReadDeadline 60s |
| Background tasks | 24h safety ceiling, 50KB output truncation |
| LLM single call | 120s |
| LLM single call (Anthropic HTTP Client) | 300s |
- Development Guide — how to contribute
- Configuration — all config fields
- Built-in Tools — tool system details