Technical Architecture

CueMate adopts a modern microservices architecture design to achieve a high-performance, highly available, and easily extensible intelligent interview training tool.

1. Overall Architecture Diagram

User Layer

Desktop Installer

(Electron)

System Detection

Docker Management

Service Deployment

Version Update

Desktop Client

(Electron)

Audio Capture

Real-time Recognition

Floating Window

Local Storage

Main Window App

(React)

Account Management

System Config

Data Statistics

Knowledge Base

WebSocket/HTTP/HTTPS

Docker Service Management

Gateway Layer (Nginx)

Application Service Layer

Web API

(3001)

User Auth

Business Logic

Data Management

LLM Router

(3002)

Model Routing

Streaming Response

Fallback Strategy

RAG Service

(3003)

Doc Vectorization

Semantic Retrieval

Answer Enhancement

SQLite Database

User Data

Interview Records

KB Metadata

System Config

External LLM

OpenAI

Claude

Gemini

Chinese LLMs

Local Model

ChromaDB

Vector Database

Vector Index

Document Storage

Similarity Retrieval

Voice Recognition Service

cuemate-asr (10095)

Local Voice RecognitionChinese/English SupportReal-time Streaming

Documentation & Support

Online Docs

User Guide Model Config FAQ

2. Layered Architecture

2.1 User Layer

2.1.1 Desktop Installer

macOS Platform:

Installation Package Types:

Online Package (~670MB): Requires network connection to pull Docker images during installation, suitable for environments with good network connectivity
Offline Package (~4.4GB): Includes all Docker images, ready to use out of the box, suitable for network-restricted environments

Core Responsibilities:

Guide users through the initial installation
Detect and install Docker Desktop
Automatically deploy backend Docker services (local or remote server)
Manage system version updates

Workflow:

Detect system environment (macOS version, chip architecture, available space)
Check Docker Desktop status (not installed/installed/needs update)
Detect port occupation (3001, 3002, 3003, 3004, 8000, 10095)
Select deployment mode (Local Mode/Distributed Mode)
Pull or load Docker images and start services
Verify service health status

Windows Platform:

Installation Package Types:

Online Package (~700MB): Requires network connection to pull Docker images during installation, suitable for environments with good network connectivity
Offline Package (~4.5GB): Includes all Docker images, ready to use out of the box, suitable for network-restricted environments

Core Responsibilities:

Guide users through the initial installation
Detect and install WSL 2 and Docker Desktop
Automatically deploy backend Docker services (local or remote server)
Manage system version updates
Support custom installation path

Workflow:

Detect system environment (Windows version, system architecture, available space)
Check WSL 2 status (not installed/installed/needs update)
Check Docker Desktop status (not installed/installed/needs update)
Select installation path (supports custom non-C: drive installation)
Detect port occupation (3001, 3002, 3003, 3004, 8000, 10095)
Select deployment mode (Local Mode/Distributed Mode)
Pull or load Docker images and start services
Verify service health status

2.1.2 Desktop Client

macOS Platform:

Core Features:

Global shortcuts and floating window
Microphone audio capture
System audio capture (AudioTee)
Real-time speech recognition display
Local text-to-speech (Piper TTS)
System tray integration

Data Storage:

Application data: ~/Library/Application Support/cuemate-desktop-client
SQLite database: ~/Library/Application Support/cuemate-desktop-client/data/sqlite/cuemate.db
Log files: ~/Library/Application Support/cuemate-desktop-client/data/logs

Windows Platform:

Core Features:

Global shortcuts and floating window
Microphone audio capture
System audio capture (Electron Audio Loopback)
Real-time speech recognition display
Local text-to-speech (Piper TTS)
System tray integration

Data Storage:

Application data: %APPDATA%\cuemate-desktop-client
SQLite database: %APPDATA%\cuemate-desktop-client\data\sqlite\cuemate.db
Log files: %APPDATA%\cuemate-desktop-client\data\logs

2.1.3 Main Window Application

Core Features:

User registration and login
Model configuration management
Knowledge base document upload
Preset question bank management
Interview record viewing
System settings configuration
Data statistics analysis

2.2 Gateway Layer

2.2.1 Nginx Reverse Proxy

Runtime: Docker container (cuemate-web)

Port: 3004

Responsibilities:

Serve web frontend static files
API request routing and forwarding
WebSocket connection proxy

2.3 Application Service Layer

2.3.1 Web API Service

Runtime: Docker container (cuemate-web-api)

Port: 3001

Core Responsibilities:

User authentication and authorization (JWT Token)
Business logic processing
Data persistence (SQLite)
REST API interface provision

Main Functional Modules:

User management (login, profile)
Model configuration management (add, edit, delete, test)
Knowledge base management (document upload, classification, retrieval)
Interview record management (create, query, statistics)
System settings management (notifications, theme, language)

2.3.2 LLM Router Service

Runtime: Docker container (cuemate-llm-router)

Port: 3002

Core Responsibilities:

Unified LLM API interface
Multi-model provider adaptation (24 providers)
Streaming response handling (Server-Sent Events)
Basic error handling and status monitoring
Request timeout control

Supported LLM Providers: 24

International Providers: OpenAI, Azure OpenAI, Anthropic, Google Gemini, AWS Bedrock
Domestic Providers: Alibaba Cloud Bailian, Zhipu AI, Baichuan Intelligence, Baidu Qianfan, ByteDance Doubao, iFlytek Spark, Tencent Hunyuan, Tencent Cloud Knowledge Engine, Kimi, MiniMax, DeepSeek, SenseTime SenseNova, StepFun, SiliconFlow
Local Models: Ollama, vLLM, Xinference, Regolo

Working Mechanism:

Receive provider and model parameters from frontend
Select corresponding adapter based on provider
Call the respective LLM API and return results
Log error information and status on failure
Support both streaming and non-streaming call modes

2.3.3 RAG Service

Runtime: Docker container (cuemate-rag-service)

Port: 3003

Core Responsibilities:

Document parsing and chunking
Text vectorization
Semantic retrieval
Answer enhancement generation
Knowledge base version management

Workflow:

Receive document upload (PDF, DOCS, Markdown, plain text)
Extract text content and intelligent chunking
Generate vectors using embedding model
Store in ChromaDB vector database
Provide semantic retrieval interface
Enhance LLM answers with retrieval results

2.4 Data Layer

2.4.1 SQLite Database

Location: Host file system

Storage Path:

macOS: ~/Library/Application Support/cuemate-desktop-client/data/sqlite/cuemate.db
Windows: %APPDATA%\cuemate-desktop-client\data\sqlite\cuemate.db

Stored Content:

User account information
Model configuration information
Interview record data
Knowledge base metadata
System configuration parameters

2.4.2 ChromaDB Vector Database

Runtime: Docker container (cuemate-chroma)

Port: 8000

Storage Method: Docker Volume (chroma_data)

Stored Content:

Document vector indexes
Document original content
Document metadata
Similarity retrieval cache

2.4.3 External LLM APIs

Call Method: HTTP/HTTPS API requests

Supported Services: 24 providers in total

International Providers (5):

OpenAI (GPT series)
Anthropic (Claude series)
Azure OpenAI (Microsoft Azure hosted OpenAI models)
Google Gemini (Google AI platform)
AWS Bedrock (AWS multi-model hosting platform)

Domestic Providers (15):

Moonshot (Kimi)
Alibaba Cloud Bailian
Qwen (Tongyi Qianwen)
Zhipu AI (GLM series)
DeepSeek
Baidu Qianfan
ByteDance Doubao (Volcengine)
iFlytek Spark
Tencent Hunyuan
Tencent Cloud Knowledge Engine
MiniMax
StepFun
SenseTime SenseNova
Baichuan Intelligence
SiliconFlow

Local Model Services (4):

Ollama (local model runtime)
vLLM (high-performance inference engine)
Xinference (Xorbits inference framework)
Regolo (local model service)

2.5 External Service Layer

2.5.1 Speech Recognition Service

Runtime: Docker container (cuemate-asr)

Port: 10095

Communication Protocol: WebSocket

Core Features:

Runs locally, no cloud API required
Supports Chinese and English recognition
Real-time streaming recognition
Low latency (< 200ms)

3. Data Flow

3.1 System Installation Flow

1. Download Installation Package
   User → Official Website/Baidu Netdisk/GitHub Releases → Download DMG

2. Launch Installer
   User → Open DMG → Drag to Applications folder → Launch

3. Environment Detection
   Installer → Detect system environment (macOS version, chip architecture, available space or Windows version, system architecture, available space)

4. Docker Detection
   Installer → Check Docker Desktop status → Guide installation if not installed

5. Port Detection
   Installer → Check 6 port occupations → Prompt to resolve conflicts

6. Service Deployment
   Installer → Pull Docker images → Start 6 containers

7. Health Check
   Installer → Verify service status → Show installation complete

8. Launch Application
   User → Open desktop client → Start using

3.2 Real-time Interview Training Flow

1. Audio Capture
   Desktop Client → Capture microphone/system audio

2. Speech Recognition
   Audio stream → WebSocket → cuemate-asr (10095) → Real-time text transcription

3. Question Understanding
   Transcribed text → LLM Router (3002) → Extract question intent

4. Knowledge Retrieval
   Question → RAG Service (3003) → ChromaDB (8000) → Relevant document fragments

5. Answer Generation
   Question + Context → LLM Router → External LLM API → Generate answer

6. Streaming Return
   LLM streaming output → Server-Sent Events → Desktop Client → Real-time display

7. Data Persistence
   Interview record → Web API (3001) → SQLite database → Save

3.3 Knowledge Base Management Flow

1. Document Upload
   User → Main Window Application (3004) → Upload PDF/Word/Markdown files

2. File Parsing
   File → Web API (3001) → Save to temporary directory

3. Document Processing
   File path → RAG Service (3003) → Extract text content

4. Intelligent Chunking
   Long text → Chunk by semantic boundaries → Maintain context integrity

5. Vectorization
   Text chunks → Embedding model → Generate vector representations

6. Store Index
   Vectors + Original text + Metadata → ChromaDB (8000) → Persistent storage

7. Metadata Management
   Document info (title, category, upload time) → Web API → SQLite database

8. Retrieval Validation
   Test query → Verify retrieval effectiveness → Adjust parameters

3.4 Version Update Flow

1. Check for Updates
   Desktop Client → Periodically check update server → Discover new version

2. Download Update Package
   Desktop Client → Download from CDN/GitHub → Save to temporary directory

3. Verify Integrity
   Update package → SHA256 verification → Ensure file integrity

4. Stop Services
   Installer → Stop Docker containers → Backup current configuration

5. Replace Files
   Installer → Extract update package → Replace application files and Docker images

6. Start Services
   Installer → Start new version containers → Verify health status

7. Data Migration
   Installer → Execute database migration scripts → Update version number

8. Restart Application
   Desktop Client → Restart → Display new version

4. Technical Features

4.1 Microservices Architecture

Advantages:

Independent service deployment, no mutual interference
Horizontal scaling on demand, add service instances
Fault isolation, single service failure doesn't affect the whole
Flexible technology stack, choose the most suitable language and framework as needed

4.2 Containerized Deployment

Implementation: Docker + Docker Compose

Advantages:

One-click start all services
Environment consistency guarantee (development, testing, production)
Resource isolation and limits
Fast rollback and version switching

Service List:

cuemate-web (Web frontend + Nginx)
cuemate-web-api (Business API)
cuemate-llm-router (LLM router)
cuemate-rag-service (Knowledge base retrieval)
cuemate-asr (Speech recognition)
cuemate-chroma (Vector database)

4.3 Deployment Modes

Local Mode:

Docker services run on local machine (macOS or Windows)
Desktop client directly accesses localhost
Suitable for personal users, data fully localized
No additional server resources required

Distributed Mode:

Docker services deployed to remote Linux server
Desktop client connects securely via SSH tunnel
Suitable for team sharing or high-performance requirements
Supports password or private key SSH authentication
Server configuration managed through system settings or tray menu

4.4 Real-time Communication

Technical Solutions:

WebSocket - Bidirectional real-time communication (speech recognition)
Server-Sent Events - Unidirectional streaming push (LLM answers)
HTTP/HTTPS - Standard API requests

Application Scenarios:

Real-time speech-to-text (WebSocket)
Streaming answer generation (SSE)
Data query and modification (HTTP)

4.5 Hybrid Storage

Storage Solutions:

SQLite - Structured data (users, configurations, records)
ChromaDB - Vector data (document embeddings, semantic retrieval)
File system - Log files, temporary files, uploaded files

Data Paths:

SQLite:
- macOS: ~/Library/Application Support/cuemate-desktop-client/data/sqlite/
- Windows: %APPDATA%\cuemate-desktop-client\data\sqlite\
ChromaDB: Docker Volume chroma_data
Logs:
- macOS: ~/Library/Application Support/cuemate-desktop-client/data/logs/
- Windows: %APPDATA%\cuemate-desktop-client\data\logs\

4.6 Security Design

Security Measures:

API Key encrypted storage (AES-256)
JWT Token authentication
HTTPS encrypted transmission (production environment)
Fine-grained permission control (RBAC)
Sensitive data stored locally (not uploaded to cloud)

4.7 Observability

Monitoring Solutions:

Unified log collection (categorized by level: info, warn, error)
Service health checks (periodic service status probing)
Error tracking and alerting (automatic exception logging)
Usage statistics analysis (API call counts, token consumption)

5. Performance Optimization

5.1 Caching Strategy

Cached Content:

LLM response cache (return directly for same questions, save costs)
Vector retrieval result cache (accelerate repeated queries)
Static file CDN acceleration (improve page load speed)

5.2 Asynchronous Processing

Asynchronous Tasks:

Document vectorization (avoid blocking user operations)
Log writing (batch writing, reduce IO)

5.3 Resource Limits

Limit Measures:

Docker container resource quotas (CPU, memory)
API request rate limiting (prevent abuse)
File upload size limits (prevent server storage exhaustion)

6. Extensibility Design

6.1 Horizontal Scaling

Scalable Services:

LLM Router - Deploy multiple instances, Nginx load balancing
RAG Service - Deploy multiple instances, parallel processing
Web API - Deploy multiple instances, distribute request load

6.2 Vertical Scaling

Scaling Methods:

Increase Docker container resource quotas
Use more powerful servers
GPU acceleration (vectorization and model inference)

6.3 Plugin System

Extension Capabilities:

Support integrating third-party LLM providers
Support custom embedding models
Support custom prompt templates
Support custom speech recognition engines

7. Technical Acknowledgments

CueMate's creation would not have been possible without the support of numerous excellent open-source projects and technical communities. We express our sincere gratitude to all developers and teams who have contributed to these technologies!

7.1 Frontend Technologies

Web Application Frameworks and Build Tools:

React - Modern frontend framework open-sourced by Meta, making UI development more efficient
TypeScript - JavaScript superset developed by Microsoft, providing a powerful type system
Vite - Next-generation frontend build tool created by Evan You, excellent development experience
Ant Design - Enterprise-grade UI component library open-sourced by Ant Group, beautifully designed and fully featured
Tailwind CSS - Atomic CSS framework, making style development more flexible and efficient

Desktop Application Framework:

Electron - Cross-platform desktop application framework open-sourced by GitHub, making it possible to build native apps with web technologies

7.2 Backend Technologies

Runtime and Frameworks:

Node.js - High-performance JavaScript runtime, making backend development simpler
Fastify - Ultra-fast web framework, excellent performance and rich plugin ecosystem
pnpm - Fast, disk-space-efficient package manager

Databases:

SQLite - The world's most widely used embedded database engine
better-sqlite3 - High-performance Node.js SQLite3 binding library
ChromaDB - Open-source vector database providing semantic retrieval capabilities for AI applications

7.3 AI Services

Speech Recognition:

FunASR - Speech recognition toolkit open-sourced by Alibaba DAMO Academy, supporting real-time streaming recognition
Piper TTS - Fast local neural network text-to-speech system

Audio Processing:

AudioTee - System audio capture tool

Large Language Model Providers:

Support from numerous domestic and international LLM providers

7.4 Development and Deployment Tools

Containerization and Orchestration:

Docker - Containerization platform, making application deployment simpler and more reliable
Docker Compose - Multi-container application orchestration tool
Nginx - High-performance web server and reverse proxy

Document Parsing:

pdf2json - PDF file parsing library
mammoth.js - Word document parsing library

7.5 Open Source Community

Special thanks to:

GitHub - Providing code hosting and collaboration platform
npm - JavaScript package management ecosystem
Stack Overflow - Developer community, solving countless technical challenges
All contributors who submit Issues and Pull Requests on GitHub

7.6 Acknowledgment Statement

CueMate is built on the shoulders of giants. Every maintainer of open-source projects, every developer contributing code, and every community member providing technical support are important forces enabling CueMate's creation.

Our commitments:

Comply with all open-source project license requirements
Actively participate in the open-source community, give back to the technology ecosystem
Continuously improve CueMate to provide better service to users

Once again, our most sincere thanks to all open-source projects and technical communities!

System Requirements - Learn about runtime environment requirements
Installation Guide - Start installing CueMate
Feature Introduction - Learn about core features
FAQ - Solve usage problems

Technical Architecture ​

1. Overall Architecture Diagram ​

2. Layered Architecture ​

2.1 User Layer ​

2.1.1 Desktop Installer ​

2.1.2 Desktop Client ​

2.1.3 Main Window Application ​

2.2 Gateway Layer ​

2.2.1 Nginx Reverse Proxy ​

2.3 Application Service Layer ​

2.3.1 Web API Service ​

2.3.2 LLM Router Service ​

2.3.3 RAG Service ​

2.4 Data Layer ​

2.4.1 SQLite Database ​

2.4.2 ChromaDB Vector Database ​

2.4.3 External LLM APIs ​

2.5 External Service Layer ​

2.5.1 Speech Recognition Service ​

3. Data Flow ​

3.1 System Installation Flow ​

3.2 Real-time Interview Training Flow ​

3.3 Knowledge Base Management Flow ​

3.4 Version Update Flow ​

4. Technical Features ​

4.1 Microservices Architecture ​

4.2 Containerized Deployment ​

4.3 Deployment Modes ​

4.4 Real-time Communication ​

4.5 Hybrid Storage ​

4.6 Security Design ​

4.7 Observability ​

5. Performance Optimization ​

5.1 Caching Strategy ​

5.2 Asynchronous Processing ​

5.3 Resource Limits ​

6. Extensibility Design ​

6.1 Horizontal Scaling ​

6.2 Vertical Scaling ​

6.3 Plugin System ​

7. Technical Acknowledgments ​

7.1 Frontend Technologies ​

7.2 Backend Technologies ​

7.3 AI Services ​

7.4 Development and Deployment Tools ​

7.5 Open Source Community ​

7.6 Acknowledgment Statement ​

Related Pages ​

Technical Architecture

1. Overall Architecture Diagram

2. Layered Architecture

2.1 User Layer

2.1.1 Desktop Installer

2.1.2 Desktop Client

2.1.3 Main Window Application

2.2 Gateway Layer

2.2.1 Nginx Reverse Proxy

2.3 Application Service Layer

2.3.1 Web API Service

2.3.2 LLM Router Service

2.3.3 RAG Service

2.4 Data Layer

2.4.1 SQLite Database

2.4.2 ChromaDB Vector Database

2.4.3 External LLM APIs

2.5 External Service Layer

2.5.1 Speech Recognition Service

3. Data Flow

3.1 System Installation Flow

3.2 Real-time Interview Training Flow

3.3 Knowledge Base Management Flow

3.4 Version Update Flow

4. Technical Features

4.1 Microservices Architecture

4.2 Containerized Deployment

4.3 Deployment Modes

4.4 Real-time Communication

4.5 Hybrid Storage

4.6 Security Design

4.7 Observability

5. Performance Optimization

5.1 Caching Strategy

5.2 Asynchronous Processing

5.3 Resource Limits

6. Extensibility Design

6.1 Horizontal Scaling

6.2 Vertical Scaling

6.3 Plugin System

7. Technical Acknowledgments

7.1 Frontend Technologies

7.2 Backend Technologies

7.3 AI Services

7.4 Development and Deployment Tools

7.5 Open Source Community

7.6 Acknowledgment Statement

Related Pages