Skip to content

Technical Architecture

CueMate adopts a modern microservices architecture design to achieve a high-performance, highly available, and easily extensible intelligent interview training tool.

1. Overall Architecture Diagram

User Layer
Desktop Installer
(Electron)
System Detection
Docker Management
Service Deployment
Version Update
Desktop Client
(Electron)
Audio Capture
Real-time Recognition
Floating Window
Local Storage
Main Window App
(React)
Account Management
System Config
Data Statistics
Knowledge Base
WebSocket/HTTP/HTTPS
Docker Service Management
Gateway Layer (Nginx)
Application Service Layer
Web API
(3001)
User Auth
Business Logic
Data Management
LLM Router
(3002)
Model Routing
Streaming Response
Fallback Strategy
RAG Service
(3003)
Doc Vectorization
Semantic Retrieval
Answer Enhancement
SQLite Database
User Data
Interview Records
KB Metadata
System Config
External LLM
OpenAI
Claude
Gemini
Chinese LLMs
Local Model
ChromaDB
Vector Database
Vector Index
Document Storage
Similarity Retrieval
Voice Recognition Service
cuemate-asr (10095)
Local Voice RecognitionChinese/English SupportReal-time Streaming
Documentation & Support

2. Layered Architecture

2.1 User Layer

2.1.1 Desktop Installer

macOS Platform:

Installation Package Types:

  • Online Package (~670MB): Requires network connection to pull Docker images during installation
  • Offline Package (~4.4GB): Includes all Docker images, ready to use out of the box

Core Responsibilities:

  1. Guide users through the initial installation
  2. Detect and install Docker Desktop
  3. Automatically deploy backend Docker services
  4. Manage system version updates

Workflow:

  1. Detect system environment (macOS version, chip architecture, available space)
  2. Select deployment mode (Local Mode/Distributed Mode)
  3. Check Docker Desktop status (not installed/installed/needs update)
  4. Detect port occupation (3001, 3002, 3003, 3004, 8000, 10095)
  5. Pull Docker images and start services (or configure remote server connection)
  6. Verify service health status

Windows Platform:

Under Development

The Windows version is currently under development, please stay tuned.

If you have any suggestions or requirements for the Windows version, feel free to provide feedback through:

2.1.2 Desktop Client

macOS Platform:

Core Features:

  1. Global shortcuts and floating window
  2. Microphone audio capture
  3. System audio capture (AudioTee)
  4. Real-time speech recognition display
  5. Local text-to-speech (Piper TTS)
  6. System tray integration

Data Storage:

  • Application data: ~/Library/Application Support/cuemate-desktop-client
  • SQLite database: ~/Library/Application Support/cuemate-desktop-client/data/sqlite/cuemate.db
  • Log files: ~/Library/Application Support/cuemate-desktop-client/data/logs

Windows Platform:

Under Development

The Windows version is currently under development, please stay tuned.

If you have any suggestions or requirements for the Windows version, feel free to provide feedback through:

2.1.3 Main Window Application

Core Features:

  1. User registration and login
  2. Model configuration management
  3. Knowledge base document upload
  4. Preset question bank management
  5. Interview record viewing
  6. System settings configuration
  7. Data statistics analysis

2.2 Gateway Layer

2.2.1 Nginx Reverse Proxy

Runtime: Docker container (cuemate-web)

Port: 3004

Responsibilities:

  1. Serve web frontend static files
  2. API request routing and forwarding
  3. WebSocket connection proxy

2.3 Application Service Layer

2.3.1 Web API Service

Runtime: Docker container (cuemate-web-api)

Port: 3001

Core Responsibilities:

  1. User authentication and authorization (JWT Token)
  2. Business logic processing
  3. Data persistence (SQLite)
  4. REST API interface provision

Main Functional Modules:

  1. User management (login, profile)
  2. Model configuration management (add, edit, delete, test)
  3. Knowledge base management (document upload, classification, retrieval)
  4. Interview record management (create, query, statistics)
  5. System settings management (notifications, theme, language)

2.3.2 LLM Router Service

Runtime: Docker container (cuemate-llm-router)

Port: 3002

Core Responsibilities:

  1. Unified LLM API interface
  2. Multi-model provider adaptation (24 providers)
  3. Streaming response handling (Server-Sent Events)
  4. Basic error handling and status monitoring
  5. Request timeout control

Supported LLM Providers: 24

  • International Providers: OpenAI, Azure OpenAI, Anthropic, Google Gemini, AWS Bedrock
  • Domestic Providers: Alibaba Cloud Bailian, Zhipu AI, Baichuan Intelligence, Baidu Qianfan, ByteDance Doubao, iFlytek Spark, Tencent Hunyuan, Tencent Cloud Knowledge Engine, Kimi, MiniMax, DeepSeek, SenseTime SenseNova, StepFun, SiliconFlow
  • Local Models: Ollama, vLLM, Xinference, Regolo

Working Mechanism:

  1. Receive provider and model parameters from frontend
  2. Select corresponding adapter based on provider
  3. Call the respective LLM API and return results
  4. Log error information and status on failure
  5. Support both streaming and non-streaming call modes

2.3.3 RAG Service

Runtime: Docker container (cuemate-rag-service)

Port: 3003

Core Responsibilities:

  1. Document parsing and chunking
  2. Text vectorization
  3. Semantic retrieval
  4. Answer enhancement generation
  5. Knowledge base version management

Workflow:

  1. Receive document upload (PDF, DOCS, Markdown, plain text)
  2. Extract text content and intelligent chunking
  3. Generate vectors using embedding model
  4. Store in ChromaDB vector database
  5. Provide semantic retrieval interface
  6. Enhance LLM answers with retrieval results

2.4 Data Layer

2.4.1 SQLite Database

Location: Host file system

Storage Path: ~/Library/Application Support/cuemate-desktop-client/data/sqlite/cuemate.db

Stored Content:

  1. User account information
  2. Model configuration information
  3. Interview record data
  4. Knowledge base metadata
  5. System configuration parameters

2.4.2 ChromaDB Vector Database

Runtime: Docker container (cuemate-chroma)

Port: 8000

Storage Method: Docker Volume (chroma_data)

Stored Content:

  1. Document vector indexes
  2. Document original content
  3. Document metadata
  4. Similarity retrieval cache

2.4.3 External LLM APIs

Call Method: HTTP/HTTPS API requests

Supported Services: 24 providers in total

International Providers (5):

  1. OpenAI (GPT series)
  2. Anthropic (Claude series)
  3. Azure OpenAI (Microsoft Azure hosted OpenAI models)
  4. Google Gemini (Google AI platform)
  5. AWS Bedrock (AWS multi-model hosting platform)

Domestic Providers (15):

  1. Moonshot (Kimi)
  2. Alibaba Cloud Bailian
  3. Qwen (Tongyi Qianwen)
  4. Zhipu AI (GLM series)
  5. DeepSeek
  6. Baidu Qianfan
  7. ByteDance Doubao (Volcengine)
  8. iFlytek Spark
  9. Tencent Hunyuan
  10. Tencent Cloud Knowledge Engine
  11. MiniMax
  12. StepFun
  13. SenseTime SenseNova
  14. Baichuan Intelligence
  15. SiliconFlow

Local Model Services (4):

  1. Ollama (local model runtime)
  2. vLLM (high-performance inference engine)
  3. Xinference (Xorbits inference framework)
  4. Regolo (local model service)

2.5 External Service Layer

2.5.1 Speech Recognition Service

Runtime: Docker container (cuemate-asr)

Port: 10095

Communication Protocol: WebSocket

Core Features:

  1. Runs locally, no cloud API required
  2. Supports Chinese and English recognition
  3. Real-time streaming recognition
  4. Low latency (< 200ms)

3. Data Flow

3.1 System Installation Flow

1. Download Installation Package
   User → Official Website/Baidu Netdisk/GitHub Releases → Download DMG

2. Launch Installer
   User → Open DMG → Drag to Applications folder → Launch

3. Environment Detection
   Installer → Detect system environment (macOS version, chip architecture, available space or Windows version, system architecture, available space)

4. Docker Detection
   Installer → Check Docker Desktop status → Guide installation if not installed

5. Port Detection
   Installer → Check 6 port occupations → Prompt to resolve conflicts

6. Service Deployment
   Installer → Pull Docker images → Start 6 containers

7. Health Check
   Installer → Verify service status → Show installation complete

8. Launch Application
   User → Open desktop client → Start using

3.2 Real-time Interview Training Flow

1. Audio Capture
   Desktop Client → Capture microphone/system audio

2. Speech Recognition
   Audio stream → WebSocket → cuemate-asr (10095) → Real-time text transcription

3. Question Understanding
   Transcribed text → LLM Router (3002) → Extract question intent

4. Knowledge Retrieval
   Question → RAG Service (3003) → ChromaDB (8000) → Relevant document fragments

5. Answer Generation
   Question + Context → LLM Router → External LLM API → Generate answer

6. Streaming Return
   LLM streaming output → Server-Sent Events → Desktop Client → Real-time display

7. Data Persistence
   Interview record → Web API (3001) → SQLite database → Save

3.3 Knowledge Base Management Flow

1. Document Upload
   User → Main Window Application (3004) → Upload PDF/Word/Markdown files

2. File Parsing
   File → Web API (3001) → Save to temporary directory

3. Document Processing
   File path → RAG Service (3003) → Extract text content

4. Intelligent Chunking
   Long text → Chunk by semantic boundaries → Maintain context integrity

5. Vectorization
   Text chunks → Embedding model → Generate vector representations

6. Store Index
   Vectors + Original text + Metadata → ChromaDB (8000) → Persistent storage

7. Metadata Management
   Document info (title, category, upload time) → Web API → SQLite database

8. Retrieval Validation
   Test query → Verify retrieval effectiveness → Adjust parameters

3.4 Version Update Flow

1. Check for Updates
   Desktop Client → Periodically check update server → Discover new version

2. Download Update Package
   Desktop Client → Download from CDN/GitHub → Save to temporary directory

3. Verify Integrity
   Update package → SHA256 verification → Ensure file integrity

4. Stop Services
   Installer → Stop Docker containers → Backup current configuration

5. Replace Files
   Installer → Extract update package → Replace application files and Docker images

6. Start Services
   Installer → Start new version containers → Verify health status

7. Data Migration
   Installer → Execute database migration scripts → Update version number

8. Restart Application
   Desktop Client → Restart → Display new version

4. Technical Features

4.1 Microservices Architecture

Advantages:

  1. Independent service deployment, no mutual interference
  2. Horizontal scaling on demand, add service instances
  3. Fault isolation, single service failure doesn't affect the whole
  4. Flexible technology stack, choose the most suitable language and framework as needed

4.2 Containerized Deployment

Implementation: Docker + Docker Compose

Advantages:

  1. One-click start all services
  2. Environment consistency guarantee (development, testing, production)
  3. Resource isolation and limits
  4. Fast rollback and version switching

Service List:

  1. cuemate-web (Web frontend + Nginx)
  2. cuemate-web-api (Business API)
  3. cuemate-llm-router (LLM router)
  4. cuemate-rag-service (Knowledge base retrieval)
  5. cuemate-asr (Speech recognition)
  6. cuemate-chroma (Vector database)

4.3 Deployment Modes

Local Mode:

  • Docker services and desktop client both run on local macOS
  • Suitable for personal users
  • Simple setup, works out of the box
  • Data stored locally, maximum privacy

Distributed Mode:

  • Docker services deployed to remote Linux server
  • Desktop client runs on local macOS
  • Connect via SSH (supports password or private key authentication)
  • Suitable for team sharing or high-performance requirements

Deployment Comparison:

FeatureLocal ModeDistributed Mode
Service LocationLocal macOSRemote Linux Server
Network RequirementLAN onlyInternet/Intranet
Hardware RequirementHigh (need Docker resources)Low (only run client)
Data LocationLocalServer
Suitable ForIndividual usersTeam sharing

4.4 Real-time Communication

Technical Solutions:

  1. WebSocket - Bidirectional real-time communication (speech recognition)
  2. Server-Sent Events - Unidirectional streaming push (LLM answers)
  3. HTTP/HTTPS - Standard API requests

Application Scenarios:

  1. Real-time speech-to-text (WebSocket)
  2. Streaming answer generation (SSE)
  3. Data query and modification (HTTP)

4.5 Hybrid Storage

Storage Solutions:

  1. SQLite - Structured data (users, configurations, records)
  2. ChromaDB - Vector data (document embeddings, semantic retrieval)
  3. File system - Log files, temporary files, uploaded files

Data Paths:

  • SQLite: ~/Library/Application Support/cuemate-desktop-client/data/sqlite/
  • ChromaDB: Docker Volume chroma_data
  • Logs: ~/Library/Application Support/cuemate-desktop-client/data/logs/

4.6 Security Design

Security Measures:

  1. API Key encrypted storage (AES-256)
  2. JWT Token authentication
  3. HTTPS encrypted transmission (production environment)
  4. Fine-grained permission control (RBAC)
  5. Sensitive data stored locally (not uploaded to cloud)

4.7 Observability

Monitoring Solutions:

  1. Unified log collection (categorized by level: info, warn, error)
  2. Service health checks (periodic service status probing)
  3. Error tracking and alerting (automatic exception logging)
  4. Usage statistics analysis (API call counts, token consumption)

5. Performance Optimization

5.1 Caching Strategy

Cached Content:

  1. LLM response cache (return directly for same questions, save costs)
  2. Vector retrieval result cache (accelerate repeated queries)
  3. Static file CDN acceleration (improve page load speed)

5.2 Asynchronous Processing

Asynchronous Tasks:

  1. Document vectorization (avoid blocking user operations)
  2. Log writing (batch writing, reduce IO)

5.3 Resource Limits

Limit Measures:

  1. Docker container resource quotas (CPU, memory)
  2. API request rate limiting (prevent abuse)
  3. File upload size limits (prevent server storage exhaustion)

6. Extensibility Design

6.1 Horizontal Scaling

Scalable Services:

  1. LLM Router - Deploy multiple instances, Nginx load balancing
  2. RAG Service - Deploy multiple instances, parallel processing
  3. Web API - Deploy multiple instances, distribute request load

6.2 Vertical Scaling

Scaling Methods:

  1. Increase Docker container resource quotas
  2. Use more powerful servers
  3. GPU acceleration (vectorization and model inference)

6.3 Plugin System

Extension Capabilities:

  1. Support integrating third-party LLM providers
  2. Support custom embedding models
  3. Support custom prompt templates
  4. Support custom speech recognition engines

7. Technical Acknowledgments

CueMate's creation would not have been possible without the support of numerous excellent open-source projects and technical communities. We express our sincere gratitude to all developers and teams who have contributed to these technologies!

7.1 Frontend Technologies

Web Application Frameworks and Build Tools:

  • React - Modern frontend framework open-sourced by Meta, making UI development more efficient
  • TypeScript - JavaScript superset developed by Microsoft, providing a powerful type system
  • Vite - Next-generation frontend build tool created by Evan You, excellent development experience
  • Ant Design - Enterprise-grade UI component library open-sourced by Ant Group, beautifully designed and fully featured
  • Tailwind CSS - Atomic CSS framework, making style development more flexible and efficient

Desktop Application Framework:

  • Electron - Cross-platform desktop application framework open-sourced by GitHub, making it possible to build native apps with web technologies

7.2 Backend Technologies

Runtime and Frameworks:

  • Node.js - High-performance JavaScript runtime, making backend development simpler
  • Fastify - Ultra-fast web framework, excellent performance and rich plugin ecosystem
  • pnpm - Fast, disk-space-efficient package manager

Databases:

  • SQLite - The world's most widely used embedded database engine
  • better-sqlite3 - High-performance Node.js SQLite3 binding library
  • ChromaDB - Open-source vector database providing semantic retrieval capabilities for AI applications

7.3 AI Services

Speech Recognition:

  • FunASR - Speech recognition toolkit open-sourced by Alibaba DAMO Academy, supporting real-time streaming recognition
  • Piper TTS - Fast local neural network text-to-speech system

Audio Processing:

Large Language Model Providers:

7.4 Development and Deployment Tools

Containerization and Orchestration:

  • Docker - Containerization platform, making application deployment simpler and more reliable
  • Docker Compose - Multi-container application orchestration tool
  • Nginx - High-performance web server and reverse proxy

Document Parsing:

7.5 Open Source Community

Special thanks to:

  • GitHub - Providing code hosting and collaboration platform
  • npm - JavaScript package management ecosystem
  • Stack Overflow - Developer community, solving countless technical challenges
  • All contributors who submit Issues and Pull Requests on GitHub

7.6 Acknowledgment Statement

CueMate is built on the shoulders of giants. Every maintainer of open-source projects, every developer contributing code, and every community member providing technical support are important forces enabling CueMate's creation.

Our commitments:

  1. Comply with all open-source project license requirements
  2. Actively participate in the open-source community, give back to the technology ecosystem
  3. Continuously improve CueMate to provide better service to users

Once again, our most sincere thanks to all open-source projects and technical communities!

Released under the GPL-3.0 License.