Technical Architecture
CueMate adopts a modern microservices architecture design to achieve a high-performance, highly available, and easily extensible intelligent interview training tool.
1. Overall Architecture Diagram
2. Layered Architecture
2.1 User Layer
2.1.1 Desktop Installer
macOS Platform:
Installation Package Types:
- Online Package (~670MB): Requires network connection to pull Docker images during installation
- Offline Package (~4.4GB): Includes all Docker images, ready to use out of the box
Core Responsibilities:
- Guide users through the initial installation
- Detect and install Docker Desktop
- Automatically deploy backend Docker services
- Manage system version updates
Workflow:
- Detect system environment (macOS version, chip architecture, available space)
- Select deployment mode (Local Mode/Distributed Mode)
- Check Docker Desktop status (not installed/installed/needs update)
- Detect port occupation (3001, 3002, 3003, 3004, 8000, 10095)
- Pull Docker images and start services (or configure remote server connection)
- Verify service health status
Windows Platform:
Under Development
The Windows version is currently under development, please stay tuned.
If you have any suggestions or requirements for the Windows version, feel free to provide feedback through:
- GitHub Issues: https://github.com/cuemate-chat/cuemate/issues
- Email: nuneatonhydroplane@gmail.com
2.1.2 Desktop Client
macOS Platform:
Core Features:
- Global shortcuts and floating window
- Microphone audio capture
- System audio capture (AudioTee)
- Real-time speech recognition display
- Local text-to-speech (Piper TTS)
- System tray integration
Data Storage:
- Application data:
~/Library/Application Support/cuemate-desktop-client - SQLite database:
~/Library/Application Support/cuemate-desktop-client/data/sqlite/cuemate.db - Log files:
~/Library/Application Support/cuemate-desktop-client/data/logs
Windows Platform:
Under Development
The Windows version is currently under development, please stay tuned.
If you have any suggestions or requirements for the Windows version, feel free to provide feedback through:
- GitHub Issues: https://github.com/cuemate-chat/cuemate/issues
- Email: nuneatonhydroplane@gmail.com
2.1.3 Main Window Application
Core Features:
- User registration and login
- Model configuration management
- Knowledge base document upload
- Preset question bank management
- Interview record viewing
- System settings configuration
- Data statistics analysis
2.2 Gateway Layer
2.2.1 Nginx Reverse Proxy
Runtime: Docker container (cuemate-web)
Port: 3004
Responsibilities:
- Serve web frontend static files
- API request routing and forwarding
- WebSocket connection proxy
2.3 Application Service Layer
2.3.1 Web API Service
Runtime: Docker container (cuemate-web-api)
Port: 3001
Core Responsibilities:
- User authentication and authorization (JWT Token)
- Business logic processing
- Data persistence (SQLite)
- REST API interface provision
Main Functional Modules:
- User management (login, profile)
- Model configuration management (add, edit, delete, test)
- Knowledge base management (document upload, classification, retrieval)
- Interview record management (create, query, statistics)
- System settings management (notifications, theme, language)
2.3.2 LLM Router Service
Runtime: Docker container (cuemate-llm-router)
Port: 3002
Core Responsibilities:
- Unified LLM API interface
- Multi-model provider adaptation (24 providers)
- Streaming response handling (Server-Sent Events)
- Basic error handling and status monitoring
- Request timeout control
Supported LLM Providers: 24
- International Providers: OpenAI, Azure OpenAI, Anthropic, Google Gemini, AWS Bedrock
- Domestic Providers: Alibaba Cloud Bailian, Zhipu AI, Baichuan Intelligence, Baidu Qianfan, ByteDance Doubao, iFlytek Spark, Tencent Hunyuan, Tencent Cloud Knowledge Engine, Kimi, MiniMax, DeepSeek, SenseTime SenseNova, StepFun, SiliconFlow
- Local Models: Ollama, vLLM, Xinference, Regolo
Working Mechanism:
- Receive provider and model parameters from frontend
- Select corresponding adapter based on provider
- Call the respective LLM API and return results
- Log error information and status on failure
- Support both streaming and non-streaming call modes
2.3.3 RAG Service
Runtime: Docker container (cuemate-rag-service)
Port: 3003
Core Responsibilities:
- Document parsing and chunking
- Text vectorization
- Semantic retrieval
- Answer enhancement generation
- Knowledge base version management
Workflow:
- Receive document upload (PDF, DOCS, Markdown, plain text)
- Extract text content and intelligent chunking
- Generate vectors using embedding model
- Store in ChromaDB vector database
- Provide semantic retrieval interface
- Enhance LLM answers with retrieval results
2.4 Data Layer
2.4.1 SQLite Database
Location: Host file system
Storage Path: ~/Library/Application Support/cuemate-desktop-client/data/sqlite/cuemate.db
Stored Content:
- User account information
- Model configuration information
- Interview record data
- Knowledge base metadata
- System configuration parameters
2.4.2 ChromaDB Vector Database
Runtime: Docker container (cuemate-chroma)
Port: 8000
Storage Method: Docker Volume (chroma_data)
Stored Content:
- Document vector indexes
- Document original content
- Document metadata
- Similarity retrieval cache
2.4.3 External LLM APIs
Call Method: HTTP/HTTPS API requests
Supported Services: 24 providers in total
International Providers (5):
- OpenAI (GPT series)
- Anthropic (Claude series)
- Azure OpenAI (Microsoft Azure hosted OpenAI models)
- Google Gemini (Google AI platform)
- AWS Bedrock (AWS multi-model hosting platform)
Domestic Providers (15):
- Moonshot (Kimi)
- Alibaba Cloud Bailian
- Qwen (Tongyi Qianwen)
- Zhipu AI (GLM series)
- DeepSeek
- Baidu Qianfan
- ByteDance Doubao (Volcengine)
- iFlytek Spark
- Tencent Hunyuan
- Tencent Cloud Knowledge Engine
- MiniMax
- StepFun
- SenseTime SenseNova
- Baichuan Intelligence
- SiliconFlow
Local Model Services (4):
- Ollama (local model runtime)
- vLLM (high-performance inference engine)
- Xinference (Xorbits inference framework)
- Regolo (local model service)
2.5 External Service Layer
2.5.1 Speech Recognition Service
Runtime: Docker container (cuemate-asr)
Port: 10095
Communication Protocol: WebSocket
Core Features:
- Runs locally, no cloud API required
- Supports Chinese and English recognition
- Real-time streaming recognition
- Low latency (< 200ms)
3. Data Flow
3.1 System Installation Flow
1. Download Installation Package
User → Official Website/Baidu Netdisk/GitHub Releases → Download DMG
2. Launch Installer
User → Open DMG → Drag to Applications folder → Launch
3. Environment Detection
Installer → Detect system environment (macOS version, chip architecture, available space or Windows version, system architecture, available space)
4. Docker Detection
Installer → Check Docker Desktop status → Guide installation if not installed
5. Port Detection
Installer → Check 6 port occupations → Prompt to resolve conflicts
6. Service Deployment
Installer → Pull Docker images → Start 6 containers
7. Health Check
Installer → Verify service status → Show installation complete
8. Launch Application
User → Open desktop client → Start using3.2 Real-time Interview Training Flow
1. Audio Capture
Desktop Client → Capture microphone/system audio
2. Speech Recognition
Audio stream → WebSocket → cuemate-asr (10095) → Real-time text transcription
3. Question Understanding
Transcribed text → LLM Router (3002) → Extract question intent
4. Knowledge Retrieval
Question → RAG Service (3003) → ChromaDB (8000) → Relevant document fragments
5. Answer Generation
Question + Context → LLM Router → External LLM API → Generate answer
6. Streaming Return
LLM streaming output → Server-Sent Events → Desktop Client → Real-time display
7. Data Persistence
Interview record → Web API (3001) → SQLite database → Save3.3 Knowledge Base Management Flow
1. Document Upload
User → Main Window Application (3004) → Upload PDF/Word/Markdown files
2. File Parsing
File → Web API (3001) → Save to temporary directory
3. Document Processing
File path → RAG Service (3003) → Extract text content
4. Intelligent Chunking
Long text → Chunk by semantic boundaries → Maintain context integrity
5. Vectorization
Text chunks → Embedding model → Generate vector representations
6. Store Index
Vectors + Original text + Metadata → ChromaDB (8000) → Persistent storage
7. Metadata Management
Document info (title, category, upload time) → Web API → SQLite database
8. Retrieval Validation
Test query → Verify retrieval effectiveness → Adjust parameters3.4 Version Update Flow
1. Check for Updates
Desktop Client → Periodically check update server → Discover new version
2. Download Update Package
Desktop Client → Download from CDN/GitHub → Save to temporary directory
3. Verify Integrity
Update package → SHA256 verification → Ensure file integrity
4. Stop Services
Installer → Stop Docker containers → Backup current configuration
5. Replace Files
Installer → Extract update package → Replace application files and Docker images
6. Start Services
Installer → Start new version containers → Verify health status
7. Data Migration
Installer → Execute database migration scripts → Update version number
8. Restart Application
Desktop Client → Restart → Display new version4. Technical Features
4.1 Microservices Architecture
Advantages:
- Independent service deployment, no mutual interference
- Horizontal scaling on demand, add service instances
- Fault isolation, single service failure doesn't affect the whole
- Flexible technology stack, choose the most suitable language and framework as needed
4.2 Containerized Deployment
Implementation: Docker + Docker Compose
Advantages:
- One-click start all services
- Environment consistency guarantee (development, testing, production)
- Resource isolation and limits
- Fast rollback and version switching
Service List:
- cuemate-web (Web frontend + Nginx)
- cuemate-web-api (Business API)
- cuemate-llm-router (LLM router)
- cuemate-rag-service (Knowledge base retrieval)
- cuemate-asr (Speech recognition)
- cuemate-chroma (Vector database)
4.3 Deployment Modes
Local Mode:
- Docker services and desktop client both run on local macOS
- Suitable for personal users
- Simple setup, works out of the box
- Data stored locally, maximum privacy
Distributed Mode:
- Docker services deployed to remote Linux server
- Desktop client runs on local macOS
- Connect via SSH (supports password or private key authentication)
- Suitable for team sharing or high-performance requirements
Deployment Comparison:
| Feature | Local Mode | Distributed Mode |
|---|---|---|
| Service Location | Local macOS | Remote Linux Server |
| Network Requirement | LAN only | Internet/Intranet |
| Hardware Requirement | High (need Docker resources) | Low (only run client) |
| Data Location | Local | Server |
| Suitable For | Individual users | Team sharing |
4.4 Real-time Communication
Technical Solutions:
- WebSocket - Bidirectional real-time communication (speech recognition)
- Server-Sent Events - Unidirectional streaming push (LLM answers)
- HTTP/HTTPS - Standard API requests
Application Scenarios:
- Real-time speech-to-text (WebSocket)
- Streaming answer generation (SSE)
- Data query and modification (HTTP)
4.5 Hybrid Storage
Storage Solutions:
- SQLite - Structured data (users, configurations, records)
- ChromaDB - Vector data (document embeddings, semantic retrieval)
- File system - Log files, temporary files, uploaded files
Data Paths:
- SQLite:
~/Library/Application Support/cuemate-desktop-client/data/sqlite/ - ChromaDB: Docker Volume
chroma_data - Logs:
~/Library/Application Support/cuemate-desktop-client/data/logs/
4.6 Security Design
Security Measures:
- API Key encrypted storage (AES-256)
- JWT Token authentication
- HTTPS encrypted transmission (production environment)
- Fine-grained permission control (RBAC)
- Sensitive data stored locally (not uploaded to cloud)
4.7 Observability
Monitoring Solutions:
- Unified log collection (categorized by level: info, warn, error)
- Service health checks (periodic service status probing)
- Error tracking and alerting (automatic exception logging)
- Usage statistics analysis (API call counts, token consumption)
5. Performance Optimization
5.1 Caching Strategy
Cached Content:
- LLM response cache (return directly for same questions, save costs)
- Vector retrieval result cache (accelerate repeated queries)
- Static file CDN acceleration (improve page load speed)
5.2 Asynchronous Processing
Asynchronous Tasks:
- Document vectorization (avoid blocking user operations)
- Log writing (batch writing, reduce IO)
5.3 Resource Limits
Limit Measures:
- Docker container resource quotas (CPU, memory)
- API request rate limiting (prevent abuse)
- File upload size limits (prevent server storage exhaustion)
6. Extensibility Design
6.1 Horizontal Scaling
Scalable Services:
- LLM Router - Deploy multiple instances, Nginx load balancing
- RAG Service - Deploy multiple instances, parallel processing
- Web API - Deploy multiple instances, distribute request load
6.2 Vertical Scaling
Scaling Methods:
- Increase Docker container resource quotas
- Use more powerful servers
- GPU acceleration (vectorization and model inference)
6.3 Plugin System
Extension Capabilities:
- Support integrating third-party LLM providers
- Support custom embedding models
- Support custom prompt templates
- Support custom speech recognition engines
7. Technical Acknowledgments
CueMate's creation would not have been possible without the support of numerous excellent open-source projects and technical communities. We express our sincere gratitude to all developers and teams who have contributed to these technologies!
7.1 Frontend Technologies
Web Application Frameworks and Build Tools:
- React - Modern frontend framework open-sourced by Meta, making UI development more efficient
- TypeScript - JavaScript superset developed by Microsoft, providing a powerful type system
- Vite - Next-generation frontend build tool created by Evan You, excellent development experience
- Ant Design - Enterprise-grade UI component library open-sourced by Ant Group, beautifully designed and fully featured
- Tailwind CSS - Atomic CSS framework, making style development more flexible and efficient
Desktop Application Framework:
- Electron - Cross-platform desktop application framework open-sourced by GitHub, making it possible to build native apps with web technologies
7.2 Backend Technologies
Runtime and Frameworks:
- Node.js - High-performance JavaScript runtime, making backend development simpler
- Fastify - Ultra-fast web framework, excellent performance and rich plugin ecosystem
- pnpm - Fast, disk-space-efficient package manager
Databases:
- SQLite - The world's most widely used embedded database engine
- better-sqlite3 - High-performance Node.js SQLite3 binding library
- ChromaDB - Open-source vector database providing semantic retrieval capabilities for AI applications
7.3 AI Services
Speech Recognition:
- FunASR - Speech recognition toolkit open-sourced by Alibaba DAMO Academy, supporting real-time streaming recognition
- Piper TTS - Fast local neural network text-to-speech system
Audio Processing:
- AudioTee - System audio capture tool
Large Language Model Providers:
7.4 Development and Deployment Tools
Containerization and Orchestration:
- Docker - Containerization platform, making application deployment simpler and more reliable
- Docker Compose - Multi-container application orchestration tool
- Nginx - High-performance web server and reverse proxy
Document Parsing:
- pdf2json - PDF file parsing library
- mammoth.js - Word document parsing library
7.5 Open Source Community
Special thanks to:
- GitHub - Providing code hosting and collaboration platform
- npm - JavaScript package management ecosystem
- Stack Overflow - Developer community, solving countless technical challenges
- All contributors who submit Issues and Pull Requests on GitHub
7.6 Acknowledgment Statement
CueMate is built on the shoulders of giants. Every maintainer of open-source projects, every developer contributing code, and every community member providing technical support are important forces enabling CueMate's creation.
Our commitments:
- Comply with all open-source project license requirements
- Actively participate in the open-source community, give back to the technology ecosystem
- Continuously improve CueMate to provide better service to users
Once again, our most sincere thanks to all open-source projects and technical communities!
Related Pages
- System Requirements - Learn about runtime environment requirements
- Installation Guide - Start installing CueMate
- Feature Introduction - Learn about core features
- FAQ - Solve usage problems
