Model Overview
CueMate supports configuring multiple mainstream large language model providers, offering users flexible AI capability choices.
1. Supported Model Providers
CueMate currently supports the following model providers:
Chinese Providers
- Alibaba Cloud Bailian - Alibaba Cloud's enterprise-grade large model service platform
- Tencent Hunyuan - Tencent's self-developed large language model
- Tencent Cloud - Large model service provided by Tencent Cloud
- Zhipu AI - GLM series models provided by Zhipu AI
- DeepSeek - High-performance large model launched by DeepSeek
- Kimi - Kimi intelligent assistant launched by Moonshot AI
- iFlytek Spark - Spark cognitive large model launched by iFlytek
- Volcengine - Doubao large model service under ByteDance
- SiliconFlow - Service platform focused on AI inference acceleration
- Baidu Qianfan - Large language model platform launched by Baidu
- MiniMax - Ultra-long text large model launched by MiniMax
- StepFun - Step series models focused on long context
- SenseNova - SenseNova series models launched by SenseTime
- Baichuan AI - Baichuan series models launched by Baichuan Intelligence
International Providers
- OpenAI - Provider of GPT series models
- Anthropic - Provider of Claude series models
- Google Gemini - Multimodal large model launched by Google
- Azure OpenAI - OpenAI service on Microsoft Azure platform
- Amazon Bedrock - Large model service platform provided by AWS
Local/Private Deployment
- Local LLM - Large model service supporting local deployment
- Ollama - Open source tool for running large models locally
- vLLM - High-performance large model inference engine
- Xorbits Inference - Inference framework supporting multiple models
- Regolo - Enterprise-grade private deployment solution
2. Model Configuration Guide
Configuration Steps
- Get API Key - Obtain API key from the corresponding provider
- Add Model Configuration - Add model configuration in CueMate system
- Test Connection - Verify if model configuration is correct
- Apply Model - Use the configured model in the system
Configuration Parameters
Different model providers may require different configuration parameters. Common parameters include:
- API Key - Key for authentication
- API Endpoint - Access address for model service
- Model Name - Specific model version to use
- Temperature - Controls output randomness, range 0-1
- Max Tokens - Limits maximum output length
- Timeout - API request timeout setting
3. Model Selection Recommendations
By Scenario
Technical Interview Scenarios (2025 Recommendations)
- Recommended: Claude Sonnet 4.5, DeepSeek Reasoner, OpenAI GPT-5, Baichuan AI Baichuan4
- Features: Top-tier reasoning ability, deep code understanding, accurate technical Q&A
- Alternatives: Claude Opus 4.1, GPT-4.1, Alibaba Cloud Qwen Max
Behavioral Interview Scenarios
- Recommended: DeepSeek Chat, Zhipu GLM-4 Flash, Gemini 2.0 Flash, Baidu Qianfan ERNIE-4.5, MiniMax abab6.5s
- Features: Strong comprehension, fast response, extremely low cost
- Alternatives: Claude 3.7 Sonnet, GPT-4o Mini, SenseNova SenseChat-Turbo
Multimodal/Multilingual Scenarios
- Recommended: Claude Sonnet 4.5, Gemini 2.0 Flash, Zhipu GLM-4V Plus, SenseNova SenseChat-5
- Features: Image understanding, multilingual support, real-time translation
- Alternatives: GPT-4o, Claude 3.7 Sonnet, Baidu Qianfan ERNIE-4.5-Turbo
Cost-Sensitive Scenarios
- Recommended: DeepSeek Chat (¥0.07/million tokens), Zhipu GLM-4 Air, Local Ollama, Baidu Qianfan (free), Alibaba Cloud Qwen Turbo
- Features: Extremely low cost, unbeatable value, meets daily needs
- Alternatives: Alibaba Cloud Qwen3 series, Tencent Hunyuan Lite, MiniMax abab6.5s
Long Text Processing Scenarios
- Recommended: MiniMax abab6.5-chat (245K), Baichuan AI Baichuan3-Turbo-128k (128K), StepFun Step-1-32k (32K)
- Features: Ultra-long context support, suitable for long document analysis, multi-turn deep conversations
- Alternatives: Kimi (200K), Claude Sonnet 4.5 (200K)
By Model Deployment Location
CueMate is a desktop application installed on your local computer (Mac/Windows) that can connect to large language models deployed in different locations:
Cloud API Services (Models on Provider's Cloud)
- Usage: CueMate calls the provider's cloud API via internet
- Pros:
- Ready to use, no self-deployment needed
- Strong model capabilities, continuously updated
- No need to purchase servers and GPUs
- Cons:
- Requires payment (usage-based billing)
- Requires network connection
- Data passes through provider's servers
- Suitable for: Users with high model capability requirements who want quick setup
Private Deployment (Models on Your Own Servers)
- Usage: Deploy models on servers (Ollama, vLLM, Xinference, etc.), CueMate accesses via intranet or internet
- Pros:
- Data completely private, doesn't pass through third parties
- No API call fees (only server costs)
- Can customize models as needed
- Cons:
- Need to purchase or rent servers (32GB+ RAM recommended, GPU preferred)
- Requires technical ability for deployment and maintenance
- Need to download model files (several GB to tens of GB)
- Model capabilities usually not as good as top cloud models
- Suitable for: Users or enterprises valuing data privacy, with technical capabilities and server resources
By Budget
The following are reference prices from mainstream providers (1 million tokens ≈ 750,000 Chinese characters):
High-End Models (Powerful but Expensive)
- OpenAI GPT-5: ~¥280-560 (input/output)
- Anthropic Claude Sonnet 4.5: ~¥210-420 (input/output)
- Anthropic Claude Opus 4.1: ~¥140-420
- Google Gemini 1.5 Pro: ~¥70-210
- Suitable for: Scenarios requiring extreme accuracy
Mid-Range Models (High Value)
- Anthropic Claude Haiku 4.5: ~¥7-14
- OpenAI GPT-5 Mini: ~¥7-14
- OpenAI GPT-4o Mini: ~¥3.5-7
- Zhipu GLM-4: ~¥7-14
- DeepSeek Chat: ~¥0.7-1.4
- Alibaba Cloud Qwen Plus: ~¥2.8-14
- Tencent Hunyuan Pro: ~¥21-70
- Baichuan AI Baichuan3-Turbo: ~¥2-4
- Suitable for: Daily use, controlled costs
Economy Models (Low Cost)
- DeepSeek Chat: ~¥0.7-1.4
- Alibaba Cloud Qwen Turbo: ~¥1.4-4.2
- iFlytek Spark Lite: ~¥1.4-4.2
- Tencent Hunyuan Lite: ~¥2.8-7
- Baidu Qianfan (free for individuals): ¥0
- MiniMax abab6.5s: ~¥1-2
- StepFun Step-1-8k: ~¥1-2
- SenseNova SenseChat-Turbo: ~¥1.5-3
- Suitable for: High-frequency use, limited budget
Local Models (Zero Cost)
- Ollama + DeepSeek-R1, Llama, Qwen: ¥0
- Only requires one-time hardware investment (computer RAM/GPU)
- Suitable for: Long-term use, privacy-focused, with sufficient hardware
Price Notes:
- Above prices are approximate costs per 1 million tokens (RMB)
- Actual prices subject to each provider's official website
- Most providers bill separately for "input tokens + output tokens"
- Local models have no API fees, but require electricity and hardware costs
Model Experience Tiers (Personal Usage Impressions)
Based on actual usage experience, mainstream models are divided into 5 tiers:
| Tier | Model List | Experience Features | Reasoning |
|---|---|---|---|
| Elite | OpenAI, Anthropic, Google Gemini, Alibaba Cloud Bailian, DeepSeek, Baichuan AI | Top-tier reasoning, deep understanding, extremely high response quality, strong multimodal capabilities | Represents the highest level of current large models, clear reasoning paths, extremely high code generation accuracy, excellent multi-turn dialogue context understanding, first-class complex task handling |
| Top-Tier | Azure OpenAI, Amazon Bedrock, Tencent Hunyuan, Kimi, MiniMax, StepFun | Excellent performance, fast response, high value, suitable for most scenarios | Performance close to elite but lower cost, fast response, balanced comprehensive capabilities, smooth daily usage, relatively friendly pricing, suitable for high-frequency use |
| Professional | Zhipu AI, Volcengine, SiliconFlow, vLLM, Baidu Qianfan, SenseNova | Good performance, enterprise-grade stability, moderate cost, comprehensive features | Enterprise services with SLA guarantees, high feature completeness, good Chinese support, reliable stability, well-documented APIs, suitable for production environments |
| Basic | Tencent Cloud, Local LLM, Ollama | Basically usable, low cost, suitable for simple tasks, average response quality | Can complete basic tasks but occasionally has errors, unstable response quality, average reasoning ability, suitable for non-core business or test environments, wins on being cheap |
| Limited | iFlytek Spark, Xorbits Inference, Regolo | Poor experience, complex configuration or insufficient performance, only suitable for special scenarios or testing | Configuration is cumbersome and error-prone, slow response, weak understanding, incomplete documentation, local models severely limited by hardware, only suitable for specific scenarios or learning/testing |
Notes:
- This rating is based on personal actual usage experience, for reference only
- Model experience is affected by network environment, API configuration, prompt quality, and other factors
- Recommend testing and selecting based on your actual needs and scenarios
- Higher-tier models usually mean higher costs, need to weigh cost-effectiveness
4. Important Notes
- API Key Security - Please keep API keys safe, don't share with others
- Usage Limits - Be aware of each provider's API call limits and billing rules
- Data Compliance - Ensure model usage complies with relevant laws and regulations
- Performance Monitoring - Recommend regularly checking model response time and accuracy
- Cost Control - Set request parameters reasonably to avoid unnecessary fees
Next Steps
Model Settings Feature Usage:
- Model Settings Guide - Detailed guide on how to add, configure, test, and manage models
Select the model provider you need to configure and view the detailed configuration guide:
