Model Overview

CueMate supports configuring multiple mainstream large language model providers, offering users flexible AI capability choices.

1. Supported Model Providers

CueMate currently supports the following model providers:

Chinese Providers

Alibaba Cloud Bailian - Alibaba Cloud's enterprise-grade large model service platform
Tencent Hunyuan - Tencent's self-developed large language model
Tencent Cloud - Large model service provided by Tencent Cloud
Zhipu AI - GLM series models provided by Zhipu AI
DeepSeek - High-performance large model launched by DeepSeek
Kimi - Kimi intelligent assistant launched by Moonshot AI
iFlytek Spark - Spark cognitive large model launched by iFlytek
Volcengine - Doubao large model service under ByteDance
SiliconFlow - Service platform focused on AI inference acceleration
Baidu Qianfan - Large language model platform launched by Baidu
MiniMax - Ultra-long text large model launched by MiniMax
StepFun - Step series models focused on long context
SenseNova - SenseNova series models launched by SenseTime
Baichuan AI - Baichuan series models launched by Baichuan Intelligence

International Providers

OpenAI - Provider of GPT series models
Anthropic - Provider of Claude series models
Google Gemini - Multimodal large model launched by Google
Azure OpenAI - OpenAI service on Microsoft Azure platform
Amazon Bedrock - Large model service platform provided by AWS

Local/Private Deployment

Local LLM - Large model service supporting local deployment
Ollama - Open source tool for running large models locally
vLLM - High-performance large model inference engine
Xorbits Inference - Inference framework supporting multiple models
Regolo - Enterprise-grade private deployment solution

2. Model Configuration Guide

Configuration Steps

Get API Key - Obtain API key from the corresponding provider
Add Model Configuration - Add model configuration in CueMate system
Test Connection - Verify if model configuration is correct
Apply Model - Use the configured model in the system

Configuration Parameters

Different model providers may require different configuration parameters. Common parameters include:

API Key - Key for authentication
API Endpoint - Access address for model service
Model Name - Specific model version to use
Temperature - Controls output randomness, range 0-1
Max Tokens - Limits maximum output length
Timeout - API request timeout setting

3. Model Selection Recommendations

By Scenario

Technical Interview Scenarios (2026 Recommendations)

Recommended: Claude Sonnet 4.5, DeepSeek Reasoner, OpenAI GPT-5, Baichuan AI Baichuan4
Features: Top-tier reasoning ability, deep code understanding, accurate technical Q&A
Alternatives: Claude Opus 4.1, GPT-4.1, Alibaba Cloud Qwen Max

Behavioral Interview Scenarios

Recommended: DeepSeek Chat, Zhipu GLM-4 Flash, Gemini 2.0 Flash, Baidu Qianfan ERNIE-4.5, MiniMax abab6.5s
Features: Strong comprehension, fast response, extremely low cost
Alternatives: Claude 3.7 Sonnet, GPT-4o Mini, SenseNova SenseChat-Turbo

Multimodal/Multilingual Scenarios

Recommended: Claude Sonnet 4.5, Gemini 2.0 Flash, Zhipu GLM-4V Plus, SenseNova SenseChat-5
Features: Image understanding, multilingual support, real-time translation
Alternatives: GPT-4o, Claude 3.7 Sonnet, Baidu Qianfan ERNIE-4.5-Turbo

Cost-Sensitive Scenarios

Recommended: DeepSeek Chat (¥0.07/million tokens), Zhipu GLM-4 Air, Local Ollama, Baidu Qianfan (free), Alibaba Cloud Qwen Turbo
Features: Extremely low cost, unbeatable value, meets daily needs
Alternatives: Alibaba Cloud Qwen3 series, Tencent Hunyuan Lite, MiniMax abab6.5s

Long Text Processing Scenarios

Recommended: MiniMax abab6.5-chat (245K), Baichuan AI Baichuan3-Turbo-128k (128K), StepFun Step-1-32k (32K)
Features: Ultra-long context support, suitable for long document analysis, multi-turn deep conversations
Alternatives: Kimi (200K), Claude Sonnet 4.5 (200K)

By Model Deployment Location

CueMate is a desktop application installed on your local computer (Mac/Windows) that can connect to large language models deployed in different locations:

Cloud API Services (Models on Provider's Cloud)

Usage: CueMate calls the provider's cloud API via internet
Pros:
- Ready to use, no self-deployment needed
- Strong model capabilities, continuously updated
- No need to purchase servers and GPUs
Cons:
- Requires payment (usage-based billing)
- Requires network connection
- Data passes through provider's servers
Suitable for: Users with high model capability requirements who want quick setup

Private Deployment (Models on Your Own Servers)

Usage: Deploy models on servers (Ollama, vLLM, Xinference, etc.), CueMate accesses via intranet or internet
Pros:
- Data completely private, doesn't pass through third parties
- No API call fees (only server costs)
- Can customize models as needed
Cons:
- Need to purchase or rent servers (32GB+ RAM recommended, GPU preferred)
- Requires technical ability for deployment and maintenance
- Need to download model files (several GB to tens of GB)
- Model capabilities usually not as good as top cloud models
Suitable for: Users or enterprises valuing data privacy, with technical capabilities and server resources

By Budget

The following are reference prices from mainstream providers (1 million tokens ≈ 750,000 Chinese characters):

High-End Models (Powerful but Expensive)

OpenAI GPT-5: ~¥280-560 (input/output)
Anthropic Claude Sonnet 4.5: ~¥210-420 (input/output)
Anthropic Claude Opus 4.1: ~¥140-420
Google Gemini 1.5 Pro: ~¥70-210
Suitable for: Scenarios requiring extreme accuracy

Mid-Range Models (High Value)

Anthropic Claude Haiku 4.5: ~¥7-14
OpenAI GPT-5 Mini: ~¥7-14
OpenAI GPT-4o Mini: ~¥3.5-7
Zhipu GLM-4: ~¥7-14
DeepSeek Chat: ~¥0.7-1.4
Alibaba Cloud Qwen Plus: ~¥2.8-14
Tencent Hunyuan Pro: ~¥21-70
Baichuan AI Baichuan3-Turbo: ~¥2-4
Suitable for: Daily use, controlled costs

Economy Models (Low Cost)

DeepSeek Chat: ~¥0.7-1.4
Alibaba Cloud Qwen Turbo: ~¥1.4-4.2
iFlytek Spark Lite: ~¥1.4-4.2
Tencent Hunyuan Lite: ~¥2.8-7
Baidu Qianfan (free for individuals): ¥0
MiniMax abab6.5s: ~¥1-2
StepFun Step-1-8k: ~¥1-2
SenseNova SenseChat-Turbo: ~¥1.5-3
Suitable for: High-frequency use, limited budget

Local Models (Zero Cost)

Ollama + DeepSeek-R1, Llama, Qwen: ¥0
Only requires one-time hardware investment (computer RAM/GPU)
Suitable for: Long-term use, privacy-focused, with sufficient hardware

Price Notes:

Above prices are approximate costs per 1 million tokens (RMB)
Actual prices subject to each provider's official website
Most providers bill separately for "input tokens + output tokens"
Local models have no API fees, but require electricity and hardware costs

Model Experience Tiers (Personal Usage Impressions)

Based on actual usage experience, mainstream models are divided into 5 tiers:

Tier	Model List	Experience Features	Reasoning
Elite	OpenAI, Anthropic, Google Gemini, Alibaba Cloud Bailian, DeepSeek, Baichuan AI	Top-tier reasoning, deep understanding, extremely high response quality, strong multimodal capabilities	Represents the highest level of current large models, clear reasoning paths, extremely high code generation accuracy, excellent multi-turn dialogue context understanding, first-class complex task handling
Top-Tier	Azure OpenAI, Amazon Bedrock, Tencent Hunyuan, Kimi, MiniMax, StepFun	Excellent performance, fast response, high value, suitable for most scenarios	Performance close to elite but lower cost, fast response, balanced comprehensive capabilities, smooth daily usage, relatively friendly pricing, suitable for high-frequency use
Professional	Zhipu AI, Volcengine, SiliconFlow, vLLM, Baidu Qianfan, SenseNova	Good performance, enterprise-grade stability, moderate cost, comprehensive features	Enterprise services with SLA guarantees, high feature completeness, good Chinese support, reliable stability, well-documented APIs, suitable for production environments
Basic	Tencent Cloud, Local LLM, Ollama	Basically usable, low cost, suitable for simple tasks, average response quality	Can complete basic tasks but occasionally has errors, unstable response quality, average reasoning ability, suitable for non-core business or test environments, wins on being cheap
Limited	iFlytek Spark, Xorbits Inference, Regolo	Poor experience, complex configuration or insufficient performance, only suitable for special scenarios or testing	Configuration is cumbersome and error-prone, slow response, weak understanding, incomplete documentation, local models severely limited by hardware, only suitable for specific scenarios or learning/testing

Notes:

This rating is based on personal actual usage experience, for reference only
Model experience is affected by network environment, API configuration, prompt quality, and other factors
Recommend testing and selecting based on your actual needs and scenarios
Higher-tier models usually mean higher costs, need to weigh cost-effectiveness

4. Important Notes

API Key Security - Please keep API keys safe, don't share with others
Usage Limits - Be aware of each provider's API call limits and billing rules
Data Compliance - Ensure model usage complies with relevant laws and regulations
Performance Monitoring - Recommend regularly checking model response time and accuracy
Cost Control - Set request parameters reasonably to avoid unnecessary fees

Next Steps

Model Settings Feature Usage:

Model Settings Guide - Detailed guide on how to add, configure, test, and manage models

Select the model provider you need to configure and view the detailed configuration guide:

Model Overview ​

1. Supported Model Providers ​

Chinese Providers ​

International Providers ​

Local/Private Deployment ​

2. Model Configuration Guide ​

Configuration Steps ​

Configuration Parameters ​

3. Model Selection Recommendations ​

By Scenario ​

By Model Deployment Location ​

By Budget ​

Model Experience Tiers (Personal Usage Impressions) ​

4. Important Notes ​

Next Steps ​