Skip to content

Model Overview

CueMate supports configuring multiple mainstream large language model providers, offering users flexible AI capability choices.

1. Supported Model Providers

CueMate currently supports the following model providers:

Chinese Providers

  • Alibaba Cloud Bailian - Alibaba Cloud's enterprise-grade large model service platform
  • Tencent Hunyuan - Tencent's self-developed large language model
  • Tencent Cloud - Large model service provided by Tencent Cloud
  • Zhipu AI - GLM series models provided by Zhipu AI
  • DeepSeek - High-performance large model launched by DeepSeek
  • Kimi - Kimi intelligent assistant launched by Moonshot AI
  • iFlytek Spark - Spark cognitive large model launched by iFlytek
  • Volcengine - Doubao large model service under ByteDance
  • SiliconFlow - Service platform focused on AI inference acceleration
  • Baidu Qianfan - Large language model platform launched by Baidu
  • MiniMax - Ultra-long text large model launched by MiniMax
  • StepFun - Step series models focused on long context
  • SenseNova - SenseNova series models launched by SenseTime
  • Baichuan AI - Baichuan series models launched by Baichuan Intelligence

International Providers

  • OpenAI - Provider of GPT series models
  • Anthropic - Provider of Claude series models
  • Google Gemini - Multimodal large model launched by Google
  • Azure OpenAI - OpenAI service on Microsoft Azure platform
  • Amazon Bedrock - Large model service platform provided by AWS

Local/Private Deployment

  • Local LLM - Large model service supporting local deployment
  • Ollama - Open source tool for running large models locally
  • vLLM - High-performance large model inference engine
  • Xorbits Inference - Inference framework supporting multiple models
  • Regolo - Enterprise-grade private deployment solution

2. Model Configuration Guide

Configuration Steps

  1. Get API Key - Obtain API key from the corresponding provider
  2. Add Model Configuration - Add model configuration in CueMate system
  3. Test Connection - Verify if model configuration is correct
  4. Apply Model - Use the configured model in the system

Configuration Parameters

Different model providers may require different configuration parameters. Common parameters include:

  • API Key - Key for authentication
  • API Endpoint - Access address for model service
  • Model Name - Specific model version to use
  • Temperature - Controls output randomness, range 0-1
  • Max Tokens - Limits maximum output length
  • Timeout - API request timeout setting

3. Model Selection Recommendations

By Scenario

Technical Interview Scenarios (2025 Recommendations)

  • Recommended: Claude Sonnet 4.5, DeepSeek Reasoner, OpenAI GPT-5, Baichuan AI Baichuan4
  • Features: Top-tier reasoning ability, deep code understanding, accurate technical Q&A
  • Alternatives: Claude Opus 4.1, GPT-4.1, Alibaba Cloud Qwen Max

Behavioral Interview Scenarios

  • Recommended: DeepSeek Chat, Zhipu GLM-4 Flash, Gemini 2.0 Flash, Baidu Qianfan ERNIE-4.5, MiniMax abab6.5s
  • Features: Strong comprehension, fast response, extremely low cost
  • Alternatives: Claude 3.7 Sonnet, GPT-4o Mini, SenseNova SenseChat-Turbo

Multimodal/Multilingual Scenarios

  • Recommended: Claude Sonnet 4.5, Gemini 2.0 Flash, Zhipu GLM-4V Plus, SenseNova SenseChat-5
  • Features: Image understanding, multilingual support, real-time translation
  • Alternatives: GPT-4o, Claude 3.7 Sonnet, Baidu Qianfan ERNIE-4.5-Turbo

Cost-Sensitive Scenarios

  • Recommended: DeepSeek Chat (¥0.07/million tokens), Zhipu GLM-4 Air, Local Ollama, Baidu Qianfan (free), Alibaba Cloud Qwen Turbo
  • Features: Extremely low cost, unbeatable value, meets daily needs
  • Alternatives: Alibaba Cloud Qwen3 series, Tencent Hunyuan Lite, MiniMax abab6.5s

Long Text Processing Scenarios

  • Recommended: MiniMax abab6.5-chat (245K), Baichuan AI Baichuan3-Turbo-128k (128K), StepFun Step-1-32k (32K)
  • Features: Ultra-long context support, suitable for long document analysis, multi-turn deep conversations
  • Alternatives: Kimi (200K), Claude Sonnet 4.5 (200K)

By Model Deployment Location

CueMate is a desktop application installed on your local computer (Mac/Windows) that can connect to large language models deployed in different locations:

Cloud API Services (Models on Provider's Cloud)

  • Usage: CueMate calls the provider's cloud API via internet
  • Pros:
    • Ready to use, no self-deployment needed
    • Strong model capabilities, continuously updated
    • No need to purchase servers and GPUs
  • Cons:
    • Requires payment (usage-based billing)
    • Requires network connection
    • Data passes through provider's servers
  • Suitable for: Users with high model capability requirements who want quick setup

Private Deployment (Models on Your Own Servers)

  • Usage: Deploy models on servers (Ollama, vLLM, Xinference, etc.), CueMate accesses via intranet or internet
  • Pros:
    • Data completely private, doesn't pass through third parties
    • No API call fees (only server costs)
    • Can customize models as needed
  • Cons:
    • Need to purchase or rent servers (32GB+ RAM recommended, GPU preferred)
    • Requires technical ability for deployment and maintenance
    • Need to download model files (several GB to tens of GB)
    • Model capabilities usually not as good as top cloud models
  • Suitable for: Users or enterprises valuing data privacy, with technical capabilities and server resources

By Budget

The following are reference prices from mainstream providers (1 million tokens ≈ 750,000 Chinese characters):

High-End Models (Powerful but Expensive)

  • OpenAI GPT-5: ~¥280-560 (input/output)
  • Anthropic Claude Sonnet 4.5: ~¥210-420 (input/output)
  • Anthropic Claude Opus 4.1: ~¥140-420
  • Google Gemini 1.5 Pro: ~¥70-210
  • Suitable for: Scenarios requiring extreme accuracy

Mid-Range Models (High Value)

  • Anthropic Claude Haiku 4.5: ~¥7-14
  • OpenAI GPT-5 Mini: ~¥7-14
  • OpenAI GPT-4o Mini: ~¥3.5-7
  • Zhipu GLM-4: ~¥7-14
  • DeepSeek Chat: ~¥0.7-1.4
  • Alibaba Cloud Qwen Plus: ~¥2.8-14
  • Tencent Hunyuan Pro: ~¥21-70
  • Baichuan AI Baichuan3-Turbo: ~¥2-4
  • Suitable for: Daily use, controlled costs

Economy Models (Low Cost)

  • DeepSeek Chat: ~¥0.7-1.4
  • Alibaba Cloud Qwen Turbo: ~¥1.4-4.2
  • iFlytek Spark Lite: ~¥1.4-4.2
  • Tencent Hunyuan Lite: ~¥2.8-7
  • Baidu Qianfan (free for individuals): ¥0
  • MiniMax abab6.5s: ~¥1-2
  • StepFun Step-1-8k: ~¥1-2
  • SenseNova SenseChat-Turbo: ~¥1.5-3
  • Suitable for: High-frequency use, limited budget

Local Models (Zero Cost)

  • Ollama + DeepSeek-R1, Llama, Qwen: ¥0
  • Only requires one-time hardware investment (computer RAM/GPU)
  • Suitable for: Long-term use, privacy-focused, with sufficient hardware

Price Notes:

  • Above prices are approximate costs per 1 million tokens (RMB)
  • Actual prices subject to each provider's official website
  • Most providers bill separately for "input tokens + output tokens"
  • Local models have no API fees, but require electricity and hardware costs

Model Experience Tiers (Personal Usage Impressions)

Based on actual usage experience, mainstream models are divided into 5 tiers:

TierModel ListExperience FeaturesReasoning
EliteOpenAI, Anthropic, Google Gemini, Alibaba Cloud Bailian, DeepSeek, Baichuan AITop-tier reasoning, deep understanding, extremely high response quality, strong multimodal capabilitiesRepresents the highest level of current large models, clear reasoning paths, extremely high code generation accuracy, excellent multi-turn dialogue context understanding, first-class complex task handling
Top-TierAzure OpenAI, Amazon Bedrock, Tencent Hunyuan, Kimi, MiniMax, StepFunExcellent performance, fast response, high value, suitable for most scenariosPerformance close to elite but lower cost, fast response, balanced comprehensive capabilities, smooth daily usage, relatively friendly pricing, suitable for high-frequency use
ProfessionalZhipu AI, Volcengine, SiliconFlow, vLLM, Baidu Qianfan, SenseNovaGood performance, enterprise-grade stability, moderate cost, comprehensive featuresEnterprise services with SLA guarantees, high feature completeness, good Chinese support, reliable stability, well-documented APIs, suitable for production environments
BasicTencent Cloud, Local LLM, OllamaBasically usable, low cost, suitable for simple tasks, average response qualityCan complete basic tasks but occasionally has errors, unstable response quality, average reasoning ability, suitable for non-core business or test environments, wins on being cheap
LimitediFlytek Spark, Xorbits Inference, RegoloPoor experience, complex configuration or insufficient performance, only suitable for special scenarios or testingConfiguration is cumbersome and error-prone, slow response, weak understanding, incomplete documentation, local models severely limited by hardware, only suitable for specific scenarios or learning/testing

Notes:

  • This rating is based on personal actual usage experience, for reference only
  • Model experience is affected by network environment, API configuration, prompt quality, and other factors
  • Recommend testing and selecting based on your actual needs and scenarios
  • Higher-tier models usually mean higher costs, need to weigh cost-effectiveness

4. Important Notes

  1. API Key Security - Please keep API keys safe, don't share with others
  2. Usage Limits - Be aware of each provider's API call limits and billing rules
  3. Data Compliance - Ensure model usage complies with relevant laws and regulations
  4. Performance Monitoring - Recommend regularly checking model response time and accuracy
  5. Cost Control - Set request parameters reasonably to avoid unnecessary fees

Next Steps

Model Settings Feature Usage:

Select the model provider you need to configure and view the detailed configuration guide:

Released under the GPL-3.0 License.