Skip to content
Ollama

Configure Ollama

Ollama is a lightweight local large model runtime framework that supports quickly deploying and running open-source large language models on personal computers. It provides simple command-line tools and API interfaces, making local AI deployment simple and efficient.

1. Install Ollama

1.1 Download Ollama

Visit the Ollama official website to download the installation package for your system: https://ollama.com/

Visit Ollama Official Website

1.2 Install Ollama

Download the installation package for your system and run the installer:

  • macOS: Download the .dmg file and drag it to the Applications folder
  • Windows: Download the .exe installer and double-click to run
  • Linux: Download the installation package for your distribution (.deb / .rpm) or install using a package manager

Install Ollama

1.3 Verify Installation

Open the terminal and run the following command to verify the installation:

bash
ollama --version

Verify Installation

1.4 Select Model

In CueMate's model selection interface, you can see two types of models:

  • Cloud Models (names include -cloud):

    • No download required, directly call through the network
    • Examples: deepseek-v3.1:671b-cloud, qwen3-coder:480b-cloud, glm-4.6:cloud
    • Advantages: No local storage space required, supports ultra-large parameter models (e.g., 671B)
  • Local Models (marked with ↓, no -cloud suffix):

    • Will be automatically downloaded locally when selected for the first time
    • Examples: gpt-oss:120b, gemma3:27b, deepseek-r1:8b, qwen3:8b
    • Advantages: Fast running speed, no network connection required, high data privacy

Model Selection Interface

1.5 Cloud Model Configuration (Required for Cloud Models)

If you choose to use cloud models (e.g., deepseek-v3.1:671b-cloud), you need to first create a model on the Ollama official website:

1.5.1 Visit Ollama Official Website

Visit https://ollama.com/ and log in or register an account.

1.5.2 Create Cloud Model

  1. After logging in, click the Models menu
  2. Click the Create a new model button
  3. Fill in the model name (e.g., CueMate)
  4. Select Private or Public
  5. Click Create model to complete creation

Create Model Step 1

After creation, you will enter the model details page:

Create Model Step 2

1.5.3 Push Model to Cloud

After creating the model, the page will display push commands. There are two methods:

Method 1: Create and push based on existing model

bash
# 1. Pull base model
ollama pull llama3.2

# 2. Create Modelfile
echo "FROM llama3.2" >> Modelfile
echo "SYSTEM You are a friendly assistant." >> Modelfile

# 3. Create custom model
ollama create -f Modelfile your-username/CueMate

# 4. Push to cloud
ollama push your-username/CueMate

Method 2: Directly copy existing model and push

bash
# Copy existing model
ollama cp llama3.2 your-username/CueMate

# Push to cloud
ollama push your-username/CueMate

Model Push Instructions

1.5.4 View Cloud Model Address

After successful push, the page will display your cloud model access address:

You can find your model at:
https://ollama.com/your-username/CueMate

This address is your cloud model link, which can be shared with others.

1.5.5 Obtain API Key (For Use in CueMate)

When configuring cloud models in CueMate, you need an API Key:

  1. Visit Ollama official website settings page: https://ollama.com/settings/keys
  2. Click Create new key to create a new API Key
  3. Copy the generated API Key and save it for backup

API Key

When configuring in CueMate, fill in:

  • Model Name: your-username/CueMate (cloud models do not need to add :cloud suffix)
  • API URL: https://ollama.com
  • API Key: The API Key you just created

1.6 Local Model Configuration (Required for Local Models)

If you choose to use local models (e.g., gemma3:12b, deepseek-r1:8b), you need to start the local Ollama service:

  • Ollama will automatically start the service after installation, listening on http://localhost:11434 by default
  • Verify if the service is running:
    bash
    curl http://localhost:11434/api/version
  • Local models will be automatically downloaded on first use

2. Configure Ollama Model in CueMate

2.1 Enter Model Settings Page

After logging into CueMate, click Model Settings in the dropdown menu in the upper right corner.

Enter Model Settings

2.2 Add New Model

Click the Add Model button in the upper right corner.

Click Add Model

2.3 Select Ollama Provider

In the popup dialog:

  1. Provider Type: Select Ollama
  2. After clicking, it will automatically proceed to the next step

Select Ollama

2.4 Fill in Configuration Information

Fill in the following information on the configuration page:

Basic Configuration

  1. Model Name: Give this model configuration a name (e.g., Local DeepSeek R1)
  2. API URL: Keep the default http://localhost:11434 (if Ollama is running on another address, modify it)
  3. Model Version: Enter the downloaded model name

2025 Recommended Models:

  • Cloud Models: deepseek-v3.1:671b-cloud, qwen3-coder:480b-cloud, qwen3-vl:235b-cloud, glm-4.6:cloud, minimax-m2:cloud
  • Local Models: gpt-oss:120b, gemma3:27b, gemma3:12b, deepseek-r1:8b, qwen3-coder:30b, qwen3-vl:30b, qwen3:30b, qwen3:8b

Note: Local models will be automatically downloaded on first use, cloud models do not require download.

Fill in Basic Configuration

Advanced Configuration (Optional)

Expand the Advanced Configuration panel to adjust the following parameters:

CueMate Interface Adjustable Parameters:

  1. Temperature: Controls output randomness

    • Range: 0-2
    • Recommended Value: 0.7
    • Function: Higher values produce more random and creative output, lower values produce more stable and conservative output
    • Usage Suggestions:
      • Creative writing/brainstorming: 1.0-1.5
      • Regular conversation/Q&A: 0.7-0.9
      • Code generation/precise tasks: 0.3-0.5
  2. Max Tokens: Limits single output length

    • Range: 256 - 32768 (depending on model)
    • Recommended Value: 8192
    • Function: Controls the maximum word count of model's single response
    • Usage Suggestions:
      • Short Q&A: 1024-2048
      • Regular conversation: 4096-8192
      • Long text generation: 16384-32768

Advanced Configuration

Other Advanced Parameters Supported by Ollama API:

Although CueMate's interface only provides temperature and max_tokens adjustments, if you call Ollama directly through the API, you can also use the following advanced parameters (Ollama uses OpenAI-compatible API format):

  1. top_p (nucleus sampling)

    • Range: 0-1
    • Default Value: 1
    • Function: Samples from the smallest candidate set with cumulative probability reaching p
    • Relationship with temperature: Usually only adjust one of them
    • Usage Suggestions:
      • Maintain diversity but avoid absurdity: 0.9-0.95
      • More conservative output: 0.7-0.8
  2. top_k

    • Range: 0-100
    • Default Value: 40
    • Function: Samples from the k candidates with highest probability
    • Usage Suggestions:
      • More diverse: 50-100
      • More conservative: 10-30
  3. frequency_penalty

    • Range: -2.0 to 2.0
    • Default Value: 0
    • Function: Reduces the probability of repeating the same words (based on word frequency)
    • Usage Suggestions:
      • Reduce repetition: 0.3-0.8
      • Allow repetition: 0 (default)
  4. presence_penalty

    • Range: -2.0 to 2.0
    • Default Value: 0
    • Function: Reduces the probability of words that have already appeared appearing again (based on whether they appear)
    • Usage Suggestions:
      • Encourage new topics: 0.3-0.8
      • Allow repeated topics: 0 (default)
  5. stop (stop sequence)

    • Type: String or array
    • Default Value: null
    • Function: Stops when generated content contains specified string
    • Example: ["###", "User:", "\n\n"]
    • Usage Scenarios:
      • Structured output: Use delimiters to control format
      • Dialogue system: Prevent model from speaking on behalf of users
  6. stream

    • Type: Boolean
    • Default Value: false
    • Function: Enables SSE streaming return, returning as it generates
    • In CueMate: Handled automatically, no manual setting required
  7. seed (random seed)

    • Type: Integer
    • Default Value: null
    • Function: Fixes random seed, same input produces same output
    • Usage Scenarios:
      • Reproducible testing
      • Comparative experiments
No.Scenariotemperaturemax_tokenstop_ptop_kfrequency_penaltypresence_penalty
1Creative writing1.0-1.24096-81920.95500.50.5
2Code generation0.2-0.52048-40960.9400.00.0
3Q&A system0.71024-20480.9400.00.0
4Summarization0.3-0.5512-10240.9300.00.0
5Brainstorming1.2-1.52048-40960.95600.80.8

2.5 Test Connection

After filling in the configuration, click the Test Connection button to verify if the configuration is correct.

Test Connection

If the configuration is correct, it will display a success message and return a model response example.

Test Success

If the configuration is incorrect, it will display test error logs, and you can view specific error information through log management.

2.6 Save Configuration

After a successful test, click the Save button to complete the model configuration.

Save Configuration

3. Use Model

Through the dropdown menu in the upper right corner, enter the system settings interface and select the model configuration you want to use in the Large Model Provider section.

After configuration, you can select to use this model in functions such as interview training and question generation, or you can individually select the model configuration for a specific interview in the interview options.

Select Model

4. Supported Model List

4.1 Cloud Models

No.Model NameModel IDParametersFeatures
1GPT-OSS 120B Cloudgpt-oss:120b-cloud120BOpen-source GPT cloud version
2GPT-OSS 20B Cloudgpt-oss:20b-cloud20BOpen-source GPT cloud version
3DeepSeek V3.1deepseek-v3.1:671b-cloud671BUltra-large scale reasoning model
4Qwen3 Coderqwen3-coder:480b-cloud480BCode generation specialized
5Qwen3 VLqwen3-vl:235b-cloud235BVision-language model
6MiniMax M2minimax-m2:cloud-MiniMax cloud model
7GLM-4.6glm-4.6:cloud-Zhipu GLM latest version

4.2 Local Models

GPT-OSS Series

No.Model NameModel IDParametersUse Case
1GPT-OSS 120Bgpt-oss:120b120BOpen-source GPT ultra-large model
2GPT-OSS 20Bgpt-oss:20b20BOpen-source GPT medium model

Gemma 3 Series (Google)

No.Model NameModel IDParametersUse Case
1Gemma3 27Bgemma3:27b27BGoogle latest flagship model
2Gemma3 12Bgemma3:12b12BMedium-scale tasks
3Gemma3 4Bgemma3:4b4BLightweight tasks
4Gemma3 1Bgemma3:1b1BUltra-lightweight

DeepSeek R1 Series

No.Model NameModel IDParametersUse Case
1DeepSeek R1 8Bdeepseek-r1:8b8BReasoning enhancement

Qwen 3 Series

No.Model NameModel IDParametersUse Case
1Qwen3 Coder 30Bqwen3-coder:30b30BCode generation
2Qwen3 VL 30Bqwen3-vl:30b30BVision-language
3Qwen3 VL 8Bqwen3-vl:8b8BVision-language
4Qwen3 VL 4Bqwen3-vl:4b4BVision-language
5Qwen3 30Bqwen3:30b30BGeneral conversation
6Qwen3 8Bqwen3:8b8BGeneral conversation
7Qwen3 4Bqwen3:4b4BLightweight tasks

5. Common Issues

5.1 Ollama Service Not Started

Symptom: Connection failure prompt during test connection

Solution:

  1. Confirm if Ollama service is running: ollama list
  2. Restart Ollama service
  3. Check if port 11434 is occupied: lsof -i :11434

5.2 Model Not Downloaded

Symptom: Model does not exist prompt

Solution:

  1. Use ollama list to view downloaded models
  2. Use ollama pull <model-name> to download models
  3. Confirm model name spelling is correct

5.3 Performance Issues

Symptom: Slow model response speed

Solution:

  1. Select models with smaller parameters (e.g., 7B instead of 70B)
  2. Ensure sufficient GPU memory or system memory
  3. Check system resource usage

5.4 API URL Error

Symptom: Cannot connect to Ollama service

Solution:

  1. Confirm API URL is configured correctly (default http://localhost:11434)
  2. If Ollama is running in Docker, use the container's internal address
  3. Check firewall settings

5.5 Model Selection

  1. Development and Testing: Use 7B-14B parameter models, fast response, low resource consumption
  2. Production Environment: Select 14B-32B parameter models based on performance requirements
  3. Resource Constrained: Use 0.5B-3B parameter lightweight models

5.6 Hardware Requirements

Model ParametersMinimum MemoryRecommended MemoryGPU
0.5B-3B4GB8GBOptional
7B-14B8GB16GBRecommended
32B-70B32GB64GBRequired

Released under the GPL-3.0 License.