Skip to content
Regolo

Configure Regolo

Regolo is a local model service that integrates API interfaces from major global large language model providers. It provides unified access, flexible billing models, and high availability guarantees, simplifying multi-model setup and switching.

1. Get Regolo API Key

1.1 Access Regolo Platform

Visit the Regolo AI platform and register/login: https://api.regolo.ai/

Access Regolo Platform

1.2 Enter Virtual Keys Management Page

After logging in, click Virtual Keys in the left menu to enter the API key management page.

Enter Virtual Keys

1.3 Create New API Key

Click the Create Key button in the upper right corner to open the creation dialog.

Click Create Button

1.4 Configure API Key Information

Configure the following information in the Create Key dialog:

1.4.1 Set Key Alias

Enter an easily identifiable name, such as CueMate.

Naming Recommendations:

  • Use project name or purpose as prefix
  • Distinguish dev/test/production environments (e.g., CueMate-Dev, CueMate-Prod)
  • Avoid including sensitive information

1.4.2 Select Authorized Models

Click the Models dropdown to select which models this API Key can access:

Available Modes:

  1. All models: Authorize access to all models (recommended for production)
  2. Specific models: Only authorize access to specific models (recommended for development/testing)

Model Selection Recommendations:

  • Production environment: Select "All models" for flexible switching
  • Development environment: Only select models needed for testing to reduce misuse risk
  • On-demand authorization: Select corresponding models based on actual business scenarios

Currently Available LLM Models:

  • deepseek-r1-70b: DeepSeek R1 reasoning model (max 64K tokens)
  • llama-guard3-8b: Llama Guard 3 safety audit model
  • qwen3-30b: Qwen3 30B general model (max 32K tokens)
  • qwen3-coder-30b: Qwen3 code-specific model (max 256K tokens)
  • mistral-small3.2: Mistral Small 3.2 lightweight model (max 32K tokens)
  • gpt-oss-120b: Open-source GPT 120B large model
  • Llama-3.3-70B-Instruct: Llama 3.3 latest version
  • Llama-3.1-8B-Instruct: Llama 3.1 8B cost-effective
  • maestrale-chat-v0.4-beta: Regolo original conversational model
  • Qwen3-8B: Qwen3 8B lightweight model (max 32K tokens)
  • gemma-3-27b-it: Google Gemma 3 27B (max 128K tokens)

1.4.3 Set Rate Limits

Click the Edit Limits button to configure rate limits for this API Key:

  • RPM (Requests Per Minute): Requests per minute
  • TPM (Tokens Per Minute): Tokens per minute
  • RPD (Requests Per Day): Requests per day

Rate Limit Recommendations:

  • Development/testing: RPM=60, TPM=100000, RPD=10000
  • Production environment: Set according to actual business volume to avoid unexpected overages

1.4.4 Complete Creation

After configuration, click the Save button.

Set API Key Information

1.5 Save API Key

After successful creation, the system will display the API Key.

Important Reminder:

  • The API Key is displayed only once and cannot be viewed again after closing the dialog
  • Please copy immediately and save to a secure location (such as a password manager)
  • If lost, you need to delete the old Key and create a new one

Copy API Key

Recommended Save Methods:

  1. Click the copy button, API Key is copied to clipboard
  2. Paste into password manager (such as 1Password, Bitwarden)
  3. Or save to a secure text file and keep it safe (do not share with others)

1.6 Verify API Key

In the Virtual Keys list, you can see the newly created Key:

  • Status: Shows whether the Key is enabled
  • Authorized Models: Shows the list of accessible models
  • Creation Time: Records the creation date
  • Actions: Can edit limits or delete the Key

2. Configure Regolo Model in CueMate

2.1 Enter Model Settings Page

After logging into CueMate, click Model Settings in the dropdown menu in the upper right corner.

Enter Model Settings

2.2 Add New Model

Click the Add Model button in the upper right corner.

Click Add Model

2.3 Select Regolo Provider

In the pop-up dialog:

  1. Provider Type: Select Regolo
  2. Click to automatically proceed to the next step

Select Regolo

2.4 Fill in Configuration Information

Fill in the following information on the configuration page:

Basic Configuration

  1. Model Name: Give this model configuration a name (e.g., Regolo Phi-4)
  2. API URL: Keep the default https://api.regolo.ai/v1
  3. API Key: Paste the Regolo API Key
  4. Model Version: Select or enter the model to use
    • Microsoft Series:
      • Phi-4: Microsoft Phi-4, lightweight and efficient
    • DeepSeek R1 Series:
      • DeepSeek-R1-Distill-Qwen-32B: DeepSeek R1 Distilled 32B
      • DeepSeek-R1-Distill-Qwen-14B: DeepSeek R1 Distilled 14B
      • DeepSeek-R1-Distill-Qwen-7B: DeepSeek R1 Distilled 7B
      • DeepSeek-R1-Distill-Llama-8B: DeepSeek R1 Llama 8B
    • Regolo Original:
      • maestrale-chat-v0.4-beta: Maestrale conversational model
    • Llama Series:
      • Llama-3.3-70B-Instruct: Llama 3.3 70B Instruct
      • Llama-3.1-70B-Instruct: Llama 3.1 70B Instruct
      • Llama-3.1-8B-Instruct: Llama 3.1 8B Instruct
    • DeepSeek Coder:
      • DeepSeek-Coder-6.7B-Instruct: DeepSeek Coder 6.7B
    • Qwen Series:
      • Qwen2.5-72B-Instruct: Qwen 2.5 72B Instruct

Fill in Basic Configuration

Advanced Configuration (Optional)

Expand the Advanced Configuration panel to adjust the following parameters:

CueMate Interface Adjustable Parameters:

  1. Temperature: Controls output randomness

    • Range: 0-2 (different models have different upper limits)
    • Recommended Value: 0.7
    • Effect: Higher values produce more random and creative output, lower values produce more stable and conservative output
    • Usage Recommendations:
      • Creative writing/brainstorming: 1.0-1.5
      • General conversation/Q&A: 0.7-0.9
      • Code generation/precise tasks: 0.3-0.5
  2. Max Tokens: Limits the maximum output length

    • Range: 256 - 262144 (depending on the model)
    • Recommended Value: 8192
    • Effect: Controls the maximum number of tokens in a single model response
    • Model Limits:
      • deepseek-r1-70b: max 64K tokens
      • gemma-3-27b-it: max 128K tokens
      • qwen3-coder-30b: max 256K tokens
      • Other models: 8K-32K tokens
    • Usage Recommendations:
      • Short Q&A: 1024-2048
      • General conversation: 4096-8192
      • Long text generation: 16384-32768
      • Ultra-long documents: 65536+ (supported models only)

Advanced Configuration

Other Advanced Parameters Supported by Regolo API:

While the CueMate interface only provides temperature and max_tokens adjustments, if you call Regolo directly via API, you can also use the following advanced parameters (Regolo uses OpenAI-compatible API format):

  1. top_p (nucleus sampling)

    • Range: 0-1
    • Default Value: 0.9
    • Effect: Samples from the smallest candidate set with cumulative probability of p
    • Relationship with temperature: Usually only adjust one of them
    • Usage Recommendations:
      • Maintain diversity while avoiding nonsense: 0.9-0.95
      • More conservative output: 0.7-0.8
  2. top_k

    • Range: 1-100
    • Default Value: 50
    • Effect: Samples from the top k candidates with highest probability
    • Usage Recommendations:
      • More diversity: 50-100
      • More conservative: 10-30
  3. frequency_penalty

    • Range: -2.0 to 2.0
    • Default Value: 0
    • Effect: Reduces the probability of repeating the same words (based on frequency)
    • Usage Recommendations:
      • Reduce repetition: 0.3-0.8
      • Allow repetition: 0 (default)
      • Force diversity: 1.0-2.0
  4. presence_penalty

    • Range: -2.0 to 2.0
    • Default Value: 0
    • Effect: Reduces the probability of words that have already appeared appearing again (based on presence)
    • Usage Recommendations:
      • Encourage new topics: 0.3-0.8
      • Allow topic repetition: 0 (default)
  5. stop

    • Type: String array
    • Default Value: null
    • Effect: Stops generation when the specified string appears in the content
    • Example: ["###", "User:", "\n\n"]
    • Use Cases:
      • Structured output: Use delimiters to control format
      • Dialogue systems: Prevent the model from speaking for the user
  6. stream

    • Type: Boolean
    • Default Value: false
    • Effect: Enable SSE streaming return, generating and returning incrementally
    • In CueMate: Automatically handled, no manual setting required
  7. seed

    • Type: Integer
    • Default Value: null
    • Effect: Fix random seed, same input produces same output
    • Use Cases:
      • Reproducible testing
      • Comparative experiments
    • Note: Not all models support this
No.Scenariotemperaturemax_tokenstop_pfrequency_penaltypresence_penalty
1Creative Writing1.0-1.24096-81920.950.50.5
2Code Generation0.2-0.52048-40960.90.00.0
3Q&A System0.71024-20480.90.00.0
4Summarization0.3-0.5512-10240.90.00.0
5Brainstorming1.2-1.52048-40960.950.80.8

2.5 Test Connection

After filling in the configuration, click the Test Connection button to verify if the configuration is correct.

Test Connection

If the configuration is correct, a success message will be displayed with a sample model response.

Test Success

If the configuration is incorrect, an error log will be displayed, and you can view detailed error information through log management.

2.6 Save Configuration

After successful testing, click the Save button to complete the model configuration.

Save Configuration

3. Use the Model

Through the dropdown menu in the upper right corner, enter the system settings interface and select the model configuration you want to use in the large model provider section.

After configuration, you can select to use this model in interview training, question generation, and other functions, or of course, you can individually select the model configuration for each interview in the interview options.

Select Model

4. Supported Model List

4.1 DeepSeek Series

No.Model NameModel IDParametersMax OutputUse Cases
1DeepSeek R1 70Bdeepseek-r1-70b70B64K tokensEnhanced reasoning, complex tasks, ultra-long context

4.2 Llama Series

No.Model NameModel IDParametersMax OutputUse Cases
1Llama Guard 3 8Bllama-guard3-8b8B8K tokensContent safety audit, risk detection
2Llama 3.3 70BLlama-3.3-70B-Instruct70B8K tokensLatest version, high-performance general tasks
3Llama 3.1 8BLlama-3.1-8B-Instruct8B8K tokensStandard tasks, cost-effective

4.3 Qwen Series

No.Model NameModel IDParametersMax OutputUse Cases
1Qwen3 30Bqwen3-30b30B32K tokensGeneral conversation, long text processing
2Qwen3 8BQwen3-8B8B32K tokensLightweight and efficient, fast response
3Qwen3 Coder 30Bqwen3-coder-30b30B256K tokensCode generation, ultra-long code context

4.4 Mistral Series

No.Model NameModel IDParametersMax OutputUse Cases
1Mistral Small 3.2mistral-small3.2-32K tokensLightweight model, multilingual support

4.5 Google Gemma Series

No.Model NameModel IDParametersMax OutputUse Cases
1Gemma 3 27Bgemma-3-27b-it27B128K tokensUltra-long context, document analysis

4.6 Open Source Community Models

No.Model NameModel IDParametersMax OutputUse Cases
1GPT OSS 120Bgpt-oss-120b120B8K tokensOpen-source super-large model, experimental tasks

4.7 Regolo Original

No.Model NameModel IDParametersMax OutputUse Cases
1Maestrale Chat v0.4maestrale-chat-v0.4-beta-8K tokensConversation optimization, multilingual (Italian enhanced)

5. Common Issues

5.1 Invalid API Key

Symptom: API Key error message when testing connection

Solution:

  1. Check if the API Key is completely copied
  2. Confirm the API Key has not expired or been disabled
  3. Verify the API Key permissions are set correctly

5.2 Model Not Available

Symptom: Error message indicating model does not exist or is not authorized

Solution:

  1. Confirm the model ID spelling is correct
  2. Check if the account has access permissions for this model
  3. Verify the account balance is sufficient

5.3 Request Timeout

Symptom: No response for a long time when testing connection or using

Solution:

  1. Check if the network connection is normal
  2. Confirm the API URL is configured correctly
  3. Check firewall settings

5.4 Quota Limit

Symptom: Request quota exceeded error

Solution:

  1. Log in to the Regolo platform to check quota usage
  2. Recharge or apply for more quota
  3. Optimize usage frequency

5.5 Enterprise Services

  • High availability guarantee
  • Professional technical support
  • Flexible pricing plans

5.6 Rich Models

  • Support for multiple mainstream open-source models
  • Regolo original optimized models
  • Continuously updated with latest models

5.7 Performance Optimization

  • Distributed inference cluster
  • Low latency response
  • High concurrency support

5.8 Data Security

  • Encrypted data transmission
  • Privacy protection mechanism
  • Compliance certification

Regolo uses a pay-as-you-go billing model:

Model LevelInput PriceOutput PriceUnit
Lightweight (<10B)¥0.001¥0.003/1K tokens
Standard (10B-30B)¥0.003¥0.009/1K tokens
High-performance (>30B)¥0.006¥0.018/1K tokens

Note: Specific prices are subject to the Regolo official website.

5.9 Model Selection

  1. Development/Testing: Use 7B-14B parameter models, low cost
  2. Production Environment: Choose 32B-70B models based on performance requirements
  3. Code Generation: Prefer DeepSeek Coder series
  4. General Conversation: Recommended Llama 3.3 or Qwen 2.5 series

5.10 Cost Optimization

  1. Set max_tokens parameter reasonably
  2. Use caching to reduce duplicate requests
  3. Choose models with appropriate parameter sizes
  4. Monitor API usage

5.1 Enterprise Applications

  • Internal knowledge base Q&A
  • Customer service automation
  • Document generation and processing

5.2 Developers

  • Application prototype development
  • AI feature integration
  • Algorithm validation testing

5.3 Private Deployment Needs

  • Support for private deployment solutions
  • Customized model training
  • Dedicated technical support

Released under the GPL-3.0 License.