Skip to main content

Generation: Parameters & Output format

Parameter configuration is an important aspect of AI Agents tuning. Each parameter influences how the model generates text, affecting creativity, coherence, determinism, and overall output quality. Teammately's system provides optimized parameter management for a specific use case.

Core Parameters​

Temperature​

Temperature controls the randomness of the model's outputs:

  • Low temperature [0.0-0.3]: More deterministic, focused responses. Ideal for factual Q&A, structured data extraction, and use cases requiring consistency
  • Medium temperature [0.4-0.7]: Balanced creativity and determinism. Suitable for most general-purpose applications and conversational AI
  • High temperature [0.8-1.0]: Increased randomness and creativity. Useful for brainstorming, creative writing, and generating diverse alternatives

Top_p (Nucleus Sampling)​

Top_p sampling selects from the smallest possible set of tokens whose cumulative probability exceeds the threshold p:

  • Low values [0.1-0.5]: More focused, deterministic outputs with limited variety
  • Medium values [0.5-0.8]: Balanced approach suitable for most applications
  • High values [0.9-1.0]: Considers a wider range of tokens, increasing diversity

Top_p can be used alongside or as an alternative to temperature. While temperature adjusts probability distribution, top_p truncates the distribution to the most likely tokens.

Max Tokens​

This parameter sets the maximum length of the generated response. Setting too low may result in truncating important information, while setting too high may lead to unnecessary verbosity or hallucinations.

When using Teammately Agent, it dynamically adjusts max tokens based on input context length, agent's complexity, and response type.

Frequency Penalty​

Frequency penalty reduces the likelihood of repetition by penalizing tokens that have already appeared in the generated text:

  • 0.0: No penalty applied
  • 0.1-0.5: Mild discouragement of repetition while preserving natural language patterns
  • 0.6-1.0: Strong prevention of repetition, useful for reducing "stuck in a loop" behaviors

This parameter is particularly useful for longer generations where repetition can become problematic.

Presence Penalty​

Presence penalty reduces the likelihood of the model repeating the same topics:

  • 0.0: No penalty applied
  • [0.1-0.5]: Gentle encouragement toward topic diversity
  • [0.6-1.0]: Strong push toward covering new topics

Unlike the frequency penalty (which targets specific repeated tokens), the presence penalty discourages thematic repetition more broadly.

Output Formatting​

Teammately supports multiple output formats to match your integration needs:

  • Text Output - The default format providing natural language responses. Usually best for user-facing conversations, narrative outputs, and creative content
  • JSON Schema - Structured output conforming to a predefined schema. Use it to ensure consistent, parseable responses from agents and to simplify integration with existing services
  • JSON Object - Free-form structured data without a predefined schema. It is more flexible than schema-based outputs and supports dynamic response structures

Parameter Selection Strategy​

Teammately's Agent parameter management system:

  1. Analyzes your use case requirements
  2. Considers the target model's characteristics
  3. Dynamically adjusts parameters based on input context
  4. Applies best practices from extensive testing
  5. Continuously optimizes based on performance metrics

Our approach eliminates the need for manual parameter tuning while ensuring optimal results across different models and use cases.