Chat API token counting
max_tokens parameter
The max_tokens
parameter in the ConversationTokenBufferMemory
class is used to limit the number of tokens.
Output token limit
The models documentation mentions 4096 output tokens for many models.
Counting tokens
Counting tokens for chat API calls To see how many tokens are in a text string without making an API call use.
Rate limit
Your rate limit is calculated as the maximum of max_tokens
and the estimated number of tokens based on.
Context length
The token count of your prompt plus max_tokens
cannot exceed the models context length.
Questions about max_tokens
The max_tokens
parameter in the chat completion endpoint raises questions about its..
Komentar