class taco_llm.SamplingParams(n: int = 1,best_of: Optional[int] = None,presence_penalty: float = 0.0,frequency_penalty: float = 0.0,repetition_penalty: float = 1.0,temperature: float = 1.0,top_p: float = 1.0,top_k: int = -1,min_p: float = 0.0,seed: Optional[int] = None,use_beam_search: bool = False,length_penalty: float = 1.0,early_stopping: Union[bool, str] = False,stop: Optional[Union[str, List[str]]] = None,stop_token_ids: Optional[List[int]] = None,ignore_eos: bool = False,max_tokens: Optional[int] = 16,min_tokens: int = 0,logprobs: Optional[int] = None,prompt_logprobs: Optional[int] = None,detokenize: bool = True,skip_special_tokens: bool = True,spaces_between_special_tokens: bool = True,logits_processors: Optional[Any] = None,include_stop_str_in_output: bool = False,truncate_prompt_tokens: Optional[Annotated[int, msgspec.Meta(ge=1)]] = None,no_repeat_ngram_size: int = 0)"""Sampling parameters for text generation.Overall, we follow the sampling parameters from the OpenAI text completionAPI (https://platform.openai.com/docs/api-reference/completions/create).In addition, we support beam search, which is not supported by OpenAI.Args:n: Number of output sequences to return for the given prompt.best_of: Number of output sequences that are generated from the prompt.From these `best_of` sequences, the top `n` sequences are returned.`best_of` must be greater than or equal to `n`. This is treated asthe beam width when `use_beam_search` is True. By default, `best_of`is set to `n`.presence_penalty: Float that penalizes new tokens based on whether theyappear in the generated text so far. Values > 0 encourage the modelto use new tokens, while values < 0 encourage the model to repeattokens.frequency_penalty: Float that penalizes new tokens based on theirfrequency in the generated text so far. Values > 0 encourage themodel to use new tokens, while values < 0 encourage the model torepeat tokens.repetition_penalty: Float that penalizes new tokens based on whetherthey appear in the prompt and the generated text so far. Values > 1encourage the model to use new tokens, while values < 1 encouragethe model to repeat tokens.temperature: Float that controls the randomness of the sampling. Lowervalues make the model more deterministic, while higher values makethe model more random. Zero means greedy sampling.top_p: Float that controls the cumulative probability of the top tokensto consider. Must be in (0, 1]. Set to 1 to consider all tokens.top_k: Integer that controls the number of top tokens to consider. Setto -1 to consider all tokens.min_p: Float that represents the minimum probability for a token to beconsidered, relative to the probability of the most likely token.Must be in [0, 1]. Set to 0 to disable this.seed: Random seed to use for the generation.use_beam_search: Whether to use beam search instead of sampling.length_penalty: Float that penalizes sequences based on their length.Used in beam search.early_stopping: Controls the stopping condition for beam search. Itaccepts the following values: `True`, where the generation stops assoon as there are `best_of` complete candidates; `False`, where anheuristic is applied and the generation stops when is it veryunlikely to find better candidates; `"never"`, where the beam searchprocedure only stops when there cannot be better candidates(canonical beam search algorithm).stop: List of strings that stop the generation when they are generated.The returned output will not contain the stop strings.stop_token_ids: List of tokens that stop the generation when they aregenerated. The returned output will contain the stop tokens unlessthe stop tokens are special tokens.include_stop_str_in_output: Whether to include the stop strings inoutput text. Defaults to False.ignore_eos: Whether to ignore the EOS token and continue generatingtokens after the EOS token is generated.max_tokens: Maximum number of tokens to generate per output sequence.min_tokens: Minimum number of tokens to generate per output sequencebefore EOS or stop_token_ids can be generatedlogprobs: Number of log probabilities to return per output token.When set to None, no probability is returned. If set to a non-Nonevalue, the result includes the log probabilities of the specifiednumber of most likely tokens, as well as the chosen tokens.Note that the implementation follows the OpenAI API: The API willalways return the log probability of the sampled token, so theremay be up to `logprobs+1` elements in the response.prompt_logprobs: Number of log probabilities to return per prompt token.detokenize: Whether to detokenize the output. Defaults to True.skip_special_tokens: Whether to skip special tokens in the output.spaces_between_special_tokens: Whether to add spaces between specialtokens in the output. Defaults to True.logits_processors: List of functions that modify logits based onpreviously generated tokens, and optionally prompt tokens asa first argument.truncate_prompt_tokens: If set to an integer k, will use only the last ktokens from the prompt (i.e., left truncation). Defaults to None(i.e., no truncation).no_repeat_ngram_size:If set to int > 0, all ngrams of that size can only occur once."""
除了兼容 vLLM 所有的采样参数外,TACO-LLM 还额外添加了以下采样参数:
no_repeat_ngram_size:If set to int > 0, all ngrams of that size can only occur once.