前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >vllm的SamplingParams参数

vllm的SamplingParams参数

作者头像
致Great
发布2024-02-03 09:02:26
3180
发布2024-02-03 09:02:26
举报
文章被收录于专栏:程序生活程序生活

vllm部署示例

代码语言:javascript
复制
from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="facebook/opt-125m")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}"

参数列表

代码语言:javascript
复制
n: Number of output sequences to return for the given prompt.
best_of: Number of output sequences that are generated from the prompt.
    From these `best_of` sequences, the top `n` sequences are returned.
    `best_of` must be greater than or equal to `n`. This is treated as
    the beam width when `use_beam_search` is True. By default, `best_of`
    is set to `n`.
presence_penalty: Float that penalizes new tokens based on whether they
    appear in the generated text so far. Values > 0 encourage the model
    to use new tokens, while values < 0 encourage the model to repeat
    tokens.
frequency_penalty: Float that penalizes new tokens based on their
    frequency in the generated text so far. Values > 0 encourage the
    model to use new tokens, while values < 0 encourage the model to
    repeat tokens.
repetition_penalty: Float that penalizes new tokens based on whether
    they appear in the prompt and the generated text so far. Values > 1
    encourage the model to use new tokens, while values < 1 encourage
    the model to repeat tokens.
temperature: Float that controls the randomness of the sampling. Lower
    values make the model more deterministic, while higher values make
    the model more random. Zero means greedy sampling.
top_p: Float that controls the cumulative probability of the top tokens
    to consider. Must be in (0, 1]. Set to 1 to consider all tokens.
top_k: Integer that controls the number of top tokens to consider. Set
    to -1 to consider all tokens.
min_p: Float that represents the minimum probability for a token to be
    considered, relative to the probability of the most likely token.
    Must be in [0, 1]. Set to 0 to disable this.
use_beam_search: Whether to use beam search instead of sampling.
length_penalty: Float that penalizes sequences based on their length.
    Used in beam search.
early_stopping: Controls the stopping condition for beam search. It
    accepts the following values: `True`, where the generation stops as
    soon as there are `best_of` complete candidates; `False`, where an
    heuristic is applied and the generation stops when is it very
    unlikely to find better candidates; `"never"`, where the beam search
    procedure only stops when there cannot be better candidates
    (canonical beam search algorithm).
stop: List of strings that stop the generation when they are generated.
    The returned output will not contain the stop strings.
stop_token_ids: List of tokens that stop the generation when they are
    generated. The returned output will contain the stop tokens unless
    the stop tokens are special tokens.
include_stop_str_in_output: Whether to include the stop strings in output
    text. Defaults to False.
ignore_eos: Whether to ignore the EOS token and continue generating
    tokens after the EOS token is generated.
max_tokens: Maximum number of tokens to generate per output sequence.
logprobs: Number of log probabilities to return per output token.
    Note that the implementation follows the OpenAI API: The return
    result includes the log probabilities on the `logprobs` most likely
    tokens, as well the chosen tokens. The API will always return the
    log probability of the sampled token, so there  may be up to
    `logprobs+1` elements in the response.
prompt_logprobs: Number of log probabilities to return per prompt token.
skip_special_tokens: Whether to skip special tokens in the output.
spaces_between_special_tokens: Whether to add spaces between special
    tokens in the output.  Defaults to True.
logits_processors: List of functions that modify logits based on
    previously generated tokens.
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2024-02-02,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • vllm部署示例
  • 参数列表
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档