TACO-LLM 支持 Huggingface 模型格式的多种生成式 Transformer 模型。下面列出了 TACO-LLM 目前支持的模型架构和对应的常用模型。
Decoder-only 语言模型
Architecture | Models | Example HuggingFace Models | LoRA |
BaiChuanForCausalLM | Baichuan & Baichuan2 | baichuan-inc/Baichuan2-13B-Chat, baichuan-inc/Baichuan-7B, etc. | ✓ |
BloomForCausalLM | BLOOM, BLOOMZ, BLOOMChat | bigscience/bloom, bigscience/bloomz, etc. | - |
ChatGLMModel | ChatGLM | THUDM/chatglm2-6b, THUDM/chatglm3-6b, etc. | ✓ |
FalconForCausalLM | Falcon | tiiuae/falcon-7b, tiiuae/falcon-40b, tiiuae/falcon-rw-7b, etc. | - |
GemmaForCausalLM | Gemma | google/gemma-2b, google/gemma-7b, etc. | ✓ |
Gemma2ForCausalLM | Gemma2 | google/gemma-2-9b, google/gemma-2-27b, etc. | ✓ |
GPT2LMHeadModel | GPT-2 | gpt2, gpt2-xl, etc. | - |
GPTBigCodeForCausalLM | StarCoder, SantaCoder, WizardCoder | bigcode/starcoder, bigcode/gpt_bigcode-santacoder, WizardLM/WizardCoder-15B-V1.0, etc. | ✓ |
GPTJForCausalLM | GPT-J | EleutherAI/gpt-j-6b, nomic-ai/gpt4all-j, etc. | - |
GPTNeoXForCausalLM | GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM | EleutherAI/gpt-neox-20b, EleutherAI/pythia-12b, OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5, databricks/dolly-v2-12b, stabilityai/stablelm-tuned-alpha-7b, etc. | - |
InternLMForCausalLM | InternLM | internlm/internlm-7b, internlm/internlm-chat-7b, etc. | ✓ |
InternLM2ForCausalLM | InternLM2 | internlm/internlm2-7b, internlm/internlm2-chat-7b, etc. | - |
LlamaForCausalLM | Llama 3.1, Llama 3, Llama 2, LLaMA, Yi | meta-llama/Meta-Llama-3.1-405B-Instruct, meta-llama/Meta-Llama-3.1-70B, meta-llama/Meta-Llama-3-70B-Instruct, meta-llama/Llama-2-70b-hf, 01-ai/Yi-34B, etc. | ✓ |
MistralForCausalLM | Mistral, Mistral-Instruct | mistralai/Mistral-7B-v0.1, mistralai/Mistral-7B-Instruct-v0.1, etc. | ✓ |
MixtralForCausalLM | Mixtral-8x7B, Mixtral-8x7B-Instruct | mistralai/Mixtral-8x7B-v0.1, mistralai/Mixtral-8x7B-Instruct-v0.1, mistral-community/Mixtral-8x22B-v0.1, etc. | ✓ |
NemotronForCausalLM | Nemotron-3, Nemotron-4, Minitron | nvidia/Minitron-8B-Base, mgoin/Nemotron-4-340B-Base-hf-FP8, etc. | ✓ |
OPTForCausalLM | OPT, OPT-IML | facebook/opt-66b, facebook/opt-iml-max-30b, etc. | |
PhiForCausalLM | Phi | microsoft/phi-1_5, microsoft/phi-2, etc. | ✓ |
Phi3ForCausalLM | Phi-3 | microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct, microsoft/Phi-3-medium-128k-instruct, etc. | - |
Phi3SmallForCausalLM | Phi-3-Small | microsoft/Phi-3-small-8k-instruct, microsoft/Phi-3-small-128k-instruct, etc. | - |
PhiMoEForCausalLM | Phi-3.5-MoE | microsoft/Phi-3.5-MoE-instruct , etc. | - |
QWenLMHeadModel | Qwen | Qwen/Qwen-7B, Qwen/Qwen-7B-Chat, etc. | - |
Qwen2ForCausalLM | Qwen2 | Qwen/Qwen2-beta-7B, Qwen/Qwen2-beta-7B-Chat, etc. | ✓ |
Qwen2MoeForCausalLM | Qwen2MoE | Qwen/Qwen1.5-MoE-A2.7B, Qwen/Qwen1.5-MoE-A2.7B-Chat, etc. | - |
StableLmForCausalLM | StableLM | stabilityai/stablelm-3b-4e1t/ , stabilityai/stablelm-base-alpha-7b-v2, etc. | - |
Starcoder2ForCausalLM | Starcoder2 | bigcode/starcoder2-3b, bigcode/starcoder2-7b, bigcode/starcoder2-15b, etc. | - |
XverseForCausalLM | Xverse | xverse/XVERSE-7B-Chat, xverse/XVERSE-13B-Chat, xverse/XVERSE-65B-Chat, etc. | - |
多模态语言模型
Architecture | Models | Modalities | Example HuggingFace Models | LoRA |
InternVLChatModel | InternVL2 | Image(E+) | OpenGVLab/InternVL2-4B, OpenGVLab/InternVL2-8B, etc. | - |
LlavaForConditionalGeneration | LLaVA-1.5 | Image(E+) | llava-hf/llava-1.5-7b-hf, llava-hf/llava-1.5-13b-hf, etc. | - |
LlavaNextForConditionalGeneration | LLaVA-NeXT | Image(E+) | llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, etc. | - |
LlavaNextVideoForConditionalGeneration | LLaVA-NeXT-Video | Video | llava-hf/LLaVA-NeXT-Video-7B-hf, etc. (see note) | - |
PaliGemmaForConditionalGeneration | PaliGemma | Image(E) | google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, etc. | - |
Phi3VForCausalLM | Phi-3-Vision, Phi-3.5-Vision | Image(E+) | microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct etc. | - |
PixtralForConditionalGeneration | Pixtral | Image(+) | mistralai/Pixtral-12B-2409 | - |
QWenLMHeadModel | Qwen-VL | Image(E+) | Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc. | - |
Qwen2VLForConditionalGeneration | Qwen2-VL (see note) | Image(+) / Video(+) | Qwen/Qwen2-VL-2B-Instruct, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc. | - |
说明:
E: 表示 Pre-computed embeddings 可以作为多模态输入。
+: 表示一个 prompt 可以插入多个多模态输入。