我正在尝试将使用转换器的脚本转换为exe文件。它是一个执行令牌分类的小文件脚本:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
# download once to save locally
# tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER-uncased")
# model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER-uncased")
# save model locally
# tokenizer.save_pretrained("./model")
# model.save_pretrained("./model")
# now just load from local file
tokenizer = AutoTokenizer.from_pretrained('./model')
model = AutoModelForTokenClassification.from_pretrained('./model')
nlp = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
example = """00:00:02 Speaker 1: hi john, it's nice to see you again. how was your weekend? do anything special? 00:00:06 Speaker 2: yep, all good thanks. i was with my sister in derby. We saw, you know, that james bond film. what's it called? then got a couple of drinks at the pitcher and piano, back in nottingham. """
ner_results = nlp(example)
print(ner_results)
for i in range(0, len(ner_results)):
start = ner_results[i]['start']
end = ner_results[i]['end']
example = example.replace(ner_results[i]['word'], ner_results[i]['entity_group'])
print(example)
在线模型只下载一次,然后在本地保存,以便可以使用pyinstaller进行打包。我正在使用下面的行来构建exe文件(通过阅读其他类似的问题,添加了pyinstaller遗漏的所有必需的库)
pyinstaller --windowed --add-data ./model/config.json;./model/ --add-data ./model/pytorch_model.bin;./model/ --add-data ./model/special_tokens_map.json;./model/ --add-data ./model/tokenizer.json;./model/ --add-data ./model/tokenizer_config.json;./model/ --add-data ./model/vocab.txt;./model/ --collect-data tensorflow --collect-data torch --copy-metadata torch --copy-metadata tqdm --copy-metadata regex --copy-metadata sacremoses --copy-metadata requests --copy-metadata packaging --copy-metadata filelock --copy-metadata numpy --copy-metadata tokenizers --copy-metadata importlib_metadata --hidden-import=“sklearn.utils._cython_blas” --hidden-import=“sklearn.neighbors.typedefs” --hidden-import=“sklearn.neighbors.quad_tree” --hidden-import=“sklearn.tree” --hidden-import=“sklearn.tree._utils” deidentify.py
这将生成以下.spec文件
# -*- mode: python ; coding: utf-8 -*-
from PyInstaller.utils.hooks import collect_data_files
from PyInstaller.utils.hooks import copy_metadata
datas = [('./model/config.json', './model/'), ('./model/pytorch_model.bin', './model/'), ('./model/special_tokens_map.json', './model/'), ('./model/tokenizer.json', './model/'), ('./model/tokenizer_config.json', './model/'), ('./model/vocab.txt', './model/')]
datas += collect_data_files('tensorflow')
datas += collect_data_files('torch')
datas += copy_metadata('torch')
datas += copy_metadata('tqdm')
datas += copy_metadata('regex')
datas += copy_metadata('sacremoses')
datas += copy_metadata('requests')
datas += copy_metadata('packaging')
datas += copy_metadata('filelock')
datas += copy_metadata('numpy')
datas += copy_metadata('tokenizers')
datas += copy_metadata('importlib_metadata')
block_cipher = None
a = Analysis(['deidentify.py'],
pathex=[],
binaries=[],
datas=datas,
hiddenimports=['“sklearn.utils._cython_blas”', '“sklearn.neighbors.typedefs”', '“sklearn.neighbors.quad_tree”', '“sklearn.tree”', '“sklearn.tree._utils”'],
hookspath=[],
hooksconfig={},
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
cipher=block_cipher)
exe = EXE(pyz,
a.scripts,
[],
exclude_binaries=True,
name='deidentify',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
console=False,
disable_windowed_traceback=False,
target_arch=None,
codesign_identity=None,
entitlements_file=None )
coll = COLLECT(exe,
a.binaries,
a.zipfiles,
a.datas,
strip=False,
upx=True,
upx_exclude=[],
name='deidentify')
可以看到,所有的模型文件和库都包括在内。
以下是生成exe文件时的控制台输出
console output removed due to maximum character limit reached
我不知道为什么上面有这么多模块,因为我已经在我的系统和我的本地环境中安装了它们。他们应该被捡起来。我甚至要求将它们包括在.spec文件中。
进程完成后,运行exe文件时收到的错误是:
Traceback (most recent call last):
File "transformers\utils\versions.py", line 105, in require_version
File "importlib_metadata\__init__.py", line 631, in version
File "importlib_metadata\__init__.py", line 604, in distribution
File "importlib_metadata\__init__.py", line 229, in from_name
importlib_metadata.PackageNotFoundError: No package metadata was found for dataclasses
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "deidentify.py", line 1, in <module>
File "PyInstaller\loader\pyimod03_importers.py", line 476, in exec_module
File "transformers\__init__.py", line 43, in <module>
File "PyInstaller\loader\pyimod03_importers.py", line 476, in exec_module
File "transformers\dependency_versions_check.py", line 41, in <module>
File "transformers\utils\versions.py", line 120, in require_version_core
File "transformers\utils\versions.py", line 108, in require_version
importlib_metadata.PackageNotFoundError: No package metadata was found for The 'dataclasses' distribution was not found and is required by this application.
Try: pip install transformers -U or pip install -e '.[dev]' if you're working with git master
importlib_metadata
安装在pip中,不应该丢失。
更新
在@0x26res的注释并更新到python 3.8之后,我出现了一个新的错误:
Traceback (most recent call last):
File "torch\_sources.py", line 21, in get_source_lines_and_file
sourcelines, file_lineno = inspect.getsourcelines(obj)
File "inspect.py", line 979, in getsourcelines
File "inspect.py", line 798, in findsource
OSError: could not get source code
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "transformers\file_utils.py", line 2704, in _get_module
File "importlib\__init__.py", line 127, in import_module
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
File "transformers\models\deberta\modeling_deberta.py", line 505, in <module>
File "torch\jit\_script.py", line 1307, in script
ast = get_jit_def(obj, obj.__name__)
File "torch\jit\frontend.py", line 233, in get_jit_def
parsed_def = parse_def(fn)
File "torch\_sources.py", line 95, in parse_def
sourcelines, file_lineno, filename = get_source_lines_and_file(fn, ErrorReport.call_stack())
File "torch\_sources.py", line 28, in get_source_lines_and_file
raise OSError(msg) from e
OSError: Can't get source for <function c2p_dynamic_expand at 0x000002608019EDC0>. TorchScript requires source access in order to carry out compilation, make sure original .py files are available.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "deidentify.py", line 16, in <module>
File "transformers\pipelines\__init__.py", line 651, in pipeline
File "transformers\pipelines\token_classification.py", line 103, in __init__
File "transformers\pipelines\base.py", line 853, in check_model_type
File "transformers\models\auto\auto_factory.py", line 601, in items
File "transformers\models\auto\auto_factory.py", line 604, in <listcomp>
File "transformers\models\auto\auto_factory.py", line 573, in _load_attr_from_module
File "transformers\models\auto\auto_factory.py", line 535, in getattribute_from_module
File "transformers\file_utils.py", line 2694, in __getattr__
File "transformers\file_utils.py", line 2706, in _get_module
RuntimeError: Failed to import transformers.models.deberta.modeling_deberta because of the following error (look up to see its traceback):
Can't get source for <function c2p_dynamic_expand at 0x000002608019EDC0>. TorchScript requires source access in order to carry out compilation, make sure original .py files are available.
在更新到python3.8之后,我给出了以下命令
pyinstaller --windowed --add-data ./model/;./model/ --collect-data torch --copy-metadata torch --copy-metadata tqd
m --copy-metadata regex --copy-metadata sacremoses --copy-metadata requests --copy-metadata packaging --copy-metadata filelock --copy-metadata numpy --copy-metadata tokenizers deidentify.p
y
发布于 2022-02-11 17:11:15
首先,您不需要单独地包含./model/
中的所有文件,只需包含整个模型目录,其他所有内容也将包括在内:
datas=[('model/','model'),...
我不知道为什么不包括dataclasses
,但是只需要手动将它包括进去
datas=[('[path-to-your-dataclasses.py]', '.'),...
这将把dataclasses.py放在根目录中,它应该由exe找到。
发布于 2022-02-14 14:36:05
显然,有许多不同的方法可以将脚本及其设计和内容转换为可执行文件.exe
。我最近做的最好的选择之一是auto-py-to-exe
!它很容易使用。下面是我为转换所做的步骤:
conda activate <NAME_OF_ENV>
激活condaauto-py-to-exe
安装pip install auto-py-to-exe
软件包auto-py-to-exe
应用程序来运行auto-py-to-exe
。https://stackoverflow.com/questions/70607241
复制相似问题