我想运行sagemaker,然后根据下面的示例调整代码:
步骤3:https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-preprocess-data.html
和
步骤4:https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model.html
但我不能完成训练。Sagemaker返回以下错误:
ErrorMessage "FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/training/train/data.csv
在我的代码中,我将路径设置如下:
将数据上载到s3:
bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker_forecasting_ml'
train_1d.to_csv('train_1d.csv', sep=',', index=False, header=False)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'data/train_1d.csv')).upload_file('train_1d.csv')
training_1d_s3_path = TrainingInput(
"s3://{}/{}/{}".format(bucket, prefix, "data/train_1d.csv"), content_type="csv"
)
为什么当数据在上面时,sagemaker找不到路径?
这是我运行算法的完整代码:
from sagemaker import hyperparameters
from sagemaker.session import TrainingInput
from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base
train_model_id, train_model_version, train_scope = "lightgbm-regression-model", "*", "training"
training_instance_type = "ml.g4dn.xlarge"
region = sagemaker.Session().boto_region_name
# Retrieve the docker image
train_image_uri = image_uris.retrieve(
region=region,
framework=None,
model_id=train_model_id,
model_version=train_model_version,
image_scope=train_scope,
instance_type=training_instance_type
)
# Retrieve the training script
train_source_uri = script_uris.retrieve(
model_id=train_model_id, model_version=train_model_version, script_scope=train_scope
)
train_model_uri = model_uris.retrieve(
model_id=train_model_id, model_version=train_model_version, model_scope=train_scope
)
training_1d_s3_path = TrainingInput(
"s3://{}/{}/{}".format(bucket, prefix, "data/train.csv"), content_type="csv"
)
s3_output_loc_1d = f"s3://{bucket}/{prefix}/output"
# Retrieve the default hyper-parameters for training the model
hyperparameters = hyperparameters.retrieve_default(
model_id=train_model_id, model_version=train_model_version
)
# Override default hyperparameters with custom values
hyperparameters["num_boost_round"] = "500"
hyperparameters["n_estimators"] = "600"
hyperparameters["boosting_type"] = "gbdt"
hyperparameters["learning_rate"] = "0.01"
hyperparameters["objective"] = "tweedie"
hyperparameters["tweedie_variance_power"] = "1"
aws_role = sagemaker.get_execution_role()
training_job_name = name_from_base(f"built-in-algo-{train_model_id}-training")
# Create SageMaker Estimator instance
tabular_estimator = Estimator(
role=aws_role,
image_uri=train_image_uri,
source_dir=train_source_uri,
model_uri=train_model_uri,
entry_point="transfer_learning.py",
instance_count=1,
instance_type=training_instance_type,
max_run=360000,
hyperparameters=hyperparameters,
output_path=s3_output_loc_1d
)
# Launch a SageMaker Training job by passing the S3 path of the training data
tabular_estimator.fit(
{"training": training_1d_s3_path}, logs=True, job_name=training_job_name
)
发布于 2022-07-28 23:44:44
如果您正在正确构造文件路径,请签入您的transfer_learning.py文件。在您的例子中,应该是类似于/opt/ml/input/data/train/train_1d.csv之类的内容,而不是
/opt/ml/输入/数据/培训/培训/data.csv
在您的代码中,应该如下所示
default=os.environ.get('SM_CHANNEL_TRAINING')),type=str,parser.add_argument(‘--列车’)
args = parser.parse_args()
input_path = args.train + "/train_1d.csv“
还请参考下面的文档,以了解输入如何传递给培训工作。
https://docs.aws.amazon.com/sagemaker/latest/dg/model-train-storage.html
https://stackoverflow.com/questions/73152043
复制相似问题