文章/答案/技术大牛

发布

社区首页 >问答首页 >TensorFlow2.0-针对大数据集的tf.estimator.DNNClassifier培训

问TensorFlow2.0-针对大数据集的tf.estimator.DNNClassifier培训
EN

Stack Overflow用户

提问于 2020-06-04 16:58:54

回答 1查看 767关注 0票数 6

我在试着训练一个DNNClassifier

    labels = ['BENIGN', 'Syn', 'UDPLag', 'UDP', 'LDAP', 'MSSQL', 'NetBIOS', 'WebDDoS']

    # Build a DNN
    classifier = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[30, 10],
    n_classes=len(labels),
    label_vocabulary=labels)

    def input_fn(features, labels, training=True, batch_size=32):
       '''
       An input function for training or evaluating
       '''
       # Convert the inputs to a Dataset.
       dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
       # Shuffle and repeat if you are in training mode.
       if training:
          dataset = dataset.shuffle(1000).repeat()
       return dataset.batch(batch_size)

    # Train the model
    classifier.train(
    input_fn=lambda: input_fn(train_features, train_label, training=True),
    steps=5000)

在使用更大的数据集之前，训练效果很好

train_features.shape
>>> (15891114, 20)
train_label.shape
>>> (15891114,)

我正在使用Google Colaboratory，一旦培训开始，我的会话就会因为超过RAM使用量(12 as的RAM)而崩溃

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python

/ops/resource_variable_ops.py:1666: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Layer dnn is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/adagrad.py:106: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.

在训练开始之前，只使用了1 1GB的RAM，但是一旦训练开始，RAM很快就饱和了。

我通过提供数据帧的chunks来训练/评估模型，从而使其正常工作。

尽管如此，我仍然不清楚为什么当我为Estimator的训练或评估提供整个数据帧时，内存会饱和。

tensorflow

google-colaboratory

tensorflow2.0

tensorflow-datasets

tensorflow-estimator

回答 1

Stack Overflow用户

发布于 2020-06-07 22:04:35

我复制了你的Google Colab，复制了我的驱动器中的数据文件，并训练了估计器，你的代码刚刚工作：:s

。我可以训练DNN没有问题：

我检查了我使用的是大数据集：

当我重新计算一些jupyter笔记本单元时，我确实收到了一条out of RAM消息，但当我‘重启内核’，然后在那个Run all cells之后，我从来没有收到过一条消息。也许问题出在jupyter身上？试着把你的代码写成.py文件(你把它放在驱动器里)，然后用subprocess在colab笔记本上运行它，也许这就解决了你的问题。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62190677

复制

相似问题

问TensorFlow2.0-针对大数据集的tf.estimator.DNNClassifier培训
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问TensorFlow2.0-针对大数据集的tf.estimator.DNNClassifier培训EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问TensorFlow2.0-针对大数据集的tf.estimator.DNNClassifier培训
EN