我使用TensorFlow对象检测应用程序接口使用自定义数据对一些对象检测模型进行了4K步的训练,并在训练期间对它们进行了评估。所有的检查点都进行了评估,我在控制台上看到了结果。
但是,我在Tensorboard上看不到最后两个检查点的评估结果。它显示了3K步的评估结果,之后就什么也没有了。我可以看到评估已经在控制台和文件夹中完成了。
当我启动Tensorboard时,控制台上没有错误信息。我可以看到训练结果完全上传到Tensorboard,唯一缺少的是最后的评估结果。
我再次尝试评估最新的检查点,但没有任何变化。在评估结束时,我收到一条消息,说明指标已记录到摘要中……
训练检查点每10分钟保存一次,评估需要12分钟。但即使在这种情况下,我也希望得到最新的检查点评估结果。
当我尝试从Tensorboard下载csv文件时,我也看不到最后两个检查点的评估。
可能的原因是什么?
I0311 16:57:21.281645 MainThread program.py:165] Not bringing up TensorBoard, but inspecting event files.
I0311 16:57:21.281645 140028330256128 program.py:165] Not bringing up TensorBoard, but inspecting event files.
======================================================================
Processing event files... (this can take a few minutes)
======================================================================
Found event files in:
./CN_flow1_95/eval
./CN_flow1_95/train
These tags are in ./CN_flow1_95/eval:
audio -
histograms -
images
image-0
image-1
image-2
image-3
image-4
image-5
image-6
image-7
image-8
image-9
scalars
Losses/Loss/BoxClassifierLoss/classification_loss
Losses/Loss/BoxClassifierLoss/localization_loss
Losses/Loss/RPNLoss/localization_loss
Losses/Loss/RPNLoss/objectness_loss
PascalBoxes_PerformanceByCategory/AP@0.5IOU/b'cyclist'
PascalBoxes_PerformanceByCategory/AP@0.5IOU/b'motorcyclist'
PascalBoxes_PerformanceByCategory/AP@0.5IOU/b'pedestrian'
PascalBoxes_Precision/mAP@0.5IOU
tensor -
======================================================================
Event statistics for ./CN_flow1_95/eval:
audio -
graph
first_step 0
last_step 0
max_step 0
min_step 0
num_steps 1
outoforder_steps []
histograms -
images
first_step 0
last_step 4112
max_step 4112
min_step 0
num_steps 7
outoforder_steps []
scalars
first_step 0
last_step 4112
max_step 4112
min_step 0
num_steps 7
outoforder_steps []
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor -
======================================================================
These tags are in ./CN_flow1_95/train:
audio -
histograms
ModelVars/...
images -
scalars
Losses/TotalLoss
Losses/clone_0/Loss/BoxClassifierLoss/classification_loss
Losses/clone_0/Loss/BoxClassifierLoss/localization_loss
Losses/clone_0/Loss/RPNLoss/localization_loss
Losses/clone_0/Loss/RPNLoss/objectness_loss
Losses/clone_1/Loss/BoxClassifierLoss/classification_loss
Losses/clone_1/Loss/BoxClassifierLoss/localization_loss
Losses/clone_1/Loss/RPNLoss/localization_loss
Losses/clone_1/Loss/RPNLoss/objectness_loss
Losses/clone_2/Loss/BoxClassifierLoss/classification_loss
Losses/clone_2/Loss/BoxClassifierLoss/localization_loss
Losses/clone_2/Loss/RPNLoss/localization_loss
Losses/clone_2/Loss/RPNLoss/objectness_loss
batch/fraction_of_150_full
clone_0/Losses/clone_0//clone_loss
global_step/sec
queue/prefetch_queue/fraction_of_5_full
tensor -
======================================================================
Event statistics for ./CN_flow1_95/train:
audio -
graph
first_step 0
last_step 0
max_step 0
min_step 0
num_steps 1
outoforder_steps []
histograms
first_step 0
last_step 4110
max_step 4110
min_step 0
num_steps 28
outoforder_steps []
images -
scalars
first_step 0
last_step 4110
max_step 4110
min_step 0
num_steps 54
outoforder_steps []
sessionlog:checkpoint
first_step 1
last_step 4111
max_step 4111
min_step 1
num_steps 7
outoforder_steps []
sessionlog:start
outoforder_steps []
steps [0, 4110]
sessionlog:stop
outoforder_steps []
steps [0, 0]
tensor -
======================================================================
发布于 2019-03-15 04:26:52
我也在TensorBoard repo上问过这个问题。他们说没有理由不完美地加载事件文件,并告诉我来这里...
有时会看到正确的结果(如果由于详尽的测试而有10-15个事件文件),但大多数情况下看不到。我更改了存储检查点的频率,以便在评估期间不会遗漏任何检查点(这没有意义,但我还是尝试了一下)
我每12分钟存储一次检查点,因为评估也需要12分钟。它也不起作用。
所有拉力板--检查结果看起来都很好。
我在不同的计算机上尝试了不同的模型,还清理了浏览器缓存。没有什么真正有帮助的。
我相信在拉力板上有一个bug。
https://stackoverflow.com/questions/55101924
复制相似问题