我已经下载了一个用于设置hadoop配置的脚本。它包含具有以下块的/scripts/directories.sh文件:
# Space separated list of directories where NameNode will store file system image. For example, /grid/hadoop/hdfs/nn /grid1/hadoop/hdfs/nn
DFS_NAME_DIR="TODO-LIST-OF-NAMENODE-DIRS";
# Space separated list of directories where DataN
从日志中,我可以看到有182 k行,70 the。在Dataproc上训练182 K行需要1.5小时、70 on数据和9小时(从15/11/14开始,01:58:28,15/11/14,09:19:09)。在本地机器上加载相同的数据和运行相同的算法需要3分钟
DataProc日志
15/11/13 23:27:09 INFO com.google.cloud.hadoop.io.bigquery.ShardedExportToCloudStorage: Table 'mydata-data:website_wtw_feed.video_click20151111' to be
我有一个分布式视频分析系统,它由以下部分组成:
1. feature extraction: generated lots of features(20+) from each frame of the video
2. multiple detectors(in different machine):
* Each of them will get a subset of feature
* Each of them needs the features from multiple frames.
* Eg. Detector 1 needs feature 1-5 from