下午好,我正在关注风暴启动器WordCountTopology here。作为参考,这里有一些Java文件。
这是主文件:
public class WordCountTopology {
public static class SplitSentence extends ShellBolt implements IRichBolt {
public SplitSentence() {
super("python", "splitsentence.py");
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
public static class WordCount extends BaseBasicBolt {
Map<String, Integer> counts = new HashMap<String, Integer>();
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if (count == null)
count = 0;
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new TextFileSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
conf.setDebug(true);
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
}
else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
Thread.sleep(10000);
cluster.shutdown();
}
}
}不是从随机的String[]中读取,我只想从一句话中读取一次:
public class TextFileSpout extends BaseRichSpout {
SpoutOutputCollector _collector;
String sentence = "";
String line = "";
String splitBy = ",";
BufferedReader br = null;
@Override
public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
_collector = collector;
}
@Override
public void nextTuple() {
Utils.sleep(100);
sentence = "wordOne wordTwo";
_collector.emit(new Values(sentence));
System.out.println(sentence);
}
@Override
public void ack(Object id) {
}
@Override
public void fail(Object id) {
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}这段代码运行时,输出是大量的线程/emits。问题是,程序重复执行85次,而不是只读一次。我猜这是因为原始代码会多次执行新的随机语句。
是什么导致NextTuple被调用这么多次?
发布于 2015-06-11 02:30:13
您应该在open方法中移动文件初始化代码,否则每次调用nextTuple时,您的文件处理程序都将被初始化。
编辑:
在open方法中,执行类似这样的操作
br = new BufferedReader(new FileReader(csvFileToRead));然后,读取文件的逻辑应该在nextTuple方法内部
while ((line = br.readLine()) != null) {
// your logic
}https://stackoverflow.com/questions/30738832
复制相似问题