文章/答案/技术大牛

发布

社区首页 >问答首页 >阅读文件内容，然后对其进行分析

问阅读文件内容，然后对其进行分析
EN

Stack Overflow用户

提问于 2016-12-29 18:17:54

回答 4查看 79关注 0票数 0

我目前正在进行的项目让我读取一个文件，然后对数据中的数据进行分析。使用FileReader，我已经将文件的每一行读取到一个数组中。该文件如下所示：

01 02 03 04 05 06 02 03 04 05 06 07 03 04 05 06 07 08 04 05 06 07 08 09

这些不是确切的数字，但它们是一个很好的例子。我现在想知道在我的数据列表中有多少次提到了"04“这个数字。我正在考虑把所有的数据放在一个二维数组中，方法是将每一行分开，但我不太清楚该如何做。我是否需要解析器，或者是否可以使用某种类型的字符串函数(如split)将数据拆分，然后将其存储到Array中？

java

回答 4

Stack Overflow用户

回答已采纳

发布于 2016-12-29 18:44:44

您在设计思想上“为时过早”；例如，在这里使用2D数组。

您知道，在开始考虑设计/实现选择之前，您确实需要更好地了解requirements。

示例:当您只关注时，关心的是测量某个数字出现的频率，总体上是，那么使用2D数组不会有任何好处。相反，您可以将所有的数字放入一个长List<Integer>中，然后对其使用一些花哨的java8流操作。

但是，如果这只是众多例子中的一个，那么管理内存中的数据的其他方法可能会更有效。

除此之外，如果您发现您将对这些数据所做的事情超出了简单的计算范围，那么Java可能不是这里的最佳选择。你看，像R这样的语言就是专门设计出来的:处理大量难以置信的数据；并让您“即时”访问范围广泛的各种统计操作。

要回答有关计数所有不同数字的出现情况的想法，这非常简单:这里使用Map<Integer, Integer>；如：

Map<Integer, Integer> numbersWithCount = new HashMap<>();

now you loop over your data; and for each data point:

int currentNumber = ... next number from your input data

int counterForNum;
if (numbersWithCount.containsKey(currentNumber)) {
  counterForNum = numbersWithCount.get(currentNumber) + 1;
} else {
   // currentNumber found the first time
  counterForCurrentNumber = 1;
}
numbersWithCount.put(currentNumber);

换句话说:您只需遍历所有传入的数字，对于其中的每一个，您要么创建一个新的计数器，要么增加一个已经存储的计数器。

如果您使用的是TreeMap而不是HashMap，那么您甚至可以对密钥进行排序。有很多可能性..。

票数 0

Stack Overflow用户

发布于 2016-12-29 18:21:56

如果你只需要计算04，你真的不需要存储整个文件。例如，你可以读取每一行，并检查它是否是04的(并添加到计数器或其他东西中)。你甚至可以逐个字符地读取文件，但这对于提高效率(如果有的话)来说可能有点乏味。

如果您需要对文件进行更复杂的处理，则此方法可能无法完成任务。但除非你具体说明那是什么，否则我不能说它到底是什么。

票数 1

Stack Overflow用户

发布于 2016-12-29 18:29:24

您应该使用一个地图来保存事件的计数，如下所示：

public static void main(String[] args) throws IOException {
Pattern splitter = Pattern.compile("\\s+");
try(Stream<String> stream = Files.lines(Paths.get("input.txt"))) {
    Map<String,Long> result = stream.flatMap(splitter::splitAsStream)
            .collect(Collectors.groupingBy(Function.identity(),
                    Collectors.counting()));
    System.out.println(result);
}}

或者加载数据并在多个阶段进行解析：

public static void main(String[] args) throws IOException {
    // 1. load the data array
    String[][] data;
    try(Stream<String> stream = Files.lines(Paths.get("numbers.txt"))) {
        data = stream.map(line -> line.split("\\s+")).toArray(String[][]::new);
    }
    System.out.format("Total lines = %d%n", data.length);

    // 2. count the occurrences of each word
    Map<String,Long> countDistinct = Arrays.stream(data).flatMap(Arrays::stream)
            .collect(Collectors.groupingBy(Function.identity(),
                    Collectors.counting()));
    System.out.println("Count of 04 = " + countDistinct.getOrDefault("04", 0L));

    // 3. calculate correlations 
    Map<String,Map<String,Long>> correlations;
    correlations = Arrays.stream(data).flatMap((String[] row) -> {
        Set<String> words = new HashSet<>(Arrays.asList(row));
        return words.stream().map(word -> new AbstractMap.SimpleEntry<>(word, words));
    }).collect(Collectors.toMap(kv -> kv.getKey(),
            kv -> kv.getValue().stream()
                    .collect(Collectors.toMap(Function.identity(), v -> 1L)),
            (map1, map2) -> {
                map2.entrySet().forEach(kv -> map1.merge(kv.getKey(), kv.getValue(), Long::sum));
                return map1;
            }));
    System.out.format("Lines with 04 = %d%n",           
        correlations.getOrDefault("04", Collections.EMPTY_MAP).getOrDefault("04", 0L));
    System.out.format("Lines with both 04 and 07 = %d%n",           
        correlations.getOrDefault("04", Collections.EMPTY_MAP).getOrDefault("07", 0L));
}

编辑：

这里有一个(也许)更容易阅读的版本，它不使用流/功能方法：

public static void main(String[] args) throws IOException {
    long lineCount = 0;
    Map<String,Long> wordCount = new HashMap<>();
    Map<String,Map<String,Long>> correlations = new HashMap<>();
    try(Stream<String> stream = Files.lines(Paths.get("numbers.txt"))) {
        Iterable<String> lines = stream::iterator;
        Set<String> lineWords = new HashSet<>();
        for(String line : lines) {
            lineCount++;
            for(String word : line.split("\\s+")) {
                lineWords.add(word);
                wordCount.merge(word, 1L, Long::sum);
            }
            for(String wordA : lineWords) {
                Map<String,Long> relate = correlations.computeIfAbsent(wordA,
                        key -> new HashMap<>());
                for(String wordB : lineWords) {
                    relate.merge(wordB, 1L, Long::sum);
                }
            }
        }
    }
    System.out.format("Total lines = %d%n", lineCount);
    System.out.println("Count of 04 = " + wordCount.getOrDefault("04", 0L));
    System.out.format("Lines with 04 = %d%n",           
        correlations.getOrDefault("04", Collections.EMPTY_MAP).getOrDefault("04", 0L));
    System.out.format("Lines with both 04 and 07 = %d%n",           
        correlations.getOrDefault("04", Collections.EMPTY_MAP).getOrDefault("07", 0L));
}

输出：

项目总数=4 04计数=4 线04 =4 同时具有04和07的行=3

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/41385136

复制

相似问题

问阅读文件内容，然后对其进行分析
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问阅读文件内容，然后对其进行分析EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问阅读文件内容，然后对其进行分析
EN