文章/答案/技术大牛

发布

社区首页 >问答首页 >C++如何从类似JSON的文本文件中获取特定的值？

问C++如何从类似JSON的文本文件中获取特定的值？
EN

Stack Overflow用户

提问于 2022-01-22 17:30:04

回答 1查看 162关注 0票数 0

我有一个.txt文件，其中包含数百个图书信息。(所显示的图像只是其中之一，它有多行相同的信息，在同一个.txt文件中有不同的数据)，我只想检查"unigramCount“值，并找到前3位最常用的单词。因此，程序应该查找unigramCount行中的每个单词，并在单词前面增加数字。将它们存储在数组或链接列表中，最后告诉最频繁的3个单词。

c++

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-01-22 18:10:24

您的文件是一个所谓的结构化文本文件，如xml或json。

语法可以定义文件的语言(结构)。如果我们看一下乔姆斯基等级分类，那么我们就有了一种所谓的“上下文无关语言”，它可以并且应该用一个典型的推倒自动机或递归下降分析器来分析。

但你说过你“只想查一查”。这需要一个乔姆斯基等级体系3 规则语法，它可以用正则表达式实现。

请注意:通常不能使用正则表达式，因为它们不能识别嵌套结构。例如。如果有嵌套的开始和结束大括号，则正则表达式无法工作。简单地说。正则表达式不能计数。

无论如何，由于给定文件的简单性，我们可以使用正则表达式进行模式匹配。

好的是C++用它的regex库支持正则表达式。

但即使如此，我们也无法找到一个单一正则表达式的解决方案。我们将采用三个步骤

匹配包含unigram数据的所有行。
在上面的匹配行上迭代并提取单词/计数对
将单词计数对拆分为单词和计数

(正则表达式可以在任何在线regex测试器(如这里)中进行测试)。

如果我们有这个，我们将使用一个关联的容器，就像一个std：：地图，如果排序重要，或者地图 (如果速度重要)来计数。命令是为了字数而不是字数。

使用地图或多或少是计数项目的标准方法。“关键”是我们想要计数的词。它是独一无二的。相关的淡水河谷是伯爵。

这两种地图都有一个方便的索引算子。您可以将键(在本例中是一个unigram单词的实例)放在方括号[]中，如果该单词已经存在，则索引运算符将返回对相关值的引用。

如果索引中的单词尚未在映射中出现，它将被创建，并且在我们的示例中将值初始化为0。将再次返回对此值的引用。因此，在任何情况下，结果都是对计数器值的引用，然后我们可以增加这个值。

凉爽的。

但是，由于关联容器的性质，您无法对它们进行排序。因此，我们需要在第二个容器中复制产生的单词/计数对。

并且在CPP算法库中有很好的拟合函数。复制。使用它，我们可以找到前3，并将它们复制到结果的std::vector中。

因此，我们将大量使用现有的和高级的C++功能，用几行代码实现解决方案。

有许多不同的解决方案。一个例子是：

#include <iostream>
#include <fstream>
#include <string>
#include <regex>
#include <unordered_map>
#include <utility>
#include <map>

// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<std::string, unsigned int>;
// Standard approach for counter
//using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;
using Counter = std::map<Pair::first_type, Pair::second_type>;
// Sorted values will be stored in a vector
using Top = std::vector<Pair>;
// ------------------------------------------------------------

// Regexes
const std::regex unigramLineIdentifierRegex{R"(\"unigramCount\":\{(.*)\})"};
const std::regex unigramCountRegex{ R"((\"[a-zA-Z ,]+\"\:\d+)+)" };
const std::regex wordAndCountRegex(R"(\"([a-zA-Z ,]+)\":(\d+))");

int main() {
    // Open the file and check, if it could be opened
    if (std::ifstream ifs{"text.txt"}; ifs) {

        // Read the complete textfile into a string
        std::string text(std::istreambuf_iterator<char>(ifs), {});
        
        // Here we will count the words
        Counter counter{};

        // Now extract all lines containing the unigram line pattern into a std::vector
        std::vector unigramLine(std::sregex_token_iterator(text.begin(), text.end(), unigramLineIdentifierRegex), {});

        // For each of the found lines containg a unigram record
        for (const std::string& wordAndCount : unigramLine) {

            // Extract all word count pairs
            std::vector unigrams(std::sregex_token_iterator(wordAndCount.begin(), wordAndCount.end(), unigramCountRegex), {});

            // Go over all word/count strings and split them up
            for (const std::string unigram : unigrams) {

                // Find the word part and the count part
                std::smatch smWordAndCount{};
                if (std::regex_match(unigram, smWordAndCount, wordAndCountRegex)) {

                    // Increment the global counter
                    counter[smWordAndCount[1]] += std::stoi(smWordAndCount[2]);
                }
            }
        }
        // Here we will store the top 3
        Top top(3);

        // Copy and sort. Get only the biggest 3
        std::partial_sort_copy(counter.begin(), counter.end(), top.begin(), top.end(), [](const Pair& p1, const Pair& p2) { return p1.second > p2.second; });

        // Debug output. Show the result´, the top3 on the screen
        for (const auto& [word, count] : top)
            std::cout << word << " \t--> " << count << '\n';
    }
    else std::cerr << "\n*** Error: Could not open source file\n\n";
}

请记住。regex解决方案不是惯用的正确方法。

如果文件真的是JSON，那么查看这个图书馆

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70815367

复制

相似问题

问C++如何从类似JSON的文本文件中获取特定的值？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问C++如何从类似JSON的文本文件中获取特定的值？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问C++如何从类似JSON的文本文件中获取特定的值？
EN