文章/答案/技术大牛

发布

社区首页 >问答首页 >在Deno中读取大型JSON文件

问在Deno中读取大型JSON文件
EN

Stack Overflow用户

提问于 2019-09-23 21:21:10

回答 4查看 2.6K关注 0票数 8

我经常发现自己读取了一个大型JSON文件(通常是一个对象数组)，然后操作每个对象并将其写回一个新文件。

为了在Node (至少读取数据部分)中实现这一点，我通常使用stream模块执行类似的操作。

const fs = require('fs');
const StreamArray = require('stream-json/streamers/StreamArray');

const pipeline = fs.createReadStream('sample.json')
  .pipe(StreamArray.withParser());

pipeline.on('data', data => {
    //do something with each object in file
});

我最近发现了Deno，并希望能够使用Deno完成这个工作流。

看起来，标准库中的readJSON方法将文件的全部内容读入内存，因此我不知道它是否适合处理大型文件。

是否可以通过使用Deno中内置的一些较低级别的方法来流文件中的数据来做到这一点？

deno

回答 4

Stack Overflow用户

回答已采纳

发布于 2020-05-21 04:30:32

现在，Deno1.0已经发布，如果其他人有兴趣这样做的话，请回过头来讨论一下。我能够拼凑一个适用于我的用例的小类。它不像stream-json包那样健壮，但它处理大型JSON数组非常好。

import { EventEmitter } from "https://deno.land/std/node/events.ts";

export class JSONStream extends EventEmitter {

    private openBraceCount = 0;
    private tempUint8Array: number[] = [];
    private decoder = new TextDecoder();

    constructor (private filepath: string) {
        super();
        this.stream();
    }

    async stream() {
        console.time("Run Time");
        let file = await Deno.open(this.filepath);
        //creates iterator from reader, default buffer size is 32kb
        for await (const buffer of Deno.iter(file)) {

            for (let i = 0, len = buffer.length; i < len; i++) {
                const uint8 = buffer[ i ];

                //remove whitespace
                if (uint8 === 10 || uint8 === 13 || uint8 === 32) continue;

                //open brace
                if (uint8 === 123) {
                    if (this.openBraceCount === 0) this.tempUint8Array = [];
                    this.openBraceCount++;
                };

                this.tempUint8Array.push(uint8);

                //close brace
                if (uint8 === 125) {
                    this.openBraceCount--;
                    if (this.openBraceCount === 0) {
                        const uint8Ary = new Uint8Array(this.tempUint8Array);
                        const jsonString = this.decoder.decode(uint8Ary);
                        const object = JSON.parse(jsonString);
                        this.emit('object', object);
                    }
                };
            };
        }
        file.close();
        console.timeEnd("Run Time");
    }
}

示例用法

const stream = new JSONStream('test.json');

stream.on('object', (object: any) => {
    // do something with each object
});

处理包含大约20,000个小对象的~4.8MB json文件

[
    {
      "id": 1,
      "title": "in voluptate sit officia non nesciunt quis",
      "urls": {
         "main": "https://www.placeholder.com/600/1b9d08",
         "thumbnail": "https://www.placeholder.com/150/1b9d08"
      }
    },
    {
      "id": 2,
      "title": "error quasi sunt cupiditate voluptate ea odit beatae",
      "urls": {
          "main": "https://www.placeholder.com/600/1b9d08",
          "thumbnail": "https://www.placeholder.com/150/1b9d08"
      }
    }
    ...
]

花了127毫秒。

❯ deno run -A parser.ts
Run Time: 127ms

票数 3

Stack Overflow用户

发布于 2019-10-22 07:58:07

我认为像stream-json这样的包在Deno上和NodeJs上一样有用，所以其中一种方法肯定是获取该包的源代码并使其在Deno上工作。(这个答案很快就会过时，因为有很多人会做这样的事情，而且不会花很长时间，直到有人--也许是你--把他们的结果公之于众，并输入到任何德诺剧本中。)

或者，尽管这并不直接回答您的问题，但是处理大型Json数据集的一个常见模式是拥有包含Json对象的文件，这些文件由换行符分隔。(每行一个Json对象)例如，Hadoop和Spark、AWS S3 select，以及其他许多使用这种格式的用户。如果您能够以这种格式获取输入数据，这可能会帮助您使用更多的工具。此外，您还可以在Deno的标准库：readString('\n')中使用std/blob/master/io/bufio.ts方法来流数据

有一个额外的优势，即减少对第三方包的依赖。示例代码：

    import { BufReader } from "https://deno.land/std/io/bufio.ts";

    async function stream_file(filename: string) {
        const file = await Deno.open(filename);
        const bufReader = new BufReader(file);
        console.log('Reading data...');
        let line: string;
        let lineCount: number = 0;
        while ((line = await bufReader.readString('\n')) != Deno.EOF) {
            lineCount++;
            // do something with `line`.
        }
        file.close();
        console.log(`${lineCount} lines read.`)
    }

票数 2

Stack Overflow用户

发布于 2020-02-15 20:39:19

这是我为一个包含13,147,089行文本的文件使用的代码。注意，它与Roberts的代码相同，但使用了readLine()而不是readString('\n')。readLine()是一个低级的行读取基元.大多数呼叫者应该使用readString('\n')，或者使用扫描仪。

import { BufReader } from "https://deno.land/std/io/bufio.ts";

export async function stream_file(filename: string) {
  const file = await Deno.open(filename);
  const bufReader = new BufReader(file);
  console.log("Reading data...");
  let line: string | any;
  let lineCount: number = 0;
  while ((line = await bufReader.readLine()) != Deno.EOF) {
    lineCount++;
    // do something with `line`.
  }
  file.close();
  console.log(`${lineCount} lines read.`);
}

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58070346

复制

相似问题

问在Deno中读取大型JSON文件
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Deno中读取大型JSON文件EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Deno中读取大型JSON文件
EN