首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >hadoop中读取json的自定义输入格式

hadoop中读取json的自定义输入格式
EN

Stack Overflow用户
提问于 2013-09-03 21:28:09
回答 1查看 6.1K关注 0票数 2

我是hadoop的初学者,我被告知要创建一个自定义的inputformat类来读取json数据,我已经在谷歌上搜索并学习了如何创建一个自定义的inputformat类来读取file.but中的数据,我被困在解析json数据上。我的json数据如下所示

代码语言:javascript
复制
[
    {
        "_count": 30,
        "_start": 0,
        "_total": 180,
        "values": [
            {
                "attachment": {
                    "contentDomain": "techcarnival2013.eventbrite.com",
                    "contentUrl": "http://techcarnival2013.eventbrite.com/",
                    "imageUrl": "http://ebmedia.eventbrite.com/s3-s3/static/images/django/logos/eb_home_tm-trans-fb.png",
                    "summary": "Get to know a few thousand of Silicon Valley's best and brightest while enjoying unparalleled access to Candlestick Park,\u00a0games, food, music and more. We'll have carnival games you haven't played since you were ten, giant inflatable obstacle...",
                    "title": "Tech Carnival @ Candlestick Park"
                },
                "comments": {
                    "_total": 0
                },
                "creationTimestamp": 1373908436000,
                "creator": {
                    "firstName": "Clayton",
                    "headline": "Director of Operations",
             "secondname":{
                "name":"myname"
                },
                    "lastName": "K.",
                    "pictureUrl": "http://m.c.lnkd.licdn.com/mpr/mprx/0_R7Vm6_RqBDHaHCDzJHRA6hsNcwOfECjzMeaA6heqHeo0v6ovBWoCe8pVJiYrd5pJVu4KdbnQQ3Lj"
                },
                "likes": {
                    "_total": 0
                },
                "relationToViewer": {
                    "availableActions": {
                        "_total": 7,
                        "values": [
                            {
                                "code": "add-comment"
                            },
                            {
                                "code": "categorize-as-job"
                            },
                            {
                                "code": "categorize-as-promotion"
                            },
                            {
                                "code": "flag-as-inappropriate"
                            },
                            {
                                "code": "follow"
                            },
                            {
                                "code": "like"
                            },
                            {
                                "code": "reply-privately"
                            }
                        ]
                    },
                    "isFollowing": false,
                    "isLiked": false
                },
                "summary": "Network with 4,000+ from the tech community, including folks from DFJ, Google, LinkedIn, Square, Uber, Y Combinator, 500 Startups, etc. $10 ticket gets you all-you-can-ride access to the pop-up Tech Carnival, will be the biggest Wednesday night of the tech summer.",
                "title": "Tech Event @ Candlestick Park on Wednesday, July 17th! Come play carnival games with ~4,000 of the Bay area's best and brightest!"
            },
            {
                "attachment": {
                    "contentDomain": "lifebeyondnumbers.com",
                    "contentUrl": "http://bit.ly/10VTqMu",
                    "imageUrl": "http://lifebeyondnumbers.com/wp-content/uploads/2013/07/lurnq_Online_Courses.jpg",
                    "summary": "LurnQ offers a platform for learning and teaching that is free for everyone. It caters to a diverse online audience and is relevant to everyone in general. The key segment that we address now is of life long learners.",
                    "title": "LurnQ - making lifelong learning clutter free, fun and a social..."
                },
                "comments": {
                    "_total": 0
                },
                "creationTimestamp": 1373883177000,
                "creator": {
                    "firstName": "Syed",
                    "headline": "Founder and CEO at QubiqSquare",
                    "lastName": "Muksit",
                    "pictureUrl": "http://m.c.lnkd.licdn.com/mpr/mprx/0_Y5gdzlRCbQBTqIa-pXYnz-01b6KinDO-pFWnz-ZCZLk1WWdt-_SLUt2uWmrpzo0OxQxcVv6pRjbE"
                },
                "likes": {
                    "_total": 0
                },
                "relationToViewer": {
                    "availableActions": {
                        "_total": 7,
                        "values": [
                            {
                                "code": "add-comment"
                            },
                            {
                                "code": "categorize-as-job"
                            },
                            {
                                "code": "categorize-as-promotion"
                            },
                            {
                                "code": "flag-as-inappropriate"
                            },
                            {
                                "code": "follow"
                            },
                            {
                                "code": "like"
                            },
                            {
                                "code": "reply-privately"
                            }
                        ]
                    },
                    "isFollowing": false,
                    "isLiked": false
                },
                "summary": "LurnQ offers a platform for learning and teaching that is free for everyone. It caters to a diverse online audience and is relevant to everyone in general. The key segment that we address now is of life long learners.",
                "title": "There is so much to learn and most of the times, we don\u2019t even know that this-and-that good stuff exists.  http://bit.ly/10VTqMu"
            },
            {
                "attachment": {
                    "contentDomain": "techcarnival2013.eventbrite.com",
                    "contentUrl": "http://techcarnival2013.eventbrite.com/",
                    "imageUrl": "http://ebmedia.eventbrite.com/s3-s3/static/images/django/logos/eb_home_tm-trans-fb.png",
                    "summary": "Get to know a few thousand of Silicon Valley's best and brightest while enjoying unparalleled access to Candlestick Park,\u00a0games, food, music and more. We'll have carnival games you haven't played since you were ten, giant inflatable obstacle...",
                    "title": "Tech Carnival @ Candlestick Park"
                },
                "comments": {
                    "_total": 0
                },
                "creationTimestamp": 1373654758000,
                "creator": {
                    "firstName": "Clayton",
                    "headline": "Director of Operations",
                    "lastName": "K.",
                    "pictureUrl": "http://m.c.lnkd.licdn.com/mpr/mprx/0_R7Vm6_RqBDHaHCDzJHRA6hsNcwOfECjzMeaA6heqHeo0v6ovBWoCe8pVJiYrd5pJVu4KdbnQQ3Lj"
                },
                "likes": {
                    "_total": 0
                },
                "relationToViewer": {
                    "availableActions": {
                        "_total": 7,
                        "values": [
                            {
                                "code": "add-comment"
                            },
                            {
                                "code": "categorize-as-job"
                            },
                            {
                                "code": "categorize-as-promotion"
                            },
                            {
                                "code": "flag-as-inappropriate"
                            },
                            {
                                "code": "follow"
                            },
                            {
                                "code": "like"
                            },
                            {
                                "code": "reply-privately"
                            }
                        ]
                    },
                    "isFollowing": false,
                    "isLiked": false
                },
                "summary": "Network with 4,000+ from the tech community, including folks from DFJ, Google, LinkedIn, Square, Uber, Y Combinator, 500 Startups, etc. $10 ticket gets you all-you-can-ride access to the pop-up Tech Carnival, will be the biggest Wednesday night of the tech summer.",
                "title": "Tech Event @ Candlestick Park on Wednesday, July 17th! Come play carnival games with ~4,000 of the Bay area's best and brightest!"
            }
..........
........ so on

]

我想要读取json数组中的单个json对象,我的意思是读取正确的json字符串,然后将字符串提供给映射,我将在映射中使用json解析器来构造我自己的键值pair.any帮助

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-09-04 03:06:36

如果你的问题与Magham Ravi评论的一致,答案是好的。

但是,如果您有一个包含所有JSON数据的文件(如上所述),那么您可能希望读取整个文件,并将其作为字符串从映射函数中的值部分( JSON值)中检索出来,然后将其提供给同一个BytesWritable()函数中提供的JSON解析器。

请看一下WholeFileInputFormat

对于JSON,必须有唯一的开始和结束标记,以准确地标记所需的单个JSON数据对象的开始和结束。仅仅使用start-tag = "{“和end-tag = "}”可能没有帮助,因为您已经有许多会混淆InputFormat的嵌套。

如果您在任何情况下都无法实现上述目标,请尝试构建覆盖TextInputFormat中定义的LineReader的customTextInputFormat。

代码语言:javascript
复制
private static final byte CR = '\r';
private static final byte LF = '\n';

您可以放弃CR并将LF更改为“JSON ]\n[”,因为每个独立的JSON数据都将采用所示的形式,或者您如何更好地了解它?

[

...JSON 1

]

[

...JSON 2

]

[

...JSON N

]

(注意:在不同JSON对象的数据之间有一个\n ]和[标记为边界。

希望这是有意义的。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/18593595

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档