首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >从JSON文件中计数Python中的项

从JSON文件中计数Python中的项
EN

Stack Overflow用户
提问于 2016-03-10 03:21:22
回答 1查看 3.3K关注 0票数 0

我试图搜索一个数据文件,例如Yelp.json。它在洛杉矶,波士顿,哥伦比亚特区有生意。

我写了这个:

代码语言:javascript
运行
复制
# Python 2

# read json
with open('updated_data.json') as facts_data:
    data = json.load(facts_data)

# return every unique locality along with how often it occurs
locality = []
unique_locality = []
# Load items into lists
for item in data:
   locality.append(data["payload"]["locality"])
   if data["payload"]["locality"] not in unique_locality:
       print unique_locality.append(data["payload"]["locality"])
# Loops over unique_locality and count from locality
print "Unique Locality Count:", unique_locality, locality.count(data["payload"]["locality"])

但是我得到了“朴茨茅斯1”的答案,这意味着它没有提供所有的城市,甚至可能没有提供所有的计数。本节的目标是搜索该JSON文件,并让它显示"DC: 10家企业、LA: 20家企业、波士顿:2家企业“。每个有效载荷都是一组关于单个企业的信息,而“地点”就是城市。所以我想要它找到有多少个独特的城市,然后有多少企业在每个城市。因此,一种有效载荷可能是la的星巴克,另一种有效载荷可能是dc的星巴克,另一种可能是洛杉矶的Chipotle。

JSON文件示例(JSONlite.com表示它有效):

代码语言:javascript
运行
复制
"payload": {
        "existence_full": 1,
        "geo_virtual": "[\"56.9459720|-2.1971226|20|within_50m|4\"]",
        "latitude": "56.945972",
        "locality": "Stonehaven",
        "_records_touched": "{\"crawl\":8,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
        "address": "The Lodge, Dunottar",
        "email": "dunnottarcastle@btconnect.com",
        "existence_ml": 0.5694238217658721,
        "domain_aggregate": "",
        "name": "Dunnottar Castle",
        "search_tags": ["Dunnottar Castle Aberdeenshire", "Dunotter Castle"],
        "admin_region": "Scotland",
        "existence": 1,
        "category_labels": [
            ["Landmarks", "Buildings and Structures"]
        ],
        "post_town": "Stonehaven",
        "region": "Kincardineshire",
        "review_count": "719",
        "geocode_level": "within_50m",
        "tel": "01569 762173",
        "placerank": 65,
        "longitude": "-2.197123",
        "placerank_ml": 37.27916073464469,
        "fax": "01330 860325",
        "category_ids_text_search": "",
        "website": "http://www.dunnottarcastle.co.uk",
        "status": "1",
        "geocode_confidence": "20",
        "postcode": "AB39 2TL",
        "category_ids": [108],
        "country": "gb",
        "_geocode_quality": "4",
        "uuid": "3867aaf3-12ab-434f-b12b-5d627b3359c3"
    },
    "payload": {
        "existence_full": 1,
        "geo_virtual": "[\"56.237480|-5.073578|20|within_50m|4\"]",
        "latitude": "56.237480",
        "locality": "Inveraray",
        "_records_touched": "{\"crawl\":11,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
        "address": "Cherry Park",
        "email": "enquiries@inveraray-castle.com",
        "longitude": "-5.073578",
        "domain_aggregate": "",
        "name": "Inveraray Castle",
        "admin_region": "Scotland",
        "search_tags": ["Inveraray Castle Tea Room", "Inverary Castle"],
        "existence": 1,
        "category_labels": [
            ["Social", "Food and Dining", "Restaurants"]
        ],
        "region": "Argyll",
        "review_count": "532",
        "geocode_level": "within_50m",
        "tel": "01499 302203",
        "placerank": 67,
        "post_town": "Inveraray",
        "placerank_ml": 41.19978087352266,
        "fax": "01499 302421",
        "category_ids_text_search": "",
        "website": "http://www.inveraray-castle.com",
        "status": "1",
        "geocode_confidence": "20",
        "postcode": "PA32 8XE",
        "category_ids": [347],
        "country": "gb",
        "_geocode_quality": "4",
        "existence_ml": 0.7914881102847783,
        "uuid": "8278ab80-2cd1-4dbd-9685-0d0036b681eb"
    },
EN

回答 1

Stack Overflow用户

发布于 2016-03-10 03:57:48

如果您的"json“语义类似于

代码语言:javascript
运行
复制
{"payload":{ CONTENT_A }, "payload":{ CONTENT_B }, ..., "payload":{ CONTENT_LAST }}

它是一个有效的json 字符串,但是在您对该字符串进行json.loads之后,它将被计算为

代码语言:javascript
运行
复制
{"payload":{ CONTENT_LAST }}

这就是为什么你最终只会有一座城市和一笔生意。

您可以在这个在线json解析器http://json.parser.online.fr/上通过检查json字段来验证这种行为。

在本例中,预处理json字符串的一种方法是摆脱虚拟的“有效负载”键,并将内容字典直接包装到列表中。您将具有以下格式的json字符串。

代码语言:javascript
运行
复制
{[{CONTENT_A}, {CONTENT_B} ..., {CONTENT_LAST} ]}

假设您的json字符串现在是一个有效负载字典列表,并且您有json.loads(json_str)到数据。

在遍历json有效负载时,在此过程中构建一个查找表。这将自动为您处理重复的城市,因为在同一个城市的业务将被哈希到相同的列表。

代码语言:javascript
运行
复制
city_business_map = {}
for payload in data:
    city = payload['locality']
    business = payload['name']
    if city not in city_business_map:
        city_business_map[city] = []
    city_business_map[city].append(business)

然后,您可以轻松地通过以下方式呈现解决方案:

代码语言:javascript
运行
复制
for city, business_list in city_business_map.items():
     print city, len(business_list)

如果要计算每个城市中的唯一业务,请初始化要设置的值,而不是列表。

如果这是一个过度,而不是初始化为列表或设置,只需关联一个计数器与每个键。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/35906966

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档