我试图搜索一个数据文件,例如Yelp.json。它在洛杉矶,波士顿,哥伦比亚特区有生意。
我写了这个:
# Python 2
# read json
with open('updated_data.json') as facts_data:
data = json.load(facts_data)
# return every unique locality along with how often it occurs
locality = []
unique_locality = []
# Load items into lists
for item in data:
locality.append(data["payload"]["locality"])
if data["payload"]["locality"] not in unique_locality:
print unique_locality.append(data["payload"]["locality"])
# Loops over unique_locality and count from locality
print "Unique Locality Count:", unique_locality, locality.count(data["payload"]["locality"])但是我得到了“朴茨茅斯1”的答案,这意味着它没有提供所有的城市,甚至可能没有提供所有的计数。本节的目标是搜索该JSON文件,并让它显示"DC: 10家企业、LA: 20家企业、波士顿:2家企业“。每个有效载荷都是一组关于单个企业的信息,而“地点”就是城市。所以我想要它找到有多少个独特的城市,然后有多少企业在每个城市。因此,一种有效载荷可能是la的星巴克,另一种有效载荷可能是dc的星巴克,另一种可能是洛杉矶的Chipotle。
JSON文件示例(JSONlite.com表示它有效):
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.9459720|-2.1971226|20|within_50m|4\"]",
"latitude": "56.945972",
"locality": "Stonehaven",
"_records_touched": "{\"crawl\":8,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "The Lodge, Dunottar",
"email": "dunnottarcastle@btconnect.com",
"existence_ml": 0.5694238217658721,
"domain_aggregate": "",
"name": "Dunnottar Castle",
"search_tags": ["Dunnottar Castle Aberdeenshire", "Dunotter Castle"],
"admin_region": "Scotland",
"existence": 1,
"category_labels": [
["Landmarks", "Buildings and Structures"]
],
"post_town": "Stonehaven",
"region": "Kincardineshire",
"review_count": "719",
"geocode_level": "within_50m",
"tel": "01569 762173",
"placerank": 65,
"longitude": "-2.197123",
"placerank_ml": 37.27916073464469,
"fax": "01330 860325",
"category_ids_text_search": "",
"website": "http://www.dunnottarcastle.co.uk",
"status": "1",
"geocode_confidence": "20",
"postcode": "AB39 2TL",
"category_ids": [108],
"country": "gb",
"_geocode_quality": "4",
"uuid": "3867aaf3-12ab-434f-b12b-5d627b3359c3"
},
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.237480|-5.073578|20|within_50m|4\"]",
"latitude": "56.237480",
"locality": "Inveraray",
"_records_touched": "{\"crawl\":11,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "Cherry Park",
"email": "enquiries@inveraray-castle.com",
"longitude": "-5.073578",
"domain_aggregate": "",
"name": "Inveraray Castle",
"admin_region": "Scotland",
"search_tags": ["Inveraray Castle Tea Room", "Inverary Castle"],
"existence": 1,
"category_labels": [
["Social", "Food and Dining", "Restaurants"]
],
"region": "Argyll",
"review_count": "532",
"geocode_level": "within_50m",
"tel": "01499 302203",
"placerank": 67,
"post_town": "Inveraray",
"placerank_ml": 41.19978087352266,
"fax": "01499 302421",
"category_ids_text_search": "",
"website": "http://www.inveraray-castle.com",
"status": "1",
"geocode_confidence": "20",
"postcode": "PA32 8XE",
"category_ids": [347],
"country": "gb",
"_geocode_quality": "4",
"existence_ml": 0.7914881102847783,
"uuid": "8278ab80-2cd1-4dbd-9685-0d0036b681eb"
},发布于 2016-03-10 03:57:48
如果您的"json“语义类似于
{"payload":{ CONTENT_A }, "payload":{ CONTENT_B }, ..., "payload":{ CONTENT_LAST }}它是一个有效的json 字符串,但是在您对该字符串进行json.loads之后,它将被计算为
{"payload":{ CONTENT_LAST }}这就是为什么你最终只会有一座城市和一笔生意。
您可以在这个在线json解析器http://json.parser.online.fr/上通过检查json字段来验证这种行为。
在本例中,预处理json字符串的一种方法是摆脱虚拟的“有效负载”键,并将内容字典直接包装到列表中。您将具有以下格式的json字符串。
{[{CONTENT_A}, {CONTENT_B} ..., {CONTENT_LAST} ]}假设您的json字符串现在是一个有效负载字典列表,并且您有json.loads(json_str)到数据。
在遍历json有效负载时,在此过程中构建一个查找表。这将自动为您处理重复的城市,因为在同一个城市的业务将被哈希到相同的列表。
city_business_map = {}
for payload in data:
city = payload['locality']
business = payload['name']
if city not in city_business_map:
city_business_map[city] = []
city_business_map[city].append(business)然后,您可以轻松地通过以下方式呈现解决方案:
for city, business_list in city_business_map.items():
print city, len(business_list)如果要计算每个城市中的唯一业务,请初始化要设置的值,而不是列表。
如果这是一个过度,而不是初始化为列表或设置,只需关联一个计数器与每个键。
https://stackoverflow.com/questions/35906966
复制相似问题