我需要帮助!
我有一个Excel文件,其中包含我试图放入dataframe中的数据,但这些数据是以表格形式存在的,不容易处理。示例:
我希望最终将其放入以下形式的数据框架中:
Meal Food Calories
Breakfast English Muffins 120
Breakfast Peanut Butter Spread 190
Morning Snack Banana 90
Morning Snack Nectarine 59
... ... ...
并为此表单中的每日合计提供一个单独的数据帧(暂时忽略'Date‘列):
Date Calories Fat Fiber Carbs Sodium Protein Water
2017-07-01 699 26 16 93 612 32 0
2017-07-02 ... ... ... ... ... ... ...
我正在努力将其放入数据帧中。看一下数据集的屏幕截图,首先将数据存储到字典中是有意义的,但由于所有的空单元格,这给我留下了一堆NaN值。
我对获取“Meal”列的方式的想法是做一个向前填充,但这意味着我将不得不使用Series或Dataframe,而我还没有做到这一点。
这是我目前所拥有的:
df = pd.read_excel('filename.xls', 'Foods')
# create a list to store the dictionaries
food_logs = []
# this is code to reformat the string values in a certain column
# to get the name of the sheets I need to use in the Excel. This can be ignored
for day in df.values:
if day[1] != '0':
foodLogSheetName = 'Food Log ' + day[0].replace('-', '')
food_logs.append(foodLogSheetName)
# 'foods' is now a list of nested dictionaries (think of everything in the
# first screenshot as the outer dictionary, and each of the column as the
# inner dictionary)
foods = [xls.parse(food_log).to_dict() for food_log in food_logs]
这就是现在的“食品”,如果我把它打印出来,在每个外部字典之间加一行:
我可以选择使用CSV文件,但如果这有意义的话,我会让多个‘表’垂直堆叠,而不是多张表
我将非常感谢任何人可以提供的任何提示,请!
发布于 2018-08-10 05:09:19
我认为,有了令人满意的数据,您就走上了正确的道路。听起来你可能只是在处理丢失的数据时遇到了麻烦。从您发布的示例中,看起来您可以将整个内容读取到一个数据帧中,删除所有空行,在meal列上进行填充,然后删除部分为空(或在子集上)的任何行。
import pandas as pd
df = pd.read_excel(file_path_or_buffer, sheet_name=my_sheet_name, **other_kwargs)
# You should have a dataframe that looks like
# Meal Food Calories
# Breakfast
# English Muffins 120
# Peanut Butter Spread 190
# ...
# Next drop totally NaN/empty rows
df.dropna(how='all', inplace=True)
df['Meal'] = df['Meal'].fillna(method='ffill')
# Now you should have something that looks like
# Meal Food Calories
# Breakfast
# Breakfast English Muffins 120
# Breakfast Peanut Butter Spread 190
# ...
# Drop empty rows, if you need to allow for some sparse data, use the subset argument
df.dropna(how='any', inplace=True)
https://stackoverflow.com/questions/51775708
复制相似问题