问将CSV/Excel中的多个表转换为字典或Python中的Dataframe
EN

Stack Overflow用户

提问于 2018-08-10 05:01:42

回答 1查看 473关注 0票数 0

我需要帮助！

我有一个Excel文件，其中包含我试图放入dataframe中的数据，但这些数据是以表格形式存在的，不容易处理。示例：

我希望最终将其放入以下形式的数据框架中：

Meal               Food                              Calories
Breakfast          English Muffins                   120
Breakfast          Peanut Butter Spread              190
Morning Snack      Banana                            90
Morning Snack      Nectarine                         59
...                ...                               ...

并为此表单中的每日合计提供一个单独的数据帧(暂时忽略'Date‘列)：

Date          Calories     Fat     Fiber     Carbs     Sodium     Protein     Water
2017-07-01    699          26      16        93        612        32          0
2017-07-02    ...          ...     ...       ...       ...        ...         ...

我正在努力将其放入数据帧中。看一下数据集的屏幕截图，首先将数据存储到字典中是有意义的，但由于所有的空单元格，这给我留下了一堆NaN值。

我对获取“Meal”列的方式的想法是做一个向前填充，但这意味着我将不得不使用Series或Dataframe，而我还没有做到这一点。

这是我目前所拥有的：

df = pd.read_excel('filename.xls', 'Foods')

# create a list to store the dictionaries
food_logs = []

# this is code to reformat the string values in a certain column 
# to get the name of the sheets I need to use in the Excel. This can be ignored
for day in df.values:
    if day[1] != '0':
        foodLogSheetName = 'Food Log ' + day[0].replace('-', '')
        food_logs.append(foodLogSheetName)

# 'foods' is now a list of nested dictionaries (think of everything in the 
# first screenshot as the outer dictionary, and each of the column as the 
# inner dictionary)
foods = [xls.parse(food_log).to_dict() for food_log in food_logs]

这就是现在的“食品”，如果我把它打印出来，在每个外部字典之间加一行：

我可以选择使用CSV文件，但如果这有意义的话，我会让多个‘表’垂直堆叠，而不是多张表

我将非常感谢任何人可以提供的任何提示，请！

python

excel

csv

dictionary

dataframe

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-08-10 05:09:19

我认为，有了令人满意的数据，您就走上了正确的道路。听起来你可能只是在处理丢失的数据时遇到了麻烦。从您发布的示例中，看起来您可以将整个内容读取到一个数据帧中，删除所有空行，在meal列上进行填充，然后删除部分为空(或在子集上)的任何行。

import pandas as pd

df = pd.read_excel(file_path_or_buffer, sheet_name=my_sheet_name, **other_kwargs)
# You should have a dataframe that looks like
# Meal               Food                              Calories
# Breakfast          
#                    English Muffins                   120
#                    Peanut Butter Spread              190
# ...
# Next drop totally NaN/empty rows
df.dropna(how='all', inplace=True)
df['Meal'] = df['Meal'].fillna(method='ffill')
# Now you should have something that looks like
# Meal               Food                              Calories
# Breakfast          
# Breakfast          English Muffins                   120
# Breakfast          Peanut Butter Spread              190
# ...
# Drop empty rows, if you need to allow for some sparse data, use the subset argument
df.dropna(how='any', inplace=True)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51775708

复制

相似问题

问将CSV/Excel中的多个表转换为字典或Python中的Dataframe
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将CSV/Excel中的多个表转换为字典或Python中的DataframeEN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将CSV/Excel中的多个表转换为字典或Python中的Dataframe
EN