如何比较python中列的行?

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (1)
  • 关注 (0)
  • 查看 (113)

我有下面的数据框架


df=
 city    code     qty    year
 hyd     1        10    2016
 hyd     2        12    2016
 pune    2        15    2016
 pune    4        25    2016
 hyd     1        10    2017
 hyd     3        12    2017
 pune    1        15    2017
 pune    2        25    2017
 hyd     2        10    2018
 hyd     4        10    2018
 hyd     6        12    2018
 pune    1        15    2018
 pune    4        25    2018

我想在这里添加所有独特的年份作为列(2016,2017,2018),并比较同一城市和一年的代码是否与其他年份相比较(即2018年与20172016,2015年和2017年与2016年,2015年等)。如果其他年份有相同的城市和代码可用,则将其标记为Y(如果不存在),则N和我们正在比较的城市必须留空。

以下必须是结果数据框架。


city    code     qty    year    year_2016     year_2017    year_2018 
hyd     1         10    2016                                 
hyd     2         12    2016                                         
pune    2         15    2016                                  
pune    4         25    2016                                
hyd     1         10    2017        Y                                          
hyd     3         12    2017        N                         
pune    1         15    2017        N                           
pune    2         25    2017        Y                         
hyd     2         10    2018        Y            N        
hyd     4         12    2018        N            N
hyd     6         12    2018        N            N
pune    1         15    2018        N            Y
pune    4         25    2018        Y            N     
提问于
用户回答回答于
# Get a list of all year, this way we know how many columns to make and which columns to mark as N
all_years = df.year.unique()

def my_func(x):
    # Function to create new year_... rows

    # Get the city and code names
    city, code = x.name

    # This function will return a pandas.DataFrame
    out = pd.DataFrame()

    # Loop through each year
    for key, year in x.iteritems():
        append_series = pd.Series()

        # If this (city, code) has multiple years we must iterate over each year vs the other years
        iterate = [year]
        if len(x.values) > 1:
            iterate = x.drop(key).values

        # Create a pandas.Series to add to the main dataframe 'out'
        for other_year in iterate:
            append_series.at['year'] = year
            append_series.at["year_"+str(other_year)] = "Y"
            append_series.at["city"] = city
            append_series.at["code"] = code

            # If any year does not show up then we must mark is as N
            for missing_year in (set(all_years) - set(x.values)):
                append_series.at["year_" + str(missing_year)] = "N"

        # Add this series to the main dataframe 'out'
        out = out.append(append_series, ignore_index=True)
    return out

df.groupby(['city', 'code'])['year'].apply(my_func).reset_index(drop=True).fillna("")


Out[]:
    city  code    year year_2016 year_2017 year_2018
0    hyd   1.0  2016.0                   Y         N
1    hyd   1.0  2017.0         Y                   N
2    hyd   2.0  2016.0                   N         Y
3    hyd   2.0  2018.0         Y         N          
4    hyd   3.0  2017.0         N         Y         N
5    hyd   4.0  2018.0         N         N         Y
6    hyd   6.0  2018.0         N         N         Y
7   pune   1.0  2017.0         N                   Y
8   pune   1.0  2018.0         N         Y          
9   pune   2.0  2016.0                   Y         N
10  pune   2.0  2017.0         Y                   N
11  pune   4.0  2016.0                   N         Y
12  pune   4.0  2018.0         Y         N          

扫码关注云+社区

领取腾讯云代金券

玩转腾讯云 有奖征文活动