首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >在python中比较列的行

在python中比较列的行
EN

Stack Overflow用户
提问于 2018-06-18 22:03:00
回答 1查看 140关注 0票数 0

我有下面的数据框

df=
 city    code     qty    year
 hyd     1        10    2016
 hyd     2        12    2016
 pune    2        15    2016
 pune    4        25    2016
 hyd     1        10    2017
 hyd     3        12    2017
 pune    1        15    2017
 pune    2        25    2017
 hyd     2        10    2018
 hyd     4        10    2018
 hyd     6        12    2018
 pune    1        15    2018
 pune    4        25    2018

我想在这里添加所有唯一的年份作为列(2016,2017,2018 ),并比较相同城市和代码的年份与其他年份相比是否小于年份(即2018年与2017,2016,2015比较,2017与2016,2015比较,以此类推)。如果相同的城市和代码在其他年份可用,则将其标记为Y,如果不存在,则N。我们与之进行比较的必须留空。

下面必须是结果数据框。

city    code     qty    year    year_2016     year_2017    year_2018 
hyd     1         10    2016                                 
hyd     2         12    2016                                         
pune    2         15    2016                                  
pune    4         25    2016                                
hyd     1         10    2017        Y                                          
hyd     3         12    2017        N                         
pune    1         15    2017        N                           
pune    2         25    2017        Y                         
hyd     2         10    2018        Y            N        
hyd     4         12    2018        N            N
hyd     6         12    2018        N            N
pune    1         15    2018        N            Y
pune    4         25    2018        Y            N     

提前感谢

EN

回答 1

Stack Overflow用户

发布于 2018-06-18 22:56:02

# Get a list of all year, this way we know how many columns to make and which columns to mark as N
all_years = df.year.unique()

def my_func(x):
    # Function to create new year_... rows

    # Get the city and code names
    city, code = x.name

    # This function will return a pandas.DataFrame
    out = pd.DataFrame()

    # Loop through each year
    for key, year in x.iteritems():
        append_series = pd.Series()

        # If this (city, code) has multiple years we must iterate over each year vs the other years
        iterate = [year]
        if len(x.values) > 1:
            iterate = x.drop(key).values

        # Create a pandas.Series to add to the main dataframe 'out'
        for other_year in iterate:
            append_series.at['year'] = year
            append_series.at["year_"+str(other_year)] = "Y"
            append_series.at["city"] = city
            append_series.at["code"] = code

            # If any year does not show up then we must mark is as N
            for missing_year in (set(all_years) - set(x.values)):
                append_series.at["year_" + str(missing_year)] = "N"

        # Add this series to the main dataframe 'out'
        out = out.append(append_series, ignore_index=True)
    return out

df.groupby(['city', 'code'])['year'].apply(my_func).reset_index(drop=True).fillna("")


Out[]:
    city  code    year year_2016 year_2017 year_2018
0    hyd   1.0  2016.0                   Y         N
1    hyd   1.0  2017.0         Y                   N
2    hyd   2.0  2016.0                   N         Y
3    hyd   2.0  2018.0         Y         N          
4    hyd   3.0  2017.0         N         Y         N
5    hyd   4.0  2018.0         N         N         Y
6    hyd   6.0  2018.0         N         N         Y
7   pune   1.0  2017.0         N                   Y
8   pune   1.0  2018.0         N         Y          
9   pune   2.0  2016.0                   Y         N
10  pune   2.0  2017.0         Y                   N
11  pune   4.0  2016.0                   N         Y
12  pune   4.0  2018.0         Y         N          
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50911268

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档