问使用Pandas Python进行循环
EN

Stack Overflow用户

提问于 2018-05-25 02:45:26

回答 1查看 55关注 0票数 -1

下面是一个数据集

目标是选择至少一行上具有至少一个作曲家和一个发布者的歌曲id。例如，songid 4有2行，有2个不同的作曲家，但没有出版商，而歌曲id 1没有作曲家。我们的目标是用Python(pandas)拒绝这样的excel表格有什么建议吗？

import pandas as pd
import numpy as np
import smtplib
from email.mime.image import MIMEImage
from email.mime.multipart import MIMEMultipart

df_header = pd.read_csv('New York Yankees Twins at Yankees-FNG-042318.csv',header=None,skiprows=1)
cuesheetprepareremail = df_header.iloc[0,7]
print(cuesheetprepareremail)


df = pd.read_csv('New York Yankees Twins at Yankees-FNG-042318.csv',
                 names=['CUE','SONG TITLE','USAGE','RUNNING TIME','COMPOSER','COMPOSER PRO','COMPOSER % SHARE','PUBLISHER',' PUBLISHER PRO','PUBLISHER % SHARE' ,'TRACK ID','LIBRARY','ARTIST','START TIME'
],skiprows=7)

#select all rows with same cue number
columns = ['CUE','COMPOSER','PUBLISHER']
df1 = pd.DataFrame(df,columns=columns)

df1 = df1.replace('', np.NaN)
gp = df1.groupby('CUE').count()
fileToSend = 'New York Yankees Twins at Yankees-FNG-042318.csv'
emailfrom = ''
emailto = 'xyz@abc.com'
username= ''
password = ''

msg = MIMEMultipart()
msg['Subject'] = 'Enco error testing'

msg['From'] = emailfrom
msg['To'] = emailto
msg.preamble = 'Enco error testing'

if gp[(gp['COMPOSER'] == 0) | (gp['PUBLISHER'] == 0)] :

    # Send the email via our own SMTP server.
    server = smtplib.SMTP('localhost')
    server.starttls()
    server.login(username,password)
    server.sendmail(emailfrom, emailto, msg.as_string())
    server.quit()

python-3.x

pandas

回答 1

Stack Overflow用户

发布于 2018-05-25 03:04:45

给定您的DataFrame df

   Song_Id        SONG TITLE *USAGE RUNNING COMPOSER(s)  COMPOSE PUBLISHER(s)
0        1    Testing Moment    BGI                        ASCAP        audio
1        2  Rented Dreams-JP    BGI              Andrew  ABRAMUS         Nova
2        2                                         Paul      UBC             
3        2                                        Molly      UBC             
4        3     Gridiron Rock    BGI               Brian    ASCAP       Client
5        3                                       Daniel    ASCAP             
6        4          Rock Run    BGI             Sharron    ASCAP             
7        4                                   Kyle Towns    ASCAP

您应该使用np.NaN填充空字符串，然后可以使用groupby + count，并将您的逻辑应用于分组的对象。

import numpy as np

df = df.replace('', np.NaN)
gp = df.groupby('Song_Id').count()

gp[(gp['COMPOSER(s)'] > 0) & (gp['PUBLISHER(s)'] > 0)]
#         *USAGE  COMPOSE  COMPOSER(s)  PUBLISHER(s)  RUNNING  SONG TITLE
#Song_Id                                                                 
#2             1        3            3             1        0           1
#3             1        2            2             1        0           1

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50516182

复制

相似问题

问使用Pandas Python进行循环
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Pandas Python进行循环EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Pandas Python进行循环
EN