问电子邮件分类器根据时间对电子邮件进行分类
EN

Stack Overflow用户

提问于 2022-05-18 18:40:29

回答 1查看 43关注 0票数 0

我必须设计一个程序，可以将电子邮件分类为垃圾邮件或非垃圾邮件使用Python和Pandas。

我已经做了分类，电子邮件作为垃圾邮件或非垃圾邮件，根据电子邮件的主题。对于我的第二项任务，我必须根据时间将电子邮件分类为垃圾邮件或非垃圾邮件。如果这封邮件是在(星期五和星期六)收到的，它应该被归类为垃圾邮件。否则不是垃圾邮件。我真的不知道该怎么做。我试图搜索，但最终一无所获。

这是excel文件中的屏幕截图。

import pandas as pd
ExcelFile = pd.read_excel(r'C:\Users\Documents\Email Table.xlsx')
Subject = pd.DataFrame(ExcelFile, columns=['Subject'])

def spam(Subject):
A = len(ExcelFile[ExcelFile['Subject'].isnull()]) 
print("Number of spam emails ",A)
print(ExcelFile[ExcelFile['Subject'].isnull()]) 

spam(Subject)

python

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-05-18 19:11:05

有无数种方法可以做到这一点，但这就是我要做的。为了清晰起见，我提供了一些注释和一些命名约定，这应该允许您根据需要采取和修改以满足您的特定需求。

#All necessary imports
import pandas as pd
import numpy as np
import datetime
#Create same sample data (just made this up nothing specific)
data = {
    'From' : ['test@gmail.com', 'test1@gmail.com', 'test2@gmail.com', 'test3@gmail.com', 'test4@gmail.com'],
    'Subject' : ['Free Stuff', 'Buy Stuff', np.nan,'More Free Stuff', 'More Buy Stuff'],
    'Dates' : ['2022-05-18 01:00:00', '2022-05-18 03:00:00', '2022-05-19 08:00:00', '2022-05-20 01:00:00', '2022-05-21 10:00:00']
}

#Create a Dataframe with the data
df = pd.DataFrame(data)

#Set all nulls/nones/NaN to a blank string
df.fillna('', inplace = True)

#Set the Dates column to a date column with YYYY-MM-DD HH:MM:SS format
df['Dates'] = pd.to_datetime(df['Dates'], format = '%Y-%m-%d %H:%M:%S')

#Create a column that will identify the what day the Dates column is on
df['Day'] = df['Dates'].dt.day_name()

#Write a np.select() to determine if the Subject column is null or if the Day column is on Friday or Saturday

#This is where you specify which days are spam days
list_of_spam_days = ['Friday', 'Saturday']

#List of conditions to test of true or false (np.nan is equivilent of a null)
condition_list = [df['Subject'] == '', df['Day'].isin(list_of_spam_days)]

#Mirroring the condition_list from before what should happen if the condition is true
true_list = ['Spam', 'Spam']

#Make a new column to which holds all of the results of our condition and true lists
#The final 'Not Spam' is the default if the condition list was not satisfied
df['Spam or Not Spam'] = np.select(condition_list, true_list, 'Not Spam')
df

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72294420

复制

相似问题

问电子邮件分类器根据时间对电子邮件进行分类
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问电子邮件分类器根据时间对电子邮件进行分类EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问电子邮件分类器根据时间对电子邮件进行分类
EN