在本文章,我们主要给大家介绍一些顶级的自动化EDA工具,并且通过实例来展示具体效果。 代码链接:https://www.kaggle.com/andreshg/automatic-eda-libraries-comparisson/notebook
AutoViz在众多免费软件Pythonic Rapid EDA Automation工具中脱颖而出,以非常快速的方式运行,这比其紧密的免费软件竞争对手SweetViz或Pandas Profiling更好
安装方式:
!pip install git+git://github.com/AutoViML/AutoViz.git
!pip install xlrd
from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class()
dftc = AV.AutoViz(
filename='',
sep='' ,
depVar='target',
dfte=df,
header=0,
verbose=1,
lowess=False,
chart_format='png',
max_rows_analyzed=300000,
max_cols_analyzed=30
)
from pandas_profiling import ProfileReport
df = pd.read_csv('/kaggle/input/titanic/train.csv')
report = ProfileReport(df)
# Start of Pandas Profiling process
start_time = dt.datetime.now()
print("Started at ", start_time)
report
!pip install sweetviz
import sweetviz as sv
df = pd.read_csv('/kaggle/input/credit-card-customers/BankChurners.csv').head(2000)
advert_report = sv.analyze([df, 'Data'])
advert_report.show_html()
print('SweetViz finished!!')
finish_time = dt.datetime.now()
print("Finished at ", finish_time)
elapsed = finish_time - start_time
print("Elapsed time: ", elapsed)
安装
!pip install dtale
import dtale
dtale.show(df)
官方链接:https://github.com/man-group/dtale
!pip install -U dataprep
实例
from dataprep.eda import plot, plot_correlation
plot(df)
plot_correlation(df)
plot(df, "Customer_Age")
plot(df, "Customer_Age", "Gender")