前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >[数据科学从零到壹]·泰坦尼克号生存预测(数据读取、处理与建模)​​​​​​​

[数据科学从零到壹]·泰坦尼克号生存预测(数据读取、处理与建模)​​​​​​​

作者头像
小宋是呢
发布2019-06-27 13:01:02
6690
发布2019-06-27 13:01:02
举报
文章被收录于专栏:深度应用深度应用

泰坦尼克号生存预测(数据读取、处理与建模)

  • 简介:

本文是泰坦尼克号上的生存概率预测,这是基于Kaggle上的一个经典比赛项目。

数据集:

1.Kaggle泰坦尼克号项目页面下载数据:https://www.kaggle.com/c/titanic

2.网盘地址:https://pan.baidu.com/s/1BfRZdCz6Z1XR6aDXxiHmHA 提取码:jzb3

  • 代码内容

数据读取:

代码语言:javascript
复制
#%%
import tensorflow as tf
import keras
import pandas as pd
import numpy as np

data = pd.read_csv("titanic/train.csv")
print(data.head())
print(data.describe())

数据处理:

代码语言:javascript
复制
#%%
strs = "Survived Pclass Sex Age SibSp Parch Fare Embarked"
clos = strs.split(" ")
print(clos)
#%%
x_datas = data[clos]
print(x_datas.head())
#%%
print(x_datas.isnull().sum())

#%%
x_datas["Age"] = x_datas["Age"].fillna(x_datas["Age"].mean())
x_datas["Embarked"] = x_datas["Embarked"].fillna(x_datas["Embarked"].mode()[0])


#x_datas["Sex"] = pd.get_dummies(x_datas["Sex"])
x_datas = pd.get_dummies(x_datas,columns=["Pclass","Sex","Embarked"])
x_datas["Age"]/=100
x_datas["Fare"]/=100

print(x_datas.isnull().sum())
print(x_datas.head())

#%%
seq = int(0.75*(len(x_datas)))

X ,Y = x_datas.iloc[:,1:],x_datas.iloc[:,0]
X_train,Y_train,X_test,Y_test = X[:seq],Y[:seq],X[seq:],Y[seq:]

模型搭建:

代码语言:javascript
复制
#%%
model = keras.models.Sequential()

model.add(keras.layers.Dense(64,input_dim = 12,activation="relu"))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(16,activation="relu"))
model.add(keras.layers.Dense(2,activation="softmax"))

model.compile(loss="sparse_categorical_crossentropy",optimizer="adam",metrics=["accuracy"])

print(model.summary())

模型训练与评估:

代码语言:javascript
复制
#%%
model.fit(X_train,Y_train,validation_split=0.2,epochs=100,batch_size=50)

#%%
y = model.evaluate(X_test,Y_test)
print("test loss is %f, acc %f"%(y[0],y[1]))
model.save("model_100_1.h5")
  • 输出结果:
代码语言:javascript
复制
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense_1 (Dense)              (None, 64)                832
_________________________________________________________________
dropout_1 (Dropout)          (None, 64)                0
_________________________________________________________________
dense_2 (Dense)              (None, 16)                1040
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 34
=================================================================
Total params: 1,906
Trainable params: 1,906
Non-trainable params: 0
_________________________________________________________________
...
Epoch 96/100
534/534 [==============================] - 0s 80us/step - loss: 0.3870 - acc: 0.8277 - val_loss: 0.5083 - val_acc: 0.7612
Epoch 97/100
534/534 [==============================] - 0s 80us/step - loss: 0.3921 - acc: 0.8352 - val_loss: 0.5070 - val_acc: 0.7687
Epoch 98/100
534/534 [==============================] - 0s 82us/step - loss: 0.3940 - acc: 0.8371 - val_loss: 0.5102 - val_acc: 0.7687
Epoch 99/100
534/534 [==============================] - 0s 78us/step - loss: 0.3996 - acc: 0.8277 - val_loss: 0.5106 - val_acc: 0.7687
Epoch 100/100
534/534 [==============================] - 0s 80us/step - loss: 0.3892 - acc: 0.8352 - val_loss: 0.5082 - val_acc: 0.7612
223/223 [==============================] - 0s 63us/step
test loss is 0.389338, acc 0.829596
  • 完整代码:
代码语言:javascript
复制
#%%
import tensorflow as tf
import keras
import pandas as pd
import numpy as np

data = pd.read_csv("titanic/train.csv")
print(data.head())
print(data.describe())
#%%
strs = "Survived Pclass Sex Age SibSp Parch Fare Embarked"
clos = strs.split(" ")
print(clos)
#%%
x_datas = data[clos]
print(x_datas.head())
#%%
print(x_datas.isnull().sum())

#%%
x_datas["Age"] = x_datas["Age"].fillna(x_datas["Age"].mean())
x_datas["Embarked"] = x_datas["Embarked"].fillna(x_datas["Embarked"].mode()[0])


#x_datas["Sex"] = pd.get_dummies(x_datas["Sex"])
x_datas = pd.get_dummies(x_datas,columns=["Pclass","Sex","Embarked"])
x_datas["Age"]/=100
x_datas["Fare"]/=100

print(x_datas.isnull().sum())
print(x_datas.head())

#%%
seq = int(0.75*(len(x_datas)))

X ,Y = x_datas.iloc[:,1:],x_datas.iloc[:,0]
X_train,Y_train,X_test,Y_test = X[:seq],Y[:seq],X[seq:],Y[seq:]


#%%
model = keras.models.Sequential()

model.add(keras.layers.Dense(64,input_dim = 12,activation="relu"))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(16,activation="relu"))
model.add(keras.layers.Dense(2,activation="softmax"))

model.compile(loss="sparse_categorical_crossentropy",optimizer="adam",metrics=["accuracy"])

print(model.summary())

#%%
model.fit(X_train,Y_train,validation_split=0.2,epochs=100,batch_size=50)

#%%
y = model.evaluate(X_test,Y_test)
print("test loss is %f, acc %f"%(y[0],y[1]))
model.save("model_100_1.h5")
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2019年02月22日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 泰坦尼克号生存预测(数据读取、处理与建模)
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档