# Scikit-Learn 与 TensorFlow 机器学习实用指南学习笔记 4 —— 数据探索与可视化

`housing = strat_train_set.copy()`

### 1. 地理数据可视化

`housing.plot(kind="scatter", x="longitude", y="latitude")`

`housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)`

```housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")```

### 2. 寻找关联性

`corr_matrix = housing.corr()`

```>>> corr_matrix["median_house_value"].sort_values(ascending=False)

median_house_value 1.000000
median_income 0.687170
total_rooms 0.135231
housing_median_age 0.114220
households 0.064702
total_bedrooms 0.047865
population -0.026699
longitude -0.047279
latitude -0.142826
Name: median_house_value, dtype: float64```

```from pandas.tools.plotting import scatter_matrix

attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))```

```housing.plot(kind="scatter", x="median_income", y="median_house_value",
alpha=0.1)```

### 3. 尝试结合不同的属性

```housing["rooms_per_household"] = housing["total_rooms"]/housing["households"]
housing["bedrooms_per_room"] = housing["total_bedrooms"]/housing["total_rooms"]
housing["population_per_household"]=housing["population"]/housing["households"]```

```>>> corr_matrix = housing.corr()
>>> corr_matrix["median_house_value"].sort_values(ascending=False)

median_house_value 1.000000
median_income 0.687170
rooms_per_household 0.199343
total_rooms 0.135231
housing_median_age 0.114220
households 0.064702
total_bedrooms 0.047865
population_per_household -0.021984
population -0.026699
longitude -0.047279
latitude -0.142826
bedrooms_per_room -0.260070
Name: median_house_value, dtype: float64```

https://github.com/RedstoneWill/Hands-On-Machine-Learning-with-Sklearn-TensorFlow

95 篇文章38 人订阅

0 条评论