## TensorFlow有哪些应用实践？

• 回答 (6)
• 关注 (1)
• 查看 (1242)

TensorFlow是一个强大的面向数据流的机器学习库，由谷歌的Brain Team创建，于2015年开源，被设计成易于使用和广泛应用于数字和神经网络的问题以及其他领域，现如今又哪些应用和实践呢？

### 数据读入

```TRAIN_FILE = '../data/census/census-income.data'
TEST_FILE = '../data/census/census-income.test'

df_train = pd.read_csv(TRAIN_FILE, names=COLUMNS, skipinitialspace=True)
df_test = pd.read_csv(TEST_FILE, names=COLUMNS, skipinitialspace=True)
df_train = df_train.dropna(how='any', axis=0)
df_test = df_test.dropna(how='any', axis=0)
df_train[[
'detailed_industry_recode', 'detailed_occupation_recode', 'year',
]] = df_train[[
'detailed_industry_recode', 'detailed_occupation_recode', 'year',
]].astype(str)
df_test[[
'detailed_industry_recode', 'detailed_occupation_recode', 'year',
]] = df_test[[
'detailed_industry_recode', 'detailed_occupation_recode', 'year',
]].astype(str)

df_train[LABEL_COLUMN] = (
df_train[LABEL_COLUMN].apply(lambda x: '+' in x)).astype(int)
df_test[LABEL_COLUMN] = (
df_test[LABEL_COLUMN].apply(lambda x: '+' in x)).astype(int)
dtypess = df_train.dtypes
```

```class_of_worker = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='class_of_worker', hash_bucket_size=1000)
detailed_industry_recode = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='detailed_industry_recode', hash_bucket_size=1000)
detailed_occupation_recode = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='detailed_occupation_recode', hash_bucket_size=1000)
education = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='education', hash_bucket_size=1000)
enroll_in_edu_inst_last_wk = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='enroll_in_edu_inst_last_wk', hash_bucket_size=1000)
marital_stat = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='marital_stat', hash_bucket_size=1000)
major_industry_code = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='major_industry_code', hash_bucket_size=1000)
major_occupation_code = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='major_occupation_code', hash_bucket_size=1000)
race = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='race', hash_bucket_size=1000)
hispanic_origin = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='hispanic_origin', hash_bucket_size=1000)
sex = tf.contrib.layers.sparse_column_with_keys(
column_name='sex', keys=['Female', 'Male'])
member_of_labor_union = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='member_of_labor_union', hash_bucket_size=1000)
reason_for_unemployment = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='reason_for_unemployment', hash_bucket_size=1000)
full_or_part_time_employment_stat = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='full_or_part_time_employment_stat', hash_bucket_size=1000)
tax_filer_stat = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='tax_filer_stat', hash_bucket_size=1000)
region_of_previous_residence = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='region_of_previous_residence', hash_bucket_size=1000)
state_of_previous_residence = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='state_of_previous_residence', hash_bucket_size=1000)
detailed_household_and_family_stat = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='detailed_household_and_family_stat', hash_bucket_size=1000)
detailed_household_summary_in_household = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='detailed_household_summary_in_household',
hash_bucket_size=1000)
migration_code_change_in_msa = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='migration_code_change_in_msa', hash_bucket_size=1000)
migration_code_change_in_msa = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='migration_code_change_in_msa', hash_bucket_size=1000)
migration_code_change_in_reg = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='migration_code_change_in_reg', hash_bucket_size=1000)
migration_code_move_within_reg = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='migration_code_move_within_reg', hash_bucket_size=1000)
live_in_this_house_1year_ago = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='live_in_this_house_1year_ago', hash_bucket_size=1000)
migration_prev_res_in_sunbelt = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='migration_prev_res_in_sunbelt', hash_bucket_size=1000)
family_members_under18 = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='family_members_under18', hash_bucket_size=1000)
country_of_birth_father = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='country_of_birth_father', hash_bucket_size=1000)
country_of_birth_mother = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='country_of_birth_mother', hash_bucket_size=1000)
country_of_birth_self = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='country_of_birth_self', hash_bucket_size=1000)
citizenship = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='citizenship', hash_bucket_size=1000)
hash_bucket_size=1000)
veterans_benefits = tf.contrib.layers.sparse_column_with_hash_bucket(
column_name='veterans_benefits', hash_bucket_size=1000)
year = tf.contrib.layers.sparse_column_with_keys(
column_name='year', keys=['94', '95'])
# Continuous base columns
age = tf.contrib.layers.real_valued_column('age')
age_buckets = tf.contrib.layers.bucketized_column(
age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
wage_per_hour = tf.contrib.layers.real_valued_column('wage_per_hour')
capital_gains = tf.contrib.layers.real_valued_column('capital_gains')
capital_losses = tf.contrib.layers.real_valued_column('capital_losses')
dividends_from_stocks = tf.contrib.layers.real_valued_column(
'dividends_from_stocks')
instance_weight = tf.contrib.layers.real_valued_column('instance_weight')
weeks_worked_in_year = tf.contrib.layers.real_valued_column(
'weeks_worked_in_year')
num_persons_worked_for_employer = tf.contrib.layers.real_valued_column(
'num_persons_worked_for_employer')
```

real_valued_column 主要做连续性的特征，对categorical var这里有两种处理方式：一种是sparse_column_with_keys；另一种是sparse_column_with_hash_bucket，把对应的categorical var转换为对应的数字index。

```def input_fn(df):
# Creates a dictionary mapping from each continuous feature column name (k) to
# # the values of that column stored in a constant Tensor.
continuous_cols = {
k: tf.constant(df[k].values)
for k in CONTINUOUS_COLUMNS
}
# Creates a dictionary mapping from each categorical feature column name (k)
# to the values of that column stored in a tf.SparseTensor.
categorical_cols = {
k: tf.SparseTensor(
indices=[[i, 0] for i in range(df[k].size)],
values=df[k].values,
dense_shape=[df[k].size, 1])
for k in CATEGORICAL_COLUMNS
}
# Merges the two dictionaries into one.
feature_cols = dict(continuous_cols.items() + categorical_cols.items())
# Converts the label column into a constant Tensor.
label = tf.constant(df[LABEL_COLUMN].values)
# Returns the feature columns and the label.
return feature_cols, label
```

## 模型训练

```def train_input_fn():
return input_fn(df_train)

def eval_input_fn():
return input_fn(df_test)

model_dir = '../model_dir'

model = tf.contrib.learn.LinearClassifier(
feature_columns=FEATURE_COLUMNS, model_dir=model_dir)
model.fit(input_fn=train_input_fn, steps=200)
results = model.evaluate(input_fn=eval_input_fn, steps=1)
for key in sorted(results):
print("%s: %s" % (key, results[key]))
```

## Support Vector Machine

1. SVM需要有一个example_id的列需要指定，所以我们需要在input_fn中将其加上；
2. SVM的调用底层有一个reshape的bug，我在玩svm的过程发现了，具体描述在这儿check-failed-ndims-dims-2-vs-1-when-i-build-a-svm-model,大概原因是对连续值特征比如个数是200，而值的shape是(200,)而非(200, 1)，提了个issue Check failed: NDIMS == dims() (2 vs. 1) when I build a svm model,后面RandomForest也有类似的问题，等着后续修复，暂时的解决方法是原先的continuous_cols修改为：continuous_cols = {k: tf.constant(df[k].values) for k in CONTINUOUS_COLUMNS}；
3. 模型替代SVM：
```model_dir = ‘../svm_model_dir’
model = svm.SVM(example_id_column=’example_id’,
feature_columns=FEATURE_COLUMNS,
model_dir=model_dir)
model.fit(input_fn=train_input_fn, steps=10)
results = model.evaluate(input_fn=eval_input_fn, steps=1)
for key in sorted(results):
print(“%s: %s” % (key, results[key]))
svm的代码见:tensorflow-101/machinelear
```

ning_toolkit/scripts/tf-svm.py

### RandomForest

```validation_metrics = {
"accuracy":
tf.contrib.learn.MetricSpec(
metric_fn=tf.contrib.metrics.streaming_accuracy,
prediction_key='probabilities'
),
"precision":
tf.contrib.learn.MetricSpec(
metric_fn=tf.contrib.metrics.streaming_precision,
prediction_key='probabilities'
),
"recall":
tf.contrib.learn.MetricSpec(
metric_fn=tf.contrib.metrics.streaming_recall,
prediction_key='probabilities'
)
}

hparams = tf.contrib.tensor_forest.python.tensor_forest.ForestHParams(
num_trees=10,
max_nodes=1000,
num_classes=2,
num_features=len(CONTINUOUS_COLUMNS) + len(CATEGORICAL_COLUMNS))
classifier = random_forest.TensorForestEstimator(hparams, model_dir=model_dir, config=tf.contrib.learn.RunConfig(save_checkpoints_secs=60))

classifier.fit(input_fn=train_input_fn, steps=200)
results = classifier.evaluate(
input_fn=eval_input_fn, steps=1, metrics=validation_metrics)
print results
for key in sorted(results):
print("%s: %s" % (key, results[key]))
```

wide and deep

wide and deep可以很方便的在TF.Learn中定义使用，比较复杂的是做feature的一些处理，如wide column一般对实数列做bucket处理，如age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])，这里需要给定boundaries，将连续值离散化，这里不知道是否有不需要指定boundaries的api或者按比例自己计算的，这个我后续调研下，离散后之后，可直接为wide列，但是通常会做更多的cross column： tf.contrib.layers.crossed_column(columns=[age_buckets, class_of_worker], hash_bucket_size=1000) 这里为了代码的简单，我就只错了两个维度的cross_column，以以前的经验来说，通常在特征维度上cross column这种效果提升会比较明显，尤其是linearClassifier这种线性模型。

deep的列通常不需要对连续性特征做多少处理，主要对categorical var在离线化之后需要向量化，通常会使用one_hot_column和embedding_column，通常one_hot_column会对sex、year这类值很容易穷举的，可取值不多，而embedding_column会重新向量化categorical var，官方源码里面有对这部分进行说明tensorflow/contrib/layers/python/layers/feature_column.py具体里面的算法暂时还不太清楚，后面我会来细细研究下。

```validation_metrics = {
"accuracy":
tf.contrib.learn.MetricSpec(
metric_fn=tf.contrib.metrics.streaming_accuracy,
prediction_key="classes"),
"precision":
tf.contrib.learn.MetricSpec(
metric_fn=tf.contrib.metrics.streaming_precision,
prediction_key="classes"),
"recall":
tf.contrib.learn.MetricSpec(
metric_fn=tf.contrib.metrics.streaming_recall,
prediction_key="classes")
}
validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(input_fn=eval_input_fn,
every_n_steps=10, metrics=validation_metrics, eval_steps=1)
if FLAGS.classifier_mode == 'wide':
model = tf.contrib.learn.LinearClassifier(model_dir=model_dir,
feature_columns=wide_columns, config=tf.contrib.learn.RunConfig(save_checkpoints_secs=60))
elif FLAGS.classifier_mode == 'deep':
model = tf.contrib.learn.DNNClassifier(model_dir=model_dir, feature_columns=deep_columns, hidden_units=[128, 64], config=tf.contrib.learn.RunConfig(save_checkpoints_secs=60))
else:
model = tf.contrib.learn.DNNLinearCombinedClassifier(
model_dir=model_dir,
linear_feature_columns=wide_columns,
dnn_feature_columns=deep_columns,
dnn_hidden_units=[128, 64],
fix_global_step_increment_bug=True,
config=tf.contrib.learn.RunConfig(save_checkpoints_secs=60))

model.fit(input_fn=train_input_fn, steps=train_step, monitors=[validation_monitor])
results = model.evaluate(input_fn=eval_input_fn, steps=1)
for key in results:
print "%s: %s" % (key, results[key])
```

mkdir 1.helloworld

cd 1.helloworldvim

helloworld.py

# -*- coding: UTF-8 -*-

# 引入 TensorFlow 库

import tensorflow as tf

# 设置了gpu加速提示信息太多了，设置日志等级屏蔽一些

import os

os.environ['TF_CPP_MIN_LOG_LEVEL']='3'

# 创建一个常量 Operation (操作)

hw = tf.constant("Hello World! Mtianyan love TensorFlow!")

# 启动一个 TensorFlow 的 Session (会话)

sess = tf.Session()

# 运行 Graph (计算图)

print (sess.run(hw))

# 关闭 Session（会话）

sess.close()

TensorFlow的编程模式

# -*- coding: UTF-8 -*-

# 引入 TensorFlow 库

import tensorflow as tf

# 设置了gpu加速提示信息太多了，设置日志等级屏蔽一些

import os os.environ['TF_CPP_MIN_LOG_LEVEL']='3'a = tf.constant(2) b = tf.constant(3) c = tf.multiply(a,b)

d = tf.add(c, 1)

with tf.Session() as sess:

print (sess.run(d))

TensorFlow的计算流图，符号式编程的范式。有节点有边，边是计算结果在节点中流动。

TensorFlow的基础结构

Tensor 在 计算流图中流动(flow)

• Tensor (张量) 边里流动的数据
• Operation(操作)

Tensor 会作为operation的输入，operation输出的依然是Tensor。

TensorFlow的基础模型

TensorFlow使用了客户端和服务端的经典架构。

session的作用

TensorFlow程序的流程

1. 定义算法的计算图(Graph)的结构 静态
2. 使用会话(Session) 执行计算

Python常用库numpy

TensorFlow和numpy有一定联系，有很多类似的概念和api

numpy官网，科学计算。n阶数组对象。

numpy速度是非常快的，比原生快很多。

scipy 是一个开源软件。Matplotlib。pandas。jupyter notebook

numpy的操作对象是一个多维的数组。类似Tensor

ndarray ndim shape size dtype(同一类型元素).

import numpy as np vector = np.array([1,2,3]) vector.shape vector.size vector.ndim type(vector)# 创建二维数组(矩阵)matrix = np.array([[1, 2],[3, 4]]) matrix.shape matrix.size matrix.ndim

type(matrix)

one = np.arange(12)# 0 - 11one.reshape((3,4)) two = one.reshape((3,4)) two.shape two.size

two.ndim

TensorFlow里的数据都是Tensor，所以它可以说是一个张量的流图

vector & Matrix

numpy中的基础要素就是array，和Tensor 差不多的一种表述。

import numpy as np zeros = np.zeros((3,4)) zeros ones = np.ones((5,6))

ones# 对角矩阵: 必须是一个方阵.对角线是1，其他都是0的方阵ident = np.eye(4)

Tensor的属性

TensorFlow.datatype list

https://www.tensorflow.org/api_docs/python/tf/DType

TensorFlow数据类型有很多

TensorBoard是一套可视化工具，是TensorFlow的一种简单解决方案，它由创建者提供，允许可视化图形，并使用其他数据(如图像)来绘制图形的定量度量

1098480274学生回答于

ywojb10T一声不吭 慢慢窒息回答于

TensorFlow 中通过调用 tf.feature_column.indicator_column 创建指标列

categorical_column = ... indicator_column = tf.feature_column.indicator_column(categorical_column)