前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Kaggle谷歌大脑大赛教科书版的Transformer金牌方案(含Code)。

Kaggle谷歌大脑大赛教科书版的Transformer金牌方案(含Code)。

作者头像
炼丹笔记
发布2021-11-10 10:51:16
8350
发布2021-11-10 10:51:16
举报
文章被收录于专栏:炼丹笔记

作者:RSJ & 杰

谷歌大脑-Ventilator Pressure Prediction金牌方案分享

简介

在刚刚结束的谷歌大脑Ventilator Pressure Prediction大赛中,前排选手CHRIS DEOTTE开源了教科书版的基于Transformer对序列问题进行建模的方案。

本篇文章共有两大非常值得学习借鉴的地方:

  1. 目标函数的设计;
  2. Transformer框架的使用;

该处的目标函数设计对于类似的问题是通用的,在早期的序列化问题建模中,我们也曾经尝试对单个目标预测和几个target一起训练预测,发现后者的效果往往更为稳定,所以非常有借鉴意义;而Transformer的框架使用虽然简单,但是是非常好的一个baseline,非常值得借鉴。

后面我们结合代码一起学习,详细的代码大家可以参考文末作者的Notebook,此处我们仅介绍核心的三大模块。

代码解读

此处我们直接结合代码一起学习,我们将代码拆分为下面几大核心的部分:

  1. 特征工程;
  2. Loss设计;
  3. Transformer Block使用。

01

特征工程

特征工程分为:

  1. 交叉特征,主要是乘法和cumsum为主;
  2. lag特征;lag1-4
  3. 与局部统计特征的差值;
  4. 基于lag特征的diff特征;
  5. 时间戳&滑窗统计特征;
  6. 类别变量的dummy;
代码语言:javascript
复制
def add_features(df):
    df['cross']  = df['u_in'] * df['u_out']
    df['cross2'] = df['time_step'] * df['u_out']
    df['area']   = df['time_step'] * df['u_in']
    df['area']   = df.groupby('breath_id')['area'].cumsum()
    
    df['time_step_cumsum'] = df.groupby(['breath_id'])['time_step'].cumsum()
    df['u_in_cumsum']       = (df['u_in']).groupby(df['breath_id']).cumsum()
    print("Step-1...Completed")
    
    df['u_in_lag1']       = df.groupby('breath_id')['u_in'].shift(1)
    df['u_out_lag1']      = df.groupby('breath_id')['u_out'].shift(1)
    df['u_in_lag_back1']  = df.groupby('breath_id')['u_in'].shift(-1)
    df['u_out_lag_back1'] = df.groupby('breath_id')['u_out'].shift(-1)
    df['u_in_lag2']       = df.groupby('breath_id')['u_in'].shift(2)
    df['u_out_lag2']      = df.groupby('breath_id')['u_out'].shift(2)
    df['u_in_lag_back2']  = df.groupby('breath_id')['u_in'].shift(-2)
    df['u_out_lag_back2'] = df.groupby('breath_id')['u_out'].shift(-2)
    df['u_in_lag3']       = df.groupby('breath_id')['u_in'].shift(3)
    df['u_out_lag3']      = df.groupby('breath_id')['u_out'].shift(3)
    df['u_in_lag_back3']  = df.groupby('breath_id')['u_in'].shift(-3)
    df['u_out_lag_back3'] = df.groupby('breath_id')['u_out'].shift(-3)
    df['u_in_lag4']       = df.groupby('breath_id')['u_in'].shift(4)
    df['u_out_lag4']      = df.groupby('breath_id')['u_out'].shift(4)
    df['u_in_lag_back4']  = df.groupby('breath_id')['u_in'].shift(-4)
    df['u_out_lag_back4'] = df.groupby('breath_id')['u_out'].shift(-4)
    df = df.fillna(0)
    print("Step-2...Completed")
    
    df['breath_id__u_in__max']      = df.groupby(['breath_id'])['u_in'].transform('max')
    df['breath_id__u_in__mean']     = df.groupby(['breath_id'])['u_in'].transform('mean')
    df['breath_id__u_in__diffmax']  = df.groupby(['breath_id'])['u_in'].transform('max')  - df['u_in']
    df['breath_id__u_in__diffmean'] = df.groupby(['breath_id'])['u_in'].transform('mean') - df['u_in']

    print("Step-3...Completed")
    
    df['u_in_diff1']  = df['u_in'] - df['u_in_lag1']
    df['u_out_diff1'] = df['u_out'] - df['u_out_lag1']
    df['u_in_diff2']  = df['u_in'] - df['u_in_lag2']
    df['u_out_diff2'] = df['u_out'] - df['u_out_lag2']
    df['u_in_diff3']  = df['u_in'] - df['u_in_lag3']
    df['u_out_diff3'] = df['u_out'] - df['u_out_lag3']
    df['u_in_diff4']  = df['u_in'] - df['u_in_lag4']
    df['u_out_diff4'] = df['u_out'] - df['u_out_lag4']
    print("Step-4...Completed")
    
    df['one'] = 1
    df['count'] = (df['one']).groupby(df['breath_id']).cumsum()
    df['u_in_cummean'] =df['u_in_cumsum'] /df['count']
    
    df['breath_id_lag']=df['breath_id'].shift(1).fillna(0)
    df['breath_id_lag2']=df['breath_id'].shift(2).fillna(0)
    df['breath_id_lagsame']=np.select([df['breath_id_lag']==df['breath_id']],[1],0)
    df['breath_id_lag2same']=np.select([df['breath_id_lag2']==df['breath_id']],[1],0)
    df['breath_id__u_in_lag'] = df['u_in'].shift(1).fillna(0)
    df['breath_id__u_in_lag'] = df['breath_id__u_in_lag'] * df['breath_id_lagsame']
    df['breath_id__u_in_lag2'] = df['u_in'].shift(2).fillna(0)
    df['breath_id__u_in_lag2'] = df['breath_id__u_in_lag2'] * df['breath_id_lag2same']
    print("Step-5...Completed")
    
    df['time_step_diff'] = df.groupby('breath_id')['time_step'].diff().fillna(0)
    df['ewm_u_in_mean'] = (df\
                           .groupby('breath_id')['u_in']\
                           .ewm(halflife=9)\
                           .mean()\
                           .reset_index(level=0,drop=True))
    df[["15_in_sum","15_in_min","15_in_max","15_in_mean"]] = (df\
                                                              .groupby('breath_id')['u_in']\
                                                              .rolling(window=15,min_periods=1)\
                                                              .agg({"15_in_sum":"sum",
                                                                    "15_in_min":"min",
                                                                    "15_in_max":"max",
                                                                    "15_in_mean":"mean"
                                                                    #"15_in_std":"std"
                                                               })\
                                                               .reset_index(level=0,drop=True))
    print("Step-6...Completed")
        
    df['u_in_lagback_diff1']  = df['u_in'] - df['u_in_lag_back1']
    df['u_out_lagback_diff1'] = df['u_out'] - df['u_out_lag_back1']
    df['u_in_lagback_diff2']  = df['u_in'] - df['u_in_lag_back2']
    df['u_out_lagback_diff2'] = df['u_out'] - df['u_out_lag_back2']
    print("Step-7...Completed")
    
    df['R'] = df['R'].astype(str)
    df['C'] = df['C'].astype(str)
    df['R__C'] = df["R"].astype(str) + '__' + df["C"].astype(str)
    df = pd.get_dummies(df)
    print("Step-8...Completed")
    
    return df

train = add_features(train)
test = add_features(test)

02

辅助Loss设计

  1. 相邻pressure的差值;
  2. pressure的cumsum;
代码语言:javascript
复制
train['pressure_diff']     = train.groupby('breath_id').pressure.diff().fillna(0)
train['pressure_integral'] = train.groupby('breath_id').pressure.cumsum()/200
targets = train[['pressure','pressure_diff','pressure_integral']].to_numpy().reshape(-1, 80, 3)
代码语言:javascript
复制
U_OUT_IDX = 2
y_weight = np.ones_like( targets )
u_out_values = train[:,:,U_OUT_IDX]
y_weight[ u_out_values==0 ] = 0 # because robust scaler changes 1 to  

03

Transformer

此处使用了传统的MHA,细节不再详述;https://keras.io/examples/nlp/text_classification_with_transformer/

代码语言:javascript
复制
class TransformerBlock(layers.Layer):
    def __init__(self, embed_dim, feat_dim, num_heads, ff_dim, rate=0.1):
        super(TransformerBlock, self).__init__()
        self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
        self.ffn = keras.Sequential(
            [layers.Dense(ff_dim, activation="gelu"), layers.Dense(feat_dim),]
        )
        self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
        self.dropout1 = layers.Dropout(rate)
        self.dropout2 = layers.Dropout(rate)

    def call(self, inputs, training):
        attn_output = self.att(inputs, inputs)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)
代码语言:javascript
复制
feat_dim = train.shape[-1] + 32
embed_dim = 64  # Embedding size for attention
num_heads = 8  # Number of attention heads
ff_dim = 128  # Hidden layer size in feed forward network inside transformer
dropout_rate = 0.0
num_blocks = 12

def build_model():
    inputs = layers.Input(shape=train.shape[-2:])
        
    # "EMBEDDING LAYER"
    x = layers.Dense(feat_dim)(inputs)
    x = layers.LayerNormalization(epsilon=1e-6)(x)
    
    # TRANSFORMER BLOCKS
    for k in range(num_blocks):
        x_old = x
        transformer_block = TransformerBlock(embed_dim, feat_dim, num_heads, ff_dim, dropout_rate)
        x = transformer_block(x)
        x = 0.7*x + 0.3*x_old # SKIP CONNECTION
    
    # REGRESSION HEAD
    x = layers.Dense(128, activation="selu")(x)
    x = layers.Dropout(dropout_rate)(x)
    outputs = layers.Dense(3, activation="linear")(x)
    model = keras.Model(inputs=inputs, outputs=outputs)
        
    return model

小结

本文基于Transformer的方案简洁明了,非常值得借鉴,包括Transformer用于序列建模的方式,以及此类问题Loss的设计等等。

参考文献

  1. https://www.kaggle.com/cdeotte/tensorflow-transformer-0-112/comments?scriptVersionId=78792844
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-11-05,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 炼丹笔记 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档