Welcome to the second assignment of Week 2. You are going to use word vector representations to build an Emojifier.
Have you ever wanted to make your text messages more expressive? Your emojifier app will help you do that. So rather than writing:
"Congratulations on the promotion! Let's get coffee and talk. Love you!"
The emojifier can automatically turn this into:
"Congratulations on the promotion! ? Let's get coffee and talk. ☕️ Love you! ❤️"
avg
).avg
vector.pretrained_embedding_layer
.Let's get started! Run the following cell to load the package you are going to use.
import numpy as np
from emo_utils import *
import emoji
import matplotlib.pyplot as plt
%matplotlib inline
Let's start by building a simple baseline classifier.
You have a tiny dataset (X, Y) where:
style="width:700px;height:300px;">
Figure 1: EMOJISET - a classification problem with 5 classes. A few examples of sentences are given here.
Let's load the dataset using the code below. We split the dataset between training (127 examples) and testing (56 examples).
X_train, Y_train = read_csv('data/train_emoji.csv')
X_test, Y_test = read_csv('data/tesss.csv')
maxLen = len(max(X_train, key=len).split())
Run the following cell to print sentences from X_train and corresponding labels from Y_train.
idx
to see different examples. for idx in range(10):
print(X_train[idx], label_to_emoji(Y_train[idx]))
never talk to me again ?
I am proud of your achievements ?
It is the worst day in my life ?
Miss you so much ❤️
food is life ?
I love you mum ❤️
Stop saying bullshit ?
congratulations on your acceptance ?
The assignment is too long ?
I want to go play ⚾
In this part, you are going to implement a baseline model called "Emojifier-v1".
style="width:900px;height:300px;">
Figure 2: Baseline model (Emojifier-V1).
Y_oh
stands for "Y-one-hot" in the variable names Y_oh_train
and Y_oh_test
: Y_oh_train = convert_to_one_hot(Y_train, C = 5)
Y_oh_test = convert_to_one_hot(Y_test, C = 5)
Let's see what convert_to_one_hot()
did. Feel free to change index
to print out different values.
idx = 50
print(f"Sentence '{X_train[50]}' has label index {Y_train[idx]}, which is emoji {label_to_emoji(Y_train[idx])}", )
print(f"Label index {Y_train[idx]} in one-hot encoding format is {Y_oh_train[idx]}")
Sentence 'I missed you' has label index 0, which is emoji ❤️
Label index 0 in one-hot encoding format is [ 1. 0. 0. 0. 0.]
All the data is now ready to be fed into the Emojify-V1 model. Let's implement the model!
As shown in Figure 2 (above), the first step is to:
Run the following cell to load the word_to_vec_map
, which contains all the vector representations.
word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('../../readonly/glove.6B.50d.txt')
You've loaded:
word_to_index
: dictionary mapping from words to their indices in the vocabulary index_to_word
: dictionary mapping from indices to their corresponding words in the vocabularyword_to_vec_map
: dictionary mapping words to their GloVe vector representation.Run the following cell to check if it works.
word = "cucumber"
idx = 289846
print("the index of", word, "in the vocabulary is", word_to_index[word])
print("the", str(idx) + "th word in the vocabulary is", index_to_word[idx])
the index of cucumber in the vocabulary is 113317
the 289846th word in the vocabulary is potatos
Exercise: Implement sentence_to_avg()
. You will need to carry out two steps:
X.lower()
and X.split()
might be useful. numpy.zeros()
.avg
array of zeros, you'll want it to be a vector of the same shape as the other word vectors in the word_to_vec_map
.
word_to_vec_map
and access its .shape
field.word_to_vec_map
within this notebook, that this word will be in the word_to_vec_map
when the function is being called by the automatic grader.sentence
to find the shape of a word vector.# GRADED FUNCTION: sentence_to_avg
def sentence_to_avg(sentence, word_to_vec_map):
"""
Converts a sentence (string) into a list of words (strings). Extracts the GloVe representation of each word
and averages its value into a single vector encoding the meaning of the sentence.
Arguments:
sentence -- string, one training example from X
word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
Returns:
avg -- average vector encoding information about the sentence, numpy-array of shape (50,)
"""
### START CODE HERE ###
# Step 1: Split sentence into list of lower case words (≈ 1 line)
words = sentence.lower().split()
# Initialize the average word vector, should have the same shape as your word vectors.
avg = np.zeros((50))
# Step 2: average the word vectors. You can loop over the words in the list "words".
total = 0
for w in words:
total += word_to_vec_map[w]
avg = total / len(words)
### END CODE HERE ###
return avg
avg = sentence_to_avg("Morrocan couscous is my favorite dish", word_to_vec_map)
print("avg = \n", avg)
avg =
[-0.008005 0.56370833 -0.50427333 0.258865 0.55131103 0.03104983
-0.21013718 0.16893933 -0.09590267 0.141784 -0.15708967 0.18525867
0.6495785 0.38371117 0.21102167 0.11301667 0.02613967 0.26037767
0.05820667 -0.01578167 -0.12078833 -0.02471267 0.4128455 0.5152061
0.38756167 -0.898661 -0.535145 0.33501167 0.68806933 -0.2156265
1.797155 0.10476933 -0.36775333 0.750785 0.10282583 0.348925
-0.27262833 0.66768 -0.10706167 -0.283635 0.59580117 0.28747333
-0.3366635 0.23393817 0.34349183 0.178405 0.1166155 -0.076433
0.1445417 0.09808667]
Expected Output:
avg =
[-0.008005 0.56370833 -0.50427333 0.258865 0.55131103 0.03104983
-0.21013718 0.16893933 -0.09590267 0.141784 -0.15708967 0.18525867
0.6495785 0.38371117 0.21102167 0.11301667 0.02613967 0.26037767
0.05820667 -0.01578167 -0.12078833 -0.02471267 0.4128455 0.5152061
0.38756167 -0.898661 -0.535145 0.33501167 0.68806933 -0.2156265
1.797155 0.10476933 -0.36775333 0.750785 0.10282583 0.348925
-0.27262833 0.66768 -0.10706167 -0.283635 0.59580117 0.28747333
-0.3366635 0.23393817 0.34349183 0.178405 0.1166155 -0.076433
0.1445417 0.09808667]
You now have all the pieces to finish implementing the model()
function.
After using sentence_to_avg()
you need to:
Exercise: Implement the model()
function described in Figure (2).
\[ z^{(i)} = W . avg^{(i)} + b\]
\[ a^{(i)} = softmax(z^{(i)})\]
\[ \mathcal{L}^{(i)} = - \sum_{k = 0}^{n_y - 1} Y_{oh,k}^{(i)} * log(a^{(i)}_k)\]
Note It is possible to come up with a more efficient vectorized implementation. For now, let's use nested for loops to better understand the algorithm, and for easier debugging.
We provided the function softmax()
, which was imported earlier.
# GRADED FUNCTION: model
def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):
"""
Model to train word vector representations in numpy.
Arguments:
X -- input data, numpy array of sentences as strings, of shape (m, 1)
Y -- labels, numpy array of integers between 0 and 7, numpy-array of shape (m, 1)
word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
learning_rate -- learning_rate for the stochastic gradient descent algorithm
num_iterations -- number of iterations
Returns:
pred -- vector of predictions, numpy-array of shape (m, 1)
W -- weight matrix of the softmax layer, of shape (n_y, n_h)
b -- bias of the softmax layer, of shape (n_y,)
"""
np.random.seed(1)
# Define number of training examples
m = Y.shape[0] # number of training examples
n_y = 5 # number of classes
n_h = 50 # dimensions of the GloVe vectors
# Initialize parameters using Xavier initialization
W = np.random.randn(n_y, n_h) / np.sqrt(n_h)
b = np.zeros((n_y,))
# Convert Y to Y_onehot with n_y classes
Y_oh = convert_to_one_hot(Y, C = n_y)
# Optimization loop
for t in range(num_iterations): # Loop over the number of iterations
for i in range(m): # Loop over the training examples
### START CODE HERE ### (≈ 4 lines of code)
# Average the word vectors of the words from the i'th training example
avg = sentence_to_avg(X[i] ,word_to_vec_map)
# Forward propagate the avg through the softmax layer
z = np.dot(W, avg) + b
a = softmax(z)
# Compute cost using the i'th training label's one hot representation and "A" (the output of the softmax)
cost = -1 * np.sum(Y_oh[i]*np.log(a));
### END CODE HERE ###
# Compute gradients
dz = a - Y_oh[i]
dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
db = dz
# Update parameters with Stochastic Gradient Descent
W = W - learning_rate * dW
b = b - learning_rate * db
if t % 100 == 0:
print("Epoch: " + str(t) + " --- cost = " + str(cost))
pred = predict(X, Y, W, b, word_to_vec_map) #predict is defined in emo_utils.py
return pred, W, b
print(X_train.shape)
print(Y_train.shape)
print(np.eye(5)[Y_train.reshape(-1)].shape)
print(X_train[0])
print(type(X_train))
Y = np.asarray([5,0,0,5, 4, 4, 4, 6, 6, 4, 1, 1, 5, 6, 6, 3, 6, 3, 4, 4])
print(Y.shape)
X = np.asarray(['I am going to the bar tonight', 'I love you', 'miss you my dear',
'Lets go party and drinks','Congrats on the new job','Congratulations',
'I am so happy for you', 'Why are you feeling bad', 'What is wrong with you',
'You totally deserve this prize', 'Let us go play football',
'Are you down for football this afternoon', 'Work hard play harder',
'It is suprising how people can be dumb sometimes',
'I am very disappointed','It is the best day in my life',
'I think I will end up alone','My life is so boring','Good job',
'Great so awesome'])
print(X.shape)
print(np.eye(5)[Y_train.reshape(-1)].shape)
print(type(X_train))
(132,)
(132,)
(132, 5)
never talk to me again
<class 'numpy.ndarray'>
(20,)
(20,)
(132, 5)
<class 'numpy.ndarray'>
Run the next cell to train your model and learn the softmax parameters (W,b).
pred, W, b = model(X_train, Y_train, word_to_vec_map)
print(pred)
Epoch: 0 --- cost = 1.95204988128
Accuracy: 0.348484848485
Epoch: 100 --- cost = 0.0797181872601
Accuracy: 0.931818181818
Epoch: 200 --- cost = 0.0445636924368
Accuracy: 0.954545454545
Epoch: 300 --- cost = 0.0343226737879
Accuracy: 0.969696969697
[[ 3.]
[ 2.]
[ 3.]
[ 0.]
[ 4.]
[ 0.]
[ 3.]
[ 2.]
[ 3.]
[ 1.]
[ 3.]
[ 3.]
[ 1.]
[ 3.]
[ 2.]
[ 3.]
[ 2.]
[ 3.]
[ 1.]
[ 2.]
[ 3.]
[ 0.]
[ 2.]
[ 2.]
[ 2.]
[ 1.]
[ 4.]
[ 3.]
[ 3.]
[ 4.]
[ 0.]
[ 3.]
[ 4.]
[ 2.]
[ 0.]
[ 3.]
[ 2.]
[ 2.]
[ 3.]
[ 4.]
[ 2.]
[ 2.]
[ 0.]
[ 2.]
[ 3.]
[ 0.]
[ 3.]
[ 2.]
[ 4.]
[ 3.]
[ 0.]
[ 3.]
[ 3.]
[ 3.]
[ 4.]
[ 2.]
[ 1.]
[ 1.]
[ 1.]
[ 2.]
[ 3.]
[ 1.]
[ 0.]
[ 0.]
[ 0.]
[ 3.]
[ 4.]
[ 4.]
[ 2.]
[ 2.]
[ 1.]
[ 2.]
[ 0.]
[ 3.]
[ 2.]
[ 2.]
[ 0.]
[ 3.]
[ 3.]
[ 1.]
[ 2.]
[ 1.]
[ 2.]
[ 2.]
[ 4.]
[ 3.]
[ 3.]
[ 2.]
[ 4.]
[ 0.]
[ 0.]
[ 3.]
[ 3.]
[ 3.]
[ 3.]
[ 2.]
[ 0.]
[ 1.]
[ 2.]
[ 3.]
[ 0.]
[ 2.]
[ 2.]
[ 2.]
[ 3.]
[ 2.]
[ 2.]
[ 2.]
[ 4.]
[ 1.]
[ 1.]
[ 3.]
[ 3.]
[ 4.]
[ 1.]
[ 2.]
[ 1.]
[ 1.]
[ 3.]
[ 1.]
[ 0.]
[ 4.]
[ 0.]
[ 3.]
[ 3.]
[ 4.]
[ 4.]
[ 1.]
[ 4.]
[ 3.]
[ 0.]
[ 2.]]
Expected Output (on a subset of iterations):
<td>
**Epoch: 100**
</td>
<td>
cost = 0.0797181872601
</td>
<td>
Accuracy: 0.931818181818
</td>
</tr>
<td>
**Epoch: 200**
</td>
<td>
cost = 0.0445636924368
</td>
<td>
Accuracy: 0.954545454545
</td>
</tr>
<tr>
<td>
**Epoch: 300**
</td>
<td>
cost = 0.0343226737879
</td>
<td>
Accuracy: 0.969696969697
</td>
</tr>
**Epoch: 0** | cost = 1.95204988128 | Accuracy: 0.348484848485 |
---|---|---|
Great! Your model has pretty high accuracy on the training set. Lets now see how it does on the test set.
predict
function used here is defined in emo_util.spy.print("Training set:")
pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
print('Test set:')
pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)
Training set:
Accuracy: 0.977272727273
Test set:
Accuracy: 0.857142857143
Expected Output:
**Train set accuracy** | 97.7 |
---|---|
**Test set accuracy** | 85.7 |
In the training set, the algorithm saw the sentence
"I love you"
with the label ❤️.
X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "not feeling happy"])
Y_my_labels = np.array([[0], [0], [2], [1], [4],[3]])
pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map)
print_predictions(X_my_sentences, pred)
Accuracy: 0.833333333333
i adore you ❤️
i love you ❤️
funny lol ?
lets play with a ball ⚾
food is ready ?
not feeling happy ?
Amazing!
print(Y_test.shape)
print(' '+ label_to_emoji(0)+ ' ' + label_to_emoji(1) + ' ' + label_to_emoji(2)+ ' ' + label_to_emoji(3)+' ' + label_to_emoji(4))
print(pd.crosstab(Y_test, pred_test.reshape(56,), rownames=['Actual'], colnames=['Predicted'], margins=True))
plot_confusion_matrix(Y_test, pred_test)
(56,)
❤️ ⚾ ? ? ?
Predicted 0.0 1.0 2.0 3.0 4.0 All
Actual
0 6 0 0 1 0 7
1 0 8 0 0 0 8
2 2 0 16 0 0 18
3 1 1 2 12 0 16
4 0 0 1 0 6 7
All 9 9 19 13 6 56
You will build a better algorithm in the next section!
Let's build an LSTM model that takes word sequences as input!
Run the following cell to load the Keras packages.
import numpy as np
np.random.seed(0)
from keras.models import Model
from keras.layers import Dense, Input, Dropout, LSTM, Activation
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.initializers import glorot_uniform
np.random.seed(1)
Using TensorFlow backend.
Here is the Emojifier-v2 you will implement:
style="width:700px;height:400px;">
Figure 3: Emojifier-V2. A 2-layer LSTM sequence classifier.
Embedding()
layer's input is an integer matrix of size (batch size, max input length). max_len=5
.(2,max_len,50)
. style="width:700px;height:250px;">
Figure 4: Embedding layer
Exercise:
sentences_to_indices
, which processes an array of sentences (X) and returns inputs to the embedding layer: enumerate()
function in the for loop, but for the purposes of passing the autograder, please follow the starter code by initializing and incrementing j
explicitly.for idx, val in enumerate(["I", "like", "learning"]):
print(idx,val)
0 I
1 like
2 learning
# GRADED FUNCTION: sentences_to_indices
def sentences_to_indices(X, word_to_index, max_len):
"""
Converts an array of sentences (strings) into an array of indices corresponding to words in the sentences.
The output shape should be such that it can be given to `Embedding()` (described in Figure 4).
Arguments:
X -- array of sentences (strings), of shape (m, 1)
word_to_index -- a dictionary containing the each word mapped to its index
max_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this.
Returns:
X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)
"""
m = X.shape[0] # number of training examples
### START CODE HERE ###
# Initialize X_indices as a numpy matrix of zeros and the correct shape (≈ 1 line)
X_indices = np.zeros((m,max_len))
for i in range(m): # loop over training examples
# Convert the ith training sentence in lower case and split is into words. You should get a list of words.
sentence_words =X[i].lower().split()
# Initialize j to 0
j = 0
# Loop over the words of sentence_words
for w in range(len(sentence_words)):
# Set the (i,j)th entry of X_indices to the index of the correct word.
X_indices[i, j] = word_to_index[sentence_words[w]]
# Increment j to j + 1
j = j + 1
### END CODE HERE ###
return X_indices
Run the following cell to check what sentences_to_indices()
does, and check your results.
X1 = np.array(["funny lol", "lets play baseball", "food is ready for you"])
X1_indices = sentences_to_indices(X1,word_to_index, max_len = 5)
print("X1 =", X1)
print("X1_indices =\n", X1_indices)
X1 = ['funny lol' 'lets play baseball' 'food is ready for you']
X1_indices =
[[ 155345. 225122. 0. 0. 0.]
[ 220930. 286375. 69714. 0. 0.]
[ 151204. 192973. 302254. 151349. 394475.]]
Expected Output:
X1 = ['funny lol' 'lets play baseball' 'food is ready for you']
X1_indices =
[[ 155345. 225122. 0. 0. 0.]
[ 220930. 286375. 69714. 0. 0.]
[ 151204. 192973. 302254. 151349. 394475.]]
Embedding()
layer in Keras, using pre-trained word vectors. sentences_to_indices()
creates these word indices.Exercise: Implement pretrained_embedding_layer()
with these steps:
emb_dim
represents the length of a word embedding.word_to_index
is a string.trainable = True
, then it will allow the optimization algorithm to modify the values of the word embeddings.# GRADED FUNCTION: pretrained_embedding_layer
def pretrained_embedding_layer(word_to_vec_map, word_to_index):
"""
Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
Arguments:
word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
Returns:
embedding_layer -- pretrained layer Keras instance
"""
vocab_len = len(word_to_index) + 1 # adding 1 to fit Keras embedding (requirement)
emb_dim = word_to_vec_map["cucumber"].shape[0] # define dimensionality of your GloVe word vectors (= 50)
### START CODE HERE ###
# Step 1
# Initialize the embedding matrix as a numpy array of zeros.
# See instructions above to choose the correct shape.
emb_matrix = np.zeros((vocab_len, emb_dim))
# Step 2
# Set each row "idx" of the embedding matrix to be
# the word vector representation of the idx'th word of the vocabulary
for word, idx in word_to_index.items():
emb_matrix[idx, :] = word_to_vec_map[word]
# Step 3
# Define Keras embedding layer with the correct input and output sizes
# Make it non-trainable.
embedding_layer = Embedding(vocab_len, emb_dim, trainable=False)
### END CODE HERE ###
# Step 4 (already done for you; please do not modify)
# Build the embedding layer, it is required before setting the weights of the embedding layer.
embedding_layer.build((None,)) # Do not modify the "None". This line of code is complete as-is.
# Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
embedding_layer.set_weights([emb_matrix])
return embedding_layer
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
print("weights[0][1][3] =", embedding_layer.get_weights()[0][1][3])
weights[0][1][3] = -0.3403
Expected Output:
weights[0][1][3] = -0.3403
Lets now build the Emojifier-V2 model.
style="width:700px;height:400px;">
Figure 3: Emojifier-v2. A 2-layer LSTM sequence classifier.
Exercise: Implement Emojify_V2()
, which builds a Keras graph of the architecture shown in Figure 3.
m
, max_len
, ) defined by input_shape
. m
, C = 5
). shape
and dtype
parameters.units
and return_sequences
parameters.rate
parameter.units
, Dense()
has an activation
parameter. For the purposes of passing the autograder, please do not set the activation within Dense()
. Use the separate Activation
layer to do so.inputs
and outputs
.# How to use Keras layers in two lines of code
dense_object = Dense(units = ...)
X = dense_object(inputs)
# How to use Keras layers in one line of code
X = Dense(units = ...)(inputs)
embedding_layer
that is returned by pretrained_embedding_layer
is a layer object that can be called as a function, passing in a single argument (sentence indices).# GRADED FUNCTION: Emojify_V2
def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
"""
Function creating the Emojify-v2 model's graph.
Arguments:
input_shape -- shape of the input, usually (max_len,)
word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)
Returns:
model -- a model instance in Keras
"""
### START CODE HERE ###
# Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
sentence_indices = Input(shape = input_shape, dtype =np.int32)
# Create the embedding layer pretrained with GloVe Vectors (≈1 line)
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
# Propagate sentence_indices through your embedding layer, you get back the embeddings
embeddings = embedding_layer(sentence_indices)
# Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a batch of sequences.
X = LSTM(128, return_sequences=True)(embeddings)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X trough another LSTM layer with 128-dimensional hidden state
# The returned output should be a single hidden state, not a batch of sequences.
X = LSTM(128, return_sequences=False)(X)
# Add dropout with a probability of 0.5
X = Dropout(0.5)(X)
# Propagate X through a Dense layer with 5 units
X = Dense(5)(X)
# Add a softmax activation
X = Activation('softmax')(X)
# Create Model instance which converts sentence_indices into X.
model = Model(inputs=sentence_indices, outputs=X)
### END CODE HERE ###
return model
Run the following cell to create your model and check its summary. Because all sentences in the dataset are less than 10 words, we chose max_len = 10
. You should see your architecture, it uses "20,223,927" parameters, of which 20,000,050 (the word embeddings) are non-trainable, and the remaining 223,877 are. Because our vocabulary size has 400,001 words (with valid indices from 0 to 400,000) there are 400,001*50 = 20,000,050 non-trainable parameters.
model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_7 (InputLayer) (None, 10) 0
_________________________________________________________________
embedding_9 (Embedding) (None, 10, 50) 20000050
_________________________________________________________________
lstm_11 (LSTM) (None, 10, 128) 91648
_________________________________________________________________
dropout_11 (Dropout) (None, 10, 128) 0
_________________________________________________________________
lstm_12 (LSTM) (None, 128) 131584
_________________________________________________________________
dropout_12 (Dropout) (None, 128) 0
_________________________________________________________________
dense_6 (Dense) (None, 5) 645
_________________________________________________________________
activation_6 (Activation) (None, 5) 0
=================================================================
Total params: 20,223,927
Trainable params: 223,877
Non-trainable params: 20,000,050
_________________________________________________________________
As usual, after creating your model in Keras, you need to compile it and define what loss, optimizer and metrics your are want to use. Compile your model using categorical_crossentropy
loss, adam
optimizer and ['accuracy']
metrics:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
It's time to train your model. Your Emojifier-V2 model
takes as input an array of shape (m
, max_len
) and outputs probability vectors of shape (m
, number of classes
). We thus have to convert X_train (array of sentences as strings) to X_train_indices (array of sentences as list of word indices), and Y_train (labels as indices) to Y_train_oh (labels as one-hot vectors).
X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
Y_train_oh = convert_to_one_hot(Y_train, C = 5)
Fit the Keras model on X_train_indices
and Y_train_oh
. We will use epochs = 50
and batch_size = 32
.
model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True)
Epoch 1/50
132/132 [==============================] - 0s - loss: 1.6061 - acc: 0.2348
Epoch 2/50
132/132 [==============================] - 0s - loss: 1.5358 - acc: 0.3409
Epoch 3/50
132/132 [==============================] - 0s - loss: 1.4712 - acc: 0.3788
Epoch 4/50
132/132 [==============================] - 0s - loss: 1.4608 - acc: 0.3333
Epoch 5/50
132/132 [==============================] - 0s - loss: 1.3530 - acc: 0.4242
Epoch 6/50
132/132 [==============================] - 0s - loss: 1.2034 - acc: 0.6136
Epoch 7/50
132/132 [==============================] - 0s - loss: 1.0711 - acc: 0.6591
Epoch 8/50
132/132 [==============================] - 0s - loss: 0.9384 - acc: 0.7121
Epoch 9/50
132/132 [==============================] - 0s - loss: 0.9525 - acc: 0.6515
Epoch 10/50
132/132 [==============================] - 0s - loss: 0.9063 - acc: 0.6439
Epoch 11/50
132/132 [==============================] - 0s - loss: 0.7520 - acc: 0.6894
Epoch 12/50
132/132 [==============================] - 0s - loss: 0.6749 - acc: 0.7652
Epoch 13/50
132/132 [==============================] - 0s - loss: 0.5399 - acc: 0.8333
Epoch 14/50
132/132 [==============================] - 0s - loss: 0.5433 - acc: 0.8106
Epoch 15/50
132/132 [==============================] - 0s - loss: 0.5186 - acc: 0.8106
Epoch 16/50
132/132 [==============================] - 0s - loss: 0.4195 - acc: 0.8561
Epoch 17/50
132/132 [==============================] - 0s - loss: 0.4580 - acc: 0.8561
Epoch 18/50
132/132 [==============================] - 0s - loss: 0.5663 - acc: 0.7879
Epoch 19/50
132/132 [==============================] - 0s - loss: 0.3861 - acc: 0.8712
Epoch 20/50
132/132 [==============================] - 0s - loss: 0.4239 - acc: 0.8561
Epoch 21/50
132/132 [==============================] - 0s - loss: 0.2762 - acc: 0.9015
Epoch 22/50
132/132 [==============================] - 0s - loss: 0.4108 - acc: 0.8636
Epoch 23/50
132/132 [==============================] - 0s - loss: 0.3163 - acc: 0.9015
Epoch 24/50
132/132 [==============================] - 0s - loss: 0.2300 - acc: 0.9091
Epoch 25/50
132/132 [==============================] - 0s - loss: 0.4250 - acc: 0.8561
Epoch 26/50
132/132 [==============================] - 0s - loss: 0.3641 - acc: 0.8561
Epoch 27/50
132/132 [==============================] - 0s - loss: 0.3042 - acc: 0.8939
Epoch 28/50
132/132 [==============================] - 0s - loss: 0.2231 - acc: 0.9394
Epoch 29/50
132/132 [==============================] - 0s - loss: 0.2178 - acc: 0.9242
Epoch 30/50
132/132 [==============================] - 0s - loss: 0.1847 - acc: 0.9394
Epoch 31/50
132/132 [==============================] - 0s - loss: 0.1524 - acc: 0.9545
Epoch 32/50
132/132 [==============================] - 0s - loss: 0.2933 - acc: 0.9091
Epoch 33/50
132/132 [==============================] - 0s - loss: 0.2138 - acc: 0.9394
Epoch 34/50
132/132 [==============================] - 0s - loss: 0.2597 - acc: 0.9015
Epoch 35/50
132/132 [==============================] - 0s - loss: 0.2045 - acc: 0.9242
Epoch 36/50
132/132 [==============================] - 0s - loss: 0.3323 - acc: 0.8485
Epoch 37/50
132/132 [==============================] - 0s - loss: 0.1962 - acc: 0.9394
Epoch 38/50
132/132 [==============================] - 0s - loss: 0.2409 - acc: 0.9091
Epoch 39/50
132/132 [==============================] - 0s - loss: 0.1341 - acc: 0.9697
Epoch 40/50
132/132 [==============================] - 0s - loss: 0.1353 - acc: 0.9621
Epoch 41/50
132/132 [==============================] - 0s - loss: 0.1379 - acc: 0.9470
Epoch 42/50
132/132 [==============================] - 0s - loss: 0.1761 - acc: 0.9318
Epoch 43/50
132/132 [==============================] - 0s - loss: 0.0868 - acc: 0.9773
Epoch 44/50
132/132 [==============================] - 0s - loss: 0.1281 - acc: 0.9621
Epoch 45/50
132/132 [==============================] - 0s - loss: 0.0704 - acc: 0.9924
Epoch 46/50
132/132 [==============================] - 0s - loss: 0.0636 - acc: 0.9924
Epoch 47/50
132/132 [==============================] - 0s - loss: 0.0622 - acc: 0.9848
Epoch 48/50
132/132 [==============================] - 0s - loss: 0.0400 - acc: 0.9924
Epoch 49/50
132/132 [==============================] - 0s - loss: 0.0503 - acc: 0.9924
Epoch 50/50
132/132 [==============================] - ETA: 0s - loss: 0.0392 - acc: 0.992 - 0s - loss: 0.0384 - acc: 0.9924
<keras.callbacks.History at 0x7efe74839668>
Your model should perform around 90% to 100% accuracy on the training set. The exact accuracy you get may be a little different. Run the following cell to evaluate your model on the test set.
X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen)
Y_test_oh = convert_to_one_hot(Y_test, C = 5)
loss, acc = model.evaluate(X_test_indices, Y_test_oh)
print()
print("Test accuracy = ", acc)
32/56 [================>.............] - ETA: 0s
Test accuracy = 0.821428571429
You should get a test accuracy between 80% and 95%. Run the cell below to see the mislabelled examples.
# This code allows you to see the mislabelled examples
C = 5
y_test_oh = np.eye(C)[Y_test.reshape(-1)]
X_test_indices = sentences_to_indices(X_test, word_to_index, maxLen)
pred = model.predict(X_test_indices)
for i in range(len(X_test)):
x = X_test_indices
num = np.argmax(pred[i])
if(num != Y_test[i]):
print('Expected emoji:'+ label_to_emoji(Y_test[i]) + ' prediction: '+ X_test[i] + label_to_emoji(num).strip())
Expected emoji:? prediction: she got me a nice present ❤️
Expected emoji:? prediction: work is hard ?
Expected emoji:? prediction: This girl is messing with me ❤️
Expected emoji:? prediction: work is horrible ?
Expected emoji:? prediction: any suggestions for dinner ?
Expected emoji:? prediction: you brighten my day ❤️
Expected emoji:? prediction: she is a bully ❤️
Expected emoji:? prediction: My life is so boring ❤️
Expected emoji:? prediction: go away ⚾
Expected emoji:? prediction: yesterday we lost again ⚾
Now you can try it on your own example. Write your own sentence below.
# Change the sentence below to see your prediction. Make sure all the words are in the Glove embeddings.
x_test = np.array(['I am feeling happy'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(X_test_indices)
print(x_test[0] +' '+ label_to_emoji(np.argmax(model.predict(X_test_indices))))
[[ 185457. 52943. 146352. 173081. 0. 0. 0. 0.
0. 0.]]
I am feeling happy ?
You have completed this notebook! ❤️❤️❤️
Embedding()
layer can be initialized with pretrained values. LSTM()
has a flag called return_sequences
to decide if you would like to return every hidden states or only the last one. Dropout()
right after LSTM()
to regularize your network. "Congratulations on finishing this assignment and building an Emojifier."
"We hope you're happy with what you've accomplished in this notebook!"
Thanks to Alison Darcy and the Woebot team for their advice on the creation of this assignment.