-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent results when predicting a single sentence versus predicting labels for dev set #30
Comments
This is strange -- predicting with different batch sizes shouldn't change
the results. This happens when you run inference using the original code w/
batch size 1?
…On Tue, Jun 25, 2019 at 4:47 PM Aneesh Kotnana ***@***.***> wrote:
I'm currently trying to modify the code to predict the labels for an
arbitrary sentence (not part of the train, dev, or test set). I was getting
weird behavior when looking at the labels, particularly the padding tokens
had its own label and some of the tokens were not being labelled correctly.
This discrepancy even existed when I tried to predict the labels for one
of the sentences in my dev set. I was able to get the predicted labels from
the dev set using the code from #16
<#16>. When I compared the
predicted labels from the dev test to the same single sentence prediction,
some of the labels were off.
I decided to look at the feature vectors that were generated in the
preprocessing stage for both of sentences and surprisingly, both vectors
were almost the exact same (except for some padding).
token vector:
sentence from dev set
[ 0 18152 18152 18152 18152 1279 12659 3128 4 3
338 7 18152 18152 18152 18152 243801 6 4809 7282
50959 11 31481 1020 89238 6 33746 7 10587 2578
813 243021 40 1020 11 60 7 92 4 40
10 4704 9889 4 106 21 12404 334 7 7746
7 1296 3995 4 2486 4816 7 10587 4 11
3 220 7 10 59651 5 0 0 0 0
0]
single sentence:
[ 0 18152 18152 18152 18152 1279 12659 3128 4 3
338 7 18152 18152 18152 18152 243801 6 4809 7282
50959 11 31481 1020 89238 6 33746 7 10587 2578
813 243021 40 1020 11 60 7 92 4 40
10 4704 9889 4 106 21 12404 334 7 7746
7 1296 3995 4 2486 4816 7 10587 4 11
3 220 7 10 59651 5 0]
char vector:
sentence from dev set
[ 0 29 29 29 31 29 27 28 30 24 24 26 29 27 31 31 28 45 6 7 15 5 7 15
5 3 4 12 18 15 20 7 4 16 20 15 5 36 4 6 7 3 20 7 3 18 34 29
31 30 31 23 23 23 24 30 33 26 25 23 25 25 26 12 15 5 7 15 5 12 35 12
17 12 4 38 4 18 8 3 12 15 5 4 7 3 9 12 17 38 9 12 14 12 15 12
5 6 7 5 3 15 9 9 12 5 3 8 8 7 3 20 5 40 5 7 15 5 12 35
12 17 12 4 38 4 18 7 37 4 20 7 14 7 5 18 34 4 7 14 8 7 20 3
4 16 20 7 3 8 8 7 3 20 5 5 18 18 15 31 24 27 21 27 31 23 21 23
30 21 31 33 33 3 34 4 7 20 40 3 15 9 17 3 5 4 18 34 3 17 17 36
3 34 4 7 20 3 32 18 15 5 12 9 7 20 3 35 17 7 12 15 4 7 20 44
3 17 36 4 6 7 20 7 12 5 5 12 14 16 17 4 3 15 7 18 16 5 20 7
4 16 20 15 18 34 3 8 8 20 7 32 12 3 4 12 18 15 18 34 17 12 39 6
4 4 18 16 32 6 36 14 18 9 7 20 3 4 7 9 7 39 20 7 7 5 18 34
4 7 14 8 7 20 3 4 16 20 7 36 3 15 9 4 6 7 8 18 12 15 4 5
18 34 3 32 18 14 8 3 5 5 21 0]
single sentence:
[ 0 29 29 29 31 29 27 28 30 24 24 26 29 27 31 31 28 45 6 7 15 5 7 15
5 3 4 12 18 15 20 7 4 16 20 15 5 36 4 6 7 3 20 7 3 18 34 29
31 30 31 23 23 23 24 30 33 26 25 23 25 25 26 12 15 5 7 15 5 12 35 12
17 12 4 38 4 18 8 3 12 15 5 4 7 3 9 12 17 38 9 12 14 12 15 12
5 6 7 5 3 15 9 9 12 5 3 8 8 7 3 20 5 40 5 7 15 5 12 35
12 17 12 4 38 4 18 7 37 4 20 7 14 7 5 18 34 4 7 14 8 7 20 3
4 16 20 7 3 8 8 7 3 20 5 5 18 18 15 31 24 27 21 27 31 23 21 23
30 21 31 33 33 3 34 4 7 20 40 3 15 9 17 3 5 4 18 34 3 17 17 36
3 34 4 7 20 3 32 18 15 5 12 9 7 20 3 35 17 7 12 15 4 7 20 44
3 17 36 4 6 7 20 7 12 5 5 12 14 16 17 4 3 15 7 18 16 5 20 7
4 16 20 15 18 34 3 8 8 20 7 32 12 3 4 12 18 15 18 34 17 12 39 6
4 4 18 16 32 6 36 14 18 9 7 20 3 4 7 9 7 39 20 7 7 5 18 34
4 7 14 8 7 20 3 4 16 20 7 36 3 15 9 4 6 7 8 18 12 15 4 5
18 34 3 32 18 14 8 3 5 5 21 0]
seq len vector:
sentence from dev set
[65]
single sentence:
[65]
tok len vector:
sentence from dev set
[ 1 4 4 4 4 4 9 7 1 3 4 2 4 4 4 4 13 2 4 8 10 3 10 1
11 2 8 2 11 7 4 14 5 1 3 4 2 3 1 5 1 12 8 1 5 2 12 6
2 12 2 5 5 1 8 7 2 11 1 3 3 6 2 1 7 1 1 0 0 0 0]
single sentence:
[ 1 4 4 4 4 4 9 7 1 3 4 2 4 4 4 4 13 2 4 8 10 3 10 1
11 2 8 2 11 7 4 14 5 1 3 4 2 3 1 5 1 12 8 1 5 2 12 6
2 12 2 5 5 1 8 7 2 11 1 3 3 6 2 1 7 1 1]
------------------------------
Predictions:
sentence from dev set
[3, 4, 4, 5, 0, 0, 0, 0, 0, 0, 0, 3, 4, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
single sentence:
[3, 4, 4, 5, 8, 16, 6, 8, 6, 0, 0, 3, 4, 4, 4, 5, 8, 14, 8, 0, 6, 0, 8, 0, 8, 0, 6, 0, 6, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 8, 8, 8, 6, 6, 6,6, 8, 0, 8, 0, 6, 8, 6, 6, 6, 6, 0, 0]
You can see that some of the labels are similar (i.e 3 4 5 are a B, I, L
group for a certain class and 10 is a U group for a certain class).
However, the single sentence prediction has a bunch of additional predicted
classes for the exact same sentence with virtually the same feature vectors.
I'm wondering if there is some internal state maintained in the model with
context which would indicate why the results when running on the dev set is
different than a single sentence prediction. I didn't enable the documents
flag so I thought each sentence would be treated as separate. If this is
not the case, how can I train the model to treat each sentence separately?
If there is no internal state maintained in the model, could padding
(which is the only difference between the feature vectors) play a large
role in the prediction of classes? In that case, how should I be padding
the feature vectors for single sentence prediction? What I'm doing right
now is just padding it to the maximum length of each feature vector in the
batch, except for 1 sentence, the batch size is 1.
Thanks in advance.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#30?email_source=notifications&email_token=AAY5TNZ4EKOAJ3L2WOZS2LDP4J75VA5CNFSM4H3MDUSKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G3UW47Q>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAY5TN2HSQHZ7XFTHHAXEJLP4J75VANCNFSM4H3MDUSA>
.
|
No, it happens when I try to dynamically generate the feature vectors for a single sentence. To clarify, running the model on an example from the dev set using a batch size of 1 still gives me accurate labels. From this, I'm guessing that there's no internal state maintained in the model, so the results should be deterministic. However, when I try to dynamically generate the vectors for a single sentence, I'm getting inaccurate labels even though the input vectors for both are virtually the same. I'm going to run a couple tests to see if using the same padding changes the result at all. I'll also post my code below if you want to take a look at that. Most of the code is pieced together from your existing methods, just modified to work on a single sentence standalone. Preprocessingdef single_sentence_preprocess(sentence, token_map, shape_map, char_map, token_int_str_map, shape_int_str_map, char_int_str_map):
update_vocab = True
update_chars = True
pad_width = 1
lines = word_tokenize(sentence)
sent_len = len(lines)
max_word_len = max(map(len, [word for word in lines]))
max_len_with_pad = 2 + sent_len
oov_count = 0
if sent_len == 0:
return 0, 0, 0
tokens = np.zeros(max_len_with_pad, dtype=np.int64)
shapes = np.zeros(max_len_with_pad, dtype=np.int64)
chars = np.zeros(max_len_with_pad*max_word_len, dtype=np.int64)
sent_lens = []
tok_lens = []
tokens[:pad_width] = token_map[PAD_STR]
shapes[:pad_width] = shape_map[PAD_STR]
chars[:pad_width] = char_map[PAD_STR]
tok_lens.extend([1]*pad_width)
current_sent_len = 0
char_start = pad_width
idx = pad_width
for token_str in lines:
current_sent_len += 1
token_shape = shape(token_str)
token_str_normalized = re.sub("\d", "0", token_str)
if token_shape not in shape_map:# and update_vocab:
shape_map[token_shape] = len(shape_map)
print("LOGGING: shape map updated")
for char in token_str:
if char not in char_map and update_chars:
char_map[char] = len(char_map)
char_int_str_map[char_map[char]] = char
print("LOGGING: char map updated")
tok_lens.append(len(token_str))
if token_str_normalized not in token_map:
oov_count += 1
if update_vocab:
print("LOGGING: token map updated with " + token_str_normalized)
token_map[token_str_normalized] = len(token_map)
token_int_str_map[token_map[token_str_normalized]] = token_str_normalized
tokens[idx] = token_map.get(token_str_normalized, token_map[OOV_STR])
shapes[idx] = shape_map[token_shape] # if update_vocab else shape_map.get(token_shape, shape_map[token_shape[0]])
chars[char_start:char_start+tok_lens[-1]] = [char_map.get(char, char_map[OOV_STR]) for char in token_str]
char_start += tok_lens[-1]
idx += 1
sent_lens.append(current_sent_len)
current_sent_len = 0
tokens[idx:idx + pad_width] = token_map[PAD_STR]
shapes[idx:idx + pad_width] = shape_map[PAD_STR]
chars[char_start:char_start + pad_width] = char_map[PAD_STR]
char_start += pad_width
tok_lens.extend([1] * pad_width)
idx += pad_width
#tokens[idx:idx+pad_width] = token_map[PAD_STR]
#shapes[idx:idx+pad_width] = shape_map[PAD_STR]
#chars[char_start:char_start+pad_width] = char_map[PAD_STR]
#char_start += pad_width
#tok_lens.extend([1] * pad_width)
padded_len = 1*(len(sent_lens)+1)*pad_width+sum(sent_lens)
tokens = tokens[:padded_len]
shapes = shapes[:padded_len]
chars = chars[:sum(tok_lens)]
return tokens, shapes, chars, np.asarray(sent_lens), np.asarray(tok_lens)
def pad_to_length(x, m):
return np.pad(x,(0, m - x.shape[0]), mode = 'constant')
def batch_sentence_preprocess(sentences, token_map, shape_map, char_map, token_int_str_map, shape_int_str_map, char_int_str_map):
# need to normalize token, shapes, char, tok_lens
batch_members = []
for sentence in sentences:
eval_token, eval_shape, eval_char, eval_seq_len, eval_tok_len = single_sentence_preprocess(sentence, token_map, shape_map, char_map, token_int_str_map, shape_int_str_map, char_int_str_map)
batch_members.append((eval_token, eval_shape, eval_char, eval_seq_len, eval_tok_len))
max_token_len = max([len(batch_member[0]) for batch_member in batch_members])
max_shape_len = max([len(batch_member[1]) for batch_member in batch_members])
max_char_len = max([len(batch_member[2]) for batch_member in batch_members])
max_token_tok_len = max([len(batch_member[4]) for batch_member in batch_members])
eval_token_batch = np.asarray([pad_to_length(x[0], max_token_len) for x in batch_members])
eval_shape_batch = np.asarray([pad_to_length(x[1], max_shape_len) for x in batch_members])
eval_char_batch = np.asarray([pad_to_length(x[2], max_char_len) for x in batch_members])
eval_tok_len_batch = np.asarray([pad_to_length(x[4], max_token_tok_len) for x in batch_members])
eval_seq_len_batch = np.asarray([x[3] for x in batch_members])
mask_batch = np.zeros(eval_token_batch.shape)
actual_seq_lens = np.add(np.sum(eval_seq_len_batch, axis=1), 1 * pad_width * ((eval_seq_len_batch != 0).sum(axis=1) + 1))
for i, seq_len in enumerate(actual_seq_lens):
mask_batch[i, :seq_len] = 1
return eval_token_batch, eval_shape_batch, eval_char_batch, eval_tok_len_batch, eval_seq_len_batch, mask_batch Predictionsdef run_prediction(eval_batches, extra_text=""):
predictions = []
eval_token_batch, eval_shape_batch, eval_char_batch, eval_seq_len_batch, eval_tok_len_batch, eval_mask_batch = eval_batches
batch_size, batch_seq_len = eval_token_batch.shape
char_lens = np.sum(eval_tok_len_batch, axis=1)
max_char_len = np.max(eval_tok_len_batch)
eval_padded_char_batch = np.zeros((batch_size, max_char_len * batch_seq_len))
for b in range(batch_size):
char_indices = [item for sublist in [range(i * max_char_len, i * max_char_len + d) for i, d in
enumerate(eval_tok_len_batch[b])] for item in sublist]
eval_padded_char_batch[b, char_indices] = eval_char_batch[b][:char_lens[b]]
#print(max_char_len)
#print(eval_padded_char_batch)
char_embedding_feeds = {
char_embedding_model.input_chars: eval_padded_char_batch,
char_embedding_model.batch_size: batch_size,
char_embedding_model.max_seq_len: batch_seq_len,
char_embedding_model.token_lengths: eval_tok_len_batch,
char_embedding_model.max_tok_len: max_char_len
}
basic_feeds = {
model.input_x1: eval_token_batch,
model.input_x2: eval_shape_batch,
model.input_y: np.zeros(eval_token_batch.shape),
model.input_mask: eval_mask_batch,
model.max_seq_len: batch_seq_len,
model.batch_size: batch_size,
model.sequence_lengths: eval_seq_len_batch
}
basic_feeds.update(char_embedding_feeds)
total_feeds = basic_feeds.copy()
#print(total_feeds)
if FLAGS.viterbi:
preds, transition_params = sess.run([model.predictions, model.transition_params], feed_dict=total_feeds)
viterbi_repad = np.empty((batch_size, batch_seq_len))
for batch_idx, (unary_scores, sequence_lens) in enumerate(zip(preds, eval_seq_len_batch)):
viterbi_sequence, _ = tf.contrib.crf.viterbi_decode(unary_scores, transition_params)
viterbi_repad[batch_idx] = viterbi_sequence
predictions.append(viterbi_repad)
else:
preds, scores = sess.run([model.predictions, model.unflat_scores], feed_dict=total_feeds)
predictions.append(preds)
return predictions
if FLAGS.predict_only:
f = open(FLAGS.sample_text_file_name).read().strip().split("\n")
eval_token_batch, eval_shape_batch, eval_char_batch, eval_seq_len_batch, eval_tok_len_batch, eval_mask_batch = batch_sentence_preprocess(f, vocab_str_id_map, shape_str_id_map, char_str_id_map, vocab_id_str_map, shape_id_str_map, char_id_str_map)
eval_batches = (eval_token_batch, eval_shape_batch, eval_char_batch, eval_seq_len_batch, eval_tok_len_batch, eval_mask_batch)
predictions = run_prediction(eval_batches)[0]
for batch in range(len(eval_token_batch)):
eval_token = eval_token_batch[batch].tolist()
for x in zip([vocab_id_str_map[each] for each in eval_token], [labels_id_str_map[each] for each in predictions[batch].tolist()]):
print(x)
print("*******") |
Never mind, it looks like I mixed up the order of eval_seq_len_batch and eval_tok_len_batch in this line return eval_token_batch, eval_shape_batch, eval_char_batch, eval_tok_len_batch, eval_seq_len_batch, mask_batch Changing it to return eval_token_batch, eval_shape_batch, eval_char_batch, eval_seq_len_batch, eval_tok_len_batch, mask_batch fixed the issue I was having :) I also got the code to work to predict labels on a real-life example. I can put my stuff into a PR if you'd like (I'll try to clean it up a little bit as well) |
that's great, happy to hear you fixed it! please do submit a PR :)
…On Wed, Jun 26, 2019 at 9:40 AM Aneesh Kotnana ***@***.***> wrote:
Closed #30 <#30>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#30?email_source=notifications&email_token=AAY5TN3C32VIQJV7L55EQWTP4NWVPA5CNFSM4H3MDUSKYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOSF6YVPQ#event-2440923838>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAY5TN4RFKNKIS5HQD6LIHTP4NWVPANCNFSM4H3MDUSA>
.
|
I'm currently trying to modify the code to predict the labels for an arbitrary sentence (not part of the train, dev, or test set). I was getting weird behavior when looking at the labels, particularly the padding tokens had its own label and some of the tokens were not being labelled correctly.
This discrepancy even existed when I tried to predict the labels for one of the sentences in my dev set. I was able to get the predicted labels from the dev set using the code from #16. When I compared the predicted labels from the dev test to the same single sentence prediction, some of the labels were off.
I decided to look at the feature vectors that were generated in the preprocessing stage for both of sentences and surprisingly, both vectors were almost the exact same (except for some padding).
token vector:
char vector:
seq len vector:
tok len vector:
Predictions:
You can see that some of the labels are similar (i.e 3 4 5 are a B, I, L group for a certain class and 10 is a U group for a certain class). However, the single sentence prediction has a bunch of additional predicted classes for the exact same sentence with virtually the same feature vectors.
I'm wondering if there is some internal state maintained in the model with context which would indicate why the results when running on the dev set is different than a single sentence prediction. I didn't enable the documents flag so I thought each sentence would be treated as separate. If this is not the case, how can I train the model to treat each sentence separately?
If there is no internal state maintained in the model, could padding (which is the only difference between the feature vectors) play a large role in the prediction of classes? In that case, how should I be padding the feature vectors for single sentence prediction? What I'm doing right now is just padding it to the maximum length of each feature vector in the batch, except for 1 sentence, the batch size is 1.
Thanks in advance.
The text was updated successfully, but these errors were encountered: