Skip to content

Latest commit

 

History

History
54 lines (33 loc) · 3.36 KB

1808.md

File metadata and controls

54 lines (33 loc) · 3.36 KB

Contextual string embeddings for sequence labeling, Akbik et al., 2018

Paper, Tags: #nlp, #embeddings

In this paper we propose to leverage the internal states of a trained character language model to produce a novel type of word embedding which we refer to as contextual string embedding.

  • They're trained without any explicit notion of words, modelling words as sequences of characters
  • They are contextualized by their surrounding text, the same word can have different embeddings depending on its contextual use.

Previous work

NER and POS are formulated as sequence labeling problems, text is treated as a sequence of words, current models use BI-LSTM-CRF architectures. A crucial component here are word embeddings, current methods concatenate 3 types:

  1. Classical word embeddings: pre-trained over very large corpora and shown to capture latent syntactic and semantic similarities
  2. Character-level features: not pre-trained, trained on task data to capture task-specific subword features
  3. Contextualized word embeddings, Peters et al., 2017 and Peters et al., 2018, capture word semantics in context to address the polysemous and context-dependent nature of words.

Our contribution

Our contextualized character-level embeddings combine the best attributes of the above-mentioned embeddings:

  1. Ability to pre-train on large unlabeled corpora
  2. Capture word meaning in context and produce different embeddings for polysemous words
  3. Model words and context as sequence of characters, to better handle rare and misspelled words

Contextual string embeddings

Our approach passes sentences as sequences of characters into a character level LM to form word-level embeddings.

LSTMs

Currently, LSTMs far outperform n-gram based models due to the ability to flexibly encode long-term dependencies with their hidden state. The model posseses a hidden state for each character in the sequence. We use the hidden states of a forward-backward recurrent NN to create the embeddings.

We concatenate the following hidden character states for each word: from the flM, we extract the output hidden state after the last character in the word. We extract the output hidden state before the word's first character from the bLM to capture semantic-syntactic information from the end of the sentence to this characters. Both hidden states are concatenated to form the final embedding. Our approach produces different embeddings for the same lexical word string in different contexts.

Sequence labelling architecture

The final word embeddings are passed into a BiLSTM-CRF module (Huang et al., 2015). We add classic word embeddings to add potentially greater latent word-level semantics to our proposed embeddings.

Experiments

We evaluate our embeddings in stacked combinations with the 3 types of embeddings discussed previously:

  1. Pre-trained, classic word embeddings
  2. Task-trained, not pre-trained static character features
  3. Pre-trained contextual word embeddings (Peters et al., 2018)

Results

  • New state-of-the-art for NER
  • Good performance on syntactic tasks
  • Traditional word embeddings are helpful
    • since they capture word-level semantics that complement the strong chracter-level features of our proposed embeddings
  • Task specific features are unnecessary