Short-Term Memory In Natural Language Processing
Short-Term Memory In Natural Language Processing
Read to learn more about Short-Term Memory in regards to Natural Language processing.
Join the DZone community and get the full member experience.Join For Free
In this article, I’ll talk about short-term memory (STM) and how it is used to maintain conversational context in NLP/NLU pipelines.
It is surprising how non-trivial it is to train a machine to understand and maintain the context of a conversation. As humans, this comes absolutely naturally to us; we do it instinctively, without thinking about rules or algorithms.
But what is the conversational context, and why do we even need it?
Conversation context helps to optimize our conversation by allowing us to have implicit semantic entities in our dialogs. In other words, when we have a context to our conversation, we don’t need to repeat every bit of information — once it was mentioned prior — in every subsequent sentence in a multi-sentence conversation.
Let’s look at the series of simple questions to better illustrate this point:
- “What’s the weather in Tokyo right now?” — this question contains all the necessary information to understand it: the subject (“weather”), the place (“Tokyo”) and time (“right now”).
- “How about now in Moscow?” — this question is clearly missing the subject of the question while providing the place (“Moscow”) and time (“now”); this obviously cannot be understood unless there’s an existing context.
- “What’s the weather again?” — this question is missing both time and place while only providing the subject of the question (“weather”); this can be processed using either some default values (local time and user location) or the existing context, if available.
Questions (2) and (3) above can only be answered or understood in general if there is an existing context to this conversation. However, if questions were asked in that order and therefore the context exists — the questions (2) and (3) don’t have to be exhaustive and can be very short without needless repetition of semantic entities that can be automatically borrowed from the context of the conversation. Note that this is precisely how humans converse and this is universal in all human languages.
Sometimes we can get away using some default value for things like time and place (i.e. current time and current user location) if those are available and applicable. It is also important to note that when the question is asked for the first time, it should contain all necessary information (excluding default value substitutions) or the system won’t be able to answer it since there is no context to borrow from.
NOTE: when I say “first time,” it could mean the very first question in the dialog or the first question after the change of subject in existing conversation.
Short-Term Memory (STM) is a common name for the class of algorithms helping to maintain the conversation context. It basically mimics the STM function in the human brain. Specific implementation details of STM differ from product to product but general approach remains the same.
Most of STM implementations are based on semantic modeling (more on semantic modeling here), which essentially splits input sentence into a sequence of semantic entities. Once STM gives these entities, its job is to maintain the list of the most recent ones as conversation proceeds and properly reset that list when the conversation ends, times out, or changes the subject.
Now, maintaining the list of the most recent semantic entities is a lot trickier than it sounds.
Let’s consider a single semantic entity from the sentence. STM can take one of the following three actions when considering such entity:
- It can be added to (stored in) the STM
- It can replace (override) some existing entity already stored in STM
- It can trigger context switch and reset (nullify) entire conversation in STM
Which action that can be taken by STM depends on two properties of the semantic entity:
- Entity group. Sometimes this can be referred to as entity class or type.
- Whether or not this entity has direct or indirect values. For example,
DATEentities have implicit values (actual dates) and we call it a concrete entity, while
COMPANYdon’t have any specific values in the semantic model and exist only to denote a type of entity and therefore called abstract entities.
Given these properties, we can come up with a simple set of rules that an STM can follow to maintain the conversation. For each semantic entity in the input sentence:
- For concrete entity:
Store if STM doesn’t have any concrete entity of the same group, or
Override existing concrete entity if STM has one with the same group
- For abstract entity:
Clear conversation if STM already has an abstract entity of another group, or
Override the existing abstract entity if STM has one with the same group, or
Store in STM otherwise
Let’s play out these rules on a simple example. Consider the following sequence of questions in a brand new conversation (in this example
FORECAST are abstract entities from the same group, while
DATE are concrete entities):
- What’s the weather in London today?
STM is empty at this point. All relevant semantic entities are stored in STM per our rules. This question has all necessary information to get answered.
[WEATHER, GEO(London), DATE(today)]
- And what about Berlin?
The only semantic entity that is recognized in this sentence is
GEO(Berlin). It replaces the
GEO(London)in STM, and the system grabs
DATE(today)from STM to provide an answer, i.e. today’s weather in Berlin.
[WEATHER, GEO(Berlin), DATE(today)]
- Next week forecast?
Again, the only recognized entities here are
DATE(next week). They both replace their counterparts in STM while the system takes
GEO(Berlin)and to provide the desired answer.
[FORECAST, GEO(Berlin), DATE(next week)]
Context switching can happen not only because we detected a new abstract entity. Just like in human conversation, the context can be cleared in two additional cases:
- Conversation times out. As in human dialog if you take too long to reply — your counterpart will likely forget what you were talking about. In STM, this is typically some hard number like 5 minutes after which STM “forgets” the previous context and treats the next sentence as if it is a beginning of the new conversation.
- Explicit context switch. Just like in human conversation where you can say
“Hey, let’s talk about X”— STM implementation should support an explicit context reset typically done through an API call or UI action.
The real-life system often would have more sophistication in dealing with conversation context but won’t be far away from the simple algorithm I’ve described above. If you like to play with STM and see how it works in real life, feel free to look at DataLingvo and tinker with its APIs and examples.
Published at DZone with permission of Aaron Radzinski , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.