We all use emojis to express our sentiments in an easier, more effective way. With emojis, we can express, enhance or modify the sentiment of a sentence. But emojis are also used to replace some words, or just for fun. The use of emojis opens a new window to sentiment analysis in text, as some language models do not understand them. Today we will explain how we have enhanced Xatkit with emoji support. Thanks to this new feature, our chatbots will be able to process emojis in a user utterance and extract as much useful knowledge as possible from them.
Why is this important?
If you are a Twitter, Facebook or WhatsApp user, you probably have used emojis many times. There is an online website that tracks the count of the most used emojis on Twitter. At April 23rd of 2021, the most used emoji is “????”, a.k.a. “face with tears of joy”, with an insane number of 3.243.467.795 usages. New generations are growing with this kind of language, and for most people born in the 80s or 90s onwards (the so-called millennial) they are part of their everyday lives.
So knowing that emojis are widely used, we need tools to understand their inherent meaning. When it comes to natural language processing, emojis are a challenge as it is a difficult task to know the emojis semantic meaning since they may have subjective connotations, like irony or sarcasm. Moreover, the same emoji in a sentence may have different meanings for two people. There is no perfect solution. In fact, in Xatkit we provide two different options to chatbot designers: you can either process the user utterance to replace the emojis with the equivalent words representing their more common meaning or to keep them but annotate the processed text with useful information about the embedded emojis so that you can decide yourself how to interpret and react to them.
Both solutions are important as emoji support is still limited in state-of-the-art NLP engines so you should not count on them to do this emoji interpretation work for you. For instance, Dialogflow provides at least a partial solution where you can explicitly define intents involving emojis (see this article) but you need to manually list the emojis you want to accept, there is no predefined general solution that allows you to deal with them automatically. Other tools such as PerspectiveAPI, which you can use in your Xatkit chatbots to detect toxicity in intents, can detect toxicity in emojis. For instance, for the PerspectiveAPI NLP engine, the ???? emoji is considered more toxic than the ???? emoji. But in general you will run into trouble when trying to process emojis with NLP tools, sometimes even getting errors or failed matching due to “unexpected characters”. Let’s see how we provide emoji support in Xatkit bots.
Emoji to text pre-processor
Thanks to this language pre-processor a chatbot can detect emojis in the user utterance and process the input before doing the intent recognition. This pre-processor can handle emojis in 2 different ways
- Removing them. That is, replacing them with an empty string, in case we just want to forget about any kind of emoji processing and simulate they were never there
- Replacing them with equivalent wording (adding the necessary spaces before and after the emoji, to avoid undesired word joints). The replacement text is provided by the emoji-java library, which contains, for each emoji, a set of aliases. The first alias of an emoji is chosen to replace the actual emoji. This method sometimes can produce rare results. For instance, “I am happy ????” would be translated as “I am happy heart eyes”. This is not what the emoji actually means within the context of this comment. Something that we could consider as valid would be “I am happy and in love” or “I am very happy”. But to produce this we would need to better understand the context where the emoji occurs, and even like this, it would be a very subjective task. This is something that is currently quite difficult to perform, but as a first approach, we can manage emojis and translate easier emojis (in terms of semantic meaning) to text, e.g. from “I am from ????????” to “I am from Spain”
The following excerpt of code was extracted from the EmojiToTextBot example bot. This chatbot simply asks 2 things to the user. First, a flag, which is processed as a countryCode entity, and then another emoji, which is expected to be an animal emoji, but actually it can be any emoji, as the chatbot maps it into an any entity. The code below corresponds to the “animal recognition” part. Note that since the preprocessor takes care of the emojis, the chatbot access and manipulates the matched intents as parameters as usual, no need for any ad-hoc code.
This post-processor provides to the chatbot a new data structure in the recognizedIntent.getNlpData(). It is a Java set that contains an EmojiData object for each different kind of emoji in the user input. If an emoji appears more than 1 time, it will only be created an EmojiData object for it. These objects store useful information about the emojis in the message, that can be used in any way the chatbot designer sees fit. The attributes we calculate are:
- unicode: the unicode of the emoji
- aliases: the aliases of the emoji
- tags: the tags of the emoji
- supportsSkinTone: whether this emoji supports skin tone or not
- skinTone: the skin tone of the emoji
- description: the description of the emoji
- unicodeBlock: the Unicode block (category) of the emoji
- frequencyInSentimentRanking: the frequency in sentiment ranking of the emoji
- negativeSentiment: the negative sentiment of the emoji
- neutralSentiment: the neutral sentiment of the emoji
- positiveSentiment: the positive sentiment of the emoji
- occurrences: the number of occurrences of the emoji in the text
- positionsInText: the positions of the emoji in the text
As you can see in the list of attributes we defined for an emoji, there are sentiment-related attributes. Unfortunately, these attributes are not available for all emojis, and they will have a “default unset value” if it is not available. This is because the source of information of the sentiments is an Emoji Sentiment Ranking, which has only information about the 969 most frequent emojis on Twitter at the date this project was done (2015). So the most common emojis will have this information, but most of the emojis (actually the least used) do not. This ranking can be seen as a European language-independent resource for automated sentiment analysis
The kind of sentiment we obtain is a positive/neutral/negative type. It can be useful to distinguish positive and negative comments (e.g. a customer reaction or opinion about a purchase). As we mentioned before, it is a difficult task to infer the sentiment of an emoji based on the context. There are many factors involved in it. But as a first approach, we can obtain the context-independent sentiment of an emoji that was obtained thanks to 83 human annotators who labeled over 1.6 million tweets in 13 European languages by the sentiment polarity (4% of the tweets contained emojis).
Note that there are possibilities to guess the sentiment of the whole sentence combining the emojis sentiments and the text sentiment (which can be obtained with our StanfordNLP EnglishSentiment post-processor). For instance, if a comment has positive sentiment in the emojis but negative sentiment in the plain text, we could infer that maybe the emojis are used to ridicule someone or to emphasize the (negative) intention of the text.
The following excerpt of code was extracted from the EmojiSentimentBot example bot. This chatbot tells you if you are happy, neutral, or sad, based on the sentiment of the emojis provided by the user.
This post summarizes the need to process emojis in chat conversations since they are widely used and especially by the youngest generations. To provide solutions to emoji transcription, emoji analysis and sentiment detection, we built an EmojiToTextPreProcessor and an EmojiPostProcessor, which can be used within your Xatkit chatbots (remember not to use them together). We hope in the future we can bring you new advances in this area to extract better information about emojis ????.