Interactive data visualizations for Natural Language Processing
In collaboration with the Amazon MXNet team.
Machine learning is mostly a black box.
In particular, Natural Language Processing has become something most people just accept as a part of their lives. "Hey Siri", "Hey Alexa", and "Ok Google" have all become nearly household phrases, but understanding the data underneath their responses is usually left the experts. Being able to communicate the importance of these data is even more difficult.
Even though current data visualization methods exist, nothing more robust than (pretty much) the brat rapid annotation tool is currently in production.
Further, web apps like this try to visualize things like twitter sentiment but fail to provide clear context for the content. My work at Emory NLP contributed to making a dynamic and interactive visualization system for our lab to make sentiment analysis more communicable.
Instead of viewing word dependencies, parts of speech, and other meta data in a cluttered fashion all at once, I prototyped a depth-separated menu for every word that shows its dependencies, part of speech, and coreferences.
Instead of being confined to traditional 2D scrolling in order to analyze a long list of sentences, I created a 3D sentence scrolling concept that allows users to focus on any given sentence while still maintaining a context from its surrounding content.
Playing with language
What does the structure of grammar look like, and how can you explore it?
In order to solve a fundamental issue with understanding how people learn language and understand grammar, linguistics and NLP researchers often use dependency trees to visualize grammatical relationships between words in a sentence. However, most diagrams are static and offer little context about where the relationships are coming from, or how you can see it through normal text.
I took this problem and created a small prototype in Principle that smoothly transitions a normal sentence into a dependency tree that allows you to explore deeper and focus on individual words. I will soon be implementing this with D3.js.
Perhaps my favorite part about this was the button I created that transitions between "Treeify" and "Textify" because it visually embodies the tranistion between visualization state, something that I consider very important in button and interaction design.
How do you "rate" a sentence?
First, we have to look at how the data structures were organized once a sentence was placed through the sentiment analysis algorithm:
From here, we went to the drawing board and took this general strucutre to see how we could map these values into a digestable visualization.
We realized that the sentiment score of each sentence could be represented as a vector.
By multiplying the sentiment score by 255, we could end up producing a corresponding RGB value.
This would now produce the following mappings:
Very negative is very red.
Very neutral is very green.
Very positive is very blue.
Then, using the different weights of each word, we can translate that into either opacity or relative scale. All of this put together gives us a dynamic, adaptable visualization system that produces some interesting results:
In the near future, we will be conduting several user studies to explore the effectiveness of different visualization techniques on users' information comprehension.
The nature of my visualization algorithm allows it to produce some pretty interesting pieces of artwork. In particular, analyzing speeches without their words allows you to visualize the emotional polarization of large pieces of text in a beautiful, intuitive way.