Emory NLP

Big project

Interactive data visualizations for Natural Language Processing

In collaboration with the Amazon MXNet team.

Brief

Problem

Machine learning is mostly a black box.

In particular, Natural Language Processing has become something most people just accept as a part of their lives. "Hey Siri", "Hey Alexa", and "Ok Google" have all become nearly household phrases, but understanding the data underneath their responses is usually left the experts. Being able to communicate the importance of these data is even more difficult.

Even though current data visualization methods exist, nothing more robust than (pretty much) the brat rapid annotation tool is currently in production.

Further, web apps like this try to visualize things like twitter sentiment but fail to provide clear context for the content. My work at Emory NLP contributed to making a dynamic and interactive visualization system for our lab to make sentiment analysis more communicable.

Process

Initial Prototyping

I began designing several interactions playing with hiding/showing data within a word before any actual sentiment analysis got involved. All animations were prototyped with vanilla JavaScript, CSS, or made in Flinto.

Instead of viewing word dependencies, parts of speech, and other meta data in a cluttered fashion all at once, I prototyped a depth-separated menu for every word that shows its dependencies, part of speech, and coreferences.

 

Instead of being confined to traditional 2D scrolling in order to analyze a long list of sentences, I created a 3D sentence scrolling concept that allows users to focus on any given sentence while still maintaining a context from its surrounding content.

 

Playing with language

What does the structure of grammar look like, and how can you explore it?

In order to solve a fundamental issue with understanding how people learn language and understand grammar, linguistics and NLP researchers often use dependency trees to visualize grammatical relationships between words in a sentence. However, most diagrams are static and offer little context about where the relationships are coming from, or how you can see it through normal text.

I took this problem and created a small prototype in Principle that smoothly transitions a normal sentence into a dependency tree that allows you to explore deeper and focus on individual words. I will soon be implementing this with D3.js.

 

Perhaps my favorite part about this was the button I created that transitions between "Treeify" and "Textify" because it visually embodies the tranistion between visualization state, something that I consider very important in button and interaction design.

Sentiment visualization

How do you "rate" a sentence?

First, we have to look at how the data structures were organized once a sentence was placed through the sentiment analysis algorithm:

From here, we went to the drawing board and took this general strucutre to see how we could map these values into a digestable visualization.

We realized that the sentiment score of each sentence could be represented as a vector.

By multiplying the sentiment score by 255, we could end up producing a corresponding RGB value.

This would now produce the following mappings:

Very negative is very red.
Very neutral is very green.
Very positive is very blue.

Then, using the different weights of each word, we can translate that into either opacity or relative scale. All of this put together gives us a dynamic, adaptable visualization system that produces some interesting results:

Solution

Press

AWS Collaborates with Emory University to Develop Cloud-Based NLP Research Platform Using Apache MXNet

Demo

I ended up creating a web based interface for this in React and Redux.

Visit demo.elit.cloud for a live demo :)

EDIT: This demo is no longer active :( 

Release

 

In the near future, we will be conduting several user studies to explore the effectiveness of different visualization techniques on users' information comprehension.

Artwork

The nature of my visualization algorithm allows it to produce some pretty interesting pieces of artwork. In particular, analyzing speeches without their words allows you to visualize the emotional polarization of large pieces of text in a beautiful, intuitive way.

Full of deep pruples and muddy greens, Sojourner Truth writes with extremely polarized language and asserts her opinions with fortitude.
Split between dark blues on the edges and bright reds in the middle, Lincoln reeled in his audience with a glimmer of hope, explained the issues that were tearing the nation apart, and closed with optimism.
Dr. King's speech is filled with a sense of hope for the future, marked by the overwhelming theme of dark blue, but scattered with relatively neutral greens, and occasionaly by extremely negative reds and ambiguous, polarized purples.