Big project

A reading interface that highlights where you look



I recently took a Multimodal Learning Analytics class with Bertrand Schneider (my advisor) at Harvard. The class focused on using multiple types of sensors to more holistically evaluate how people learn. Our final assignment was to create a study to investigate a learning task, and I was super curious if highlighting text where you read can improve how much you remember - more formally, if adaptive reading interfaces can improve information retention.


Why are screens so hard to read and focus on?

What causes eye strain?

How could changing the way we read text on our devices help us remember more and comprehend more deeply?

I think a lot of the answers to these questions lie in blending basic graphic design pricniples with adaptive interfaces. Specifically, my own problems with focus and reading often have to do with jumping around the page and losing track of where I'm reading. I think simply working with "adaptive contrast" can go a long way.

I got a lot of inspiration from IA Writer, which is an amazing text editor designed to keep you focused and relaxed. It has a "focus mode" that allows you to highlight the current sentence or paragraph. I have always looked at IA Writer as inspiration for incredibly simple but useful interfaces.



I worked with Diana Feng, Jackie Kim, and Angelica Reilly. I thought of the project idea and built and designed the iOS app. All of us designed the study together, recruited participants, and conducted the study. Angelica created the study protocol and Diana cleaned the data and did the analysis.


Looking for inspiration from iOS and other popular mobile interfaces, we first tried to blur (rather than desaturate) the text you aren't looking at on the screen.

After some initial testing, we realized that the blur was way too distracting and opted for changing opacity. Unlike IA Writer however, we took a slightly different approach and slightly gradated the difference in opacity across the text instead of providing extreme contrast. 

Coded prototype

There was a lot of black magic mathematical trickery that went into this app. And by trickery I mean that Apple's current APIs for eye and face tracking don't allow you to track eye position on the screen, so we had to write this from scratch in order for the experiment to work. We created a rudimentary algorithm to estimate the eye gaze based on the positions of both pupils as recorded by the infrared sensor. Although the algorithm was accurate under certain scenarios, the exact tracking of the eye gaze movement on the x-axis was often inconsistent. Overall movement on the y-axis was relatively consistent and did not interfere with the dynamic highlighting.

We were curious about two main questions during the prototyping process:

  1. How does transition speed between areas on the screen affect focus?
  2. How much of the surrounding text should we highlight?

Both these questions could be studies by themselves- but we focused on seeing how much information retention would be affected by our current design choices for the prototype. The end result ended up being pretty smooth and definiely accomplished what we were imagining. We got this working after testing a few other prototypes that had different variations of blurring, opacity, and animation speeds.

Study design

Experimental setup

User testing and feedback were crucial to our early prototyping so we needed to collect as much data from participants as quickly as possible. We used a mixed factorial design is adopted in order to decrease the number of participants needed for the study, essentially bootstrapping our data collection.

Participant conditions

Participants were asked to use Gradia to read two passages on two different topics in neuroscience. One passage highlighted the paragraph at which the subject was looking, and the other passage did not highlight any part of the text. Immediately before and after each reading session, participants were presented with a short set of questions intended to measure their learning gain and detail retention after each session.

The questions and grading rubric for each test condition.

The text, interface change, and sequence of the two are counterbalanced to prevent order effects. At the end of the study, participants were asked to voluntarily provide information on their past exposure to the reading materials and technology in general. Gaze data were collected  during the reading sessions, and physiological arousal data  were gathered throughout the study.

Basic flow chart of the study design.

Each session ran for approximately 25 minutes and featured one participant working independently. Participants in Group A were to read a text with highlighting first, and an unaltered text second, while Group B read an unaltered text first and a highlighted text second. Each participant was given a short pre-test with a duration of two minutes to complete.

Afterwards, we briefly explained the eye-tracking program and informally calibrated the device with the participants to ensure that it was working and comfortable to use. Participants were then given 90 seconds to read a short text that required no scrolling to view in its entirety. When the 90 seconds were up, participants were given another two minutes to complete post-test questions.

Before repeating the process with the second condition, participants were given a 30 second break to look around the room and rest their eyes. At the conclusion of the study, participants were thanked for their time and informally interviewed about their experience reading with the adaptable highlighting feature.

Basic UX flow during the study

Data collected and materials

Physiological measures

An Empatica wristband

One Empatica E4 wristband was used to record heart rate (HR), blood volume pulse (BVP), skin temperature, electrodermal activity (EDA) and 3-axis acceleration. During the study, at the beginning and end of each activity, the participants were instructed to tap the Empatica using the button on the wristband. The data were aggregated  after the study. HR and EDA data were used as measurements of arousal during data analysis.

Reading materials

The reading passages are adapted from a college-level neuroscience textbook and are of similar length (three paragraphs each with words count per paragraph at 73.0 ± 6.6 and 75.3 ± 2.1 for the two passages respectively, p = 0.589). The passage was on the topic of electrical conduction in neurons, and the other passage was on a developmental disorder called Williams syndrome. The passages are fitted to the phone screen so that there is no need to scroll while reading.

Reading outcomes

The same set of three questions unique to each passage was given to the participants immediately before and after the reading sessions. In order to minimize the impact of guessing on the scores, each of the two sets of questions contained two short answer questions and one multiple choice question with one correct answer. These sets of questions were designed to measure the main concepts covered in the passages, and the score gain after the reading sessions were thus treated as and referred to as learning gains in this study.



Test outcomes

Gaze tracking

The x and y coordinates were recorded as raw outputs from the mobile application. We analyzed everything in Python to calculate gaze movement relative to the previous gaze location by taking the distance between points. In order to cluster, we combined all the data into a master CSV file.

Since a value of 0 in the eye blinking parameters indicates failure in obtaining blinking data, the data entries in which either of the eye blinking parameters is 0 is discarded in the clustering algorithm. Subsequently, the eye blinking parameters and gaze movement of the remaining data entries were normalized using z-transformation. Given that we have a large dataset with all positive values without too many significant outliers, z-transformation should be sufficient in normalizing the data. Since there is no clear indication on the anticipated number of clusters in the data, we chose X-means to cluster the normalized data.


The results of this experiment were admittedly lackluster. There were several flaws in the experiment that can still be addressed:

Reading Interface

  • Accuracy of the interface untested  
  • Limitations in estimating exact eye coordinates

Participant sampling

  • Snowball sampling (N=8)
  • Different subject preferences
  • Varying level of English

Study design

  • Two tests of recall
  • Subject: neuroscience
  • 90 second fixed reading time