A reading interface that highlights where you look
For our final project in Multimodal Learning Analytics, we created a study to investigate if highlighting text where you read can improve how much you remember - more formally, if adaptive reading interfaces can improve information retention. I built and designed a custom iOS application to track eye movement and dynamically adjust the appearance of text on the screen.
How could changing the way we read text on our devices help us remember more and comprehend more deeply?
I think a lot of the answers to these questions lie in blending basic graphic design pricniples with adaptive interfaces. Specifically, my own problems with focus and reading often have to do with jumping around the page and losing track of where I'm reading. I think simply working with "adaptive contrast" can go a long way.
I got a lot of inspiration from IA Writer, which is an amazing text editor designed to keep you focused and relaxed. It has a "focus mode" that allows you to highlight the current sentence or paragraph. I've always looked at IA Writer as inspiration for incredibly simple but useful interfaces.
I worked with Diana Feng, Jackie Kim, and Angelica Reilly. I thought of the project idea and built and designed the iOS app. All of us designed the study together, recruited participants, and conducted the study. Angelica created the study protocol and Diana cleaned the data and did the analysis.
Looking for inspiration from iOS and other popular mobile interfaces, we first tried to blur (rather than desaturate) the text you aren't looking at on the screen.
After some initial testing, we realized that the blur was way too distracting and opted for changing opacity. Unlike IA Writer however, we took a slightly different approach and slightly gradated the difference in opacity across the text instead of providing extreme contrast.
There was a lot of black magic mathematical trickery that went into this app. And by trickery I mean that Apple's current APIs for eye and face tracking don't allow you to track eye position on the screen, so we had to write this from scratch in order for the experiment to work. We created a rudimentary algorithm to estimate the eye gaze based on the positions of both pupils as recorded by the infrared sensor. Although the algorithm was accurate under certain scenarios, the exact tracking of the eye gaze movement on the x-axis was often inconsistent. Overall movement on the y-axis was relatively consistent and did not interfere with the dynamic highlighting.
We were curious about two main questions during the prototyping process:
How does transition speed between areas on the screen affect focus?
How much of the surrounding text should we highlight?
Both these questions could be studies by themselves- but we focused on seeing how much information retention would be affected by our current design choices for the prototype. The end result ended up being pretty smooth and definiely accomplished what we were imagining. We got this working after testing a few other prototypes that had different variations of blurring, opacity, and animation speeds.
User testing and feedback were crucial to our early prototyping so we needed to collect as much data from participants as quickly as possible. We used a mixed factorial design is adopted in order to decrease the number of participants needed for the study, essentially bootstrapping our data collection.
Participants were asked to use Gradia to read two passages on two different topics in neuroscience. One passage highlighted the paragraph at which the subject was looking, and the other passage did not highlight any part of the text. Immediately before and after each reading session, participants were presented with a short set of questions intended to measure their learning gain and detail retention after each session.
The text, interface change, and sequence of the two are counterbalanced to prevent order effects. At the end of the study, participants were asked to voluntarily provide information on their past exposure to the reading materials and technology in general. Gaze data were collected during the reading sessions, and physiological arousal data were gathered throughout the study.
Each session ran for approximately 25 minutes and featured one participant working independently. Participants in Group A were to read a text with highlighting first, and an unaltered text second, while Group B read an unaltered text first and a highlighted text second. Each participant was given a short pre-test with a duration of two minutes to complete.
Afterwards, we briefly explained the eye-tracking program and informally calibrated the device with the participants to ensure that it was working and comfortable to use. Participants were then given 90 seconds to read a short text that required no scrolling to view in its entirety. When the 90 seconds were up, participants were given another two minutes to complete post-test questions.
Before repeating the process with the second condition, participants were given a 30 second break to look around the room and rest their eyes. At the conclusion of the study, participants were thanked for their time and informally interviewed about their experience reading with the adaptable highlighting feature.
Data collected and materials
One Empatica E4 wristband was used to record heart rate (HR), blood volume pulse (BVP), skin temperature, electrodermal activity (EDA) and 3-axis acceleration. During the study, at the beginning and end of each activity, the participants were instructed to tap the Empatica using the button on the wristband. The data were aggregated after the study. HR and EDA data were used as measurements of arousal during data analysis.
The reading passages are adapted from a college-level neuroscience textbook and are of similar length (three paragraphs each with words count per paragraph at 73.0 ± 6.6 and 75.3 ± 2.1 for the two passages respectively, p = 0.589). The passage was on the topic of electrical conduction in neurons, and the other passage was on a developmental disorder called Williams syndrome. The passages are fitted to the phone screen so that there is no need to scroll while reading.
The same set of three questions unique to each passage was given to the participants immediately before and after the reading sessions. In order to minimize the impact of guessing on the scores, each of the two sets of questions contained two short answer questions and one multiple choice question with one correct answer. These sets of questions were designed to measure the main concepts covered in the passages, and the score gain after the reading sessions were thus treated as and referred to as learning gains in this study.
If you are curious about the scientific paper we wrote discussing all of the results in detail, you can see it here.
After conducting a 1-way ANOVA analysis, there actually was no staistically significant difference in reading comprenshion and learning scores between participants who had the adaptive interface and those who had the standard interface.
The x and y coordinates were recorded as raw outputs from the mobile application. We analyzed everything in Python to calculate gaze movement relative to the previous gaze location by taking the distance between points. In order to cluster, we combined all the data into a master CSV file.
Since a value of 0 in the eye blinking parameters indicates failure in obtaining blinking data, the data entries in which either of the eye blinking parameters is 0 is discarded in the clustering algorithm. We used the X-means algorithm to cluster the normalized data.
In the graph below, you can see that between the two primary clusters of gaze movement, the participants with and without the adaptive interface had roughly equal distributions of gaze movement.
Improvements to be made
The results of this experiment were admittedly lackluster. There were several flaws in the experiment that can still be addressed:
The reading interface
Accuracy of the interface was untested
Limitations in estimating exact eye coordinates
Because we had such a small amount of time to develop the interface prototype and create the rest of the study, we had to rely on rough metrics to determine the accuracy of the custom eye tracking algorithm. In the future we are thinking of creating a machine learning algorithm that learns how to predict eye gaze movement without solely relying on numerical estimations.
Snowball sampling (N=8)
Different subject preferences
Varying level of English
We would improve the sampling by creating a randomized controlled trial (RCT)- because we needed to manuallly recruit subjects, we snowball sampled to get as many people as quickly as possible to test the interface. In the future, we will also control for more variables in the study such as subject preference,native language, whether subjects wear glasses. These will help us make much more significant conclusions about the application.
Two tests of recall
90 second fixed reading time
Our study only had two tests of recall- in the future, we will include a more diverse set of comprehension evaluation methods ot account for differences in learning style/preference. The subject of both topics were also limited to neuroscience. Ideally with hundreds or thousands of subjects, we could also test for how personalizing content affects the effectiveness of the interface. Lastly, the 90 second fixed reading time may have put our participants at a disadvantage due to time pressure, and our current hypothesis is that perhaps an adaptive interface can aid more with focus over a longer reading period than short term reading comprehension.
Overall, we were super happy with the progress we made on the prototype and evaluating it rigorously with experiments. We see adaptive interfaces like this as the future of information consumption and are excited to continue working on similar projects and efforts.