Beyond Gaze Data: Pupillometry as an Additional Data Source in Eye

Sep 19, 2018 - Abstract: Eye tracking can be a robust and rich source of data for chemistry education ... What They See Impacts the Data You Get: Sele...
0 downloads 0 Views 1MB Size
Downloaded via UNIV OF SYDNEY on October 1, 2018 at 14:55:11 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

Chapter 8

Beyond Gaze Data: Pupillometry as an Additional Data Source in Eye Tracking Jessica M. Karch* Department of Chemistry, University of Massachusetts Boston, Boston, Massachusetts 02125, United States *E-mail: [email protected].

How students construct mental models, process information, and utilize their working memory during problem-solving processes remains a black box in discipline-based education research. Physiological responses can give researchers insight into how these processes occur. Researchers in other fields have shown one physiological response, pupillary dilation, is correlated to cognitive load, a measure of the amount of working memory used during mental tasks. However, there are many challenges to using pupillary dilation in education-based research, including accounting for changes in dilation over the course of a problem-solving episode (several minutes), appropriately controlling for the environment, and cleaning noisy data. In this chapter, some of these challenges will be discussed and an example of how this method has been applied in chemistry education will be presented.

© 2018 American Chemical Society VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

Introduction Although gaze tracking has been richly explored in this volume and others (1–5), eye tracking encompasses other complementary measures that can be used to study cognition, including blink rate (6–8) and pupillary dilation. These physiological measures can provide additional ways to quantitatively probe internal processes. Pupillometry, the measure of pupil size and reactivity, has been of particular interest to scientists throughout history. Charles Darwin found that pupil diameter changed in response to emotion (9) in both humans and animals, and the first documentation of pupil changes not induced by light may date back to 1765 (10–12). Using a camera and a millimeter ruler to measure changes in pupil diameter, Hess and Polt (13) demonstrated in 1964 that not only emotions, but also cognitive processing, affect pupil dilation. Since then, systematic pupillometric studies have been carried out in a wide variety of fields to investigate deception (14, 15), arousal (16), attention (17), and cognitive load (18, 19). The development of eye-tracking technology to measure pupillary responses has made it possible for psychologists and other researchers to collect pupillometric data to give insight into internal processes. Although researchers cannot know what a person is thinking directly, physiological measures can serve as a proxy for assessing these internal processes (20). Pupillary responses are of interest to physiopsychologists, as they are controlled by the same pathways that control neural processing and have been demonstrated to correlate in magnitude to cognitive processing (18, 21–24). Because pupillary responses correlate to several different cognitive processes, it can be difficult to make psychological inferences (20). However, pupillary responses can be used as an indicator that a cognitive process is occurring, and this can be validated with other measures, such as a comparison to a control group (25), or with qualitative data, such as an interview. Task-evoked pupillary responses (TEPR) have provided a valuable mechanism for investigating how pupils change in response to a stimulus. This approach has been employed in psychology since the late 19th century (26), although it was not well-known to English-speaking academics until its rediscovery by Hess and Polt in 1964 (13, 22). Researchers have found that the magnitude of a pupillary response correlates to task difficulty in multiplication (13), digit recall (18, 27, 28), and translation (29). Several researchers have also used pupillometry to study in-task learning and expertise. These studies have contributed some powerful findings, which can offer guidance to researchers about how pupillometry can be a useful tool in the chemistry education toolbox.

Pupillometry and Cognitive Load Pupillometry has been of particular interest to those interested in learning and working memory, because pupil dilation has long been demonstrated to correlate to the extent of cognitive load being experienced (13, 26, 27). Cognitive load theory (CLT) rests on the idea that working memory is finite (30). The theory of the underlying cognitive architecture is that a small amount of cognitive processing 146 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

capacity is allocated to working memory, where information is stored short-term, which interacts with long-term memory. Cognitive load is the extent to which the working memory is being utilized. If students experience cognitive overload, they can often no longer meaningfully participate in learning, because they have exhausted the cognitive resources available for that task. Sweller’s CLT (31) is a theory of instructional design that aims to optimize learning by reducing the extrinsic cognitive load (the demand on working memory that results from the design of a task; e.g., an assignment that uses multiple fonts and colors will require more effort on the student’s part to interpret than one that is cleanly designed) to maximize the amount of intrinsic cognitive load (the demand inherent to the difficulty of a task) a student can undergo. Studying cognitive load in education can be very fruitful for understanding learning and problem-solving (32), and for instructional design (33–35). Some researchers in chemistry education have started collecting online physiological metrics, such as heart rate (36, 37) and electroencephalogram (EEG) (see Chapter 7 of this book), to assess cognitive load during learning tasks; however, only a few have used pupillometric measures, and with limited success (38). Beyond chemistry education, however, pupillometry has been shown to provide powerful insights into learning and the development of expertise. Szulewski et al. (25) compared pupil responses of “expert” and “novice” emergency medical professionals. The expert group consisted of residents and recent medical school graduates, while the novice group consisted of first- and second-year medical students. The two groups were presented with clinical emergency medicine multiple-choice questions. The researchers found that the questions evoked a larger pupillary response in the novice students than in the expert groups. The groups were limited to looking at the questions for only 10 s at a time, as the researchers were interested in looking at the participants’ initial responses to the questions. As the novice students had less experience with the questions being asked, the larger pupillary response was interpreted as evidence that the novice students experienced more cognitive load when answering multiple-choice questions about emergency medicine. The researchers saw their study as a first step to exploring the development of expertise in emergency medicine clinicians, and two of the authors published a follow-up review study on the use of pupillometry in medical education in 2017 (39). Foroughi et al. (40) measured pupil responses to investigate within-task learning. Participants were required to engage in a task in which they had to orient themselves in a virtual space. The researchers found that over subsequent trials, as the participants learned how to engage in the task, pupillary responses decreased in magnitude as participants switched from using active working memory to automated information processing. This suggests that practicing and learning tasks reduces the amount of cognitive load experienced, which has implications for learning in all fields. Rather than comparing the pupillary responses of two groups with differing expertise, the researchers looked at how familiarity with a task developed over the course of repeated exposure. In chemistry education, Peterson and collaborators (38) combine gaze tracking and pupillometry to try to gain insight into how students learn while using ChemTutor, an online learning platform that focuses on visual representations. 147 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

The authors were interested in trying to characterize the in situ learning and cognition that happened while students explored the online platform. They aimed to use pupillometry and gaze tracking as predictors for learning gains based on a pre- and a posttest. Using an average pupil diameter per problem, the authors found that pupil diameter was only a moderate predictor of learning gains. However, they also noted that with a smaller temporal grain size, they may be able to gain more predictive power.

Conducting a Pupillometry Study Although studies conducted in other fields have yielded useful insights into learning, there are still interesting chemistry-specific questions to ask that can be explored through pupillometry. For example, how a practicing chemist approaches writing an organic mechanism differs from how an undergraduate organic student does (41). Pupillary response methods provide an opportunity to study these differences in cognitive processing in ways that have previously been unavailable. Before embarking on a study that employs pupillary response, it is valuable to consider the affordances and constraints of this technique for investigating questions of interest in chemistry education. What Questions Can a Chemistry Education Researcher Ask? Research has shown that pupillary responses can be correlated to higher processing load. If a stimulus is carefully designed and data are carefully taken, the task defined by the stimulus can be correlated to a change in processing load. One problem to study may be to compare the difference between the cognitive load response of a practicing chemist and a novice, as Szulewski and co-workers did in clinical medicine (25). Pupillometry can be used to investigate the extrinsic load induced by different forms of representation (31). In education research, there are also more complex questions about what causes cognitive load in situ in learning. The study by Peterson (38) illustrates the challenges associated with using pupillometry to investigate these processes using the technology currently available. Design Considerations Pupillometric studies have research design considerations that are not necessarily present in other eye-tracking studies. The first design consideration is technical—the choice of an eye tracker. As discussed in more detail in this volume (see Chapter 2), the choice of an appropriate eye tracker depends on the demands of the study. Many of the most common commercial eye trackers are capable of collecting information about pupil dilations, but they are not necessarily optimized for this. Holmqvist and Nyström (42) recommend using an eye tracker with a fixed distance between the eye and camera, such as an eye tracker with a chin rest, to reduce artifacts induced by the pixel size of the pupil in the eye-tracking software changing with distance. However, a remote 148 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

eye tracker can be suitable if its software accounts for changing head position and eye-to-eye tracker distance (43). Because eye trackers use pupil position and diameter to determine information about gaze, the pupil size may only be reported as an artifact of this computational need and therefore may not be a reliable metric. It is best to choose an eye tracker that has built this capability into its system deliberately; however, even when it is a deliberately collected metric, there are differences in how eye trackers process this information. Three major commercial eye trackers (Tobii, SMI, and EyeLink 1000) support collecting pupil diameter, but process raw pupil data very differently. Tobii’s software, TobiiPro, and SMI’s software attempt to estimate true pupil size in millimeters by using an algorithm based on the shape of the cornea and the measured distance between the eye tracker and the eye (44, 45). EyeLink, on the other hand, reports pupil size in arbitrary units based on the number of pixels in the image of the eye (46). Some eye trackers even report pupil area rather than pupil diameter (42). All three of these measures are internally consistent but not directly comparable with each other. If pupil size is a target experimental metric, it is recommended to reach out directly to the technical support of the eye-tracker company of choice to find out how they process and support the processing of pupillary responses. The choice of eye tracker leads to the first data processing consideration of how to report the pupillary response. Eye trackers’ pupillometric measures are internally consistent but may not reflect the true diameter of the pupil. If the eye tracker reports pupil size in terms of diameter rather than area, it makes sense to report a change in diameter rather than an absolute diameter. Even though the eye tracker may not accurately report a true pupil diameter, it is systematically biased, so calculating change should circumvent this technical issue. Furthermore, absolute diameters may not be physiologically meaningful for direct comparison of subjects, as there exists a range of corneal sizes (22). This pupillary change is commonly reported as percent dilation or the magnitude of dilation. The differences between these metrics will be discussed in more detail in the following section. A second major design consideration is the signal-to-noise ratio and controlling for the confounding factors that can cause fluctuations in pupil size. The major source of change in pupil size is the light reflex: changes in ambient light can cause constriction in high light conditions or dilation in low light. These pupillary responses can range from 1 mm to over 9 mm—the extent of change depends on the individual’s corneal size and the intensity of the light (22). The light reflex can be accounted for by designing a testing environment with controlled light and by designing a stimulus with constant luminescence (see Chapter 3 in this volume for a more detailed discussion of stimulus design). The stimulus can also be colored the same as the surrounding environment to better control the ambient conditions (19). For example, if the wall behind the computer screen is grey, the stimulus can also be colored grey, so that the participant looking away from the screen does not induce a pupillary change. The pupils also respond to visual changes via the accommodation response, in which the size of the pupil changes to focus on an object. This can be accounted for by having a fixed distance between the participant and the stimulus, such as by having an eye tracker with a head mount; however, the accommodation is not as major a 149 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

source of noise as the light reflex. Age can affect pupillary response, so pupillary responses in elderly participants may not be comparable to that of youth (28, 47). The pupil also fluctuates in size, providing a random source of error that cannot be well controlled for (known as hippus), although some suggest that this a result of spontaneous, otherwise unobserved thought (22). Hippus is not observed when the individual is engaged in mental activity, so it should not pose a large source of error for most studies (48). More complicated confounding factors are those that do not emerge from sources of noise that can be externally controlled for, but rather from internal processes. Emotions or anticipation can affect pupil diameter (22) as well as a wandering mind (49). Cognitive processes that result from sources other than the experimentally designed task are also a confounding factor when measuring pupil dilation (22, 42). Sound should be controlled—studies in audiology demonstrate that the cognitive effort involved in listening induces a pupillary response (50, 51). A concurrent think-aloud interview may not be appropriate for a pupillometric study; the confounding effort involved in speech may make the pupil signal from the task of interest more difficult to discern. The important thing to keep in mind is that the pupillary response is a reporter variable; its magnitude is proportional to the cognitive effort being used at any given moment, so diverting attention increases the noise-to-signal ratio. A third design consideration unique to pupillometry studies is the need for a baseline acclimation period. A baseline acclimation period can last anywhere from 400 ms (18) to 5 s (25). Because the parameter of interest is generally change in pupil size, or dilation, a baseline value must be measured to which the change can be compared. A baseline acclimation period can, for example, involve asking the participant to focus on a neutral stimulus, such as an x in the middle of the screen. If the experiment consists of multiple tasks, a baseline value should be collected prior to each trial. This is to mitigate any changes in pupil diameter over time due to the accommodation reflex or any fluctuations in ambient light.

Instrument Validation Memory and vigilance tests are a well-documented way to validate how cognitive load affects pupillary dilation. In these tests, participants are asked to repeat back an increasingly long sequence of digits (18, 21, 27). As the digit sequence increases in length, a corresponding increase in the magnitude of the pupillary response is observed, until participants are asked to repeat back a series that induces cognitive overload. At this point, most participants’ pupils constricted, indicating that they no longer had working memory available to store all of the digits. Replicating this validation study can be a useful exercise in processing pupillometric data, scrutinizing the effect of the experimental setup on the quality of the data collected, and confirming that cognitive load can be assessed by using pupillometry.

150 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

Data Processing Considerations The baseline can either be corrected via division (percent dilation = measured diameter / baseline diameter) or subtraction (dilation = measured diameter – baseline diameter). Mathôt and collaborators (52) compared the statistical power of these two types of baseline corrections. To do this, they compared sample (simulated) data and real (experimental) data that had been processed using divisive and subtractive baseline correction. They found that, for the simulated data, the subtractive baseline correction led to higher statistical power. For the experimental data, the two ways of processing yielded similar levels of statistical power due to variations in pupil size among participants. Mathôt and collaborators recommend using subtractive baseline correction over divisive baseline correction; and, if using divisive baseline correction, to use the grand mean across all of a participant’s trials rather than the mean baseline size for each trial. However, the choice is ultimately up to the researcher. Divisive baseline correction may seem more intuitive to some researchers, because it yields change in pupil diameter as a proportional or percentage change. This may be preferable, especially if the eye tracker used to collect data reports pupil size in arbitrary units, because a change of 10% seems more broadly comparable across different participants. On the other hand, the cognitive load effect is additive, so subtracting the basal pupil diameter may best isolate the signal of interest. The most important thing in performing either baseline correction is to ensure that the correction did not fundamentally change the structure of the data. A quick way to check this is to plot the corrected signal and the raw signal to see if the shape of the signal is different. Something to note is that baselines should be taken before each trial, as the baseline diameter of the pupil may change. Mathôt and collaborators also considered how baseline correction affects this “trial effect,” which they compared to a random effect in a linear mixed model. In fact, they found that subtractive baseline correction was exactly as statistically powerful as including the trial effect as a random effect, which makes sense because in both cases the signal of interest (change in dilation) is isolated, but unchanged. However, divisive baseline correction is more complicated. If the researcher divisively corrects each trial with the baseline of that trial, each percent dilation has been generated from a different denominator. This can make it hard to compare percent changes between trials. This is why, as mentioned above, Mathôt et al. recommend using the grand mean across trials for a participant. That way, each percent change for a participant can be meaningfully compared. Once the baseline has been corrected for, the results can either be reported as peak or mean dilation during a task of interest. Beatty and Lucero-Wagoner (22) suggest that each has its pros and cons. Mean pupil dilation is calculated by averaging together all of the dilations during a period of interest. The benefit of using mean pupil dilation is that it is less sensitive to random variations. The drawback is that different participants may have spent varying lengths of time on the task, which would result in the mean dilation for each participant being calculated from a nonstandard number of points. A second drawback is that if periods of interest overlap, it could be difficult to determine which series of points 151 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

are needed to calculate the mean dilation. This could result in calculating a mean dilation from data that actually resulted from a different stimulus. Peak dilation, the maximum dilation measured during a period of interest, is determined from a single data point. The benefit is that this measure is independent of the task length; the drawback is that it is much more sensitive to random variations and noise. Using peak and mean dilation are both useful, but they depend on the data that are collected and what the researchers are trying to do with their data. Statistics The appropriate statistics depend on experimental parameters and the effect of interest. To compare pupil diameters across participants and tasks, analysis of variance (ANOVA), a linear mixed model, and an unpaired t test can all be useful. An unpaired t test can be useful to compare the means of two populations, such as the expert and novice group (25), to see if the pupillary response between the two groups differed. An ANOVA can be used to compare the means of multiple test conditions, such as problems of different difficulties, to see whether and how different tasks evoke pupillary responses. Several eye-tracking studies (53, 54) may also implement a linear mixed effects model or a hierarchal linear model. Linear mixed effects models have a random effects parameter that accounts for the differences that result from individual differences or item order effects (55). Computational Tools There are tools developed specifically for processing pupillometry data. Sirois and Jackson developed a MATLAB routine that interfaces with Tobii eye trackers, SMART-T (http://smartt.wikidot.com/), in the field of infant cognition (56). Lemercier and collaborators (57) published a study that investigated pupillary dilation in response to taste. Their paper also served as a methodological overview and a guideline for using SMART-T in a study in the field of food sensory science. A MATLAB toolbox specifically for integration with Eyelink data has also been developed (58), which has been used in several peer-reviewed pupillometry studies (59, 60). There are also several studies that use the open source software R (40, 61, 62) that report their source code in the supplemental information.

An Example in Chemistry Education As part of a recent study (63), eye-tracking and pupillometric data were recorded while general chemistry students answered questions from the Chemical Concept Inventory (CCI) (64). The initial hypothesis was that the magnitude of the pupil signal should be dependent on the validated difficulty of the CCI items (65) (i.e., a more difficult question should induce a larger pupil dilation than an easier question). However, an ANOVA analysis indicated that this was not the case; the differences between the mean pupil dilation for each question showed that the difference in dilation was not statistically significant when the 152 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

participants were aggregated. However, the raw data showed local peaks and valleys, suggesting that over the course of the 30 s it took participants to answer a CCI question, the load on their working memory fluctuated. Four of the eight CCI questions presented to participants were selected for analysis on the basis of their relative difficulties and the range of representations used in the problem. One problem is presented here as an exemplar. Data Processing Data were collected using a 60 Hz Tobii X2-60 eye tracker. The raw data were exported into Excel as a .csv file (see Figure 1). Tobii has an internal algorithm that converts pixels to “true” pupil size, so it gives pupil size in millimeters. The software returns the pupil size of both the left and the right pupil (PupilLeft and PupilRight), and whether they were able to be recorded (ValidityLeft and ValidityRight: a ‘0’ indicates TRUE and a ‘4’ indicates FALSE). The .csv files were exported into the open source software RStudio (66) for processing. The PupilLeft and PupilRight values were averaged to give a single value for each point, and if only one value could be read (e.g., PupilRight), that value was taken as the value for that data point. A baseline value for each trial was calculated by averaging the last 400 ms of the baseline acclimation period, and subtractive baseline correction was performed to give values of dilation in millimeters. The dilation values were smoothed using a moving Hanning window using the R package zoo (67). Outliers and blinks were not removed for this analysis because a first attempt at doing so computationally using a Hampel filter did not appear to be successful. The code to process data was also written using the R packages plyr (68) and dplyr (69).

Figure 1. This screenshot shows what the pupil data from the Tobii software looks like when it has been exported into Excel. PupilLeft and PupilRight give the diameters of the left and right pupils, respectively. To figure out where the participants were looking, the parts of the problem that could be mentally stimulating (areas of interest, AOIs) first had to be defined. These AOIs were determined a priori by separating out visual aspects of the problem that might represent a different attentional focus or cognitive task, such as reading the question or studying a graphic (see Figure 2). Then, how a participant’s attention shifted was analyzed. The temporal dimension was broken 153 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

up into “segments,” where a segment was defined by the participant’s visual attention staying within a single AOI, either as a fixation or as a saccade. A segment concluded when the participant’s attention shifted to another area of the screen. This was done to more easily examine participants’ patterns of looking. Because our main assumption was that different tasks correspond to different units of cognitive load, we calculated an average dilation during each segment that served as a measure of cognitive load during that segment. After determining when someone looked (segment) and where someone looked (AOI), the mental tasks that the participant was involved in could start to be elucidated. A segment was defined for each time the participant looked at and away from an AOI to track how frequently participants’ attention shifted.

Figure 2. Identification of areas of interest (shown by the boxes) for one of the CCI questions. Adapted with permission from ref (64). Copyright 2002 American Chemical Society.

Segments were computationally determined in R. After baseline correcting and smoothing the data, the data frame was then subsetted so that only relevant information (time, pupil dilation, and the AOI tags) was included. The columns containing information about AOIs were tagged as 0 or 1 to indicate whether, at any given time point, the participant had been looking at an AOI. These 0’s and 1’s were revalued as NA (0) or the AOI name (1) and all of the AOI columns were coalesced into one column to give a new column named “focus.” This column was used to write code to determine “segments”—each time the value (AOI name) in the “focus” column changed, a new segment was defined. The segments increased in value by 1 with each new segment. What the data look like post-processing can be seen in Figure 3.

154 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

Figure 3. The shape of the data frame post-baseline processing and segment analysis. Only the first 17 rows are shown for the sake of space. At row 14, the “focus” tag changes from answers to equation, which caused the segment to change from 1 to 2. “Dilation” is the raw baseline corrected value in mm, and “smooth” is the value post-Hanning window smoothing. Time is given in ms. Epoch Analysis Tasks, or “epochs,” were determined by qualitatively interpreting the quantitative results of fixation patterns on AOIs. An epoch may consist of several segments, or several attentional shifts (19). The gaze videos were inputted in NVivo and played back to code for patterns of looking that were not clear from just AOI sequences. For example, replaying the videos was used to differentiate between whether the participant had fixated on a single word or whether they were reading the question within the AOI. Figure 4 (generated in ggplot2 (70, 71)) shows the change in pupillary response for one participant over the course of problem-solving. The points represent the pupillary dilations, which have been smoothed with a moving Hanning window. The shape represents which epoch was coded for at that particular time. When first reading and then rereading the question, Participant N8 experienced a local peak, which slowly decreased and then peaked again. In the replay video, Participant N8 reread the description several times, suggesting that she needed several rereads to process the information in the question. The second peak in this epoch suggests that 155 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

something about understanding the question required some particular mental effort. At around 17 s, she shifted to reading the question, and again there is a local dilatory peak which then tapers off. This can possibly be interpreted as processing the new information she is presented with. She then briefly looks at the equation before comparing the answer choices with the equation. There are small local peaks, but as she nears answering the question, her pupil dilation starts to increase, which suggests that settling on an answer required more working memory than reading the answer choices did. This participant did answer the question correctly (answer choice c).

Figure 4. Pupil dilation plot for participant N8 solving question 8. By analyzing all of the participant data in this way, the patterns of looking for each participant were identified. Each participant had different patterns in pupillary responses and epochs in problem-solving. This suggests that not only are participants’ problem-solving strategies idiosyncratic (evidenced by the variety of epoch patterns), but that the cognitive processing and the amount of working memory participants accessed for each task differed. Some participants exhibited several local peaks and valleys (Figure 2), whereas others had very few. Segments often occurred in conjunction with local peaks, suggesting that changing cognitive load may correlate to change in visual attention. It is possible that participants shifted their attention when they realized that the aspect they were accessing was no longer sufficient to move forward with the problem. That is, processing as a means of moving forward with the problem preceded shifts in attention. However, the reverse is also possible; that is, looking at a new representation on the screen induced cognitive load (the shift in attention preceded processing). What these data suggest is that the amount of working memory and mental resources utilized during a problem changed over the course of the problem. Challenges in Data Processing There were several challenges associated with cleaning and processing the data. In Figure 3, it is clear that there are still many outliers in the data. These could be the result of noise or of blinks. There is active debate in the literature as to the best way to remove blink artifacts (52, 72); however, the removal is 156 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

not algorithmically trivial. When blinks and outliers were removed using Hampel filtering, many visible outliers remained, while other points that seemed to be real data disappeared. There also appear to be several points per time point, even after smoothing the data using a moving Hanning window, due to the fact that the eye tracker collected 60 data points per second. There were also challenges in pre-processing. Several participants had high validity in their actual trials but blinked during the baseline acclimation period. These trials consequently had to be removed from the data set, because they could not be baseline corrected. Future Work This study is a first step toward correlating such changes in cognitive load with distinct problem-solving tasks in an attempt to better understand the cognitive demand of problem-solving on students. Further studies are possible now that this method has been established. For example, following an eye-tracking session, the eye-tracking video showing fixations and saccades could be played for a participant, and an interviewer could ask the participant to narrate the problem-solving process. This narration could later be compared to the epoch and cognitive load analysis, to better understand what difficulties participants may or may not be aware of consciously. Combining gaze tracking and carefully collected pupillometric data enables the identification of parts of a problem or a question that may induce more cognitive load or where students may have different patterns of problem-solving. This can be useful to support instructional design and teaching of new materials, such as new forms of representations.

Some Last Considerations Because pupil dilations are so sensitive to noise, and the cognitive load and processing signal are so small compared to the maximum potential extent of dilation (0.5 mm versus over 9 mm), pupillometric data should be intentionally collected and accounted for. Collecting in situ data can be difficult because of the numerous confounding factors that affect pupillary dilations. However, some studies, especially in usability research, do collect pupillometric data "in the wild," so it can be useful to explore these fields of research for methodological considerations if one is interested in collecting pupil data in more authentic learning environments and interfaces (73, 74). The signal-to-noise ratio in pupil dilations can be hard to parse, but it is also worth considering that there are many confounding signals: pupil dilation is sensitive to many cognitive processes and emotions, and the task has to be carefully designed to try to isolate the effect of interest. Some confounding factors can be mitigated by careful design of the experiment (for example, not including a concurrent think-aloud protocol or having distracting noises in the room), but some result from the fact of doing work with real people and, especially, with real students. Ultimately, despite careful experimental design, it is possible that some participants will be distracted while participating and that the resultant pupillary signal will result from their minds wandering. However, this is a 157 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

risk taken when trying to correlate any physiological signal to a psychological process. Supplementing physiological measurements with other metrics, such as qualitative data, can validate the interpretation of the results, and having sufficient participants take part in a given study can mitigate noisy or poor data. In chemistry education studies, the pathways of learning are often much more complex than can be explained by cognitive load theory alone. Prior knowledge (34), individual differences in learning (75), and sociocultural factors (76) may affect how a student solves a given problem or develops expertise. However, physiological factors like pupil dilations can be used to gain insight into the neural processing that occurs when a student solves a problem. In the above example, gaze data and pupil signals were combined to gain deeper insight into what could possibly be happening in the mind of the participant, and next steps involve validating interpretation of the eye-tracking data with post-interview survey information. This chapter aims to add another tool to the chemistry education research toolbox by providing examples and a guide for how to use pupillometric data for investigating cognitive load. Eye tracking allows us as researchers to combine where we look with how we look to gain insight into how student cognitive processing occurs.

Acknowledgments This work was supported by NSF grant DUE-1348722. The author also gratefully acknowledges her graduate research advisor, Hannah Sevian, for support and helpful feedback on this solo project; and Josibel García-Valles, who collected the data this analysis was performed on as part of her undergraduate thesis.

References 1.

2. 3.

4.

5.

Havanki, K. L.; VandenPlas, J. R. Eye Tracking Methodology for Chemistry Education Research. In Tools of Chemistry Education Research; Bunce, D. M., Cole, R. S., Eds.; ACS Symposium Series 1166; American Chemical Society: Washington, DC, 2014; pp 191–218. Cullipher, S.; Sevian, H. Atoms versus Bonds: How Students Look at Spectra. J. Chem. Educ. 2015, 92, 1996–2005. Williamson, V. M.; Hegarty, M.; Deslongchamps, G.; Williamson, K. C.; Shultz, M. J. Identifying Student Use of Ball-and-Stick Images versus Electrostatic Potential Map Images via Eye Tracking. J. Chem. Educ. 2013, 90, 159–164. Tang, H.; Kirk, J.; Pienta, N. J. Investigating the Effect of Complexity Factors in Stoichiometry Problems Using Logistic Regression and Eye Tracking. J. Chem. Educ. 2014, 91, 969–975. Lai, M.-L.; Tsai, M.-J.; Yang, F.-Y.; Hsu, C.-Y.; Liu, T.-C.; Lee, S. W.-Y.; Lee, M.-H.; Chiou, G.-L.; Liang, J.-C.; Tsai, C.-C. A Review of Using EyeTracking Technology in Exploring Learning from 2000 to 2012. Educ. Res. Rev. 2013, 10, 90–115. 158 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

6.

7.

8.

9. 10. 11. 12. 13. 14.

15.

16.

17. 18.

19.

20. 21. 22.

23.

Siegle, G. J.; Ichikawa, N.; Steinhauer, S. Blink Before and After You Think: Blinks Occur Prior to and Following Cognitive Load Indexed by Pupillary Responses. Psychophysiology 2008, 45, 679–687. Aarts, H.; Bijleveld, E.; Custers, R.; Dogge, M.; Deelder, M.; Schutter, D.; van Haren, N. E. M. Positive Priming and Intentional Binding: Eye-Blink Rate Predicts Reward Information Effects on the Sense of Agency. Soc. Neurosci. 2012, 7, 105–112. Chermahini, S. A.; Hommel, B. The (b)Link between Creativity and Dopamine: Spontaneous Eye Blink Rates Predict and Dissociate Divergent and Convergent Thinking. Cognition 2010, 115, 458–465. Darwin, C. The Expression of Emotion in Animals and Man; John Murray: London, 1872. Fontana, F. Dei Moti Dell’iride; Stamperia Jacopo Giusti: Lucca, 1765. Loewenfeld, I. E. Mechanisms of Reflex Dilatation of the Pupil. Doc. Ophthalmol. 1958, 12, 185–448. Goldinger, S. D.; Papesh, M. H. Pupil Dilation Reflects the Creation and Retrieval of Memories. Curr. Dir. Psychol. Sci. 2012, 21, 90–95. Hess, E. H.; Polt, J. M. Pupil Size in Relation to Mental Activity during Simple Problem-Solving. Science 1964, 143, 1190–1192. Lubow, R. E.; Fein, O. Pupillary Size in Response to a Visual Guilty Knowledge Test: New Technique for the Detection of Deception. J. Exp. Psychol. Appl. 1996, 2, 164–177. Dionisio, D. P.; Granholm, E.; Hillix, W. A.; Perrine, W. F. Differentiation of Deception Using Pupillary Responses as an Index of Cognitive Processing. Psychophysiology 2001, 38, 205–211. Stanners, R. F.; Coulter, M.; Sweet, A. W.; Murphy, P. The Pupillary Response as an Indicator of Arousal and Cognition. Motiv. Emot. 1979, 3, 319–340. Kang, O. E.; Huffer, K. E.; Wheatley, T. P. Pupil Dilation Dynamics Track Attention to High-Level Information. PLoS One 2014, 9, 1–6. Klingner, J.; Tversky, B.; Hanrahan, P. Effects of Visual and Verbal Presentation on Cognitive Load in Vigilance, Memory, and Arithmetic Tasks. Psychophysiology 2011, 48, 323–332. Klingner, J. Measuring Cognitive Load During Visual Tasks by Combining Pupillometry and Eye Tracking. Ph.D. Dissertation, Department of Computer Science, Stanford University, 2010. Cacioppo, J. T.; Tassinary, L. G. Inferring Psychological Significance from Physiological Signals. Am. Psychol. 1990, 45, 16–28. Beatty, J. Task-Evoked Pupillary Responses, Processing Load, and the Structure of Processing Resources. Psychol. Bull. 1982, 91, 276–292. Beatty, J.; Lucero-Wagoner, B. The Pupillary System. In Handbook of Psychophysiology, 2nd ed.; Cambridge University Press: New York, 2000; pp 142–162. Just, M. A.; Carpenter, P. A. The Intensity Dimension of Thought: Pupillometric Indices of Sentence Processing. Can. J. Exp. Psychol. Can. Psychol. Expérimentale 1993, 47, 310–339. 159 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

24. Goldwater, B. C. Psychological Significance of Pupillary Movements. Psychol. Bull. 1972, 77, 340–355. 25. Szulewski, A.; Roth, N.; Howes, D. The Use of Task-Evoked Pupillary Response as an Objective Measure of Cognitive Load in Novices and Trained Physicians: A New Tool for the Assessment of Expertise. Acad. Med. 2015, 90, 981–987. 26. Löwenstein, O. Experimentelle Beiträge Zur Lehre von Den Katatonischen Pupillenveränderungen. Eur. Neurol. 1920, 47, 194–215. 27. Kahneman, D.; Beatty, J. Pupil Diameter and Load on Memory. Science 1966, 154, 1583–1585. 28. Piquado, T.; Isaacowitz, D.; Wingfield, A. Pupillometry as a Measure of Cognitive Effort in Younger and Older Adults. Psychophysiology 2010, 47, 560–569. 29. Hyönä, J.; Tommola, J.; Alaja, A.-M. Pupil Dilation as a Measure of Processing Load in Simultaneous Interpretation and Other Language Tasks. Q. J. Exp. Psychol. Sect. A 1995, 48, 598–612. 30. De Jong, T. Cognitive Load Theory, Educational Research, and Instructional Design: Some Food for Thought. Instr. Sci. 2010, 38, 105–134. 31. Sweller, J.; Ayres, P.; Kalyuga, S. Cognitive Load Theory, Volume 1, Explorations in the Learning Sciences, Instructional Systems and Performance Technologies; Springer: New York, 2011. 32. Sweller, J. Cognitive Load During Problem Solving: Effects on Learning. Cogn. Sci. 1988, 12, 257–285. 33. Seery, M. K.; Donnelly, R. The Implementation of Pre-Lecture Resources to Reduce in-Class Cognitive Load: A Case Study for Higher Education Chemistry. Br. J. Educ. Technol. 2012, 43, 667–677. 34. Cook, M. P. Visual Representations in Science Education: The Influence of Prior Knowledge and Cognitive Load Theory on Instructional Design Principles. Sci. Educ. 2006, 90, 1073–1091. 35. Jarodzka, H.; Holmqvist, K.; Gruber, H. Eye Tracking in Educational Science: Theoretical Frameworks and Research Agendas. J. Eye Mov. Res. 2017, 10, 1–18. 36. Cranford, K. N.; Tiettmeyer, J. M.; Chuprinko, B. C.; Jordan, S.; Grove, N. P. Measuring Load on Working Memory: The Use of Heart Rate as a Means of Measuring Chemistry Students’ Cognitive Load. J. Chem. Educ. 2014, 91, 641–647. 37. Tiettmeyer, J. M.; Coleman, A. F.; Balok, R. S.; Gampp, T. W.; Duffy, P. L.; Mazzarone, K. M.; Grove, N. P. Unraveling the Complexities: An Investigation of the Factors That Induce Load in Chemistry Students Constructing Lewis Structures. J. Chem. Educ. 2017, 94, 282–288. 38. Peterson, J.; Pardos, Z.; Rau, M.; Swigart, A.; Gerber, C.; McKinsey, J. Understanding Student Success in Chemistry Using Gaze Tracking and Pupillometry. In International Conference on Artificial Intelligence in Education; Springer, 2015; pp 358–366. 39. Szulewski, A.; Kelton, D.; Howes, D. Pupillometry as a Tool to Study Expertise in Medicine. Frontline Learn. Res. 2017, 5, 55–65. 160 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

40. Foroughi, C. K.; Sibley, C.; Coyne, J. T. Pupil Size as a Measure of Within‐task Learning. Psychophysiology 2017, 54, 1436–1443. 41. Bhattacharyya, G.; Bodner, G. M. “ It Gets Me to the Product”: How Students Propose Organic Mechanisms. J. Chem. Educ. 2005, 82, 1402–1407. 42. Holmqvist, K.; Nyström, M.; Andersson, R.; Dewhurst, R.; Jarodzka, H.; Van de Weijer, J. Eye Tracking: A Comprehensive Guide to Methods and Measures; Oxford University Press: Oxford, 2011. 43. Klingner, J.; Kumar, R.; Hanrahan, P. Measuring the Task-Evoked Pupillary Response with a Remote Eye Tracker. In Proceedings of the 2008 Symposium on Eye Tracking Research & Applications; ETRA ’08; ACM: New York, NY, USA, 2008; pp 69–72. 44. Tobii Studio: User’s Manual, 3.4.5.; Tobii AB: Stockholm, Sweden, 2016. 45. Tsukahara, J. S.; Harrison, T. L.; Engle, R. W. The Relationship between Baseline Pupil Size and Intelligence. Cognit. Psychol. 2016, 91, 109–123. 46. EyeLink 1000 User Manual, 1.5.0.; SR Research Ltd.: Mississauga, Canada, 2005. 47. Van Gerven, P. W. M.; Paas, F.; Van Merriënboer, J. J. G.; Schmidt, H. G. Memory Load and the Cognitive Pupillary Response in Aging. Psychophysiology 2004, 41, 167–174. 48. Bouma, H.; Baghuis, L. C. J. Hippus of the Pupil: Periods of Slow Oscillations of Unknown Origin. Vision Res. 1971, 11, 1345–1351. 49. Franklin, M. S.; Broadway, J. M.; Mrazek, M. D.; Smallwood, J.; Schooler, J. W. Window to the Wandering Mind: Pupillometry of Spontaneous Thought While Reading. Q. J. Exp. Psychol. 2013, 66, 2289–2294. 50. Zekveld, A. A.; Kramer, S. E.; Festen, J. M. Pupil Response as an Indication of Effortful Listening: The Influence of Sentence Intelligibility. Ear Hear. 2010, 31, 480–490. 51. Koelewijn, T.; de Kluiver, H.; Shinn-Cunningham, B. G.; Zekveld, A. A.; Kramer, S. E. The Pupil Response Reveals Increased Listening Effort When It Is Difficult to Focus Attention. Hear. Res. 2015, 323, 81–90. 52. Mathôt, S.; Fabius, J.; Van Heusden, E.; Van der Stigchel, S. Safe and Sensible Preprocessing and Baseline Correction of Pupil-Size Data. Behav. Res. Methods 2018, 50, 94–106. 53. van Rijn, H.; Dalenberg, J. R.; Borst, J. P.; Sprenger, S. A. Pupil Dilation Co-Varies with Memory Strength of Individual Traces in a Delayed Response Paired-Associate Task. PLoS One 2012, 7, 1–8. 54. Zénon, A.; Sidibé, M.; Olivier, E. Pupil Size Variations Correlate with Physical Effort Perception. Front. Behav. Neurosci. 2014, 8, 1–8. 55. Baayen, R. H.; Davidson, D. J.; Bates, D. M. Mixed-Effects Modeling with Crossed Random Effects for Subjects and Items. J. Mem. Lang. 2008, 59, 390–412. 56. Jackson, I.; Sirois, S. Infant Cognition: Going Full Factorial with Pupil Dilation. Dev. Sci. 2009, 12, 670–679. 57. Lemercier, A.; Guillot, G.; Courcoux, P.; Garrel, C.; Baccino, T.; Schlich, P. Pupillometry of Taste: Methodological Guide–from Acquisition to Data Processing–and Toolbox for MATLAB. Quant. Methods Psychol. 2014, 10, 179–195. 161 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

58. Cornelissen, F. W.; Peters, E. M.; Palmer, J. The Eyelink Toolbox: Eye Tracking with MATLAB and the Psychophysics Toolbox. Behav. Res. Methods Instrum. Comput. 2002, 34, 613–617. 59. Preuschoff, K.; ‘t Hart, B. M.; Einhauser, W. Pupil Dilation Signals Surprise: Evidence for Noradrenaline’s Role in Decision Making. Front. Neurosci. 2011, 5, 1–12. 60. Einhauser, W.; Koch, C.; Carter, O. Pupil Dilation Betrays the Timing of Decisions. Front. Hum. Neurosci. 2010, 4, 1–9. 61. Wierda, S. M.; van Rijn, H.; Taatgen, N. A.; Martens, S. Pupil Dilation Deconvolution Reveals the Dynamics of Attention at High Temporal Resolution. Proc. Natl. Acad. Sci. 2012, 109, 8456–8460. 62. Irons, J. L.; Jeon, M.; Leber, A. B. Pre-Stimulus Pupil Dilation and the Preparatory Control of Attention. PloS One 2017, 12, 1–21. 63. Karch, J. M.; Garcia Valles, J.; Sevian, H. Investigating Cognitive Load via Fixation-Aligned Pupillary Responses, 253rd National Meeting of the American Chemical Society, San Francisco, CA, April 2–6, 2017; CHED 54. 64. Mulford, D. R.; Robinson, W. R. An Inventory for Alternate Conceptions among First-Semester General Chemistry Students. J. Chem. Educ. 2002, 79, 739–744. 65. Barbera, J. A Psychometric Analysis of the Chemical Concepts Inventory. J. Chem. Educ. 2013, 90, 546–553. 66. RStudio Team. RStudio: Integrated Development Environment for R, Version 1.1.453; RStudio, Inc.: Boston, MA, 2016. 67. Zeileis, A.; Grothendieck, G. zoo: S3 Infrastructure for Regular and Irregular Time Series. Journal of Statistical Software 2005, 14, 1–27. 68. Wickham, H. The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software 2011, 40, 1–29. 69. Wickham, H.; François, R.; Henry, L.; Müller, K. dplyr: A Grammar of Data Manipulation, R package version 0.7.5., 2018. 70. H. Wickham. ggplot2: Elegant Graphics for Data Analysis; Springer-Verlag: New York, 2009. 71. Arnold, J. B. ggthemes: Extra Themes, Scales and Geoms for ‘ggplot2’, R package version 3.5.0., 2018. 72. Hershman, R.; Henik, A.; Cohen, N. A Novel Blink Detection Method Based on Pupillometry Noise. Behav. Res. Methods 2018, 50, 107–114. 73. Schwalm, M.; Keinath, A.; Zimmer, H. D. Pupillometry as a Method for Measuring Mental Workload within a Simulated Driving Task. In Human Factors for Assistance and Automation; de Waard, D., Flemisch, F. O., Lorenz, B., Oberheid, H., Brookhuis, K. A., Eds.; Shaker Publishing: Maastricht, Netherlands, 2008; pp 1–13. 74. Pomplun, M.; Sunkara, S. Pupil Dilation as an Indicator of Cognitive Workload in Human-Computer Interaction. In Human-Centred Computing: Cognitive, Social, and Ergonomic Aspects: Proceedings of the 10th International Conference on Human-Computer Interactions, Crete, Greece, 2003; Harris, D., Duffy, V., Smith, M., Stephanidis, C., Eds.; Lawrence Erlbaum Associates, Inc.: Hillsdale, NJ, 2003; pp 542–546. 162 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.

75. Garrett, R. M. Problem-Solving in Science Education. Stud. Sci. Educ. 1986, 13, 70–95. 76. Atwater, M. M. Social Constructivism: Infusion into the Multicultural Science Education Research Agenda. J. Res. Sci. Teach. Off. J. Natl. Assoc. Res. Sci. Teach. 1996, 33, 821–837.

163 VandenPlas et al.; Eye Tracking for the Chemistry Education Researcher ACS Symposium Series; American Chemical Society: Washington, DC, 2018.