Looking into the Black Box: Using Gaze and ... - ACS Publications

Apr 19, 2019 - strategies (e.g., via eye tracking). One physiological measure of cognitive load that has been well-documented in psychology literature...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/jchemeduc

Cite This: J. Chem. Educ. XXXX, XXX, XXX−XXX

Looking into the Black Box: Using Gaze and Pupillometric Data to Probe How Cognitive Load Changes with Mental Tasks Jessica M. Karch,* Josibel C. García Valles, and Hannah Sevian Department of Chemistry, University of Massachusetts Boston, Boston, Massachusetts 02125, United States

Downloaded via UNIV OF LOUISIANA AT LAFAYETTE on April 20, 2019 at 16:08:51 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: When characterizing students’ item-solving strategies, methods such as interviews and think-aloud protocols are often used. However, these measures provide limited information about sub- or preconscious signals and cognitive processes that also affect students’ item-solving strategies and abilities. A growing number of researchers in chemical education research have begun to address this gap by using physiological measurements to assess cognitive load (e.g., heart rate and EEG) and to look at item-solving strategies (e.g., via eye tracking). One physiological measure of cognitive load that has been well-documented in psychology literature is pupil dilation. In this study, two streams of eye-tracking data (gaze and pupillometric data) were combined to reveal information about what mental tasks general chemistry students were engaged in as they answered Chemical Concepts Inventory (CCI) questions (gaze stream) and how those mental tasks elicited changing levels of cognitive load (pupillometric stream). We found that, for complex multiple-choice tasks, pupil dilation fluctuated throughout the course of solving the item. For a more straightforward true/false task, there was a marked difference in pupil signal between participants who correctly answered the question and those who incorrectly answered it. Those who correctly answered the question had linearly increasing pupillary signals, whereas those who incorrectly answered had pupil signals that more closely resembled those observed during the multiple-choice tasks. Interpretations of these differences are supported using retrospective interviews and previously published literature about CCI items. KEYWORDS: First-Year Undergraduate/General, Chemical Education Research, Problem Solving/Decision Making, Learning Theories, Reactions FEATURE: Chemical Education Research



INTRODUCTION When presented with multiple pieces of information to solve a chemistry problem, students must pick and choose which pieces are relevant and productive for solving the problem. When the information is presented in multiple forms of representations, such as molecular icons, chemical reactions, and equations, students must also be able to translate across different types of information.1−3 However, how students do this and how they construct their mental models of a problem remains a “black box”. Multiple experimental protocols exist to illuminate this black box, including think-aloud and retrospective interviews. Chemistry education researchers are increasingly finding eye tracking to be a powerful approach to investigating the black box in chemistry problem solving.4−7 These methods provide researchers insight into what input factors influence the performance of students on problems. However, there is often not enough information to fully understand what happens in between reading the problem and producing a response. Typically, students are asked to describe their problem-solving process, through cognitive or thinkaloud interviews. This requires students to verbalize, which may change what they are thinking and fosters an interaction © XXXX American Chemical Society and Division of Chemical Education, Inc.

between the interviewer and the student that may change the signal.8,9 These interview techniques are very useful for uncovering conscious thought processes and allowing researchers to deeply probe students’ own conceptions of what they know; however, these techniques also lose any preor subconscious signals, such as processing intensity, that may be useful for understanding student cognition. Psychophysiological responses can be measured to fill in this gap. One reliable reporter variable of changes in cognitive intensity is pupil dilation.10 Pupillometry

Because pupils change in response to both cognitive and environmental factors,11 pupillometry (the measurement of change in pupil diameter) has been used in controlled environmental conditions to investigate emotional12,13 and cognitive14,15 processes. Just and Carpenter describe pupil dilation as a measure of the “intensity of... thought” and Received: January 6, 2019 Revised: April 5, 2019

A

DOI: 10.1021/acs.jchemed.9b00014 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

effective way to design an instructional task is to reduce the extrinsic load, manage the intrinsic load, and maximize the learning gains associated with germane cognitive load.34 Although these types of cognitive load cannot easily be distinguished from each other through the use of psychophysiological reporter variables such as pupil dilation,35 altering these three variables can be used to determine when and how cognitive load occurs. While a precise distinction among types of cognitive load has been a source of some criticism,36 the underlying principle that working memory is limited is generally accepted.35−37 Cook used this foundation to consider how individual differences, such as prior knowledge and expertise, interact with cognitive architecture to affect how students draw upon visual microscopic and macroscopic representations.38 There are several physiological measures that can and have been used to investigate cognitive load in education research, including heart rate,39,40 electroencephalography,41,42 and other eye-tracking measures such as fixation duration.4 Using pupillometric data as a measure of cognitive load confers the benefit that the data are already time-synced with a second data stream, gaze data, because they are collected by the same instrument. Gaze data has also been shown to be very useful for investigating students’ item-solving approaches.43−45 This study aims to coordinate these two streams of data, gaze paths and pupil dilation, to investigate how students’ looking reveals information about their item-solving strategies for conceptual chemistry multiple-choice questions, and how students’ pupil dilation reveals information about the cognitive intensity that their item-solving strategies elicit.

contend that pupils dilate more during more intense cognitive processing.10 The average pupil size is around 3 mm, although changes in light can induce dilations up to 9 mm.11,16 Pupillary changes moderated by attentional and emotional effort, on the other hand, result in much lower frequency and lower magnitude dilations of up to 0.5 mm.11,17 In light-controlled environments, these dilations can be distinguished from those that result from luminescence changes.18,19 This is due in part to the fact that these two sources of pupillary change have different physiological pathways: changes due to light and the nearness of visual stimuli are controlled by constrictor muscles, whereas dilations from cognitive effort are moderated by the central nervous system and are controlled by dilator muscles.11 Although the precise biological mechanism that links dilator muscles and cognitive effort is unclear, psychologists have shown a correlation between the activation of the locus coeruleus, the part of the brain activated by stress and which plays a role in retrieving memories;20 secretion of the neurotransmitter norepinephrine, produced in the locus coeruleus; and pupil dilations.11,16,21,22 Measuring changes in pupil dilation has been used to investigate higher-order cognitive processing in a variety of tasks, including the emotions evoked in food preference,23 spontaneous thought while reading,24 deception,25 and mental workload during a driving simulation.26 Pupillometry has also been used to study learning.27−31 Szulewski and collaborators found that medical students (“novices”) experienced larger pupil dilations compared to medical residents (“experts”) when answering conceptual multiple-choice questions.27 In a followup review paper, the authors recommended using pupillometry to study the development of expertise.28 Foroughi and collaborators found that decreasing pupil dilation during the exploration of a virtual space evidenced within-task learning.29 In chemistry education, Petersen and collaborators combined gaze tracking and pupillometry to try to predict student success as students used an online learning environment, ChemTutor. They found that the pupil signals were only partially predictive of student success and conjectured that their measurements may not have been fine-grained enough.30 One solution for this grain size problem may be tracking temporal information,18,19,32 rather than taking a global average.

Research Questions

Through the lens of Cognitive Load Theory, pupil dilation is a promising reporter variable that can provide insights to illuminate the black of box of mental processing that occurs between reading a problem and producing a response. Pupillometry, however, carries complications when tasks are complex and occur over time frames longer than a single mental operation. After reviewing how we addressed these complications based on work we have reported elsewhere,31 we show how we combine pupillometric and gaze data to examine how students solve chemistry multiple-choice questions of varying complexity and difficulty. Two research questions guide our study:

Cognitive Load Theory

Cognitive Load Theory (CLT) is a framework developed to describe the architecture of human cognition, specifically through conceptualizing the relationship between long-term information storage and working memory. Built around information processing models from cognitive science, CLT was developed as a framework to guide instructional design based on an evolutionary biological model of how human cognition is organized. As an instructional design theory, CLT operationalizes the demand on working memory as “cognitive load”.33 Knowing about cognitive load is important to educational researchers because the capacity for working memory is limited. When a student experiences cognitive overload, the capacity for cognitive processing is exhausted, which may hinder learning. There are several sources and types of cognitive load: load that results from the design of a stimulus (extrinsic), load from the difficulty of a task for the person engaging with a stimulus (intrinsic), and load associated with the processes involved with learning (germane). The instructional design theory based on CLT suggests that the most

(1) How can gaze data and pupil measures be combined to reveal student item-solving strategies? (2) How does the cognitive load experienced during item solving change with the different mental tasks students engage in?



METHODS

Setting and Participants

This study included participants enrolled in General Chemistry at a medium-sized, nontraditional university in the Northeastern United States. Participants were volunteers and were offered a choice of a $10 gift card or five extra credit points in their General Chemistry course. Data were collected prior to the beginning of first semester General Chemistry. In total, 22 individuals participated in the study, and of those 22, 12 individuals were selected for this analysis on the basis of the quality of baseline acclimation data and a sample percent threshold of 70%. That is, participants were excluded if there B

DOI: 10.1021/acs.jchemed.9b00014 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

were colored a dark gray to match the color of the wall behind the eye-tracking apparatus to minimize dilations from luminescence difference. After the eye-tracking tasks were completed, retrospective think-aloud interviews about itemsolving strategies were conducted on three questions. The interviews were used to triangulate interpretations of the mixed-methods analysis post hoc. Four of the questions presented to participants were selected for a mixed-methods analysis.31 Two questions (the most and least difficult, questions 5 and 7 in the original CCI) were selected on the basis of their percent correct, which was considered as a proxy for difficulty (i.e., a lower percent correct correlated to a more difficult question and vice versa).48 Two additional questions (CCI questions 6 and 9) were selected for analysis on the basis of their visual structure (see Figure 1 for

were missing values during the baseline acclimation phase, due to, e.g., blinks, as the pupillometric analysis depended on reliable baseline data to calculate dilations. Trials were also excluded if less than 70% of the data points recorded by the eye tracker were valid; trials may fall under this threshold for a variety of reasons, including eye shape, whether they wore glasses, or whether they looked away from the trackable area.46 This trial selection was done a priori as the first step of data preprocessing. Participants were assigned code numbers between 1 and 22. Chemical Concepts Inventory

Participants were tasked with answering eight questions from the Chemical Concepts Inventory (CCI).47 These 8 questions were the first 9 in the instrument, excluding question 8 (a follow-up to question 7; the original CCI item 9 will be referred to as question 8 in this paper). The CCI is a 22 item instrument designed to measure first-semester general chemistry students’ alternative conceptions about traditional general chemistry topics, such as conservation of atoms and phase changes. The CCI has been demonstrated to have good test−retest reproducibility.48 The first eight questions of the CCI were presented to participants in a randomized order to mitigate a systematic item order effect.49 These questions represented a range of validated difficulties and question type. Of these eight questions, six of the questions presented had 5 multiple-choice answer choices, one had 4 answer choices, and one was a true/false question.47 Eye-Tracking Apparatus

Data were collected using a Tobii X2-60 eye tracker with Tobii Pro Studio 3.2.3. The eye tracker was operated via a Dell Precision M6800, with a 17.3 in. screen, 1600 × 900 pixel resolution, and Windows 7 operating system. The Tobii X2-60 has a frequency of 60 ± 0.1 Hz. The Tobii system output estimates pupil size by applying a corrective algorithm to the diameter of the pupil image on the screen.50 The Tobii I-VT fixation filter was used.

Figure 1. CCI question 6, one of the questions selected for the mixedmethods analysis. Each of the multiple choice questions selected for analysis had a symbolic element (the diagram), a line of description (“questionA”), and a question prompt (“questionB”), as well as 5 multiple-choice answers. The stimulus background is colored dark gray to match the color of the physical background to reduce dilation induced by luminescence change. The AOIs are distinguished with colored boxes. The AOI labels used by the eye tracker are overlaid in bold on the corresponding AOI box. Adapted with permission from ref 47. Copyright 2002 American Chemical Society.

Method Validation

To ensure the validity of the instrument to collect pupillometric data, the results of previously published vigilance51 and memorization15,51,52 tasks were replicated prior to participants answering the CCI questions. These tasks consisted of participants looking for errors in a sequence of digits (vigilance task) and repeating back increasingly long sequences of digits (memorization task). As the tasks increased in difficulty, a corresponding increase in the pupillary response was measured, until a plateau was reached. These tasks have been well-documented to correlate increasing cognitive load with pupil dilation, so the successful replication suggested that the experimental setup was appropriate for collecting pupillometric data for more complex tasks.

an exemplar). These questions were similar in structure to the most difficult question, in that they contained nontextual symbolic elements, a line of description describing the symbolic element, and a question below the symbolic element and above the answer choices. The least difficult question of the four (question 7) was a true/false question. Data Processing

All quantitative data analysis and processing were done in RStudio version 1.0.143.53 Several R packages were used to aid analysis.54−57 First, the pupil diameters of the right and left eyes were averaged together to give one value. When only one eye could be detected by the eye tracker, that value was taken as the diameter. A baseline value was calculated for each trial by averaging the pupillary diameters of the last 400 ms of the acclimation period, and the averaged pupil diameters were transformed into dilations by subtracting the baseline value (see Supporting Information). Blinks were identified by their rapidly decreasing size,58 very brief duration, and characteristic frequency (around 17 blinks/min).59 Blinks were removed by

Data Collection

Data collection consisted of two stages: a baseline acclimation period and a CCI task period. To establish a baseline pupil diameter, each CCI question was preceded by a 2 s acclimation period, during which participants were asked to focus on an “X” in the middle of the screen. Participants were able to advance to the next question at their own pace after recording their answers by pressing the letter key associated with their answer choice. The interviewer (J.C.G.V.) determined whether the calibration was sufficient to proceed with the experiment. All of the stimuli (CCI questions and the neutral “X” stimulus) C

DOI: 10.1021/acs.jchemed.9b00014 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Article

alpha value of 0.617 (calculated with the R package ′irr′66). The raters discussed how to refine the codes to reach consensus. After this process, the first author revised the codebook to be consistent with the results of the discussion. A second round of IRR with the revised codebook (see Supporting Information) and the two external raters coding 4 different cases each yielded a Krippendorff’s alpha value of 0.912 after discussion (an additional 16.7% of the data).

first calculating a velocity (v) profile for each point using eq 1, where d1 and d2 are pupil diameters at times t1 and t2, and then removing points on the basis of the medium absolute deviation (see refs 60 and 61 for more details, and Supporting Information for R code).60,61 v=

d1 − d 2 t1 − t 2

(1)

Multiple Case Analysis Approach

To align pupil measurements with gaze data, areas of interest (AOIs) were defined in the questions. The AOIs were determined a priori by identifying major visual aspects of the problem. For example, for CCI Q6, the defined AOIs were the description of the diagram (“questionA”), the diagram, the question (“questionB”), and the answers (see Figure 1). The answer choices were grouped as one AOI, because the purpose of this study was not to investigate how students navigated among different answer choices, but rather to identify trends associated with larger grain-sized tasks.62

After epochs were assigned, an embedded multiple case analysis approach67 was taken to interpret the results. In embedded multiple case studies, subunits are identified in each case, such that comparisons can be made within each case and across cases.68 In this study, each participant was identified as a case, and each item was identified as a subunit. This facilitated an assessment of the epochs approach’s power to reveal idiosyncratic trends in an individual’s item-solving (RQ1), while the cross-case analysis revealed more general information about cognitive load (RQ2). As such, trends in epochs and pupillary dilation patterns were examined in two dimensions: patterns within all items a single participant solved (within-case analysis), and patterns across all of the solutions to each item (cross-case analysis). Within-participant patterns were analyzed to investigate whether participants had idiosyncratic approaches to conceptual chemistry questions (RQ1). Cross-participant and within-item patterns were analyzed to see if certain items elicited common cognitive load responses (RQ1 and RQ2). Across-participant and across-item patterns were analyzed to look at the broad trends in cognitive load responses to CCI items (RQ2) (see Figure 2 for a schematic of this analytic approach). To facilitate the multiple case approach, an analytic framework from decision-making was used to compare across cases more rigorously. This framework focuses on two important phases in multiple-choice item solving, in which

Epochs Coding

A qualitative analysis was performed to descriptively capture processes participants may have been engaged in while they answered the multiple-choice items.31 Gaze videos were replayed in the qualitative analysis software NVivo to identify distinct item-solving tasks. These tasks were named “epochs”, following the name for short tasks consisting of several gaze events that can be used in fixation-aligned pupillary response averaging.51,63 Although epochs in fixation-aligned pupillary response averaging are identified by following similar scanpaths, epochs here were identified through qualitative analysis. These epochs consist of fixations within a single AOI or a sequence of AOIs. An epoch concluded when the participant’s attention shifted to a different task. This was operationalized by a participant no longer returning to a particular visual element on which he or she was previously focused; for example, if the participant read a line of text and, while reading that line of text, looked at the visual media element and then returned to reading the text, this was considered one epoch (“reads the description while looking at the diagram”). This epoch would also be assigned if the participant returned to the diagram several times while reading the line of text. However, if the participant read the line of text completely and then shif ted to looking at the diagram, this would be considered two epochs (“reads the description” and “looks at the diagram”). The intent behind this distinction is to capture different mental tasks which may have different purposes. In the first epoch (“reads the description with glance at the diagram”) the participant may have the visual representation in mind while processing the text, whereas in the second the participant does not look at the representation until after reading the text completely. An “initiation” epoch was coded for all cases. “Initiation” was an artifact of the baseline acclimation slide that immediately preceded the CCI questions and generally consisted of a fixation of