The goal of the present study was to investigate the attentional modulation on acoustic cue weightings in the perception of L2 phonemic contrast. One of the important cognitive factors in understanding speech sounds is the amount of attention given to speech. This is because environmental noise in everyday life and daily tasks may interfere constant phonetic percepts which vary listeners’ attention to speech signals (e.g., Francis & Nusbaum, 2009; Gordon et al., 1993; Mattys, 2004; Matty & Wiget, 2011). These lines of studies examined if and how individuals’ cognitive characteristics are related to their phonetic categorization, or investigated how listeners adjust their perceptual strategies under intentionally manipulated attentional demands. As one experimental paradigm, a dual-task provides distracting condition relative to non-distracting condition to auditory categorization by increasing task demands. In this paradigm, listeners are instructed to identify a phonetic category while performing another task such as solving arithmetic problems (e.g., Gordon et al., 1993) or letter recalls (e.g., Lee, 2014). Performing another task during a speech perception increases working memory load, distracting listeners from attending to auditory stimuli.
Previous studies of L1 speech perception showed that increased task demands affected listeners’ phonetic categorization in a way that the listeners re-set their perceptual cue weighting strategies (e.g., Francis & Nusbaum 2000; Gordon et al., 1993; Kong & Lee, 2018). Gordon et al. (1993) examined if and how relative importance of multiple acoustic cues are affected during phonetic categorization between two attentional environments, namely distracting and non-distracting conditions, specifically testing attentional modulation in the perception of the English voiced-voiceless stops. While the English stops are contrastive between voiced and voiceless ones primarily by Voice Onset Time (VOT: Lisker & Abramson, 1964), F0 at the onset of the following vowel is also one of acoustic correlates to the voicing distinction as a secondary cue; F0 is higher after the voiceless stop. Gordon et al. (1993) designed a distractor task where listeners were instructed to focus on the arithmetic calculation. In this condition, three of two-digit numbers were presented for calculation during stop voicing identification, which made English listeners pay less attention to the auditory stimuli. Their finding was that relative importance of the two acoustic cues differed by the attentional conditions. While listeners relied mainly on the primary VOT cue when attending solely on the auditory identification task in the non-distracting condition, the importance of the secondary F0 cue increased with a distractor of the arithmetic task. Gordon et al. (1993) argued that the secondary cue became more influential because it was not in strong competition with VOT, which played a role as the primary cue in that context (Gordon et al., 1993: 12).
The attentional modulation observed in Gordon et al. (1993) was attested in Kong & Lee (2018) for the Korean stop perception. Kong & Lee (2018) examined how Korean-speaking listeners weighted the multiple acoustic cues for a stop laryngeal contrast under the differential amount of attention to the perception. While the three-way laryngeal contrast of Korean stops [i.e., fortis (/p’, t’, k’/), lenis (/p, t, k/), aspirated (/ph, th, kh/) stops] uses VOT and F0, it is different from the English stops in that cue primacy between VOT and F0 varies depending on the pairs of the three stops. According to previous production studies, short-lag VOT primarily distinguished the fortis from the other two stops and lower F0 primarily differentiated the lenis from the other two. The cue primacy is less clear between the lenis and aspirated stops due to the sound change in progress where the importance of VOT has reduced, but that of F0 has increased (e.g., Cho et al., 2002; Kang, 2014; Kim et al., 2002; Lee et al., 2020). Employing the dual-task paradigm, Kong & Lee (2018) not only verified the attentional modulation of multiple cues, but also clarified the cue primacy between the lenis and aspirated stops by showing the reduced reliance on the primary VOT cue for the fortis-aspirated stop pair and for the lenis-aspirated stop pair.
More relevant to the current research, there is also experimental evidence that attention played a role in learners’ auditory processing of L2 speech sounds (e.g., Asano, 2017; Lee, 2014; Mora & Darcy, 2023; Mora & Mora-Plaza, 2019). Mora & Darcy (2023) examined how individual learners’ ability of attention control influences their phonological processing by testing Spanish L2 learners of English and English L2 learners of Spanish. Administering a speech-based distractor task (auditory judgement of L2 nonwords), Mora & Darcy (2023) showed that the learners with better attentional control were better able to differentiate target L2 vowels in production, proposing the attentional control as a factor explaining inter-learner variabilities in L2 phonological learning. Similarly, Asano (2018) examined how attentional demands affected the perception of L2 prosody by testing the perception of Japanese vowel length contrast with three listener groups of German L2 learners of Japanese, German non-learners, and native Japanese listeners. Attentional demands were added by lengthening the inter-stimulus-interval (i.e., non-distracting condition with 300 msec ISI vs. distracting condition with 2500 msec ISI). She found that the increased memory load decreased the L2 learners’ discrimination abilities, suggesting that L2 learners’ speech processing might be exacerbated in everyday situations due to numerous distracting factors.
Specifically in terms of the L2 stop perception, Lee (2014) examined how attention demands influenced English L2 learners’ perception of the three-way laryngeal contrast of the Korean stops using the dual-task paradigm. Her focus was on the cue weighting of VOT and F0 in the two attentional conditions with and without a letter recall task, revealing that there was attentional modulation of VOT for the identification of the lenis and aspirated stops. Interestingly, English L2 learners relied less on VOT with a distractor to identify the aspirated stop, but the pattern was reversed for the identification of the lenis stop as they relied more on VOT with a distractor. Lee (2014) speculated that English-speaking learners of Korean, although they were all beginners, might utilize F0 as a primary cue similar to native Korean speakers’ cue weighting pattern, and their sensitivity to VOT increased in the presence of a distractor.
While these previous studies consistently demonstrated that attention loads affect how listeners prioritize or re-prioritize acoustic cues during phonetic categorization, the role of a secondary cue with a cognitive load seems less clear in the literature. In Lee (2014), when the English-speaking learners of Korean exhibited increased and decreased reliance on VOT for the lenis and aspirated stops, respectively, F0 did not systematically interact with the attentional conditions for the two stops. This pattern with F0 in L2 perception is not in accordance with the phonetic trading relation between VOT and F0 in L1 perception of English stops in Gordon et al. (1993), and thus puzzling to generalize the role of attention in modulating multiple cues across L1 and L2 perception. One possible account for this null pattern of F0 in L2 perception is that English-speaking learners are less flexible in utilizing F0 because their native language primarily used VOT for the stop voicing contrast, and F0 is only redundant. Therefore, to better know the compensating role of non-primary cues as well as a reduced role of a primary cue with a distractor, it may be necessary to examine the L2 learners who can flexibly use both VOT and F0 already in their L1 speech processing. Korean-speaking L2 learners of English, who primary use VOT and F0 in their L1, are such population.
In all, the current study aimed to investigate the attentional modulation on acoustic cue weightings in the L2 speech perception. For this purpose, we examined the identification of the English stop voicing contrast (i.e., /d/–/t/) by Korean-speaking adult learners of English with the two specific research questions as follows: (Q.1) which acoustic cues (VOT and F0) would be more affected by the distracting condition and (Q.2) would multiple cues compensate each other when L2 listeners were too distracted to rely on the primary acoustic cue? We are interested in how systematically F0 would pattern when Korean L2 learners of English are distracted for the L2 stops perception.
2. Methods: Perception Experiment
Twenty-eight Korean college students (M: 14, F: 14) were recruited for the perception experiment. They were in their 20s (age mean=22, age SD=1.9), and have been studying English as second language for reportedly 11.6 years on average. Some of the participants have been in English-speaking countries but the length of stay did not exceed one year. They all spoke Seoul Korean (the standard variant of Korean). None of the students reported any hearing disorders. They received monetary compensation for their participation. To use one’s English proficiency as a control variable in the analysis, we collected the participants’ TOEIC scores: the scores ranged from 555 to 975 (median: 780, SD: 107.7). The participants are identical to ones reported in Kong & Lee (2018).
For auditory stimuli for the perception task, a set of 35 CV syllables were pseudo-synthesized based on natural productions of male speaker’s /da/ and /ta/ (a male native speaker of American English resident in Wisconsin). Starting from 9 ms VOT (of the base token /da/), we incremented VOTs in seven logarithmic steps by concatenating aspiration portion from the base token /ta/ and the vocalic part of the token /da/: VOT 9 ms, 13 ms, 19 ms, 28 ms, 40 ms, 58 ms, and 100 ms. We tested lag VOT values only excluding lead VOT values representing true voicing. Tokens of each VOT step were overlaid with five step F0 flat across the vocalic part: 98 Hz, 106 Hz, 114 Hz, 122 Hz, and 130 Hz. All acoustic manipulations were done in Praat.
Closely replicating Gordon et al. (1993)’s experimental design of speech perception and a distractor task, we administered two sessions of the 2-alternative forced choice task (“da” or “ta”) by altering the presence and absence of an arithmetic distractor task: i.e., distractor and no-distractor sessions. Arithmetic distractor task had participants compare three different two-digit numbers (e.g., 10 20 30) on a screen and decide whether the absolute difference between the first two numbers (|10–20|=10) is ‘same’ or ‘different’ from that of the last two numbers (|20–30|=10). In each trial of the distractor session, participants listened to a single auditory CV stimulus followed by the arithmetic distractor task. After answering with ‘same’ or ‘different’ based on the calculation, they were asked to choose between /da/ and /ta/ to respond to the auditory stimulus. The session had 4 blocks (i.e., 4 repetitions of a CV stimulus set), after which accuracy of arithmetic distractor task was reported on the screen so that participants could concentrate on an arithmetic distractor task for better scores. The mean accuracy was 97.2% (SD=2.8%) across the listeners. Similarly, each trial of the no-distractor session presented a CV syllable followed by three zeros (i.e., 0 0 0) on a screen with no request to respond, and then asked the participants to choose between ‘da’ and ‘ta’. The display duration of the three zeros was based on each participant’s average response time from the arithmetic distractor task. This identical trial design ensures the listeners’ differential use of acoustic variables is solely attributed to the presence of a distractor. Following Gordon et al. (1993)’s experimental design, the presentation order of the two sessions was fixed; the listeners always completed the distractor session before the non-distractor session, based on which the duration of the zero slide presentation could be pre-calculated from how long each participant took to finish arithmetic task. The experiment was programmed and administered in E-Prime (ver. 2).
The mixed effects logistic regression model was performed to estimate how much VOT and F0 (continuous fixed effect variables) can explain the listeners’ perceptual decision of the English stops /t/ over /d/ (DV) using lme4 package in RStudio (Bates et al., 2015; Posit Team, 2023).2 The analysis structure closely resembles the prior study of Kong & Lee (2018). The interaction effect of a distractor (Distractor: distractor vs. no-distractor) with VOT and F0 was also considered at the group level assessment. At the listener level, the model included (1) by-subject random intercepts and slopes for VOT and F0 to quantify individuals’ sensitivity to acoustic cues deviated from the group averaged coefficients and (2) subject-by-Distractor random intercepts and random slopes for VOT and F0 to assess each listener’s adjustment of VOT and F0 regarding Distractor condition. Given the model, the fixed effect coefficients represent the group averaged sensitivity to VOT and F0 (and their interaction with distractor), and the random effect coefficients (the sum of (1) and (2)) mean individual variabilities deviant from the group trend in processing the L2 English consonants with and without a distractor. We conducted a series of simple and partial correlation tests between various pairs of these individuals’ coefficients obtained from the model (e.g., between VOT and F0 coefficients in the no-distractor condition, and between VOT coefficients of no-distractor and distractor conditions) to provide statistical confirmations of noteworthy distributions among them. We used the ppcor package in RStudio (Kim, 2015).
The logistic regression models show that the Korean L2 learners of English used VOT and F0 in differentiating /t/ from /d/, and their uses of both VOT and F0 were affected by a distractor in the identification task. As in Table 1, the fixed effect coefficients of VOT and F0 were significant, while the VOT coefficient was greater than that of F0 in the no-distractor condition (a reference level of the model): βVOT=6.88, βF0=1.26. In terms of the impact of Distractor on the two acoustic cues, the coefficient of VOT×Distractor is significant (β=–2.64, SE=.46, p<.0001), and that of F0×Distractor is also significant, although marginal (β=–.17, SE=.10, p=.08). These negative coefficients suggest that the distracting listening condition interfered with the L2 listeners so that they could not make the most of the available acoustic information. Importantly, the magnitude differences of the coefficients between VOT and F0 in a distractor condition suggest that Distractor affected the primary cue (VOT) more than secondary one (F0). Figure 1 presents logistic curves based on the fixed effect coefficients of VOT and F0, depicting steeper slopes for VOT than F0, and steeper slopes for no-distractor condition than distractor condition.
Figure 2 displays individual listeners’ VOT and F0 coefficients estimated from the random and fixed variables of the regression model in the no-distractor (grey circles) and the distractor (orange squares) conditions. In both Distractor conditions, the VOT coefficients were greater than F0 coefficients, with all the datapoints located below the diagonal line. This suggests that the L2 learners used VOT as a primary acoustic information in the English /d/–/t/ differentiation. According to the distributions in Figure 2, there was no specific relationship between the VOT and F0 coefficients in the distractor and no-distractor conditions. Simple correlation tests confirmed this lack of consistency by yielding nominal correlation coefficients (no-distractor condition: r=–.02, df=26, p=.88; distractor condition: r=.22, df=26, p=.25).
The group average pattern observed in Section 3.1 is observed across multiple individuals. When an arrow connects each learner’s coefficients from no-distractor to distractor conditions, an overall trend was that the datapoints move toward a bottom left corner, meaning that both VOT and F0 coefficients decreased with a distractor in the perceptual identification. Notably, there were four individuals deviating from this trend. Three individuals in red arrows exhibited decreased VOT coefficients but increased F0 coefficients with a distractor. Reversely, a single individual in a blue arrow had a decreased F0 coefficients but an increased VOT coefficient in a distractor condition. While these exceptions may be interpreted as a perceptual adaptation boosting (non-primary) acoustic cues to cope with a distractor, the cases are too few to be generalized.
Another observation from Figure 2 is that the arrows tend to be longer for the individuals of greater VOT coefficients when a distractor was absent. It is noted that the same may be applied to the F0 dimension, but the range of F0 coefficients is practically too narrow to be eyeballed. This may mean that the amount of attentional reduction due to a distractor was somewhat proportionate to its magnitude of coefficients without a distractor.
For numerical confirmation, we conducted a series of simple correlation tests between individual listeners’ coefficients without a distractor (variable x) and their coefficient differences between the conditions (variable y: VOT(distractor)–VOT(no-distractor)). In addition, we did partial correlation tests between variable x and variable y with individuals’ English proficiency as a control variable (z). (It is noted that the samples are smaller (n=23) in the partial correlation test due to five missing TOEIC scores.) Table 2 summarizes the correlation coefficients. As the direction and length of arrows showed, VOT coefficient differences were negatively correlated with VOT coefficients without a distractor (simple correlation test: r=–.59, df=26, p<.001; partial correlation test: r=–.65, n=23, p<.001). This supports that listeners’ use of VOT, a primary cue for the English voicing contrast, was proportionately affected by a distractor. As for the use of F0, a non-primary cue for the contrast, the same results were not obtained from both correlation tests (simple correlation test: r=–.35, df=26, p<.1; partial correlation test: r=–.23, n=23, p=.28).
We also examined the correlation between F0 coefficient difference and VOT coefficient and vice versa, to explore whether reduced sensitivity to one cue (e.g., VOT) is compromised by less reduction of sensitivity to other cues (e.g., F0). While individuals’ VOT coefficient differences were significantly correlated with their F0 coefficients in a negative direction (simple correlation test: r=–.57, df=26, p<.005; partial correlation test: r=–.54, n=23, p<.001), the coefficient differences were not meaningfully correlated with individuals’ VOT or F0 coefficients estimated in the no-distractor condition. Figure 3 presents the distribution of individuals’ F0 coefficient differences against those of VOT. The F0 coefficient differences between distractor and no-distractor condition were in a negative relationship with VOT coefficient differences. Although the datapoints representing the top 10 highest VOT coefficients (without a distractor, green circles in panel (a)) tended to have smaller magnitude of F0 coefficient differences (near x-intercept), the partial correlation coefficient was not statistically meaningful. The datapoint distribution of the top 10 highest F0 coefficient differences (purple circles in panel (b)) was scattered in a rather incoherent manner. In all, the correlation test results and visual inspection do not seem robust to believe that multiple acoustic cues are flexibly and cooperatively compromised to cope with a perceptual distractor.
4. Discussion and Conclusion
The current study investigated how a cognitive distractor affects Korean L2 learners’ use of multiple acoustic cues in perception of the English stops /d/-/t/. Specifically, our goals were to examine the differential influence of a distractor on primary and non-primary cues in L2 perception and to further answer whether a non-primary cue will compensate the reduced role of a primary cue in Korean learners’ L2 speech perception with a distractor. When Korean L2 learners processed the identification of the English stops /d/-/t/ with a distractor of numeric calculations, their sensitivity to VOT was lessened echoing the pattern from English L1 listeners in Gordon et al. (1993). Notably, Korean listeners who use F0 as well as VOT as primary acoustic cues for the L1 stop laryngeal contrast were affected by a distractor in using F0 for the L2 stop differentiation although the magnitude was marginal. This may indicate that the Korean listeners used F0 actively enough to be reduced in the English stop perception. In L2 perception the role of F0, a non-primary cue for the English stop voicing contrast, interacted with that of VOT when distracted, as individuals with greater decrease of VOT sensitivity did lose less sensitivity to F0. Despite this systematic trend between the two cues, there was no clear evidence that the Korean L2 listeners used F0 cue more in the distractor condition in their efforts to compensate the reduced sensitivity to VOT. The observations in the present study are partially in line with previous studies examining the L1 speech perception (i.e., Gordon et al., 1993, Kong & Lee, 2018). Similar to Gordon et al. (1993), the primary VOT cue was negatively affected by a distractor in Korean learners’ L2 English stop perception. The secondary F0 cue to the English stop voicing contrast, however, did not either compensate the reduced role of VOT or lose the impact by a distractor, which is incongruent with the previous L1 studies.
We may provide two possible accounts for the current findings. One is that the lack of reversed cue-weighting between VOT and F0 for the English stop voicing contrast with a distractor might be ascribed to the Korean listeners’ strong reliance on F0 in the L2 English stop perception. The two acoustic cues are necessarily used in the Korean laryngeal stop contrasts. Gordon et al. (1993) explained that the increased role of the secondary cue with a distractor resulted from a sustained phonetic contribution unaffected by a distractor because attention has not been paid in the first place to it. That is, a secondary cue could be sustainable even with a distractor because its competition with a primary cue became weaker. In Korean, however, unlike the clear prioritization of the multiple cues in English stops, the three-way laryngeal stop contrasts do not have such clear-cut cue primacy between VOT and F0. For the lenis-aspirated contrast, the role of VOT has reduced, and that of F0 has increased. The role of F0 is as important as VOT, both of which Korean listeners have to closely attend to. Reflecting this L1 cue weighting pattern, the statistics in the present study showed that the attentional modulation with F0 as well as VOT was significant: both VOT and F0 were used less with a distractor. That is, because the Korean learners of English still attended to F0, it did not play a significant role in compensating the reduced VOT under distracting condition. In this sense, the lack of compensating effect of F0 and evidence of reduced F0, similar to VOT, due to a distractor might indicate that F0 is more than secondary in the Korean learners’ L2 English stop perception. Overall, the Korean learners’ perceptual flexibility in using multiple cues did not appear to be beneficial under distracting listening conditions. Instead, their inherent reliance on the two acoustic cues played a negative role with a distractor by having listeners closely attend to F0 as well as VOT.
The other possible account is that the lack of the interaction effect between F0 and distractor might be due to the L2 setting where the Korean learners in the present study are exposed to. Our participants were college students in Korea, and most of their L2 experience must be limited to the classroom setting. Under the prevailing Korean L1 context, our participants might not be accustomed to flexibly coping with L2 English input in an adverse listening condition. If we test Korean learners who are learning and using English in a naturalistic setting, we might be able to obtain more solid observation.
Overall, Korean learners’ flexibility in using both VOT and F0 in their native language did not offer advantages under distracting conditions. Instead, their reliance on both acoustic cues negatively impacted their ability to focus on F0 and VOT in English voicing contrasts. Testing Korean learners of English in a more natural setting might yield more conclusive observations.