1. Introduction
Code-switching refers to the systematic use of two or more languages or language varieties within a single communicative interaction (Myers-Scotton, 1993). It is governed by distinct linguistic (Berk-Seligson, 1986; Myers-Scotton, 2002; Poplack, 1980; Torres Cacoullos & Travis, 2015) and psycholinguistic constraints (Broersma & de Bot, 2006; Grosjean, 2008, 2010; Hartsuiker & Pickering, 2008; Kootstra et al., 2012). Code-switching also involves mechanisms of language control (e.g., Inhibitory Control Model; Green, 1986), broadly understood as the ability to regulate bilingual language activation in real time in order to select the intended language while minimizing interference from the non-target language (de Bruin et al., 2025; Green & Abutalebi, 2013). In this sense, code-switched speech provides an opportunity to investigate short-term phonetic interference.
In code-switched speech, segmental phonetic characteristics of one language have been shown to interact with those of another, yielding systematic patterns of convergence, divergence, or maintenance of cross-language contrasts at the segmental level. Specifically, acoustic realizations may converge toward the non-target language (Antoniou et al., 2011; Balukas & Koops, 2015; Olson & Seo, 2024; Seo & Olson, 2025), diverge from it (Piccinini & Arvaniti, 2015), or exhibit bidirectional influence, with both languages shaping each other simultaneously (Bullock & Toribio, 2009). Evidence for such interaction has been especially robust in consonantal production, particularly in stop consonants, where voice onset time (VOT) patterns often shift toward those of the other language in code-switching contexts. For example, Antoniou et al. (2011) reported that Greek-English bilinguals produced English stops with significantly more Greek-like VOT during code-switching than in monolingual English condition, suggesting substantial L1-to-L2 phonetic interference even in L2-dominant speakers. Similarly, Balukas & Koops (2015) found that English VOTs were significantly reduced near Spanish code-switch points, shifting in the direction of Spanish. These findings provide clear evidence of cross-linguistic short-term influence in the phonetic domain.
In addition to convergence, code-switching can also produce divergence and bidirectional influence. In spontaneous Spanish-English code-switching, speakers have been shown to exaggerate cross-language contrasts, maintaining distinct VOT patterns despite an overall reduction in VOT in both languages, which reflects enhanced phonetic differentiation rather than convergence (Piccinini & Arvaniti, 2015). Bidirectional effects have also been observed in early bilinguals, such that VOT adjustments pull both languages toward each other at switch points: English VOTs become more Spanish-like, while Spanish VOTs concurrently shift in the direction of English (Bullock & Toribio, 2009). At the same time, phonetic influence is not uniformly expressed across languages or segment types. For example, Balukas & Koops (2015) found that Spanish VOTs remained largely unaffected by English even as English stops showed convergence near switch points. These findings suggest that phonetic outcomes in code-switching are influenced by multiple contextual and speaker-level factors, including language dominance, feature salience, and phonological distance between the languages.
Although consonantal evidence, especially from VOT, has provided a strong foundation for understanding convergence, divergence, and bidirectional influence in code-switching, vowel outcomes remain less fully characterized. Existing vowel-focused work suggests that vowel production can be sensitive to code-switching context, but the magnitude and consistency of such effects appear to vary across language pairs and speakers. For example, Olson & Seo (2024) and Seo & Olson (2025) report context-dependent shifts in English vowel realization in Korean-English code-switching. At the same time, evidence from English-French bilinguals indicates that vowel quality can remain largely stable across monolingual and code-switched contexts: Muldner et al. (2017) report no robust shift in vowel height or backness (F1/F2) associated with switching, while noting subtle low-level phonetic effects and variability across speakers and languages. These findings suggest that cross-linguistic interference in code-switched vowels may be selective, emerging for particular categories, acoustic dimensions, and bilingual profiles rather than as a uniform shift of the vowel system.
Such variability motivates attention to individual differences in bilingual experience and language use. Research incorporating L2 proficiency has shown conflicting patterns. Some studies report greater convergence toward L2-like characteristics among more proficient speakers (Flege, 1987; Herd et al., 2015; Major, 1992; Ulbrich & Ordin, 2014), whereas others document stability or reduced convergence in L1 patterns among highly proficient bilinguals (Chang, 2013; Hévrová, 2021). Although these findings do not directly target code-switching, they motivate the possibility that bilingual experience influences the extent to which phonetic implementation changes across language contexts.
Beyond proficiency, immersion experience (hereafter immersion) in an L2-speaking environment provides a cumulative index of sustained immersion and everyday exposure that is conceptually distinct from attained proficiency. Work in L2 speech has linked extended residence to improved L2 performance (Flege, 1987; Flege & Liu, 2001). Related research on immersion-related phonetic drift, defined as short-term experience-driven change in L1 phonetic categories under L2 influence (Chang, 2012), has examined whether long-term residence affects L1 phonetic realization, though the findings remain mixed (Kim, 2020; Sancier & Fowler, 1997). However, far less is known about how immersion relates to phonetic modulation that emerges specifically during code-switching, particularly in vowel production. This gap is especially evident for Korean-English bilinguals, where it remains unclear whether immersion-related experience conditions convergence, divergence, or stability in Korean vowel realization across Korean-only versus code-switched Korean.
Korean and English diverge substantially in their vowel inventories and prosodic organization. English maintains a relatively large vowel system of roughly twelve vowels, including multiple tense-lax pairs such as /i–ɪ/, /e–ɛ/, and /u–ʊ/, many of which are inherently diphthongal (e.g., /eɪ/, /oʊ/) and exhibit extensive movement in the F1–F2 space (Hillenbrand et al., 1995; Ladefoged & Johnson, 2011). Korean, in contrast, traditionally has a more compact inventory of seven to eight monophthongs (/i, e, ɛ, ɯ, u, o, ʌ, a/) (Lee & Iverson, 2012; Yang, 1996), with the /e–ɛ/ contrast largely merged in contemporary Seoul Korean. Yang (1996) showed that the Korean vowel space is wedge-shaped and expanded in the high-vowel region, whereas the English vowel space is more rectangular and expanded in the low-vowel region. As a result, Korean vowels occupy a comparatively compact acoustic space, with vowel contrasts concentrated within a relatively constrained region of the F1-F2 space (Lee et al., 2006; Yang, 1996).
In addition, Korean lacks stress-based vowel reduction, resulting in relatively uniform vowel qualities across syllables (Ahn, 1999; Jun, 2005). These structural and prosodic differences create distinct demands in bilingual speech production and may increase the potential for cross-linguistic phonetic interaction when speakers alternate between languages. Prior work on L2 speech suggests that English speakers acquiring Korean often show vowel space compression, whereas Korean speakers learning English may expand vowel targets as they attempt to realize the larger and more dispersed English vowel inventory (Baker & Trofimovich, 2005; Chang, 2012). In light of these cross-linguistic differences, the corner vowels /i, a, u/ provide a well-suited testing ground for examining cross-linguistic phonetic interaction across Korean-only and bilingual contexts.
The present study addresses this gap by investigating the acoustic realization of Korean /i, a, u/ across Korean-only and code-switched Korean contexts. We analyze vowel height (F1) and frontness/backness (F2) to examine whether code-switching affects vowel realization across categories, and whether immersion modulates this effect.
2. Methods
Fifty-four undergraduate and graduate students (24 males, 30 females; M=22.7 years, SD=2.54) from a South Korean university participated in the study. All were native speakers of Korean and reported English as their primary second language. The mean self-reported age of English acquisition was 7.25 years (SD=1.96), corresponding to the onset of formal English instruction in the Korean elementary curriculum. Eighteen participants had lived in English-speaking countries or studied English abroad (for periods ranging from two months to eight years), whereas 36 participants had no overseas residence experience. Immersion in English-speaking environments showed a non-uniform distribution, with the majority of participants reporting either no overseas experience or relatively short stays, and a smaller subset reporting more extended immersion. To examine the effect of immersion on code-switched phonetic patterns, immersion was operationalized categorically to contrast the presence versus absence of meaningful English exposure within the present sample. Participants who had lived abroad for more than six months were assigned to the Immersion group (N=12, M=33.5 months, SD=26.5 months, Min=10 months, Max=96 months), while those who had lived abroad for less than six months were assigned to the No Immersion group (N=42, M=0.5 month, SD=1.4 months, Min=0, Max=5 months).
Stimuli consisted of 13 disyllabic Korean target nouns embedded in Korean-only sentences and code-switched (Korean–English) sentences, along with 8 English-only sentences containing English target words. The Korean target words were selected such that they began with an oral stop consonant followed by a monophthong vowel. Stop consonants were balanced across bilabial, alveolar, and velar places of articulation and represented the three-way laryngeal contrast in Korean: lenis (/p, t, k/), fortis (/p*, t*, k*/), and aspirated (/ph, th, kh/). The English target words were monosyllabic and began with either voiced (/b, d, g/) or voiceless (/p, t, k/) stops. These English items served as cross-language reference stimuli, allowing for comparison between Korean stop productions in code-switched condition and canonical English stop realizations. The consistent inclusion of stop-initial target words reflects the broader research design of which the present study forms a part, as the overall project aims to investigate not only vowel-related acoustic measures but also stop consonant properties, including VOT.
In Korean-only sentences, the target word appeared in sentence-initial position. In code-switched sentences, all code-switched items were intrasentential, and the Korean target words were embedded in utterance-medial position within the carrier phrase “The word ___ means…”. Intrasentential code-switching embeds lexical material from language within the morphosyntactic frame of another (Bullock & Toribio, 2009; Muysken, 2000) and has been associated with concurrent activation of both linguistic systems within bilingual language mode accounts (Grosjean, 2001). This structure was intended to induce an English-embedded production context in which the Korean target word was produced within an English frame. Sentence length was controlled across conditions to maintain comparability.
The English target words were placed in sentence-final position. This decision was constrained by lexical selection. That is, words needed to be high-frequency, phonologically simple (CVC), and accessible to participants across proficiency levels. Because the English stimuli served primarily as cross-language reference tokens rather than position-matched counterparts to the Korean items, they were presented in a consistent sentence-final position to ensure uniform prosodic context. Example stimuli are presented in Table 1, and the full list of stimuli is provided in the Appendix 1–3.
The study received full IRB approval from the Human Research Protection Programs at a University (KKUIRB-202509-HR-117). Stimulus materials were recorded in a soundproof booth using a Tascam HD-P2 solid-state recorder paired with a Shure KSM 44 condenser microphone. After completing a language-background questionnaire, participants were guided through the recording procedure by a native Korean researcher, who remained present throughout the Korean sentence session. Sentences were presented via E-Prime 2, and participants proceeded through the stimuli at their own pace using the spacebar. Any disfluency or error prompted an immediate re-recording, with only the correct production retained for analysis.
Following the Korean session, the Korean researcher left the booth and a native English researcher entered to conduct the English session (Grosjean, 2001). To induce English condition, participants engaged in a brief standardized conversation in English (e.g., “What is your major?” or “Do you have any pets?”). Participants then recorded 21 English sentences presented in random order. After the English session, participants were given a short, self-paced break. The same researcher subsequently conducted an informal bilingual interview, using both Korean and English (e.g., “Other languages 공부했어요?” / “Have you studied other languages?”) to elicit a bilingual or code-switched condition. Session order was not counterbalanced because immediate prior language exposure can influence subsequent bilingual production; we therefore used a fixed order to control for potential carryover effects and to ensure that all participants entered the bilingual condition under comparable exposure conditions. Prior to recording the bilingual sentences, participants were instructed to read naturally, as if speaking to a bilingual interlocutor, without exaggerating the Korean words. Two practice items were administered to ensure comprehension. Each sentence was read once unless an error occurred, in which case it was immediately re-recorded. Across both the English and bilingual sessions, the overall error rate was 5.8%. All participants reported no speech or hearing impairments and were compensated for their participation.
For the vowels included in the target words, the first two formant frequencies (F1, F2) were measured. Formant values (Hz) were extracted using the Burg LPC algorithm with a 25 ms Gaussian window, specifying five formants within a 0–5 kHz range for male speakers and 0–5.5 kHz for female speakers. Measurements were taken at the temporal midpoint of the vowel steady state to minimize coarticulatory effects. All formant frequencies were converted to the Bark scale to reduce anatomical variability while preserving perceptual spacing.
Linear mixed-effects regression models were computed in R. Fixed effects included Condition (Korean-only vs. Code-switched), Vowel (/i/, /u/, /a/), and Immersion (No Immersion vs. Immersion), along with all interactions. Random intercepts were included for Speaker and Word, and by-speaker random slopes for Condition were added to the model to allow the effect of speech condition to vary across speakers. Post hoc pairwise comparisons were conducted using estimated marginal means computed with the emmeans package in R (Lenth & Piaskowski, 2025), with Tukey’s HSD adjustment for multiple comparisons.
3. Results
Results showed a significant main effect of Vowel on F1 [F(2, 26.54)=245.28, p<.001], indicating substantial height differences across /i, a, u/. The effect of Condition on F1 varied depending on vowel category, with a significant Condition×Vowel interaction [F(2, 1894.05)=9.18, p<.001]. Specifically, Tukey HSD post hoc comparisons revealed that only /i/ showed a significant effect of Condition: F1 was higher in Korean-only condition than in code-switched condition (E=0.21, p=.027), indicating slightly lower tongue height in the monolingual Korean condition. Additionally, differences in F1 between immersion groups were modulated by vowel category, with a significant Vowel×Immersion interaction [F(2, 1878.80)=3.11, p=.045]. Although no within-vowel comparisons between immersion groups reached significance, the Vowel×Immersion interaction was driven by a comparatively larger between-group difference for /i/ (/a/: E=0.04, p=.999; /i/: E=0.23, p=.718; /u/: E=0.17, p=.922). In contrast, no significant main effects of Condition [F(1, 67.48)=0.14, p=.707] or Immersion [F(1, 54.33)=0.88, p=.353) was observed. Neither the Condition × Immersion interaction [F(1, 57.96)=0.32, p=.572] nor the three-way interaction [F(2, 1878.80)=1.91, p=.149] reached significance.
For F2, significant main effects of both Vowel [F(2, 26.42)=668.32, p<.001] and Condition [F(1, 1190.34)=6.81, p=.009] were found, indicating vowel-dependent differences in frontness/backness and an influence of code-switching across the vowel space. No significant main effect of Immersion was found. Turning to interactions, vowel-dependent F2 patterns varied across speech conditions and immersion groups, with significant interactions between Condition and Vowel [F(2, 1943.91)=9.67, p<.001] and between Vowel and Immersion [F(2, 1932.51)=7.50, p<.001]. The Condition×Immersion interaction was not significant. A significant three-way interaction among Condition, Vowel, and Immersion [F(2, 1932.51)=3.90, p=.020] indicates that condition-related modulation of vowel frontness/backness differed depending on both vowel category and immersion groups. Post hoc comparisons confirmed that this effect was particularly evident for /u/ in the No Immersion group: F2 was significantly lower in code-switched condition than in Korean-only condition (E=−0.39, p=.020), indicating greater backing of /u/ in code-switched speech. No other within-vowel condition effects reached significance.
Overall, No Immersion speakers showed subtle speech condition-dependent shifts in vowel frontness/backness, particularly for /u/, suggesting greater susceptibility to bilingual co-activation among speakers without immersion experience. These condition-dependent vowel space patterns are illustrated in Figure 1.
Given that length of residence in an English-speaking environment may correlate with L2 proficiency, we conducted a follow-up analysis to evaluate whether the observed immersion-dependent modulation of vowel realization could be attributed to proficiency differences rather than immersion. Speech proficiency of participants was holistically rated by a native speaker of General American English (trained in phonetics), based on their interview performance during the experiment. Ratings were assigned on a 1–7 scale with half-point increments and were weighted toward fluency (65%) and pronunciation/accentedness (35%). The weighted scores were then combined to yield one score per participant. Judgments from the native English rater confirmed that scores exceeding 4.5 corresponded to high English proficiency.
The relationship between length of residence and proficiency was then examined using Pearson correlation analysis. Results revealed a moderate positive association between the two measures (r=.43, p<.001), indicating that although length of residence tends to be associated with higher proficiency, the two variables are not strongly coupled (see Figure 2). Importantly, this pattern suggests that the three-way interaction observed in the No Immersion group, particularly the heightened sensitivity of vowel frontness and backness for /u/ across conditions, cannot be fully accounted for by proficiency differences alone. Rather than reflecting a simple proficiency gradient, the observed modulation appears to be more closely linked to residence-related factors, supporting the view that immersion indexes aspects of bilingual phonetic experience that are at least partially independent of overall L2 proficiency.
4. Discussion
The present study investigated how immersion modulates vowel production in Korean–English bilingual speakers, with particular attention to formant shifts across Korean-only and code-switched Korean contexts. The results provide converging evidence that residence experience modulates bilingual vowel behavior in ways that cannot be fully explained by L2 proficiency, supporting an interpretation in terms of residence-related phonetic adaptation.
A key finding of the study is the presence of a significant Condition×Vowel×Immersion interaction in F2, indicating that code-switching influenced vowel frontness/backness differently depending on vowel category and immersion. Notably, this interaction was driven primarily by No Immersion group, who exhibited a significant backing of /u/ in code-switched Korean compared to Korean-only condition. This suggests that bilinguals without immersion experience are particularly sensitive to contextual language activation, resulting in subtle but significant shifts in vowel frontness/backness for /u/.
The observed /u/-backing in code-switched Korean among No Immersion group may thus represent an online acoustic adjustment, whereby activation of English phonetic representations may temporarily affect Korean vowel realization in bilingual contexts. Such condition-sensitive variation suggests that L2 speakers without immersion may be particularly susceptible to immediate cross-language activation effects during speech production. In contrast, Immersion group showed more stable vowel realizations across speech conditions, indicating that extended immersion may be associated with increased consistency or stronger language-specific control mechanisms, thereby reducing short-term cross-linguistic interference.
Whereas F2 revealed reliable immersion- and code-switching-dependent effects, F1 showed limited influence. Although the observed F2 shifts are modest in magnitude, their systematic alignment with code-switching condition and immersion suggests that they reflect context-dependent variation rather than random variability. This asymmetry suggests that bilingual phonetic interaction may affect dimensions that are more vulnerable to cross-language overlap or contrast enhancement. Frontness/backness contrasts between Korean and English vowels, particularly for high back vowels such as /u/, may constitute a region in bilingual vowel space where speakers adjust production to preserve contrast under cross-language activation. English /u/ often exhibits substantial fronting in many dialects (Fridland et al., 2014; Grieve et al., 2013; Oder et al., 2013), which may indirectly influence Korean /u/ production in code-switched contexts. Accordingly, Korean /u/ may reflect a contrast-enhancing adjustment that increases cross-language distinctiveness and promotes greater dispersion among vowel categories in acoustic space (Chang, 2012; Guion, 2003). This pattern is consistent with Adaptive Dispersion Theory (Flemming, 2004; Liljencrants & Lindblom, 1972; Lindblom, 1986), which posits that vowel systems dynamically adjust to preserve contrast and reduce category overlap. The selective sensitivity of F2 therefore supports the idea that bilingual phonetic interaction is not uniform across acoustic dimensions but instead targets specific contrastive zones where the two language systems interact most strongly.
For the relationship between length of residence and L2 proficiency, the two were only moderately correlated, indicating that residence duration is not strongly tied to proficiency. Although immersion and L2 proficiency often overlap to some extent, the present results suggest that residence duration captures additional dimensions of bilingual experience that cannot be fully reduced to proficiency. In particular, immersion may index not only cumulative exposure to L2 input but also increased opportunities for engagement in contexts where both languages can be simultaneously active and alternated, although actual language use patterns likely vary across individuals. These experiential factors may constrain the extent to which bilingual speakers show condition-dependent phonetic variation during speech production. In that sense, the greater cross-condition consistency observed in L2 speakers with immersion experience may reflect reduced sensitivity to short-term contextual language co-activation effects, at least for categories particularly sensitive to cross-language overlap or dialectal variability, such as high back /u/. However, given the skewed distribution of residence duration in the present sample, the observed correlation may partly reflect a contrast between participants without immersion experience and those with some immersion, which does not capture a strictly gradual increase across residence duration. Future studies including participants with a more balanced range of residence durations could better reveal how immersion gradually affects proficiency.
Two methodological choices represent potential limitations of the present study. First, immersion was treated categorically using a six-month cutoff. Although this approach follows prior work, this threshold lies at the shorter end of the residence continuum and results in unbalanced group sizes. As a result, more gradual effects of immersion may not have been fully captured. A continuous treatment of immersion would allow a more fine-grained examination of how immersion relates to phonetic adjustment. Second, target position differed across modes. Because prosodic position can influence boundary-related strengthening, phrase-final lengthening, and variation in coarticulatory resistance (Cho et al., 2011), we acknowledge that speech condition and prosodic structure are not entirely separate in the present design. Although the within-speaker comparisons reduce some potential confounds, stricter alignment of prosodic position across speech conditions would strengthen future work.
5. Conclusion
In sum, the present findings demonstrate that immersion modulates bilingual vowel production in a selective and condition-dependent manner. Speakers without immersion exhibit heightened sensitivity of vowel frontness, particularly for /u/, to speech condition, whereas speakers with immersion show more stable cross-mode vowel representations. These effects cannot be straightforwardly reduced to L2 proficiency, underscoring the role of residence-related bilingual phonetic adaptation.
These results contribute to a growing body of evidence that bilingual phonetic systems are dynamically influenced by immersion, revealing a nuanced interplay between language activation, context-sensitive phonetic modulation, and the maintenance of relatively stable phonetic categories.






