Studies in second language (L2) acquisition have allowed researchers to gain insights into several factors that contribute to the foreign accents in L2 production. Two major factors, among many others, are the onset age of L2 acquisition (AOA) and language experience, often indexed by the length of residence (LOR) in an L2-speaking country (Flege et al., 1995; Guion, 2003; Munro et al., 1996; Tsukada et al., 2005). According to the Speech Learning Model (SLM) espoused by Flege (1995, 2002, 2003), a new L2 speech category can be established independently from the closest L2 category through changes in perceived similarity between the two sounds. Under the premise that the capacity to learn new L2 speech categories remain intact across the life span, the model proposes that young as well as adult learners with sufficient L2 experience are likely to detect the subtle acoustic difference between the native language (L1) and L2 sounds and develop new categories in a native-like manner. Several studies have presented evidence for late adult learners' ability to enhance L2 performance, provided that a substantial amount of exposure to the L2 input is given (Flege, 2009; Flege & Liu, 2001; Saito, 2015).
The long-term effects of AOA have been reported in different domains of language performance such as in speech perception (Jia et al., 2006), grammatical knowledge (Bartning et al., 2012; McDonald, 2000), inferencing and idioms (Abrahamsson & Hyltenstam, 2009) as well as speech production (Flege et al., 1995; Kang & Guion, 2008). However, compared to clear and consistent age effects on native-like proficiency in L2 production, the effects of LOR on L2 production are less straightforward and more variable. While LOR has been shown to positively impact certain aspects of language acquisition, some studies do not lend support to the predicted role of time spent in the L2-speaking environment, particularly in the case of late L2 learners. Guion et al. (2000), for example, conducted a cross-language mapping experiment to investigate the effect of L2 experience on the perception of English consonants by native speakers of Japanese. They found that highly experienced Japanese speakers were not able to perceive some English consonants in a native-like manner, especially when the English consonant is phonetically similar to the adjacent Japanese consonant. Also, Flege et al. (1992) examined coda voicing effects on vowel duration produced by Spanish and Mandarin learners of English with a varying amount of English experience (4 months to 9 years), but contrary to the hypothesis, LOR contributed little to the native-like acquisition of voicing contrast. While the experienced groups exhibited greater similarity to native speakers in terms of vowel duration before voiced stops, the L2 learners, despite their extensive L2 experience, did not produce cues that approximated native-like voicing contrasts.
Tomaschek et al. (2018) raised concerns about confounding issues in much of the research exploring the contribution of AOA and LOR on L2 attainment. They found that AOA and LOR commonly showed a negative linear correlation across bilingual studies, indicating that early bilinguals were more likely to have spent a longer duration in the L2 country compared to late bilinguals. Furthermore, it is generally assumed that the age of immigration, rather than the AOA in the L1 country, represents the onset of initial exposure to the L2. Given the discrepancies in the quantity and quality of L2 input between the two settings, the effects of AOA cannot be interpreted in isolation. Consequently, studies on the acquisition of grammatical structures showed no significant relationship between L2 performance and AOA (Hopp, 2010; Roesch & Chondrogianni, 2016; Unsworth et al., 2014). Unsworth et al. (2014), for instance, investigated the effects of AOA and LOR on the acquisition of grammatical gender in English-Dutch and English-Greek bilingual children. They divided children into three age groups based on AOA but found little to no role in predicting the accurate acquisition of grammatical gender. Similarly, a study by Hopp (2010) investigating case and gender marking in noun phrases in German bilingual children also revealed that differences in performance were better accounted for by LOR rather than AOA.
Furthermore, several studies have examined the influence of L1 phonological knowledge and L2 experience, yielding different findings. Bishop & Smith (1992) investigated the effects of voicing of final stops on preceding vowel duration produced by inexperienced Japanese and experienced Mandarin learners of English. They found that Japanese learners with L1 experience in vowel length distinction were more native-like in acquiring voicing contrast in terms of vowel duration than Mandarin learners with longer experience in English. Flege et al. (1992) compared voicing features in word-final stops produced by Mandarin and Spanish speakers varying in L2 experience. Due to the absence of voicing contrasts in word-final position in the L1, many participants were shown to devoice final voiced stops and only a few accurately produced vowel duration differences, regardless of L2 experience.
On the other hand, other studies showed that the influence of L1 knowledge and the outcomes of L2 experience can vary depending on the specific phonetic features being examined. For example, Bohn & Flege (1992) examined both temporal and spectral quality of English vowels produced by inexperienced (0.6 years) and experienced (7.5 years) German speakers of English. They reported that the experienced German speakers' accurate production for the new L2 vowel, /æ/, was limited to spectral features. Despite the vowel length contrast in the L1 and years of L2 experience, the German speakers produced significantly shorter /æ/ regardless of L2 experience. The experienced German speakers acquiring the new vowel category limited to vowel spectral quality provides support for the view that L2 experience yields different predictions for different phonetic features in late second language acquisition. Also, Toscano & Lansing (2019) found discrepancies in the use of phonetic cues for distinguishing English tense and lax vowels by native Koreans with varying levels of L2 proficiency. They reported that higher proficiency not only led to more accurate production of spectral qualities but also exhibited a greater temporal difference between tense and lax vowels. Notably, longer L2 experience did not result in a more accurate use of temporal cues. The findings suggest that the role of L2 experience in acquiring spectral and temporal features for L2 vowels may vary to a certain degree, as they involve distinct learning processes.
This exploratory research aims to investigate the influence of AOA and LOR on the acquisition of L2 phonological features that differ from the L1. Specifically, the study focuses on two language-specific cues in English—the coda voicing effect on preceding vowel duration and the spectral qualities of English vowel categories—produced by native Korean learners. As relative duration account for contextual differences and speech rate across speakers, durational patterns were analyzed in both absolute and relative (vowel-to-word ratios) terms for a more comprehensive understanding of L2 leaners’ phonetic implementation of phonological coda voicing contrast in English. The main goal was to examine whether the age of acquisition and the level of L2 experience have different effects on temporal and spectral encoding of coda voicing contrast in L2 English by Korean speakers. Given that native Korean learners of English do not have prior experience with coda voicing contrasts in their L1, it is expected that less experienced Korean learners may show more difficulties in acquiring the temporal cues associated with voicing contrasts in English.
The acquisition of “new” L2 vowel categories is predicted to be influenced by differences in age of onset and the learners’ experience with the L2. As previously employed in other research, we use the terms “new” vowels to describe Korean vowels that lack a close English equivalent and “similar” vowels to refer to Korean vowels that share corresponding sounds in English. While these terms aid in depicting the similarity between native language (L1) and second language (L2) vowels, they are used relatively and may vary among different languages. Yoon (2007) investigated how inexperienced and experienced Korean speakers perceived the similarity between English /ɛ/ and /æ/. The results indicated that experienced speakers were more likely to identify /æ/ as a distinct L2 vowel category, leading to more precise production of the “new” L2 vowel. In Lee & Cho (2015), the English-to-Korean mapping and English vowel identification tasks revealed that Korean participants were likely to map the English pairs /i/-/ɪ/, /ɛ/-/æ/ to the single Korean vowels /i/, /ɛ/, respectively. Taken together, it was expected that Korean learners with English experience would show greater improvements in producing vowels such as /ɪ/ and /æ/, which do not have clear counterparts in Korean.
The current study used data from the Speech Accent Archive (Weinberger, 2015)—a database with thousands of speech files of native and non-native English speakers reading the same passage. The Speech Accent Archive (n.d.) is available at http://accent.gmu.edu. The archive contains transcribed speech samples of the following passage in English produced by native Korean learners of various dialects with different AOA and LOR in the United States. The underlined words are analyzed for the current study.
“Please call (/ɔ/) Stella (/ɛ/). Ask her to bring these things with her from the store. Six spoons of fresh snow peas, five thick (/ɪ/) slabs of blue (/u/) cheese, and maybe a snack (/æ/) for her brother Bob (/a/). We also need (/i/) a small plastic snake and a big (/ɪ/) toy frog (/ɔ/) for the kids. She can scoop (/u/) these things into three red (/ɛ/) bags (/æ/), and we will go meet (/i/) her Wednesday at the train station.”
To compare vowel duration as a function of coda voicing, 10 words with voiced and voiceless final coda stops were used: big-six, bag-snack, need-meet, kid-thick, Bob-scoop. The target words were elicited in a more naturalistic task, which increased the likelihood of unreleased stops. Fourteen percent of the target words produced with a release burst at the end of the word were excluded from the analysis. For vowel quality, 13 words (2 words for 7 vowel categories except for /a/) were analyzed, yielding a total of 550 tokens (55 speakers×10 words) for duration and 715 tokens (55 speakers×13 words) for vowel quality comparisons (/i/, /ɪ/, /ɛ/, /æ/, /ɔ/, /a/, /u/).
For the current study, 45 female adult native speakers of Korean were chosen based on the biographical data reporting the birth place, native language, other languages acquired, age, gender, the age of English onset, English learning method, place and length of English residence. Subjects were all female Korean speakers of Seoul dialect with different age (18–52 years old), AOA (3–27 years old), LOR in the U.S. (0–38 years). Table 1 shows the participants’ age at the time of testing, AOA and LOR in the U.S. Only participants who were reported to have lived in the Northwest region in the U.S. were included to minimize any dialectal variation. For a control group, speech produced by ten female native speakers of English (18–32 years old) residing in the Northwest region were included. Regarding the collinearity of these two variables, the Pearson correlation coefficient between AOA and LOR returned a correlation of 0.10, indicating a weak correlation between the two factors.
|Native||18–32 (24, 4.8)|
|Korean speakers||18–52 (30, 11.8)||3–27 (11, 4.6)||0–38 (8, 8.9)|
Vowel duration produced by 45 native speakers of Korean was manually measured using Praat, an acoustic analysis software (Boersma & Weenink, 2020). Vowel duration was measured from the onset of voicing in the vowel to the point of constriction of the final stops. The onset and offset of clear energy in the second formant frequency on the spectrographic and time domain waveform displays served as a reference to measure vowel duration. The word duration for ‘six’ included consonant closure duration measured from the offset of the second formant frequency in the preceding vowel to the left edge of the fricative release burst. The vowel to word ratio conditioned by coda voicing served to normalize individual speech rate. Word-final devoicing was not observed in our data. For vowel qualities. First and second formant frequencies of each vowel were measured at the temporal midpoint of the vowel. Values were bark-transformed and z-normalized to enable cross-speaker comparisons by minimizing the influence of individual variations. Additionally, the bark scale is known to provide a more precise representation of the perceptual distance between speech sounds, which may not be adequately captured by linear frequency measurements (Syrdal & Gopal, 1986).
Using the lme4 package for R (Bates et al., 2015, R Core Team, 2022), linear mixed-effects regression models were conducted. For vowel duration and durational ratio (vowel-to-word), voicing (voiced, voiceless), AOA and LOR were submitted as fixed factors and item and speaker as random factors with random slopes for voicing in order to allow the slope of the effect of voicing to vary across speakers: Duration/Ratio~Voicing×(AOA+LOR)+(1+Voicing|Subject)+(1|Item). Similarly, linear mixed-effects regression models were run separately for normalized F1 and F2 values. We entered vowel (7 categories), AOA, LOR as fixed effects predictors, including the interactions, and item and speaker as random factors with by-subject random slopes for vowels: Formant values~Vowel Type×(AOA+ LOR)+ (1+Vowel Type|Subject)+(1|Item).
When linear mixed effects models were fitted to the data to test the significant predictors to absolute vowel duration, voicing [χ2(1)=7.82, p=0.005] and LOR [χ2(1)= 5.62, p=0.018] were shown to be the two significant predictors. Although not statistically significant (p=0.08), there was a negative coefficient for the interaction between voicing and LOR, suggesting that the effect of voicing on vowel duration was less prominent with longer LOR. The coefficient values of the best fitting model are reported in Table 2.
As shown in Table 2, vowel duration notably increased by 29.09 milliseconds before voiced stops and it also significantly increased with LOR (by 9.71 ms). Korean learners with longer LOR produced longer vowel duration regardless of voicing. AOA, however, was not a significant predictor (p=0.76), suggesting that vowel duration as a function of voicing did not vary by the AOA. The significantly longer absolute vowel duration with increase in LOR may be attributed to the experienced learners’ slower and more articulated production in an attempt to make themselves more intelligible (Figure 1).
More importantly, the vowel duration relative to the total word duration (i.e., durational ratio) was examined as a primary cue to coda voicing. The overall ratio difference between voiced and voiceless was larger for the Native (mean=0.22) than the Korean (mean=0.19) groups. Linear mixed-effects regression models returned Voicing [χ2(1)=8.58, p=0.003] and AOA [χ2(1)= 3.64, p=0.05], but not LOR [χ2(1)=0.74, p=0.39], as the significant predictors. The coefficient values of the best fitting model are reported in Table 3.
The durational ratios significantly increased before voiced (by 0.129) than voiceless stops but the ratio decreased by 0.056 with increase in onset age regardless of coda voicing. The negative coefficient for voicing and AOA suggested that the voicing effect on the ratio became weaker with older age of acquisition. However, the interaction was marginally significant, which may be due to a smaller number of tokens for participants with older AOA. When compared to the native English speakers’ production in Figure 2 (marked with dotted lines), the Korean learners with older AOA showed smaller ratio, more notably before voiced coda.
As illustrated in Figure 2 below, the later the onset age, the smaller ratio. Note that because the majority of the participants is concentrated around the onset age of 15, the shaded error band becomes wider with increase in age. However, the downward trajectory, coupled with a smaller ratio difference between voiced and voiceless, is notable with increasing AOA.
In linear mixed-effects model for normalized F1 returned no main effect of AOA [χ2(1)=0.09, p=0.765)] or LOR [χ2(1)= 0.53, p=0.467] and no significant interaction between Vowel type and AOA [χ2(6)= 4.94, p=0.551]. However, there was a significant main effect of Vowel type [χ2(6)=56.38, p<0.001] as well as a significant interaction between Vowel type and LOR [χ2(6)= 16.47, p=0.011]. The significant interaction suggests that the direction of the F1 movement for each vowel varied as a function of LOR. A Tukey post hoc test returned a significant decrease in F1 values for /u/ an increase for /ɪ/, /æ/ and /ɔ/ with longer LOR. As for the normalized F2, neither AOA [χ2(1)=2.46, p=0.292] nor LOR [χ2(1)=0.82, p=0.664] promoted any significant change or interactions with Vowel type.
For a better illustration of the effects of LOR on vowel quality, participants were divided into two groups of LOR (Inexperienced vs. Experienced): Inexperienced [mean LOR=1.8 (1.6)], Experienced [mean LOR=14.4 (6.5)] groups. As show in the vowel density plot in Figure 3 below, the native English group showed the most distinctive vowel categories. The post hoc group comparisons showed that both Korean groups produced English /u/ with overall higher F1 (and lower F2) values compared to the native English group, indicating a lower and further back tongue position.
The Inexperienced group differed from the native English Group for /ɪ/, /æ/ and /ɔ/ in F1 values. The results of paired sample t-test for F1 showed that the Inexperienced group produced lower F1 values (higher in the vowel space) for /ɪ/ [t(40)=–8.05. p<0.001], /æ/ [t(40)=–5.67, p<.0001] and /ɔ/ [t(40)=–6.91, p<0.001] compared to the Native English group, and the Experienced group showed a significantly higher F1 (lower in the vowel space) for /æ/ [t(40)=–6.07, p<.0001].
Based on the visual analysis in Figure 3, one of the notable changes between the two Korea groups is the dissimilation of the new (i.e., /i/, /e/) and similar (i.e., /ɪ/, /æ/) vowels in the Experienced group’s production. Moreover, the more densely populated lines for each vowel category also confirm a smaller vowel space dispersion, hence less variability, across the vowel tokens for the Experienced group.
With increased L2 experience, we not only observe that the establishment of new English vowel categories but also find the merging of the two back vowels, /ɔ/ and /ɑ/, for the Experienced group [t(40)=3.46, p=0.11], but not for the Inexperienced group [t(40)=5.05, p=0.0002]. As shown in the low back merger of /ɔ/ and /a/ (as in ‘frog’ and ‘Bob’) in the native English group [t(40)=0.68, p=0.99], the phonetic features of the North-western regional dialect are reflected in the experienced learners’ production.
In terms of the absolute vowel duration, the Korean learners were able to use duration as the primary cue for voicing contrasts. For inexperienced Korean learners of English, distinguishing vowel duration based on voicing contrasts may be predicted to be challenging since Korean lacks voicing contrasts due to neutralization of final stops (Kim & Jongman, 1996; Sohn, 1999). Regardless of AOA or LOR, however, vowel duration significantly differed as a function of voicing coda in both absolute and relative terms. The systematic vowel shortening before fortis and aspirated geminates (Choi & Jun, 1998) and different phonetic realization of short and long vowels in various word positions in Korean (Chung et al., 1999) may be attributed to the use of vowel duration as a salient correlate of voicing contrast in English regardless of onset age or L2 experience. This result is in line with the previous study by Baker (2010) where all Korean learners including those with less than one year of L2 experience produced native-like vowel duration for voicing contrasts. The study showed that vowel duration is easier for Korean learners to acquire than closure duration.
For absolute vowel duration, Korean learners with greater L2 experience produced overall longer vowel duration regardless of voicing. Also. The negative coefficient for the interaction between voicing and LOR indicated the possibility of a diminishing effect of voicing on vowel duration with increasing L2 experience. Despite lacking statistical significance, the greater deviation from native English speakers with longer LOR raises an interesting point of discussion. As briefly noted, one possible explanation could be that experienced speakers may have put greater efforts into enhancing their intelligibility. This conscious effort to be more intelligible could result in longer absolute vowel durations compared to those of native English speakers. However, it is important to note the potential impact of the smaller number of tokens for longer LOR on the reliability of the observed effects.
Our results on relative vowel duration also aligns with Baker (2010) in that Korean learners’ AOA, not the length of L2 experience, was associated with accuracy. Korean learners with older onset age showed smaller ratios, especially before voiced coda, suggesting that the onset age was a stronger predictor of accuracy for relative vowel duration than was the length of L2 experience. The different effects of age of acquisition in absolute and relative terms also indicate that the acquisition of language-specific information about the relative timing patterns may benefit more from early exposure to the language than L2 experience. Considering that English speaking children reliably signal voicing in stop codas as early as 2 years of age (Krause, 1982; Ko, 2007; Smit et al., 1990; Song et al., 2012), nonnative-like production by the experienced learners suggests that extensive amount of L2 experience may not completely override the entrenched L1 acquired from the outset.
Redford & Oh (2017) explores how the timing of speech sounds is acquired and executed in first language (children) and second language (L2 adult learners) acquisition. To investigate whether the production of timing in L2 speech sounds is affected by the learners' age of acquisition and language experience, they examined the effects of final coda voicing on relative vowel duration produced by Korean adult learners of English and school-aged English-speaking children. Despite greater speech variability, English-speaking children showed an adult-like timing control whereas the Korean-speaking adult learners of English produced different temporal patterns than native English-speaking adults. In light of the present findings, the results suggests that the age of Korean learners upon arrival in a target language country may have a stronger impact on their acquisition of phonological knowledge related to relative timing, particularly within rhythmically varied prosodic domains, compared to the amount of experience.
It has been noted in previous studies (Bohn & Flege, 1990; Baker & Trofimovich, 2005) that the impact of AOA and LOR on different aspects of L2 phonology may vary depending on the specific phonetic features being examined. In line with this observation, our study found that Korean learners' ability to accurately produce F1 and F2 vowel spectral qualities in English was more strongly related to their amount of L2 experience than their age of arrival. This indicates that L2 experience over time, combined with speech motor practice, is likely to improve vowel quality production more than temporal patterns. Inexperienced Korean learners showed non-native-like production of three new English vowels, while experienced learners demonstrated successful establishment of these new vowels distinct from their similar-sounding Korean counterparts. Moreover, experienced learners’ merging of the two low back vowels /ɔ/ as in ‘frog’ and /a/ as in ‘Bob’ among the experienced learners resembles that of West Coast English speakers. This result highlights the influence of L2 input on phonological category acquisition in the L2. While studies have suggested that most vowel quality development does not become fully adult-like until relatively late childhood, the delayed development may benefit from substantial L2 experience and training for L2 learners to improve their motor control.
However, we acknowledge that differences in phonetic production may not be attributed solely to age or experience effects, and that these production differences may not necessarily reflect differences in perception. For instance, Cebrain (2007) found no effect of L2 experience in the identification of English tense and lax vowels by native Catalan speakers, who relied more on temporal contrasts than spectral cues regardless of L2 experience. The variation in outcomes across studies could be due to differences in vowel inventories and/or acoustic characteristics of the languages being studied. Furthermore, Baker (2010) pointed out that different phonetic cues that make up the same L2 target may be differently manifested by onset age and L2 experience. For example, the acquisition of stop consonant voicing, which relies on two phonetic features (vowel duration and closure duration), may be affected by different factors.
Although the acquisition of L2 phonology is a complex process that involves multiple factors, including AOA and L2 experience, our findings suggest that early exposure to the L2 may be more advantageous for some aspects of L2 phonology, such as relative vowel duration in relation to coda voicing, while other aspects, such as vowel spectral qualities, show greater improvement with increased L2 experience. Specifically, our study found that age of acquisition and L2 experience impact the acquisition of different aspects of L2 vowels. These findings shed light on the complex interplay between age of acquisition and L2 experience in the multifaceted process of L2 phonological acquisition.