One of the most cross-linguistically salient correlates of the voicing contrast in word-initial position is the difference in voice-onset-time (VOT) (Lisker & Abramson, 1964; Keating, 1984). However, the realization of VOT as a function of voicing contrasts differs across languages, and it creates different category boundaries along the VOT continuum. In English, for example, VOT has been shown as the most reliable phonological cue for distinguishing word-initial voiced and voiceless stops: VOT is significantly shorter for voiced than voiceless stops. Voiced stops in Spanish, on the other thand, are produced with VOT that leads a following consonant release (i.e., vocal cords vibrate before the release), whereas VOT for voiceless stops is similar to that for English voiced stops (Lisker & Abramson, 1964). Word-initial stops in Korean can be categorized into three types of stop consonants (aspirated, lenis and tense voiceless stops), which are primarily characterized by VOT, fundamental frequency (F0) and differences in amplitude between the first and second harmonics (H1-H2) (Cho et al., 2002; Kang & Guion, 2006). In Seoul Korean, VOT is shortest for fortis, longer for lenis and aspirated stops, while F0 is higher for aspirated and fortis stops and much lower for lenis stops. Recent studies have suggested that F0 is used as a more reliant and salient cue than the VOT difference to distinguish lenis and aspirated stops among younger Seoul Korean speakers (Kang & Guion, 2008; Kong et al., 2011; Silva, 2006).
Different phonetic realization of stops across languages has created challenges especially for bilinguals and the difficulty of creating independent categories is often more evident in bilingual children who have been exposed to the L2 before the L1 was fully established. As children show different milestones for the acquisition of language-specific contrasts, bilinguals are often shown to experience delays in the acquisition of L1 stop contrasts (Davis, 1995; Deucher & Clark, 1996; Macken & Barton, 1980; Westbury & Keating, 1986). For instance, Deucher & Clark (1996) found that children before the age of two had a strong initial preference for short-lag over long-lag VOT. In a different study on Korean monolingual children’s production of Korean stops, children at the age of three began to form a bimodal VOT contrasts, while F0 contrast between lenis and aspirated stops did not emerge even in the production of four years old children (Kim & Stoel-Gammon, 2009).
Different development milestones at which children master language-specific elements in the L1 were also shown in bilingual children’s production (Johnson & Wilson, 2002; Lee & Iverson, 2012). Johnson & Wilson (2002) examined two Japanese-English bilingual children’s VOT production of Japanese and English stops. Despite having pre-voiced and short-lag VOT distinction in Japanese, the bilingual children produced short- and long-lag VOT contrasts as in English. The older bilingual child made a greater difference by producing longer VOT for English voiceless stops, indicating children’s strong preference for more distinctive and universal cues.
Studies have shown that acquiring L2 categories before the full development of L1 phonological categories could affect native-likeness of the L1 speech sounds. In Lee & Iverson (2012), the VOT and voice-onset F0 of Korean and English stops produced by thirthy simultaneous Korean (L1)–English (L2) bilingual children in two age ranges, five and ten, were examined. The older children with longer exposure to both languages were able to create five distinctive stop categories, whereas the younger children primarily used VOT to distinguish English and Korean stops. For instance, English voiceless and Korean lenis stops, which differed with respect to F0 by the older children, were distinguished by VOT in younger children's production. The discrepancy in the use of VOT and F0 cues for Korean and English stops in younger bilingual children’s production was interpreted as effects of early exposure to English as well as shorter exposure to both languages on the development of phonological representations in Korean. Considering the different rate and milestones in development for each language, the early exposure to the L1 does not lead to earlier or more accurate acquisition of L1 phonetic features than the corresponding L2 features for bilingual children.
Depending on age at the time of testing, bilingual speakers' production of both L1 and L2 may be at the developmental or the ultimate stage of language acquisition. Kang & Nagy (2012), for instance, examined whether Korean heritage adult speakers residing in Toronto would exhibit the shift in cue weighting from VOT to F0 in the word-initial aspirated-lenis stop contrast. Two generations of Korean heritage adult speakers were compared to native speakers of Korean. The results showed that both groups shared a similar pattern with Seoul Korean speakers in terms of the VOT-F0 trade-off. Similarly, Kang & Guion (2006) compared Korean-English early (adult) bilinguals’ (mean age of L2 learning=3.8) production of Korean and English stops and found that they were able to create distinctive stop systems for the two languages. Both studies examined adult bilinguals whose L1 and L2 are at the ultimate stage of language development.
On the other hand, Kehoe (2002) investigated child bilinguals’ production of German and Spanish vowels and found that the bilinguals’ German vowel length was not as distinctive as that produced by German monolingual children. The child bilinguals’ delay in one language over the other suggests that languages with a more complex phonological system (in this case, a larger vowel inventory) require more L2 experience to obtain accuracy. In a different study, Sundara et al. (2006) compared simultaneous French-English child and adult bilinguals to age-matched French and English monolinguals to examine whether bilingual children were able to discriminate the English /d/ and /ð/ contrast. The results showed that, unlike the adult bilinguals who were comparable to the English monolinguals, the child bilinguals obtained significantly lower scores than the English monolingual children. Together with the effect of phonological complexity of the target language, the bilingual children’s L1 at the time of testing is crucial in understanding the degree of sensitivity to new L2 sounds, which provides some insights into ultimate L2 attainment.
The current study examined Korean adults and children with different amount of exposure to English. Korean stops produced by bilingual children were compared to those produced by Korean monolingual children to examine whether bilingual children who were exposed to two languages at an early stage of language development were able to make three-way stop contrasts in a native-like manner. More specifically, as the primary acoustic correlate that distinguished lenis and aspirated stops has been shown to change from VOT to F0 (i.e., lower F0 for lenis) among young Korean speakers (Kang & Guion, 2008), the use of VOT and F0 cues in bilinguals’ Korean as well as English production was examined as a means to assess the degree and direction of the L1-L2 interaction. The prediction is that the F0 difference in bilingual children’s production may not be as distinctive as those in monolingual children’s production as the F0 difference in English voiced and voiceless stops is not a prominent cue. Additionally, the younger the bilinguals are at the time of testing, the stronger the influence of L2 will be on which acoustic cue may be employed to categorize L1 stops.
The data were collected from four groups of Seoul Korean speakers varying in age and length of residence in the United States as well as two groups of age-matched native English speakers: KEA (Korean Experienced Adults), KIA (Korean Inexperienced Adults), KEC (Korean Experienced Children), KIC (Korean Inexperienced Children), NEA (Native English-speaking Adults), NEC (Native English-speaking Children).
As shown in Table 1, each group consists of four male and six female speakers. All the adult participants were undergraduate students in the Pacific Northwest, but the inexperienced adults and children arrived in the U.S. approximately 6 months before the time of testing. None of the inexperienced speakers reported to have lived in an English speaking country prior to coming to the U.S. The KEC group, the bilingual children who reported to use L2 (English) more than L1 on a daily basis, was all born and raised in the U.S. to a Korean-speaking family and they were exposed to both languages before the age of three. Age of acquisition marks the initial age at which the children attended an English-setting institution in the U.S. Information about children's language background was obtained via parental questionnaires.
The phrase-initial syllables produced in four-syllable Korean phrases were given to elicit two pairs of three Korean laryngeal stops (lenis, tense, aspirated) differing in places of articulation (POA). These pairs of POA-matched target consonants were embedded in the phrase-initial CVC syllable position (see Table 2). In order to reduce orthographic effects, stimuli were presented as images, and thus, needed to be easy and familiar to children. Although the vowels varied due to these constraints, our analysis is not compromised as each stop category is compared within the same vowel context.
|/kʌt.sɨm.ni.da/ ‘walk’||/k*ʌt.sɨm.ni.da/ ‘turn off’||/khʌt.sɨm.ni.da/ ‘grow’|
|/tat.sɨm.ni.da/ ‘shut’||/t*at.sɨm.ni.da/ ‘pick’||/that.sɨm.ni.da/ ‘burn’|
As for the English stimuli, three monosyllabic words with word-initial voiced stops ('bed, dog, good’) were matched in terms of the vowel quality and coda consonants to words with word-initial voiceless stops (‘pet, talk, cut’). Again, a picture task was given for speech sound elicitation.
The experiment was conducted in the phonetics lab at a University. Pictures representing the target stop consonants were randomly presented on a computer screen. The participants wore a head-mounted Shure microphone (Model SM 10A) and the speech was recorded on a flash digital recorder (Marantz PMD 670) at a 22.05 kHz sampling rate with 16 bit quantization. Both Korean and English stimuli were produced in isolation and were elicited three times after a familiarization process.
VOT was measured using Praat as the time duration from the beginning of the stop burst release to the onset of the first full pitch pulse of the initial vowel. Few of the pre-voiced (less than 5%) stops in English were replaced by the mean voiced VOT value of that speaker’s production. None of the Korean stops was pre-voiced.
Adult and Child groups were separately examined using ANOVAs. In case of a significant group and stop interaction, separate ANOVAs testing the effect of group on each stop type were conducted. The alpha level was adjusted for each analysis.
As for the temporal variability, the mean and standard deviation were calculated across the three repetitions of the VOT values. The standard deviation was then divided by the mean duration to produce a normalized measure of temporal variability (i.e., the coefficient of variation).
A three-way, Group (KE, KI) by Stop type (Korean lenis, tense, aspirated) by Place (alveolar, velar) univariate repeated measures analysis with the dependent measures of VOT was conducted separately for adult and child groups. No significant effects existed between Adult Groups [F(1, 18)=0.237, p>0.05], nor did a significant interaction of Stop and Group [F(2, 36)=2.163, p>0.05] or Stop, Place and Group [F(2, 36)=2.163, p>0.05]. As for the children, however, there were significant effects of Group [F(1, 18)=8.717, p<0.05] as well as an interaction between Stop and Group [F(2, 36)=4.133, p<0.05]. The lack of a Place effect or its interaction with other variables suggests that neither Adult nor Child Group made changes across Place. Pairwise comparisons (alpha level adjusted to 0.017 for 3 separate group comparisons) testing the effect of Group on each Stop showed a significant Group effect for tense and aspirated stops (p<0.017). Tukey’s HSD tests (p<0.05) returned significantly longer VOT values for tense and aspirated stops in the KEC than the KIC’s production. Especially, the KEC’s distinctively longer VOT for aspirated stops is illustrated in Figure 1.
As expected in younger speakers of Seoul Korean, VOT difference is shown to be no longer a primary cue to the aspirated and lenis stop contrast. Although to a lesser extent, however, a significant VOT difference was found between aspirated and lenis stops in KIC’s production, which will be discussed later in Discussion section.
In order to determine whether English voiced and voiceless stops were acquired in a native-like manner, the NEA and NEC groups were compared to the age-matched Korean groups, respectively. If the Group by Vowel interaction was significant, 3-way comparisons were conducted. The results showed a significant effect of Child Group [F(4, 52)=3.889, p<0.05]. More specifically, the KIC group produced voiced stops [F(2, 30)=6.191, p<0.05] with significantly longer VOT than the KEC groups (p<0.017). As for the native-like-ness, all four Korean groups appeared to have acquired a voicing contrast in English within the native speaker range.
Given that the Korean groups’ VOT values for voiceless stops fall within the range of English monolingual values, Korean and English stops were compared for each group to examine the extent to which newly acquired L2 stop categories are distinct from L1 categories. Pairwise comparisons in repeated measures for the KEA and KIA groups showed no significant difference among Korean aspirated, lenis and English voiceless stops. Voiced stops, however, were longer than Korean tense stops in the KEA groups’ production (p<0.017).
Children groups showed greater degree of independence across stop categories. Again, as can be seen in Figure 1, aspirated stops were substantially longer than lenis and voiceless stops in the KEC group’s production (p<0.017). As for the KIC group, aspirated stops were significantly longer than lenis but shorter than voiceless stops. The non-native-like voiced stops by the KIC group were also significantly longer than tense stops (p<0.017). Aspirated and lenis stops, however, were not statistically different. The VOT merging of these two categories is consistent with the sound change documented in previous research. In summary, the KIA group showed the least distinction among Korean and English stop categories followed by the KEA, KEC and KIC groups. Despite the same amount of L2 exposure (i.e., 6–7 months), the KIC group has created a greater number of independent L2 categories compared to the KIA group.
The results of a two-way, Group (KE, KI) by Stop type (Korean lenis, tense, aspirated) univariate repeated measures analyses with the dependent measures of normalized F0 returned a significant effect of Stop [F(2, 114)=12.696, p<0.05] but no main effect of Group or interaction between the two (p>0.05) for the adult groups. As for the child groups, there were significant main effects of Group [F(2, 114)=48.593, p<0.05], Stop [F(4, 226)=23.716, p<0.05] and an interaction [F(4, 226)=3.078, p<0.05]. Separate ANOVAs on each stop type returned a significant effect of Group for Korean tense [F(1, 38)=6.715, p<0.017] and aspirated stops [F(1, 38)=12.586, p<0.017]. As shown in Figure 2, the KIC group produced significantly higher F0 values for both tense and aspirated stops than the KEC group. Pairwise comparisons (Tuckey’s HSD tests, p<0.05) showed a similar pattern of results for the KEA, KIA and KIC groups in that two categories are formed along the F0 dimension: one being voiced-voiceless-lenis stops and the other Korean aspirated-tense stops. On the other hand, the KEC group exhibited a more fine-grained and gradient distinction across the stop types with English voiced showing significantly lower F0 values compared to English voiceless and Korean lenis stops. As in Figure 2, the F0 values for aspirated and tense stops by the KEC group were significantly higher, but to a much lesser degree than those by the KIC group.
As shown in Figure 3 above, each token of Korean stops produced by the KEC group distributed along the VOT dimension for both female and male groups. The difference is more evident for lenis stops. The temporal variability shown in Figure 4 indicates that the KEC group produced the same stops with more variable VOT across repetitions compared to the KIC group [F(1,38)=4.659, p<0.05]. The variability was especially high for lenis stops (p<0.05) in KEC production, suggesting lower stability for temporal contrasts across the repetitions.
The results of the inexperienced adult groups confirm previous observations about the loss of VOT and the development of F0 distinction for lenis vs. aspirated contrasts in adults production. Children, on the other hand, displayed a varying degree of L2 effects as well as some influence from the linguistic input from older speakers. The KIC group showed a greater use of F0 than VOT to distinguish aspirated from lenis stops, whereas the KEC group relied on VOT to a greater extent for the distinction.
Taken together with English stop categories, the Korean adults did not distinguish Korean lenis and English voiceless stops by either VOT or F0, suggesting that voiceless stops were identified as instances of an existing L1 category, lenis stops, regardless of the amount of L2 experience. However, the KIC group separated Korean lenis stops from English voiceless stops with significantly shorter VOT and their tense stops were uniquely distinctive from voiced stops. Within 6 months of L2 experience, the KIC group created greater independence between the two phonological systems than the KIA group.
Substantially longer VOT values for aspirated stops by the KEC group may be attributed to the Korean input by the older speakers of Seoul Korean whose contrast between lenis and aspirated stops is still signaled mainly by VOT. The fact that the KIC group also produced longer VOT for aspirated than lenis stops provides some evidence for a substantial effect of parental linguistic input. Ko (2018) found that VOT plays a significant role in discriminating lenis from aspirated stops in child-directed speech (CDS) and that the contribution of the F0 difference increases with greater gains in children's vocabulary. The author argued that VOT, as a more salient acoustic cue, was employed to improve perceptual discrimination of the two stop categories, suggesting that the input children receive from their parents is likely to differ from the observed tonogenetic sound change.
Although maternal VOT is found to be closely associated with the VOT production of bilingual (Stoehr et al., 2019), it was also shown that the on-going diachronic change is not consistently reflected in maternal linguistic input to Korean infants. Choi et al. (2019) found wide speaker variation in the maternal output of stop consonants: the degree of sound change adaptation varied greatly across mothers' of the infants. In fact, the temporal instability shown in higher variability for lenis stops in KEC group production may be reflecting the mixed nature of the L1 input in the L2 speaking setting. Note, however, that further studies on the characteristics of bilingual mother’s speech production are needed to determine whether the variation in the L1 stops is due to L1 attrition as a result of varying degrees of L2 experience or individual differences in the retention of VOT for the aspirated and lenis stop contrast.
Under the assumption that the disproportionate lengthening of VOT, coupled with significantly lower F0 for aspirated stops, is largely driven by factors other than the input from L1 users, the question is whether the lengthening of VOT for aspirated stops is 1) to dissimilate the L1 category from the newly created L2 category (i.e., merged lenis/voiceless versus aspirated stops) or 2) to enhance contrasts within the L1 stop system (i.e., aspirated versus lenis stops). More native-like production of the dominant L2 than the L1 for the KEC group may provide insight into the degree and direction of the interaction between the two languages. One proposed explanation is the phonetic category dissimilation: an existing L1 stop category shifting away from a newly established L2, or combined L1 and L2, stop category to maintain phonetic contrasts between the two sounds (Speech Learning Model by Flege, 1995, 2002, 2003).
The hypothesis on which the category dissimilation is based finds its evidence in bilingual’s L1 and L2 vowel system(s). Flege et al. (2003) found early bilinguals overshooting formant movements in English /eɪ/ to create greater phonetic distinction from the steady-state Italian /e/. Flege & Eefting (1988) found an instance of L1 sounds shifting away from the newly acquired L2. Early Spanish–English bilinguals produced shorter VOT for Spanish voiceless consonants compared to English monolinguals in order to maintain phonetic contrast between the two stop categories. In the same vein, aspirated stops may have been pushed further away from English voiceless stops to maintain sufficient contrast within the shared phonological space.
Alternatively, the KEC group may have employed VOT as the primary cue and F0 as a secondary cue to discriminate Korean stops as in the English two-way stop system. From the input at an early stage of the L1 and L2 acquisition, the bilingual children are likely to attend to the cross-linguistically similar acoustic feature. That is, heavy exposure to the short- and long-lag VOT distinction in English before the establishment of phonological representations of Korean three-way contrastive stops may have attracted bilingual children’s attention to the acoustic cues that are primarily used in the dominant language, English. It is also likely that VOT was exaggerated to compensate for the absence of a robust F0 difference. The fact that the bilingual children failed to rely on two different acoustic dimensions suggest that being exposed the L2 before the native-like attainment of L1 may lead to a delay in the acquisition of phonological representation of L1 speech sounds. The native-like production of English stops and the heavy use of VOT for Korean stops indicate that there is a stronger effect of the L2 on L1 than the other way around.
Despite delay in the acquisition of L1 stop contrasts, our results confirm the previous findings, showing relatively early acquisition of Korean fortis stops. In Jun (2007), Korean infants from two months to twenty two months were shown to acquire tense stops much earlier than lenis and aspirated stops as VOT is the sole acoustic parameter that distinguishes tense from the other two stops. Choi et al. (2019) also found that infants as young as five to six months can discriminate fortis from aspirated stops, followed by fortis and lenis stops at around eight to nine months. It was not until ten months of age that Korean infants began to employ the relevant acoustic cues to discriminate lenis and aspirated stops. Notwithstanding the high variability, the KEC group produced lenis stops in a more native-like manner than aspirated stops in terms of the mean VOT values, suggesting that learning is taking place in the right direction. An extensive exposure to L2 English voiceless stops is likely to have facilitated the learning of phonetically similar L1 stops, resulting in a combined category of L1 and L2 vowels, each like that of a monolingual. However, the results should be interpreted with caution as one of the important cues, the amplitude difference in first and second harmonics (H1-H2), has not been taken into account.