표 5. | Table 5. 트랜스포머(T), 컨포머(C) 및 E-브랜치포머(E)의 세가지 인코더 모델에 따른 인식 속도(xRT) | Recognition Speed (xRT) by encoder model: Transformer (T), Conformer (C), and E-Branchformer (E)

Model Encoder only, beam=1 Encoder&decoder, beam=3
10s 20s 10s 20s
T 0.71 (±0.01) 1.13 (±0.01) 2.50 (±0.05) 3.79 (±0.05)
C 0.81 (±0.00) 1.23 (±0.01) 2.57 (±0.02) 3.85 (±0.02)
E 0.86 (±0.01) 1.27 (±0.03) 2.61 (±0.01) 3.92 (±0.01)
The recognition speed was measured using 10 randomly selected utterances from the KMSAV dataset, each approximately 10 and 20 seconds in length. The first two columns show the recognition speed with the CTC weight set to 1, using only the encoder and minimizing the computational load by setting the beam size to 1. The last two columns show the recognition speed in typical recognition scenarios.