표 5. | Table 5. 트랜스포머(T), 컨포머(C) 및 E-브랜치포머(E)의 세가지 인코더 모델에 따른 인식 속도(xRT) | Recognition Speed (xRT) by encoder model: Transformer (T), Conformer (C), and E-Branchformer (E)
| Model | Encoder only, beam=1 | Encoder&decoder, beam=3 |
| 10s | 20s | 10s | 20s |
| T | 0.71 (±0.01) | 1.13 (±0.01) | 2.50 (±0.05) | 3.79 (±0.05) |
| C | 0.81 (±0.00) | 1.23 (±0.01) | 2.57 (±0.02) | 3.85 (±0.02) |
| E | 0.86 (±0.01) | 1.27 (±0.03) | 2.61 (±0.01) | 3.92 (±0.01) |
The recognition speed was measured using 10 randomly selected utterances from the KMSAV dataset, each approximately 10 and 20 seconds in length. The first two columns show the recognition speed with the CTC weight set to 1, using only the encoder and minimizing the computational load by setting the beam size to 1. The last two columns show the recognition speed in typical recognition scenarios.