표 4. | Table 4. Ablation study 결과 | Ablation study results

Prosody embedding MixLN Mel2VAD predictor CER(↓,%) SECS(↑) F0 PCC(↑) Energy PCC(↑)
Seen Unseen Seen Unseen Seen Unseen Seen Unseen
× × × 28.7 27.6 0.762 0.767 0.739 0.742 0.970 0.971
× × 28.5 27.3 0.762 0.767 0.739 0.743 0.970 0.971
× 23.9 22.3 0.750 0.758 0.741 0.745 0.971 0.973
× × 27.0 26.1 0.765 0.769 0.740 0.743 0.971 0.972
23.7 22.1 0.751 0.759 0.745 0.747 0.971 0.973
MixLN, mix layer normalization; CER, character error rate; SECS, speaker embedding cosine similarity; F0 PCC, F0 pearson correlation coefficient; Energy PCC, energy pearson coefficient.