Table 1. The range of hyperparameters in the transformer encoder network
Hyperparameters
Values
output size
256
input layer
2d conv
normalized before
True
false
attention heads
1
2
4
8
linear units
512
1,024
2,048
4,096
num blocks
2
4
6
8
12
dropout rate
0.0
0.1
0.2
0.3
0.4
positional dropout rate
0.1
attention dropout rate
0.0
0.1
0.2
0.3
0.4