-
Notifications
You must be signed in to change notification settings - Fork 4
/
nets.txt
142 lines (142 loc) · 25.1 KB
/
nets.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| Name | Train Info | Result | Comments |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| 1.5.0 | Ancient Arch | base | Network for v1.5.0, before the new Arch |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net003 | 768->512x2->SCReLU->1 | -82.2 vs base | New arch |
| | 100 epochs | | Ciekce says it's "giga-overfitted" |
| | wdl=0.12 | | |
| | lr=0.01 | | |
| | lr_drop at 30 | | |
| | 35M 5k-6k nodes self-play data | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net004 | Same as net003, but 75 epochs & wdl=0.15 & uses CReLU | -74.4 vs base | There's hope if we have more data |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net005 | Same as net004, but 200 epochs & 256 hidden neurons | STC vs base | Ciekce was probably correct |
| | | 74 - 105 - 121 [0.448] 300 | Probably more data will help |
| | | -36.0 +/- 30.5, LOS: 1.0 % | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net006 | 768->384x2->CReLU->1 | STC vs base | First improvement over master. |
| | 60 epochs | 171 - 125 - 304 [0.538] 600 | Although performs relatively worse as tc increases. |
| | wdl=0.15 | 26.7 +/- 19.5, LOS: 99.6 % | |
| | lr=0.01 | | |
| | lr_drop at 30 | | |
| | 150M 5.5k nodes self-play data | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net007a | 768->384x2->CReLU->1 | STC vs base | Trying some new params |
| | 50 epochs | 77 - 35 - 89 [0.604] 201 | Much stronger than previous arch |
| | wdl=0.25 | 73.7 +/- 36.1, LOS: 100.0 % | |
| | lr=0.002 | | |
| | lr *= 0.10 every 30 epochs | LTC vs base | |
| | data reshuffled from net006 | 38 - 19 - 58 [0.583] 115 | |
| | | 57.9 +/- 44.9, LOS: 99.4 % | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net007b | 768->256x2->CReLU->1 | STC vs base | Trying a smaller network to see if 384 was necessary |
| | 70 epochs | 42 - 25 - 26 [0.591] 93 | Seems like not enough data for 384 yet |
| | wdl=0.35 | 64.2 +/- 61.2, LOS: 98.1 % | |
| | lr=0.004 | | |
| | lr *= 0.10 every 30 epochs | STC vs 007a | |
| | data from net007a | 234 - 201 - 565 [0.516] 1000 | |
| | | 11.5 +/- 14.2, LOS: 94.3 % | |
| | | | |
| | | LTC vs 007a | |
| | | 67 - 52 - 181 [0.525] 300 | |
| | | 17.4 +/- 24.8, LOS: 91.5 % | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net008a | 768->384x2->CReLU->1 | STC vs 007b | Attempting lower lr and wdl. |
| | 40 epochs | 89 - 106 - 231 [0.480] 426 | |
| | wdl=0.15 | -13.9 +/- 22.3, LOS: 11.2 % | |
| | lr=0.001 | | |
| | lr *= 0.35 every 15 epochs | | |
| | data from net007a | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net008b | 768->256x2->CReLU->1 | STC vs 007b | 008a but smaller |
| | 60 epochs | 284 - 174 - 542 [0.555] 1000 | |
| | wdl=0.15 | 38.4 +/- 14.5, LOS: 100.0 % | |
| | lr=0.002 | | |
| | lr *= 0.35 every 15 epochs | | |
| | data from net007a | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net009a | 768->384x2->CReLU->1 | STC vs 008b | Probably something went wrong in training |
| | 35 epochs | 21 - 48 - 74 [0.406] 143 | |
| | wdl=0.15 | -66.4 +/- 39.6, LOS: 0.1 % | |
| | lr=0.001 | | |
| | lr *= 0.3 every 15 epochs | | |
| | 177M 6k nodes self-play data from 008b | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net009b | 768->256x2->CReLU->1 | STC vs 008b | |
| | 50 epochs | 12 - 26 - 46 [0.417] 84 | |
| | wdl=0.15 | -58.5 +/- 50.1, LOS: 1.2 % | |
| | lr=0.001 | | |
| | lr *= 0.4 every 15 epochs | | |
| | data from net009a | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net010a | net009a retrained, shuffled, wdl=0.05 | STC vs 008b | Bad data most likely |
| | | 78 - 112 - 234 [0.460] 424 | |
| | | -27.9 +/- 22.1, LOS: 0.7 % | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net010b | 010a with 256 neurons | STC vs 008b | |
| | | 7 - 19 - 43 [0.413] 69 | |
| | | -61.0 +/- 50.1, LOS: 0.9 % | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net011 | 768->384x2->CReLU->1 | STC vs 008b | |
| | 39 epochs | 54 - 109 - 113 [0.400] 276 | |
| | wdl=0.20 | -70.2 +/- 31.7, LOS: 0.0 % | |
| | lr=0.001 | | |
| | lr *= 0.3 every 15 epochs | | |
| | new 180M 6k node self-play data | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net013 | 768->256x2->CReLU->1 | STC vs 008b | Likely overfitting |
| | 48 epochs | 214 - 299 - 409 [0.454] 922 | |
| | wdl=0.25 | -32.1 +/- 16.7, LOS: 0.0 % | |
| | lr=0.001 | | |
| | lr *= 0.1 every 30 epochs | | |
| | new 294M 5k node self-play data from 008b | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net014 | 768->384x2->CReLU->1 | Epoch 60: | |
| | wdl=0.30 | STC vs 008b | |
| | lr=0.001 | 86 - 99 - 145 [0.480] 330 | |
| | lr *= 0.1 every 30 epochs | -13.7 +/- 28.1, LOS: 17.0 % | |
| | shuffled from net013 | | |
| | | Epoch 65: | |
| | | STC vs 008b | |
| | | 33 - 49 - 58 [0.443] 140 | |
| | | -39.9 +/- 44.3, LOS: 3.9 % | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net015 | 768->384x2->CReLU->1 | STC vs 008b | |
| | 50 epochs | 207 - 225 - 348 [0.488] 780 | |
| | wdl=0.15 | -8.0 +/- 18.1, LOS: 19.3 % | |
| | lr=0.0011 | | |
| | lr *= 0.24 every 20 epochs | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net016 | 768->384x2->CReLU->1 | STC vs 008b | |
| | 35 epochs | 328 - 338 - 554 [0.496] 1220 | |
| | wdl=0.25 | -2.8 +/- 14.4, LOS: 34.9 % | |
| | lr=0.001 | | |
| | lr *= 0.1 every 15 epochs | | |
| | added another 175M depth=8 data | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net017 | 768->384x2->CReLU->1 | STC vs 008b | |
| | 40 epochs | 230 - 248 - 342 [0.489] 820 | |
| | wdl=0.30 | -7.6 +/- 18.1, LOS: 20.5 % | |
| | lr=0.001 | | |
| | lr *= 0.3 every 16 epochs | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| net028 (xuebeng) | 768->512x2->SCReLU->1 | LTC vs 008b | Finally! |
| | 50 epochs | 616 - 497 - 1088 [0.527] 2201 | |
| | wdl=0.35 | 18.8 +/- 10.3, LOS: 100.0 %, DrawRatio: 49.4 % | |
| | lr=0.001 | | |
| | lr *= 0.1 every 20 epochs | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| v2_03 | 768->512x2->CReLU->1 | STC vs xuebeng | idk what went wrong |
| | 45 epochs | 130 - 213 - 250 [0.430] 593 | |
| | wdl=0.40 | -49.0 +/- 21.3, LOS: 0.0 %, DrawRatio: 42.2 % | |
| | lr=0.001 | | |
| | lr *= 0.3 every 15 epochs | | |
| | New 500M depth=8 data from net028 | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+
| v2_10 | 768->768x2->CReLU->1 | LTC vs xuebeng | |
| | 50 epochs | +1.6 elo | |
| | wdl=0.25 | | |
| | 1.3B combined data | | |
+------------------+-------------------------------------------------------+------------------------------------------------+------------------------------------------------------+