The dynamical system view gives very clean conditions for looped transformer to be stable https://twitter.com/hayden_prairie/status/2044453231913537927
View on X →Context: Quoting @hayden_prairie: "We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, @hayden_prairie: "We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, https://x.com/hayden_prairie/status/2044453231913537927/photo/1" Tweet: The dynamical system view gives very clean conditions for looped transformer to be stable @hayden_prairie: "We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, https://x.com/hayden_prairie/status/2044453231913537927/photo/1"
| Time | Views | Likes | Bookmarks | RTs | Replies |
|---|---|---|---|---|---|
| 11:00 AM UTC | +262 | +4 | — | — | — |
| 10:50 AM UTC | +254 | +1 | +3 | — | — |
| 10:40 AM UTC | +255 | +2 | +1 | — | — |
| 10:30 AM UTC | +237 | +3 | +3 | — | — |
| 10:20 AM UTC | +201 | +2 | +1 | +1 | — |
| 10:10 AM UTC | +174 | +1 | — | — | — |
| 10:00 AM UTC | +204 | — | — | — | — |
| 9:50 AM UTC | +39 | — | — | — | — |
| 9:40 AM UTC | +366 | +6 | +3 | — | — |
| 9:30 AM UTC | +152 | +1 | +1 | — | — |