Horace He@cHHillee·Reply
To stretch this analogy further, when accelerators (humans) are severely limited by bandwidth you have no choice but to move everything into SRAM (make everything fully autonomous). However, this prohibits say, keeping kv-cache in DRAM (having humans contribute). 2/4