☆ Yσɠƚԋσʂ ☆@lemmy.ml to Technology@lemmy.mlEnglish · 9 months ago1-bit LLM performs similarly to full-precision Transformer LLMs with the same model size and training tokens but is much more efficient in terms of latency, memory, throughput, and energy consumption.arxiv.orgexternal-linkmessage-square4fedilinkarrow-up122arrow-down17 cross-posted to: hackernews@lemmy.smeargle.fans
arrow-up115arrow-down1external-link1-bit LLM performs similarly to full-precision Transformer LLMs with the same model size and training tokens but is much more efficient in terms of latency, memory, throughput, and energy consumption.arxiv.org☆ Yσɠƚԋσʂ ☆@lemmy.ml to Technology@lemmy.mlEnglish · 9 months agomessage-square4fedilink cross-posted to: hackernews@lemmy.smeargle.fans
minus-squareQ*Bert Reynolds@sh.itjust.workslinkfedilinkarrow-up4·9 months agoSays 1-bit then goes on to describe inputs as -1, 0, or 1. That’s 2-bit. Am I missing something here?
minus-squarewill_a113@lemmy.mllinkfedilinkEnglisharrow-up2·9 months agoIt’s actually 1.58bits weirdly. The addition of 0 here was the significant change/improvement in this experiment. The paper isn’t too dense and has some decent tables that explain things fairly accessibly.
Says 1-bit then goes on to describe inputs as -1, 0, or 1. That’s 2-bit. Am I missing something here?
It’s actually 1.58bits weirdly. The addition of 0 here was the significant change/improvement in this experiment. The paper isn’t too dense and has some decent tables that explain things fairly accessibly.