Top
Best
New

Posted by redm 8 hours ago

BitNet: 100B Param 1-Bit model for local CPUs(github.com)
254 points | 121 commentspage 2
algoth1 7 hours ago|
Headline: 100B. Falcon 3 family: 10B. An order of magnitude off
yubainu 5 hours ago||
That's amazing. I'm developing sub-tools for LLM as a hobby on an RTX3050 (4GB), but I can only run lightweight models like 1B and 2B. Is it possible to use your tool to make the CPU take over some of the VRAM movement?
WhitneyLand 7 hours ago||
If they had a big result like, native 1.58-bit quality clearly matches top peers, they would be saying that prominently in the repo.

The engineering/optimization work is nice, but this is not what people have been waiting for, as much as, can’t the Bitnet idea that seemed promise really deliver in a competitive way.

StilesCrisis 6 hours ago||
The output from this model is horrible! It's GPT-2 level babble and repeats entire paragraphs verbatim. It also reuses the same fake citation `(Jenkins, 2010)` over and over again. From the start of their video (which scrolls by fast enough that you don't see the slop clearly...)

``` Ecosystem Services and their impact on the Ecosystem

Ecosystem services refer to the services provided by ecosystems to the human society. These services include water, air, energy, nutrients, and soil (Jenkins, 2010). For instance, water is the most important service provided by an ecosystem and it helps in the conservation of water, irrigation and sanitation (Jenkins, 2010). On the other hand, air provides the oxygen needed for life.

The water cycle is a significant ecosystem service because it involves the cycling of water among the different parts of an ecosystem. It also involves the movement of water through the atmosphere, from one place to another. It is also the process of evaporation and condensation of water from the atmosphere. It also involves the movement of water from the air to the soil and water into the oceans.

The water cycle is a significant ecosystem service because it involves the cycling of water among the different parts of an ecosystem. It also involves the movement of water through the atmosphere, from one place to another. It is also the process of evaporation and condensation of water from the atmosphere. It also involves the movement of water from the air to the soil and water into the oceans. ```

naasking 5 hours ago|
It's a two year old base model that's only 3B parameters, trained on only 100B tokens. It's still a research project at this point.
gardnr 5 hours ago||
The new model they just released has impressive benchmark results: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T

Except on GSM8K and math...

naasking 4 hours ago||
Thanks for the link, the GSM8K result actually leads the pack in that table, but math is indeed underwhelming. Qwen 2.5 is in the lead, but bitnet isn't far behind and it takes 1/6th as much memory during inference, and was trained on less than 1/4 the number of tokens. Pretty cool.
QuadmasterXLII 8 hours ago||
headline hundred billion parameter, none of the official models are over 10 billion parameters. Curious.
Tuna-Fish 8 hours ago|
The project is an inference framework which should support 100B parameter model at 5-7tok/s on CPU. No one has quantized a 100B parameter model to 1 trit, but this existing is an incentive for someone to do so.
philvas 7 hours ago||
steve jobs would have loved the microsoft repo with demo on mac
a1o 5 hours ago||
> A demo of bitnet.cpp running a BitNet b1.58 3B model on Apple M2

With how much RAM? How much storage does it requires?

janalsncm 3 hours ago||
They have a demo video in the readme. I think they are trying to convey that BitNet is fast, which it is. But it is worth taking a moment to pause and actually see what the thing is doing so quickly.

It seems to keep repeating that the water cycle is the main source of energy for all living things on the planet and then citing Jenkins 2010. There are also a ton of sentence beginning with “It also…”

I don’t even think it’s correct. The sun is the main source of energy for most living things but there’s also life near hydrothermal vents etc.

I don’t know who Jenkins is, but this model appears to be very fond of them and the particular fact about water.

I suppose fast and inaccurate is better than slow and inaccurate.

bee_rider 7 hours ago||
What’s the lower limit on the number of bits per parameter? If you use CSR-style sparse matrices to store the weights can it be less than 1?
syntaxing 8 hours ago|
Misleading title but this is pretty exciting. Interesting how this is based on llama cpp. Its nice to see some momentum since they released the paper in 2023
More comments...