Community Articles
via Decrypt · By Decrypt Editorial
Google's DiffusionGemma AI Hits 1,000 Tokens Per Second—And It's Free
DE
Decrypt Editorial(10:01 PM UTC)
1 min read
1484 views
0 commentsIn brief
- Google released DiffusionGemma, a free open-weight model that generates entire 256-token blocks simultaneously via text diffusion—hitting over 1,000 tokens per second on an NVIDIA H100, four times faster than standard autoregressive models.
- The custom drafter module DiffusionGemma needs for local inference doesn't exist in any public runtime yet—not in mlx-lm, not in LM Studio—making it…
Add COINOTAG as a Preferred Source
Add COINOTAG to your preferred sources in Google News and Search to see our coverage first.
Add on GoogleComments
Comments
Other Community Articles