Community Articles

via Decrypt · By Decrypt Editorial

Google's DiffusionGemma AI Hits 1,000 Tokens Per Second—And It's Free

DE
Decrypt Editorial
(10:01 PM UTC)
1 min read
JM
Approved byJames Mitchell
1484 views
0 comments

In brief

  • Google released DiffusionGemma, a free open-weight model that generates entire 256-token blocks simultaneously via text diffusion—hitting over 1,000 tokens per second on an NVIDIA H100, four times faster than standard autoregressive models.
  • The custom drafter module DiffusionGemma needs for local inference doesn't exist in any public runtime yet—not in mlx-lm, not in LM Studio—making it…

Add COINOTAG as a Preferred Source

Add COINOTAG to your preferred sources in Google News and Search to see our coverage first.

Add on Google

Source

Decrypt Editorial · Decrypt

Read original →

Comments
Comments
Other Community Articles