2025-09-14 01:49:00
github.com
π Technical Report: Chinese | English
π Arxiv: arXiv:2509.05276
π§© Models: Available Models
Inspired by brain mechanisms, SpikingBrain integrates hybrid efficient attention, MoE modules, and spike encoding into its architecture, supported by a universal conversion pipeline compatible with the open-source model ecosystem. This enables continual pre-training with less than 2% of the data while achieving performance comparable to mainstream open-source models. We further adapt frameworks, operators, parallel strategies, and communication primitives for non-NVIDIA (MetaX) clusters, ensuring stable large-scale training and inference. SpikingBrain achieves over 100Γ speedup in TTFT for 4M-token sequences, while spiking delivers over 69% sparsity at the micro level. Combined with macro-level MoE sparsity, these advances provide valuable guidance for the design of next-generation neuromorphic chips.
This repository provides the full implementation and weights of SpikingBrain-7B, including the HuggingFace version, vLLM inference version, and quantized version, enabling flexible deployment and research across different scenarios.
SpikingBrain-7B/
βββ hf_7B_model/ # HuggingFace version
βββ run_model/ # Model run examples
βββ vllm_hymeta/ # vLLM plugins and inference support
βββ W8ASpike/ # Quantized inference version
βββ setup.py
βββ requirements.txt
βββ README.md
vllm-hymeta is the plugin adaptation of HyMeta (Hybrid Models built on MetaX GPUs) for the vLLM inference framework, providing efficient inference support on NVIDIA GPUs.
By leveraging the plugins mechanism in vLLM, hardware backends can be integrated in a modular fashion, bringing the following benefits:
-
Decoupled codebase: Backend-specific code remains independent, keeping the vLLM core cleaner.
-
Reduced maintenance cost: vLLM developers can focus on general functionality without being affected by backend-specific implementations.
-
Faster integration: New backends can be integrated quickly and evolve independently with less engineering effort.
sudo docker run -itd \
--entrypoint /bin/bash \
--network host \
--name hymeta-bench \
--shm-size 160g \
--gpus all \
--privileged \
-v /host_path:/container_path \
docker.1ms.run/vllm/vllm-openai:v0.10.0
git clone https://github.com/BICLab/SpikingBrain-7B.git
cd SpikingBrain-7B
pip install .
Recommended environment for installing vllm-hymeta on NVIDIA GPUs:
decorator
pyyaml
scipy
setuptools
setuptools-scm
flash_attn==2.7.3
flash-linear-attention==0.1
vllm==0.10.0
torch==2.7.1
You can serve a model with vLLM in the simplest way using the following command:
vllm serve your_model_path> \
--served-model-name model_name> \
--gpu-memory-utilization ratio> \
--block-size size> \
--dtype bfloat16 \
--port port_number>
You may also set --tensor-parallel-size
and --pipeline-parallel-size
when launching if you want to run with multiple GPUs.
W8ASpike is the quantized inference version of SpikingBrain-7B, aiming to reduce inference cost under low-precision settings and explore the potential of Spiking Neural Networks (SNNs).
The current implementation adopts pseudo-spiking, where activations are approximated as spike-like signals at the tensor level, rather than true asynchronous event-driven spiking on neuromorphic hardware.
-
Pseudo-spiking: Efficient approximation at the tensor level, suitable for prototyping and research.
-
True-spiking: Requires asynchronous hardware and event-driven operator support, which is beyond the scope of this repository.
The activation spike encoding process here is inspired by the pseudo-spiking interfaces from BICLab/Int2Spike. For additional PyTorch-based spiking interfaces, please refer to the Int2Spike library.
The model weights are hosted on ModelScope. Please select the appropriate version based on your needs:
Example scripts are provided in run_model/
for running the model with the released checkpoints.
Table 1: Performance evaluation of the SpikingBrain-7B pre-trained model. All models are tested with the HuggingFace framework and evaluated using a perplexity-based method. Except for Qwen2.5, the other baselines are trained on limited Chinese data, resulting in clear disadvantages on CMMLU and C-Eval.
Table 2: Performance evaluation of the SpikingBrain-76B pre-trained model. All models are tested with the vLLM framework and evaluated using a perplexity-based method. Except for Qwen2.5, the other baselines are trained on limited Chinese data, resulting in clear disadvantages on CMMLU and C-Eval.
If you find our work useful, please consider citing SpikingBrain:
@article{pan2025spikingbrain,
title={SpikingBrain Technical Report: Spiking Brain-inspired Large Models},
author={Pan, Yuqi and Feng, Yupeng and Zhuang, Jinghao and Ding, Siyu and Liu, Zehao and Sun, Bohan and Chou, Yuhong and Xu, Han and Qiu, Xuerui and Deng, Anlin and others},
journal={arXiv preprint arXiv:2509.05276},
year={2025}
}
Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.
Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!
Help Power Techcraticβs Future – Scan To Support
If Techcraticβs content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether itβs for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.
As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. Iβm deeply grateful for your support, it truly means the world to me! Thank you!
BITCOIN bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge Scan the QR code with your crypto wallet app |
DOGECOIN D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA Scan the QR code with your crypto wallet app |
ETHEREUM 0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a Scan the QR code with your crypto wallet app |
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.