NVIDIA Proves 4-Bit NVFP4 Pretraining Works at 10-Trillion-Token Scale
NVIDIA has demonstrated the effectiveness of 4-bit NVFP4 precision in training a 12-billion parameter model on 10 trillion tokens, achieving accuracy comparable to FP8. This breakthrough could significantly reduce the computational resources required for AI model training, making it more accessible and efficient.