Nf4.rar

: RNF4 mediates the degradation of the PML-RARα fusion protein.

: To reduce the memory footprint of LLMs (like Llama) enough to fit on a single GPU (e.g., a 24GB RTX 3090) while maintaining full 16-bit performance. NF4.rar

: Compresses 16-bit weights to 4 bits, effectively reducing VRAM usage by ~75% (e.g., a 65B parameter model can be loaded with ~35GB instead of ~130GB). : RNF4 mediates the degradation of the PML-RARα

: Recent research (April 2026) has further optimized this by creating Fast NF4 Dequantization Kernels that achieve 2.0–2.2× speedups on NVIDIA GPUs. ⚠️ Alternative Interpretation NF4.rar

The paper explains why NF4 is superior to standard 4-bit integers (Int4) or floating-point (Float4) formats: