• About TC
  • Affiliate Disclaimer
  • Privacy Policy
  • TOS
  • Contact
Tuesday, June 10, 2025
Techcratic
  • TC
  • AI
    Artificial Intelligence

    10 Awesome OCR Models for 2025

    Artificial Intelligence

    5 Error Handling Patterns in Python (Beyond Try-Except)

    Artificial Intelligence

    Top 5 Alternative Data Career Paths and How to Learn Them for Free

    Artificial Intelligence

    Implementing Machine Learning Pipelines with Apache Spark

    Artificial Intelligence

    Learn Power BI for Free This Week

    Artificial Intelligence

    Build GraphRAG applications using Amazon Bedrock Knowledge Bases

    Artificial Intelligence

    How to Use Deep Research Like a Pro

    Artificial Intelligence

    World-Consistent Video Diffusion With Explicit 3D Modeling

    Artificial Intelligence

    Deploy Amazon SageMaker Projects with Terraform Cloud

  • Crypto
    Bitcoin’s $200K Price Forecast ‘Conservative,’ Says Bernstein

    Bitcoin’s $200K Price Forecast ‘Conservative,’ Says Bernstein

    Ripple Backs XRP Ledger Startups in Japan With up to $200K per Project

    Ripple Backs XRP Ledger Startups in Japan With up to $200K per Project

    Publicly Traded Firm KULR Acquires 118.6 Bitcoin, Treasury Reaches 920 BTC

    Publicly Traded Firm KULR Acquires 118.6 Bitcoin, Treasury Reaches 920 BTC

    ETF Weekly Flows: $129 Million Outflow for Bitcoin and $281 Million Inflow for Ether

    ETF Weekly Flows: $129 Million Outflow for Bitcoin and $281 Million Inflow for Ether

    DOGE Gets Distilled: Heritage Unleashes Dogecoin-Themed Bourbon

    DOGE Gets Distilled: Heritage Unleashes Dogecoin-Themed Bourbon

    Crypto ETFs centralize what was meant to be decentralized.

    Crypto ETFs centralize what was meant to be decentralized.

    Crypto Lost $1.64 Billion to Hackers in Q1 2025

    Why Is Crypto Down Today? – June 9, 2025

    The Blockchain Group Unveils $343 Million Capital Program to Boost Bitcoin Treasury Strategy

    The Blockchain Group Unveils $343 Million Capital Program to Boost Bitcoin Treasury Strategy

    Bitcoin Bull Cycle is Over: CryptoQuant CEO

    CEX Volumes Hit 2020 Lows as Market Shifts to HODL Mode

  • Cybersecurity
    Cybersecurity

    Malicious Browser Extensions Infect 722 Users Across Latin America Since Early 2025

    Cybersecurity

    Empower Users and Protect Against GenAI Data Loss

    Cybersecurity

    Popular Chrome Extensions Leak API Keys, User Data via HTTP and Hardcoded Credentials

    Cybersecurity

    Critical Cisco ISE Auth Bypass Flaw Impacts Cloud Deployments on AWS, Azure, and OCI

    Cybersecurity

    Why Traditional DLP Solutions Fail in the Browser Era

    Cybersecurity

    HPE Issues Security Patch for StoreOnce Bug Allowing Remote Authentication Bypass

    Cybersecurity

    Critical 10-Year-Old Roundcube Webmail Bug Allows Authenticated Users Run Malicious Code

    Cybersecurity

    Android Trojan Crocodilus Now Active in 8 Countries, Targeting Banks and Crypto Wallets

    Cybersecurity

    Microsoft and CrowdStrike Launch Shared Threat Actor Glossary to Cut Attribution Confusion

  • Deals
    GD90 Mini PC, 12th Gen Intel i9-12900HK(14C/20T), 32GB DDR4 RAM 1TB SSD Desktop Mini…

    GD90 Mini PC, 12th Gen Intel i9-12900HK(14C/20T), 32GB DDR4 RAM 1TB SSD Desktop Mini…

    Hitachi MAF0058 Mass Air Flow Sensor

    Hitachi MAF0058 Mass Air Flow Sensor

    Canon PG-245 Genuine Black Ink Cartridge, Compatible with iP2820,…

    Canon PG-245 Genuine Black Ink Cartridge, Compatible with iP2820,…

    GTRACING Gaming Chair with Footrest Speakers Video Game Chair Bluetooth Music Heavy Duty…

    GTRACING Gaming Chair with Footrest Speakers Video Game Chair Bluetooth Music Heavy Duty…

    RoboCop Rogue City (PS5)

    RoboCop Rogue City (PS5)

    My Universe: Puppies and Kittens – PlayStation 4

    My Universe: Puppies and Kittens – PlayStation 4

    Disney’s Little Mermaid: Ariel’s Undersea Adventure – Nintendo DS (Renewed)

    Disney’s Little Mermaid: Ariel’s Undersea Adventure – Nintendo DS (Renewed)

    Family Game Pack 2001- PlayStation (Renewed)

    Family Game Pack 2001- PlayStation (Renewed)

    StarTech.com Cisco GLC-T Compatible SFP Module – 1000BASE-T – SFP to RJ45 Cat6/Cat5e -…

    StarTech.com Cisco GLC-T Compatible SFP Module – 1000BASE-T – SFP to RJ45 Cat6/Cat5e -…

  • Gaming
    Apple’s new UI for Macs and iPhones ‘combines the optical qualities of glass with a fluidity only Apple can achieve,’ but it sure looks like an awful lot like Windows Vista circa 2007

    Apple’s new UI for Macs and iPhones ‘combines the optical qualities of glass with a fluidity only Apple can achieve,’ but it sure looks like an awful lot like Windows Vista circa 2007

    HYPERCHARGE UNBOXED – CUSTOMIZATIONS

    HYPERCHARGE UNBOXED – CUSTOMIZATIONS

    Scars Above: First 10 Minutes of Gameplay | New Sci-Fi Action Game

    Scars Above: First 10 Minutes of Gameplay | New Sci-Fi Action Game

    2 Years with Steam Deck: My Honest Review and Experiences

    2 Years with Steam Deck: My Honest Review and Experiences

    Dune: Awakening buried treasure: How to find it and get a Sandbike Scanner

    Dune: Awakening buried treasure: How to find it and get a Sandbike Scanner

    LittleBigPlanet 3 – Five Nights at Freddy's The Movie Full Trailer  – LBP3 FNAF Animation

    LittleBigPlanet 3 – Five Nights at Freddy's The Movie Full Trailer – LBP3 FNAF Animation

    RoboCop: Rogue City – Mission 1 All Evidence and Rank A (Officer of the month Achievement)

    RoboCop: Rogue City – Mission 1 All Evidence and Rank A (Officer of the month Achievement)

    Thymesia | Boss Fight | Mutated Odur

    Thymesia | Boss Fight | Mutated Odur

    The Callisto Protocol showed me what makes a GOOD GAME (Raptor Review)

    The Callisto Protocol showed me what makes a GOOD GAME (Raptor Review)

  • Tesla
    Model Y Mud Flaps for Tesla Model Y Accessories 2024 Mud Flaps Tire Splash Guards fit…

    Model Y Mud Flaps for Tesla Model Y Accessories 2024 Mud Flaps Tire Splash Guards fit…

    Tesla CCS Adapter, Fast and Efficient Charging Adapter for Tesla Model 3 Y S X, Portable…

    Tesla CCS Adapter, Fast and Efficient Charging Adapter for Tesla Model 3 Y S X, Portable…

    4 PCS LED Reverse Lights, 4014 45SMD 6500K 800LM High Bright Brake Light Turn Signal…

    4 PCS LED Reverse Lights, 4014 45SMD 6500K 800LM High Bright Brake Light Turn Signal…

    4 Pack Trailer Ball Cover, 2.36In x 2.24In x 1.97In Waterproof Dustproof Towing Hitch…

    4 Pack Trailer Ball Cover, 2.36In x 2.24In x 1.97In Waterproof Dustproof Towing Hitch…

    ClimaTex Heavy Duty Car, Truck, Van, and SUV Automotive Floor Mat for Floor Protection,…

    ClimaTex Heavy Duty Car, Truck, Van, and SUV Automotive Floor Mat for Floor Protection,…

    2 Pcs Tow Hook Covers Compatible with Tesla Cybertruck Accessories 2024 2025 (Red)

    2 Pcs Tow Hook Covers Compatible with Tesla Cybertruck Accessories 2024 2025 (Red)

    MAXDOM Under Seat Storage Fit for 2024+ Tesla Cybertruck Rear Underseat Organizer Box…

    MAXDOM Under Seat Storage Fit for 2024+ Tesla Cybertruck Rear Underseat Organizer Box…

    Car USB Hub Charger for Tesla Model Y 2021-2024 and Model 3 2021-2023,Fast…

    Car USB Hub Charger for Tesla Model Y 2021-2024 and Model 3 2021-2023,Fast…

    CAR GUYS Tire Shine Spray | High Gloss & Satin Finish | Non-Greasy, UV Protection,…

    CAR GUYS Tire Shine Spray | High Gloss & Satin Finish | Non-Greasy, UV Protection,…

  • UFO
    CINOTON 160W UFO LED High Bay Light, Aluminum LED Shop Lights with 24000LM, 5000K Commercial Bay Lighting for Warehouse Garage Workshop Factory, 6′ Cable & Safety Rope, ETL Listed 1 Pack

    CINOTON 160W UFO LED High Bay Light, Aluminum LED Shop Lights with 24000LM, 5000K Commercial Bay Lighting for Warehouse Garage Workshop Factory, 6′ Cable & Safety Rope, ETL Listed 1 Pack

    Rewi beklaut Dner & Neue Projekte mit dem kompletten UFO

    Rewi beklaut Dner & Neue Projekte mit dem kompletten UFO

    Spacecraft Systems Engineering

    Spacecraft Systems Engineering

    NASA UAP Researchers Share Shocking UFO Evidence!

    NASA UAP Researchers Share Shocking UFO Evidence!

    UFOs Over Phoenix: Confessions of a 911 Operator [DVD]

    UFOs Over Phoenix: Confessions of a 911 Operator [DVD]

    Have Aliens Visited Earth? | COLOSSAL MYSTERIES

    Have Aliens Visited Earth? | COLOSSAL MYSTERIES

    MINDBLOWING Encounters Unraveling the Secrets of Higher Dimensions

    MINDBLOWING Encounters Unraveling the Secrets of Higher Dimensions

    Roswell: The After-Action Report

    Roswell: The After-Action Report

    Alien UFO theories: AskReddit #ufo #alien #extraterrestrial #askreddit #reddit #creepystories #scary

    Alien UFO theories: AskReddit #ufo #alien #extraterrestrial #askreddit #reddit #creepystories #scary

No Result
View All Result
  • TC
  • AI
    Artificial Intelligence

    10 Awesome OCR Models for 2025

    Artificial Intelligence

    5 Error Handling Patterns in Python (Beyond Try-Except)

    Artificial Intelligence

    Top 5 Alternative Data Career Paths and How to Learn Them for Free

    Artificial Intelligence

    Implementing Machine Learning Pipelines with Apache Spark

    Artificial Intelligence

    Learn Power BI for Free This Week

    Artificial Intelligence

    Build GraphRAG applications using Amazon Bedrock Knowledge Bases

    Artificial Intelligence

    How to Use Deep Research Like a Pro

    Artificial Intelligence

    World-Consistent Video Diffusion With Explicit 3D Modeling

    Artificial Intelligence

    Deploy Amazon SageMaker Projects with Terraform Cloud

  • Crypto
    Bitcoin’s $200K Price Forecast ‘Conservative,’ Says Bernstein

    Bitcoin’s $200K Price Forecast ‘Conservative,’ Says Bernstein

    Ripple Backs XRP Ledger Startups in Japan With up to $200K per Project

    Ripple Backs XRP Ledger Startups in Japan With up to $200K per Project

    Publicly Traded Firm KULR Acquires 118.6 Bitcoin, Treasury Reaches 920 BTC

    Publicly Traded Firm KULR Acquires 118.6 Bitcoin, Treasury Reaches 920 BTC

    ETF Weekly Flows: $129 Million Outflow for Bitcoin and $281 Million Inflow for Ether

    ETF Weekly Flows: $129 Million Outflow for Bitcoin and $281 Million Inflow for Ether

    DOGE Gets Distilled: Heritage Unleashes Dogecoin-Themed Bourbon

    DOGE Gets Distilled: Heritage Unleashes Dogecoin-Themed Bourbon

    Crypto ETFs centralize what was meant to be decentralized.

    Crypto ETFs centralize what was meant to be decentralized.

    Crypto Lost $1.64 Billion to Hackers in Q1 2025

    Why Is Crypto Down Today? – June 9, 2025

    The Blockchain Group Unveils $343 Million Capital Program to Boost Bitcoin Treasury Strategy

    The Blockchain Group Unveils $343 Million Capital Program to Boost Bitcoin Treasury Strategy

    Bitcoin Bull Cycle is Over: CryptoQuant CEO

    CEX Volumes Hit 2020 Lows as Market Shifts to HODL Mode

  • Cybersecurity
    Cybersecurity

    Malicious Browser Extensions Infect 722 Users Across Latin America Since Early 2025

    Cybersecurity

    Empower Users and Protect Against GenAI Data Loss

    Cybersecurity

    Popular Chrome Extensions Leak API Keys, User Data via HTTP and Hardcoded Credentials

    Cybersecurity

    Critical Cisco ISE Auth Bypass Flaw Impacts Cloud Deployments on AWS, Azure, and OCI

    Cybersecurity

    Why Traditional DLP Solutions Fail in the Browser Era

    Cybersecurity

    HPE Issues Security Patch for StoreOnce Bug Allowing Remote Authentication Bypass

    Cybersecurity

    Critical 10-Year-Old Roundcube Webmail Bug Allows Authenticated Users Run Malicious Code

    Cybersecurity

    Android Trojan Crocodilus Now Active in 8 Countries, Targeting Banks and Crypto Wallets

    Cybersecurity

    Microsoft and CrowdStrike Launch Shared Threat Actor Glossary to Cut Attribution Confusion

  • Deals
    GD90 Mini PC, 12th Gen Intel i9-12900HK(14C/20T), 32GB DDR4 RAM 1TB SSD Desktop Mini…

    GD90 Mini PC, 12th Gen Intel i9-12900HK(14C/20T), 32GB DDR4 RAM 1TB SSD Desktop Mini…

    Hitachi MAF0058 Mass Air Flow Sensor

    Hitachi MAF0058 Mass Air Flow Sensor

    Canon PG-245 Genuine Black Ink Cartridge, Compatible with iP2820,…

    Canon PG-245 Genuine Black Ink Cartridge, Compatible with iP2820,…

    GTRACING Gaming Chair with Footrest Speakers Video Game Chair Bluetooth Music Heavy Duty…

    GTRACING Gaming Chair with Footrest Speakers Video Game Chair Bluetooth Music Heavy Duty…

    RoboCop Rogue City (PS5)

    RoboCop Rogue City (PS5)

    My Universe: Puppies and Kittens – PlayStation 4

    My Universe: Puppies and Kittens – PlayStation 4

    Disney’s Little Mermaid: Ariel’s Undersea Adventure – Nintendo DS (Renewed)

    Disney’s Little Mermaid: Ariel’s Undersea Adventure – Nintendo DS (Renewed)

    Family Game Pack 2001- PlayStation (Renewed)

    Family Game Pack 2001- PlayStation (Renewed)

    StarTech.com Cisco GLC-T Compatible SFP Module – 1000BASE-T – SFP to RJ45 Cat6/Cat5e -…

    StarTech.com Cisco GLC-T Compatible SFP Module – 1000BASE-T – SFP to RJ45 Cat6/Cat5e -…

  • Gaming
    Apple’s new UI for Macs and iPhones ‘combines the optical qualities of glass with a fluidity only Apple can achieve,’ but it sure looks like an awful lot like Windows Vista circa 2007

    Apple’s new UI for Macs and iPhones ‘combines the optical qualities of glass with a fluidity only Apple can achieve,’ but it sure looks like an awful lot like Windows Vista circa 2007

    HYPERCHARGE UNBOXED – CUSTOMIZATIONS

    HYPERCHARGE UNBOXED – CUSTOMIZATIONS

    Scars Above: First 10 Minutes of Gameplay | New Sci-Fi Action Game

    Scars Above: First 10 Minutes of Gameplay | New Sci-Fi Action Game

    2 Years with Steam Deck: My Honest Review and Experiences

    2 Years with Steam Deck: My Honest Review and Experiences

    Dune: Awakening buried treasure: How to find it and get a Sandbike Scanner

    Dune: Awakening buried treasure: How to find it and get a Sandbike Scanner

    LittleBigPlanet 3 – Five Nights at Freddy's The Movie Full Trailer  – LBP3 FNAF Animation

    LittleBigPlanet 3 – Five Nights at Freddy's The Movie Full Trailer – LBP3 FNAF Animation

    RoboCop: Rogue City – Mission 1 All Evidence and Rank A (Officer of the month Achievement)

    RoboCop: Rogue City – Mission 1 All Evidence and Rank A (Officer of the month Achievement)

    Thymesia | Boss Fight | Mutated Odur

    Thymesia | Boss Fight | Mutated Odur

    The Callisto Protocol showed me what makes a GOOD GAME (Raptor Review)

    The Callisto Protocol showed me what makes a GOOD GAME (Raptor Review)

  • Tesla
    Model Y Mud Flaps for Tesla Model Y Accessories 2024 Mud Flaps Tire Splash Guards fit…

    Model Y Mud Flaps for Tesla Model Y Accessories 2024 Mud Flaps Tire Splash Guards fit…

    Tesla CCS Adapter, Fast and Efficient Charging Adapter for Tesla Model 3 Y S X, Portable…

    Tesla CCS Adapter, Fast and Efficient Charging Adapter for Tesla Model 3 Y S X, Portable…

    4 PCS LED Reverse Lights, 4014 45SMD 6500K 800LM High Bright Brake Light Turn Signal…

    4 PCS LED Reverse Lights, 4014 45SMD 6500K 800LM High Bright Brake Light Turn Signal…

    4 Pack Trailer Ball Cover, 2.36In x 2.24In x 1.97In Waterproof Dustproof Towing Hitch…

    4 Pack Trailer Ball Cover, 2.36In x 2.24In x 1.97In Waterproof Dustproof Towing Hitch…

    ClimaTex Heavy Duty Car, Truck, Van, and SUV Automotive Floor Mat for Floor Protection,…

    ClimaTex Heavy Duty Car, Truck, Van, and SUV Automotive Floor Mat for Floor Protection,…

    2 Pcs Tow Hook Covers Compatible with Tesla Cybertruck Accessories 2024 2025 (Red)

    2 Pcs Tow Hook Covers Compatible with Tesla Cybertruck Accessories 2024 2025 (Red)

    MAXDOM Under Seat Storage Fit for 2024+ Tesla Cybertruck Rear Underseat Organizer Box…

    MAXDOM Under Seat Storage Fit for 2024+ Tesla Cybertruck Rear Underseat Organizer Box…

    Car USB Hub Charger for Tesla Model Y 2021-2024 and Model 3 2021-2023,Fast…

    Car USB Hub Charger for Tesla Model Y 2021-2024 and Model 3 2021-2023,Fast…

    CAR GUYS Tire Shine Spray | High Gloss & Satin Finish | Non-Greasy, UV Protection,…

    CAR GUYS Tire Shine Spray | High Gloss & Satin Finish | Non-Greasy, UV Protection,…

  • UFO
    CINOTON 160W UFO LED High Bay Light, Aluminum LED Shop Lights with 24000LM, 5000K Commercial Bay Lighting for Warehouse Garage Workshop Factory, 6′ Cable & Safety Rope, ETL Listed 1 Pack

    CINOTON 160W UFO LED High Bay Light, Aluminum LED Shop Lights with 24000LM, 5000K Commercial Bay Lighting for Warehouse Garage Workshop Factory, 6′ Cable & Safety Rope, ETL Listed 1 Pack

    Rewi beklaut Dner & Neue Projekte mit dem kompletten UFO

    Rewi beklaut Dner & Neue Projekte mit dem kompletten UFO

    Spacecraft Systems Engineering

    Spacecraft Systems Engineering

    NASA UAP Researchers Share Shocking UFO Evidence!

    NASA UAP Researchers Share Shocking UFO Evidence!

    UFOs Over Phoenix: Confessions of a 911 Operator [DVD]

    UFOs Over Phoenix: Confessions of a 911 Operator [DVD]

    Have Aliens Visited Earth? | COLOSSAL MYSTERIES

    Have Aliens Visited Earth? | COLOSSAL MYSTERIES

    MINDBLOWING Encounters Unraveling the Secrets of Higher Dimensions

    MINDBLOWING Encounters Unraveling the Secrets of Higher Dimensions

    Roswell: The After-Action Report

    Roswell: The After-Action Report

    Alien UFO theories: AskReddit #ufo #alien #extraterrestrial #askreddit #reddit #creepystories #scary

    Alien UFO theories: AskReddit #ufo #alien #extraterrestrial #askreddit #reddit #creepystories #scary

No Result
View All Result
Techcratic
No Result
View All Result
Home Hacker News

dipampaul17/KVSplit: Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with

Hacker News by Hacker News
May 16, 2025
in Hacker News
Reading Time: 15 mins read
124 6
A A
0
Share on FacebookShare on XShare on LinkedIn

2025-05-16 16:04:00
github.com

Differentiated KV Cache Quantization for Apple Silicon

GitHub Stars
License
Platform

KV Cache Memory Usage

Run larger context windows and heavier LLMs on your Mac by applying different quantization precision to keys vs values in the attention mechanism’s KV cache. KVSplit enables you to:

  • Reduce memory usage by up to 72% with minimal quality loss
  • Run 2-3x longer contexts in the same memory budget
  • Maintain or improve inference speed compared to FP16
  • Optimize for Apple Silicon with full Metal support

Configuration VRAM @ 8K tokens Tokens/sec Perplexity Change
FP16 (base) 176.00 MB (100%) 54,360 —
K8V8 (8-bit) 93.50 MB (47%) 51,503 +0.03%
K8V4 71.50 MB (41%) 57,438 +0.86%
K4V8 71.50 MB (41%) 58,690 +6.06%
K4V4 (4-bit) 49.50 MB (28%) 55,193 +6.15%

Memory Savings by Sequence Length

Configuration 128 tokens 2048 tokens 4096 tokens 8192 tokens
FP16 (baseline) 5.50 MB 44.00 MB 88.00 MB 176.00 MB
K8V8 (8-bit) 2.92 MB 23.38 MB 46.75 MB 93.50 MB
K8V4 (mixed) 2.23 MB 17.88 MB 35.75 MB 71.50 MB
K4V8 (mixed) 2.23 MB 17.88 MB 35.75 MB 71.50 MB
K4V4 (4-bit) 1.55 MB 12.38 MB 24.75 MB 49.50 MB

  • Independent quantization of keys and values in the KV cache
  • Optimized for Apple Silicon with Metal support
  • Comprehensive benchmarking suite with perplexity measurement
  • Memory usage and performance analysis tools
  • Publication-quality visualization tools
  • Easy setup and usage
  • macOS (tested on Apple Silicon)
  • Homebrew package manager
  • Xcode Command Line Tools

⚡ One-Command Installation

# Clone the repository
git clone https://github.com/dipampaul17/KVSplit.git
cd kvsplit

# Run the installer script
chmod +x scripts/install_kvsplit.sh
./scripts/install_kvsplit.sh

The installer will:

  • Set up the project structure
  • Clone and build llama.cpp with Metal support
  • Configure for differentiated KV cache quantization
  • Download a small test model (optional)
  • Set up Python environment for visualization

Want to see the benefits immediately? Run a quick comparison with your model:

# Run quick comparison with different configurations
python scripts/quick_compare.py --model models/your-model.gguf

This will show you a side-by-side comparison of FP16, K8V8, K8V4, K4V8, and K4V4 with memory usage, speed, and quality metrics.

Memory vs Quality

Configuration VRAM @ 8K tokens Memory Savings Quality Impact
FP16 (base) 176.00 MB — —
K8V8 (8-bit) 93.50 MB 47% +0.03%
K8V4 71.50 MB 59% +0.86%
K4V8 71.50 MB 59% +6.06%
K4V4 (4-bit) 49.50 MB 72% +6.15%

Using KVSplit doesn’t just save memory—it often improves inference speed by 5-15%!

Configuration Tokens/sec (8K ctx) Speedup vs FP16
FP16 54,360 —
K8V8 51,503 -5.3%
K8V4 57,438 +5.7%
K4V8 58,690 +8.0%
K4V4 55,193 +1.5%

kvsplit/
├── llama.cpp/      # Optimized llama.cpp build
├── models/         # LLM model files
├── scripts/        # Utility scripts
│   ├── benchmark_kvsplit.py    # Comprehensive benchmark tool
│   ├── install_kvsplit.sh      # One-command installer
│   ├── quick_compare.py        # Quick comparison utility
│   ├── capture_memory.sh       # GIF creation for memory visualization
│   └── visualize_results.py    # Generate publication-quality plots
├── results/        # Benchmark results (CSV/JSON)
├── plots/          # Generated visualizations
└── README.md       # This file
Configuration Summary

KV cache memory is dominated by storing key and value vectors for each token. Our research has revealed a critical insight: keys are significantly more sensitive to quantization than values.

  • Asymmetric Impact: Keys require higher precision than values for maintaining quality
  • Sweet Spot: K8V4 (8-bit keys, 4-bit values) provides optimal balance
    • Only 0.86% perplexity degradation vs. FP16
    • 59% memory reduction
    • Faster inference than FP16
  • Confirmation: K4V8 configuration shows 7x more quality degradation than K8V4, despite using the same total bits

This asymmetry allows for more efficient memory usage without compromising model quality, enabling longer context windows and larger models on consumer hardware.

Running with Different KV Cache Precisions

# Baseline (FP16)
./llama.cpp/build/bin/llama-cli -m models/your-model.gguf -p "Your prompt" \
  -t 8 --flash-attn

# ⭐ RECOMMENDED: 8-bit keys, 4-bit values (K8V4) 
# Best balance of quality and memory savings
./llama.cpp/build/bin/llama-cli -m models/your-model.gguf -p "Your prompt" \
  -t 8 --flash-attn --kvq 8

# 4-bit keys, 8-bit values (K4V8)
# Shows why key precision matters more than value precision
./llama.cpp/build/bin/llama-cli -m models/your-model.gguf -p "Your prompt" \
  -t 8 --flash-attn --kvq-key 4 --kvq-val 8

# 4-bit keys and values (K4V4)
# Maximum memory savings (72% reduction) with acceptable quality
./llama.cpp/build/bin/llama-cli -m models/your-model.gguf -p "Your prompt" \
  -t 8 --flash-attn --kvq 4

Long Context Example (32K)

# Run with a 32K context (would require ~1.4GB in FP16, only ~400MB with K8V4)
./llama.cpp/build/bin/llama-cli -m models/your-model.gguf \
  -c 32768 -n 4096 -t 8 --flash-attn --kvq 8 \
  -f your-long-document.txt

🚩 Command-Line Arguments

Flag Description Recommendation
-t 8 Number of threads 8 is optimal for most Apple Silicon chips
--flash-attn Enables optimized attention Recommended for Apple Silicon
--kvq N Sets both key and value bits to N Use --kvq 8 for K8V4 configuration
--kvq-key N Sets key bits only Key precision has major quality impact
--kvq-val N Sets value bits only Value precision has minor quality impact
-c N Context size in tokens Longer contexts benefit more from KVSplit
-n N Number of tokens to generate Adjust based on your needs
-f FILE Input file For processing documents
-m MODEL Model path Path to your .gguf model file

📏 Advanced Benchmarking

For comprehensive performance analysis, use our full benchmark suite:

# Run the full benchmark suite (all configurations and sequence lengths)
python scripts/benchmark_kvsplit.py

# Run a specific configuration test
python scripts/benchmark_kvsplit.py --config K8V4 --seq-len 4096

# Generate publication-quality visualizations
python scripts/visualize_results.py

The benchmarking script provides thorough measurements of:

  • 📊 Memory Usage: VRAM and KV cache specifically
  • ⚡ Performance: Tokens per second across different sequence lengths
  • 🎯 Quality: Perplexity measurement using llama-perplexity
  • 📈 Scaling: How memory usage and performance scale with sequence length

Results are saved in CSV/JSON formats with automatic summary statistics, and the visualization script generates publication-quality plots showing key insights.

MIT

🎬 Visual Memory Savings

You can visualize memory savings with our capture tool:

# Capture memory reduction in Activity Monitor
./scripts/capture_memory.sh

🍎 Apple Silicon Optimization

  • Metal Performance: Fully optimized for Apple’s Metal framework
  • Memory Efficiency: Critical for memory-constrained M1/M2/M3 devices
  • Activity Monitor: Use our capture_memory.sh script to visualize real-time memory reductions
  • Alignment: 256B page alignment in llama.cpp means actual memory savings might differ slightly from theoretical calculations
  • Differentiated Precision: Independent key and value bit precision (K8V4, K4V8, etc)
  • Apple Silicon Optimization: Full Metal support for M1/M2/M3 chips
  • Comprehensive Benchmarking: Memory, speed, and quality metrics
  • Publication-Quality Visualization: Beautiful plots for analysis
  • Simple User Interface: One-command install and quick comparison tools
  • Memory Visualization: Tools to capture and visualize memory savings

This project implements ideas from recent research including:

  • “More for Keys, Less for Values: Adaptive KV Cache Quantization” (2024)
  • “Unifying KV Cache Compression for Large Language Models with LeanKV” (2025)

Additional credits:

Contributions are welcome! Please open an issue or submit a pull request.

🧠 Configuration Recommendations

  • Best Overall: 🌟 K8V4 🌟 (8-bit keys, 4-bit values)

    • 59% memory reduction with only 0.86% quality loss
    • Improved inference speed (+5.7% vs FP16)
    • Great balance of quality and efficiency
  • Absolute Maximum Memory Savings: K4V4 (4-bit keys and values)

    • 72% memory reduction with ~6% quality loss
    • Good for memory-constrained devices
    • Acceptable for less sensitive applications
  • Best for Very Long Contexts: K8V4 or K4V4

    • Memory savings compound with context length
    • Run 2-3x longer contexts in the same memory budget

MIT

Contributions are welcome! Please open an issue or submit a pull request.

Source Link


Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.

Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!


Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo

Bitcoin QR Code

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Scan the QR code with your crypto wallet app

DOGECOIN

Dogecoin Logo

Dogecoin QR Code

D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA

Scan the QR code with your crypto wallet app

ETHEREUM

Ethereum Logo

Ethereum QR Code

0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a

Scan the QR code with your crypto wallet app

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: Hacker News
Share161Tweet101Share28
Previous Post

Systems Administrator/Manager Onsite

Next Post

Stellar Blade hits PC on June 11 with modest system requirements

Hacker News

Hacker News

Stay updated with Hacker News, where technology meets entrepreneurial spirit. Get the latest on tech trends, startup news, and discussions from the tech community. Read the latest updates here at Techcratic.

Related Posts

Askannz/munal-os: An experimental operating system fully written in Rust, with a unikernel design, cooperative scheduling and a security model based on WASM sandboxing.
Hacker News

Askannz/munal-os: An experimental operating system fully written in Rust, with a unikernel design, cooperative scheduling and a security model based on WASM sandboxing.

June 9, 2025
1.3k
The New Godel Prize Winner Tastes Great and is Less Filling
Hacker News

The New Godel Prize Winner Tastes Great and is Less Filling

June 9, 2025
1.3k
my first attempt at iOS app development
Hacker News

my first attempt at iOS app development

June 8, 2025
1.3k
binfmtc – binfmt_misc C scripting interface
Hacker News

binfmtc – binfmt_misc C scripting interface

June 8, 2025
1.3k
Stop Vibe Coding. Start Cyborg Coding. | by Chase | Jun, 2025
Hacker News

Stop Vibe Coding. Start Cyborg Coding. | by Chase | Jun, 2025

June 7, 2025
1.3k
Discovering a JDK Race Condition, and Debugging it in 30 Minutes with Fray
Hacker News

Discovering a JDK Race Condition, and Debugging it in 30 Minutes with Fray

June 7, 2025
1.3k
Load More
Next Post
Stellar Blade hits PC on June 11 with modest system requirements

Stellar Blade hits PC on June 11 with modest system requirements

ADLEY the CAT DETECTiVE!! Cartoon Granny Lost her Cats! playing Roblox with Dad finding digital pet

ADLEY the CAT DETECTiVE!! Cartoon Granny Lost her Cats! playing Roblox with Dad finding digital pet

Dali’s V-16F subwoofer will break your back and your bank balance

Dali's V-16F subwoofer will break your back and your bank balance

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Tech Resources

  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo
  • Google News
  • Hacker News
  • Harvard Tech
  • I Like Cats ™
  • I Like Dogs ™
  • LifeHacker
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • PC World
  • Photofocus
  • Physics
  • Random Tech
  • Retro Rewind ™
  • Robot Report
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Art
  • Tech Careers
  • Tech Deals
  • Techcratic ™
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Weird Stuff
  • Wired
  • ZDNet

Tech News

  • 30 Second Tech ™
  • AI
  • AnandTech
  • Apple Insider
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • ExtremeTech
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo

Tech News

  • Harvard Tech
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • Physics
  • PC World
  • Random Tech
  • Retro Rewind ™
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Careers

Tech News​

  • Tech Art
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Weird Stuff
  • Wired
  • ZDNet

Site Links

  • About Techcratic
  • Affiliate Disclaimer
  • Affiliate Link Policy
  • Contact Techcratic
  • Dealors Discount Store
  • Privacy and Security Disclaimer
  • Privacy Policy
  • RSS Feed
  • Site Map
  • Support Techcratic
  • Techcratic
  • Tech Deals
  • TOS
  • 𝕏
Click For A Secret Deal

Techcratic – Your All In One Tech Hub © 2020 – 2025
All Rights Reserved
∞

No Result
View All Result
  • Home
  • Apple
  • Gaming
  • Microsoft
  • AnandTech