• About TC
  • Affiliate Disclaimer
  • Privacy Policy
  • TOS
  • Contact
Tuesday, July 1, 2025
Techcratic
  • TC
  • AI
    Artificial Intelligence

    Instruction-Following Pruning for Large Language Models

    Artificial Intelligence

    How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

    Artificial Intelligence

    Tailor responsible AI with new safeguard tiers in Amazon Bedrock Guardrails

    Artificial Intelligence

    Automate Data Quality Reports with n8n: From CSV to Professional Analysis

    Artificial Intelligence

    NewDay builds A Generative AI based Customer service Agent Assist with over 90% accuracy

    Artificial Intelligence

    5 Things You Need to Know About Agentic AI

    Artificial Intelligence

    Normalizing Flows are Capable Generative Models

    Artificial Intelligence

    Update on the AWS DeepRacer Student Portal

    Artificial Intelligence

    INRFlow: Flow Matching for INRs in Ambient Space

  • App Zone
    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Productivity Apps for 2025: Features, Pros, and Cons

    Top 3 Productivity Apps for 2025: Features, Pros, and Cons

  • Apple
    Apple hit with $110M damages in 3G patents lawsuit

    Apple hit with $110M damages in 3G patents lawsuit

    Photos iOS 26 vs iOS 18: Compared

    Photos iOS 26 vs iOS 18: Compared

    Here’s everything new for Apple Photos in iOS 26

    Here’s everything new for Apple Photos in iOS 26

    Apple gains ground with new Macs despite market challenges

    Apple gains ground with new Macs despite market challenges

    Anker Power Bank, Zolo, MagGo, recall

    Anker Power Bank, Zolo, MagGo, recall

    Developer for Linux on Apple Silicon Macs resigns, citing ‘major failure of leadership’

    New ‘MacBook’ rumor sounds like Apple’s taking the iPad approach

    Apple Music 10 year celebration

    Apple Music 10 year celebration

    Brazil App Store anticompetitive, fine, legal anti-steering

    Brazil App Store anticompetitive, fine, legal anti-steering

    Apple may give Siri a brain transplant with the help of Claude or ChatGPT

    Apple may give Siri a brain transplant with the help of Claude or ChatGPT

  • Retro Rewind
    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 57 April 1994

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

  • Tech Deals
    Skytech King 95 Gaming PC Desktop, Intel i7 14700F 2.1 GHz (5.3GHz Turbo), NVIDIA RTX…

    Skytech King 95 Gaming PC Desktop, Intel i7 14700F 2.1 GHz (5.3GHz Turbo), NVIDIA RTX…

    ASRock – B550M PRO SE – ASRock B550M Pro SE Gaming Desktop Motherboard – AMD PRO565…

    ASRock – B550M PRO SE – ASRock B550M Pro SE Gaming Desktop Motherboard – AMD PRO565…

    Soundcore A30i by Anker, Smart Noise Cancelling Earbuds, Lipstick-Shaped Stylish Design,…

    Soundcore A30i by Anker, Smart Noise Cancelling Earbuds, Lipstick-Shaped Stylish Design,…

    ADATA Premier 256GB MicroSDHC/SDXC UHS-I Class 10 V10 A1 Memory Card with Adapter Read…

    ADATA Premier 256GB MicroSDHC/SDXC UHS-I Class 10 V10 A1 Memory Card with Adapter Read…

    acer Wireless Mouse for Laptop, 2.4GHz Computer Mouse 3 Adjustable DPI Office Cordless…

    acer Wireless Mouse for Laptop, 2.4GHz Computer Mouse 3 Adjustable DPI Office Cordless…

    STGAubron Gaming PC Computer Desktop, GeForce GTX 1660 Ti 6G, Intel Core I7 up to 3.9…

    STGAubron Gaming PC Computer Desktop, GeForce GTX 1660 Ti 6G, Intel Core I7 up to 3.9…

    Sonic & SEGA All-Stars Racing – Xbox 360

    Sonic & SEGA All-Stars Racing – Xbox 360

    Carnival Games – Nintendo Wii (Renewed)

    Carnival Games – Nintendo Wii (Renewed)

    Transformers Devastation – PlayStation 3

    Transformers Devastation – PlayStation 3

  • Tech Eats
    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

  • Tesla
    Garmin GPS Mount – Ultra-Sticky Dash Holder for Car & Truck Dashboard & Windshield,…

    Garmin GPS Mount – Ultra-Sticky Dash Holder for Car & Truck Dashboard & Windshield,…

    Elon Musk goes from sleeping on Tesla’s factory floor to sleeping in sales office

    Elon Musk goes from sleeping on Tesla’s factory floor to sleeping in sales office

    2 PCS H13/9008 Car LED Light Canbus Error-free Decoder, Plug-and-play Retrofit Radio…

    2 PCS H13/9008 Car LED Light Canbus Error-free Decoder, Plug-and-play Retrofit Radio…

    Tesla fires Musk’s chief of staff who became head of North America and Europe

    Tesla fires Musk’s chief of staff who became head of North America and Europe

    Wireless Charge Mat for 2024 2025 Tesla Cybertruck,Center Console Wireless Charger…

    Wireless Charge Mat for 2024 2025 Tesla Cybertruck,Center Console Wireless Charger…

    Truck Bed Cargo Mesh Net for Tesla Cybertruck 2024,with 6 Carabiners Stretchable Storage…

    Truck Bed Cargo Mesh Net for Tesla Cybertruck 2024,with 6 Carabiners Stretchable Storage…

    Motor Trend Premium FlexTough Deep Dish Rear Rubber Floor Mat Liners, Heavy Duty…

    Motor Trend Premium FlexTough Deep Dish Rear Rubber Floor Mat Liners, Heavy Duty…

    JOYTUTUS Frunk Front Trunk Mat Compatible with Cybertruck 2024 2023 TPE Frunk Mat Liner…

    JOYTUTUS Frunk Front Trunk Mat Compatible with Cybertruck 2024 2023 TPE Frunk Mat Liner…

    Dash Camera, 4K/1080p Dash Camera Front and Rear, Built-in 5GWiFi, Dash Cam with 64GB SD…

    Dash Camera, 4K/1080p Dash Camera Front and Rear, Built-in 5GWiFi, Dash Cam with 64GB SD…

  • UFO
    The Most Terrifying Unsolved UFO Mysteries | Best of Close Encounters

    The Most Terrifying Unsolved UFO Mysteries | Best of Close Encounters

    FINALLY! Biggest ALIEN SEARCH OPERATION's Results are Out | Breakthrough Listen Project Results

    FINALLY! Biggest ALIEN SEARCH OPERATION's Results are Out | Breakthrough Listen Project Results

    CINOTON 160W UFO LED High Bay Light, Aluminum LED Shop Lights with 24000LM, 5000K Commercial Bay Lighting for Warehouse Garage Workshop Factory Hall, 6′ Cable & Safety Rope, ETL Listed 2 Pack

    CINOTON 160W UFO LED High Bay Light, Aluminum LED Shop Lights with 24000LM, 5000K Commercial Bay Lighting for Warehouse Garage Workshop Factory Hall, 6′ Cable & Safety Rope, ETL Listed 2 Pack

    MindBlowing Alien Encounter Giant Mouse Discovered on Mars

    MindBlowing Alien Encounter Giant Mouse Discovered on Mars

    Franco Collectibles Adventure Time Bedding Super Soft Cozy Plush Throw, 46 in x 60 in, (Officially Licensed Product)

    Franco Collectibles Adventure Time Bedding Super Soft Cozy Plush Throw, 46 in x 60 in, (Officially Licensed Product)

    Alien 3's Workprint: What Else Was Cut From the Film?

    Alien 3's Workprint: What Else Was Cut From the Film?

    Simple Area 51 Minimal UFO Tattoo Line Art Graphic Tee UFO T-Shirt

    Simple Area 51 Minimal UFO Tattoo Line Art Graphic Tee UFO T-Shirt

    UFO hearing: Pentagon shows declassified photos and video, clip of unexplainable floating object

    UFO hearing: Pentagon shows declassified photos and video, clip of unexplainable floating object

    Unveiling the Truth: ET's Among Us | Sci-Fi Movie | UFO Documentary | Free Movie

    Unveiling the Truth: ET's Among Us | Sci-Fi Movie | UFO Documentary | Free Movie

No Result
View All Result
  • TC
  • AI
    Artificial Intelligence

    Instruction-Following Pruning for Large Language Models

    Artificial Intelligence

    How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

    Artificial Intelligence

    Tailor responsible AI with new safeguard tiers in Amazon Bedrock Guardrails

    Artificial Intelligence

    Automate Data Quality Reports with n8n: From CSV to Professional Analysis

    Artificial Intelligence

    NewDay builds A Generative AI based Customer service Agent Assist with over 90% accuracy

    Artificial Intelligence

    5 Things You Need to Know About Agentic AI

    Artificial Intelligence

    Normalizing Flows are Capable Generative Models

    Artificial Intelligence

    Update on the AWS DeepRacer Student Portal

    Artificial Intelligence

    INRFlow: Flow Matching for INRs in Ambient Space

  • App Zone
    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Productivity Apps for 2025: Features, Pros, and Cons

    Top 3 Productivity Apps for 2025: Features, Pros, and Cons

  • Apple
    Apple hit with $110M damages in 3G patents lawsuit

    Apple hit with $110M damages in 3G patents lawsuit

    Photos iOS 26 vs iOS 18: Compared

    Photos iOS 26 vs iOS 18: Compared

    Here’s everything new for Apple Photos in iOS 26

    Here’s everything new for Apple Photos in iOS 26

    Apple gains ground with new Macs despite market challenges

    Apple gains ground with new Macs despite market challenges

    Anker Power Bank, Zolo, MagGo, recall

    Anker Power Bank, Zolo, MagGo, recall

    Developer for Linux on Apple Silicon Macs resigns, citing ‘major failure of leadership’

    New ‘MacBook’ rumor sounds like Apple’s taking the iPad approach

    Apple Music 10 year celebration

    Apple Music 10 year celebration

    Brazil App Store anticompetitive, fine, legal anti-steering

    Brazil App Store anticompetitive, fine, legal anti-steering

    Apple may give Siri a brain transplant with the help of Claude or ChatGPT

    Apple may give Siri a brain transplant with the help of Claude or ChatGPT

  • Retro Rewind
    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 57 April 1994

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

  • Tech Deals
    Skytech King 95 Gaming PC Desktop, Intel i7 14700F 2.1 GHz (5.3GHz Turbo), NVIDIA RTX…

    Skytech King 95 Gaming PC Desktop, Intel i7 14700F 2.1 GHz (5.3GHz Turbo), NVIDIA RTX…

    ASRock – B550M PRO SE – ASRock B550M Pro SE Gaming Desktop Motherboard – AMD PRO565…

    ASRock – B550M PRO SE – ASRock B550M Pro SE Gaming Desktop Motherboard – AMD PRO565…

    Soundcore A30i by Anker, Smart Noise Cancelling Earbuds, Lipstick-Shaped Stylish Design,…

    Soundcore A30i by Anker, Smart Noise Cancelling Earbuds, Lipstick-Shaped Stylish Design,…

    ADATA Premier 256GB MicroSDHC/SDXC UHS-I Class 10 V10 A1 Memory Card with Adapter Read…

    ADATA Premier 256GB MicroSDHC/SDXC UHS-I Class 10 V10 A1 Memory Card with Adapter Read…

    acer Wireless Mouse for Laptop, 2.4GHz Computer Mouse 3 Adjustable DPI Office Cordless…

    acer Wireless Mouse for Laptop, 2.4GHz Computer Mouse 3 Adjustable DPI Office Cordless…

    STGAubron Gaming PC Computer Desktop, GeForce GTX 1660 Ti 6G, Intel Core I7 up to 3.9…

    STGAubron Gaming PC Computer Desktop, GeForce GTX 1660 Ti 6G, Intel Core I7 up to 3.9…

    Sonic & SEGA All-Stars Racing – Xbox 360

    Sonic & SEGA All-Stars Racing – Xbox 360

    Carnival Games – Nintendo Wii (Renewed)

    Carnival Games – Nintendo Wii (Renewed)

    Transformers Devastation – PlayStation 3

    Transformers Devastation – PlayStation 3

  • Tech Eats
    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

  • Tesla
    Garmin GPS Mount – Ultra-Sticky Dash Holder for Car & Truck Dashboard & Windshield,…

    Garmin GPS Mount – Ultra-Sticky Dash Holder for Car & Truck Dashboard & Windshield,…

    Elon Musk goes from sleeping on Tesla’s factory floor to sleeping in sales office

    Elon Musk goes from sleeping on Tesla’s factory floor to sleeping in sales office

    2 PCS H13/9008 Car LED Light Canbus Error-free Decoder, Plug-and-play Retrofit Radio…

    2 PCS H13/9008 Car LED Light Canbus Error-free Decoder, Plug-and-play Retrofit Radio…

    Tesla fires Musk’s chief of staff who became head of North America and Europe

    Tesla fires Musk’s chief of staff who became head of North America and Europe

    Wireless Charge Mat for 2024 2025 Tesla Cybertruck,Center Console Wireless Charger…

    Wireless Charge Mat for 2024 2025 Tesla Cybertruck,Center Console Wireless Charger…

    Truck Bed Cargo Mesh Net for Tesla Cybertruck 2024,with 6 Carabiners Stretchable Storage…

    Truck Bed Cargo Mesh Net for Tesla Cybertruck 2024,with 6 Carabiners Stretchable Storage…

    Motor Trend Premium FlexTough Deep Dish Rear Rubber Floor Mat Liners, Heavy Duty…

    Motor Trend Premium FlexTough Deep Dish Rear Rubber Floor Mat Liners, Heavy Duty…

    JOYTUTUS Frunk Front Trunk Mat Compatible with Cybertruck 2024 2023 TPE Frunk Mat Liner…

    JOYTUTUS Frunk Front Trunk Mat Compatible with Cybertruck 2024 2023 TPE Frunk Mat Liner…

    Dash Camera, 4K/1080p Dash Camera Front and Rear, Built-in 5GWiFi, Dash Cam with 64GB SD…

    Dash Camera, 4K/1080p Dash Camera Front and Rear, Built-in 5GWiFi, Dash Cam with 64GB SD…

  • UFO
    The Most Terrifying Unsolved UFO Mysteries | Best of Close Encounters

    The Most Terrifying Unsolved UFO Mysteries | Best of Close Encounters

    FINALLY! Biggest ALIEN SEARCH OPERATION's Results are Out | Breakthrough Listen Project Results

    FINALLY! Biggest ALIEN SEARCH OPERATION's Results are Out | Breakthrough Listen Project Results

    CINOTON 160W UFO LED High Bay Light, Aluminum LED Shop Lights with 24000LM, 5000K Commercial Bay Lighting for Warehouse Garage Workshop Factory Hall, 6′ Cable & Safety Rope, ETL Listed 2 Pack

    CINOTON 160W UFO LED High Bay Light, Aluminum LED Shop Lights with 24000LM, 5000K Commercial Bay Lighting for Warehouse Garage Workshop Factory Hall, 6′ Cable & Safety Rope, ETL Listed 2 Pack

    MindBlowing Alien Encounter Giant Mouse Discovered on Mars

    MindBlowing Alien Encounter Giant Mouse Discovered on Mars

    Franco Collectibles Adventure Time Bedding Super Soft Cozy Plush Throw, 46 in x 60 in, (Officially Licensed Product)

    Franco Collectibles Adventure Time Bedding Super Soft Cozy Plush Throw, 46 in x 60 in, (Officially Licensed Product)

    Alien 3's Workprint: What Else Was Cut From the Film?

    Alien 3's Workprint: What Else Was Cut From the Film?

    Simple Area 51 Minimal UFO Tattoo Line Art Graphic Tee UFO T-Shirt

    Simple Area 51 Minimal UFO Tattoo Line Art Graphic Tee UFO T-Shirt

    UFO hearing: Pentagon shows declassified photos and video, clip of unexplainable floating object

    UFO hearing: Pentagon shows declassified photos and video, clip of unexplainable floating object

    Unveiling the Truth: ET's Among Us | Sci-Fi Movie | UFO Documentary | Free Movie

    Unveiling the Truth: ET's Among Us | Sci-Fi Movie | UFO Documentary | Free Movie

No Result
View All Result
Techcratic
No Result
View All Result
Home Hacker News

Tokasaurus: An LLM Inference Engine for High-Throughput Workloads

Hacker News by Hacker News
June 5, 2025
in Hacker News
Reading Time: 10 mins read
129
A A
0

2025-06-05 17:27:00
scalingintelligence.stanford.edu


We’re releasing Tokasaurus, a new LLM inference engine optimized for throughput-intensive workloads. With small models, Tokasaurus benefits from very low CPU overhead and dynamic Hydragen grouping to exploit shared prefixes. For larger models, Tokasaurus supports async tensor parallelism for GPUs with NVLink and a fast implementation of pipeline parallelism for GPUs without. On throughput-focused benchmarks, Tokasaurus can outperform vLLM and SGLang by up to 3x+.


Table of Contents


Intro

As LLMs get smarter, faster, and cheaper, the community keeps finding new ways to use them. Our own recent work has explored using models to scan every file in a codebase, sample 10,000 attempts for math and code problems, and collaborate with other models to minimize cloud costs. Inference is now also an important part of the training process, where we use models to generate synthetic data or as part of RL pipelines that generate and train on model completions.

Crucially, these new inference workloads look quite different than the original LLM use case of serving a chatbot. Here, we care primarily about the total time and cost required to complete a large batch of sequences, and we care much less (if at all) about the individual latency of a single generation. In other words, we want high throughput!

Open-source inference engines (i.e. dedicated systems for running efficient LLM inference) like FlexGen, vLLM, and SGLang have been enormously valuable to the community. Inspired by and learning from these projects, we built a new engine, Tokasaurus, designed from the ground up to handle throughput-focused workloads. We’ve optimized Tokasaurus for efficiently serving large and small models alike, allowing it to outperform existing engines on throughput benchmarks. In the rest of this blog, we’ll walk through some of these optimizations and show off a few settings where Tokasaurus really shines.


Optimizing Small Models

To benchmark Tokasaurus with small models, we’ll use two workloads:

  • Completing chatbot prompts from the ShareGPT dataset (this is a common benchmark for testing inference engines).
  • Reproducing an experiment from Large Language Monkeys, where we take 128 problems from the GSM8K math dataset and sample 1024 answers to every problem. The distinguishing feature of this workload is that there’s a lot of prefix sharing across sequences.
Tokasaurus small models
Tokasaurus large batch sampling

Tokasaurus outperforms vLLM and SGLang on both of these benchmarks, in particular achieving over 2x the throughput of other engines on the Large Language Monkeys workload. Two main features contribute to these wins with small models:

Minimizing CPU Overhead

LLM engines perform many different tasks on the CPU, like handling web requests, tokenizing inputs/detokenizing outputs, managing KV cache allocation, and preparing inputs for the model. If these CPU-side tasks cause the GPU-side model to stall, throughput can take a big hit. To avoid these stalls, inference engines commonly make many CPU-side tasks asynchronous: while the GPU runs a forward pass for batch N, the CPU-side of the engine post-processes the results from batch N-1 and prepares the inputs for batch N+1.

Tokasaurus goes one step further, making the CPU-side of the engine (what we call the manager) both asynchronous and adaptive. The manager’s goal is to maintain a deep queue of inputs for the model to run forward passes on. The manager monitors the size of this queue and can detect if the model is close to exhausting it (and therefore stalling the GPU). In these cases, the manager will automatically start skipping optional steps (like checking for stop strings and onboarding new sequences) until the model’s input queue is sufficiently deep again. This combination of asynchrony and adaptivity lets Tokasaurus serve small models with much less CPU overhead.

Dynamic Prefix Identification and Exploration

Prefix sharing comes up all the time in LLM inference — not just when repeatedly sampling like in the Large Language Monkeys benchmark, but also when asking many questions about a long document or reusing a system prompt across many chatbot conversations.

Shared prefixes allow attention to be computed more efficiently. We first explored this idea last year with Hydragen (aka cascade attention and bifurcated attention), but at the time we didn’t address the problem of detecting these shared prefixes in an engine where sequences are constantly starting and finishing. With Tokasaurus, we solve this detection problem by running a greedy depth-first search algorithm before every model forward pass that iteratively finds the longest shared prefixes possible. Hydragen is most impactful for small models, which spend a relatively larger fraction of total FLOPs on attention.


Optimizing Bigger Models

Tokasaurus can also efficiently serve bigger models across multiple GPUs! Here, the most important optimizations are our implementations of pipeline parallelism (PP) and tensor parallelism (TP), which allow us to maximize throughput on GPUs with or without NVLink.

Pipeline Parallelism for the GPU Poor

One of our original goals with Tokasaurus was to efficiently run multi-GPU inference on our lab’s L40S GPUs, which don’t have fast inter-GPU NVLink connections. Without NVLink, the communication costs incurred running TP across a node of eight GPUs are substantial. Therefore, efficient support for PP (which requires much less inter-GPU communication) was a high priority. PP needs a large batch in order to run efficiently, since batches from the manager are subdivided into microbatches that are spread out across pipeline stages. When optimizing for throughput, we’re generally already using the largest batch size that fits in GPU memory, so PP is often a natural fit for throughput-focused workloads. When benchmarking against vLLM’s and SGLang’s pipeline parallel implementations using Llama-3.1-70B on eight L40S GPUs, Tokasaurus improves throughput by over 3x:

Tokasaurus small models

Async Tensor Parallel for the GPU Rich

If you do have GPUs with NVLink (e.g. B200s and certain models of H100s and A100s), Tokasaurus has something for you too! Models in Tokasaurus can be torch compiled end-to-end, allowing us to take advantage of Async Tensor Parallelism (Async-TP). This is a relatively new feature in PyTorch that can overlap inter-GPU communication with other computations, partially hiding the cost of communication. In our benchmarks, we found that Async-TP adds a lot of CPU overhead to the model forward pass and only starts improving throughput with very large batch sizes (e.g. 6k+ tokens). Tokasaurus maintains torch-compiled versions of our models with and without Async-TP enabled, allowing us to automatically switch on Async-TP whenever the batch size is big enough:

Tokasaurus small models

Try it Out

Tokasaurus started as an internal lab effort to run our inference experiments faster, and we’re excited to share it more broadly! You can check out the Tokasaurus code on GitHub and install the package from PyPI with:

Currently, we support models from the Llama-3 and Qwen-2 families and support any combination of data, tensor, and pipeline parallel within a single node.

Tokasaurus is written in pure Python (although we do use attention and sampling ops from the excellent FlashInfer package). We hope that this makes the engine easier to fork and hack on, à la GPT-fast.

Benchmarking Details

The commands for reproducing our benchmarks are available here. For each benchmark, we configure all engines with the same KV cache size and maximum number of running requests. We’ve made a best effort to tune each engine’s remaining parameters. We report the average throughput across runs after completing a warmup run. For each benchmark, all engines are run on the same machine.

We use this script from SGLang for our ShareGPT benchmarks and this custom script for the Large Language Monkeys benchmark. To standardize our benchmarking scripts and interface, all experiments send requests through the OpenAI API. We also experimented with vLLM’s Python API (i.e. LLM.generate()) on the Large Language Monkeys benchmark with Llama-1B and measured roughly a 5% throughput increase (thanks to the vLLM team for the tip!).

Acknowledgements

Huge thanks to Prime Intellect and Together AI for providing us with compute for this project.

Also, we’re grateful to Dan Biderman, Simon Guo, Manat Kaur, and Avanika Narayan for beta testing the engine!


If you find Tokasaurus useful, please use the following citation:

@misc{juravsky2025tokasaurus,
  author       = {Jordan Juravsky and Ayush Chakravarthy and Ryan Ehrlich and Sabri Eyuboglu and Bradley Brown and Joseph Shetaye and Christopher R{\'e} and Azalia Mirhoseini},
  title        = {Tokasaurus: An LLM Inference Engine for High-Throughput Workloads},
  year         = {2025},
  howpublished = {\url{https://scalingintelligence.stanford.edu/blogs/tokasaurus/}}
}

Source Link


Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.

Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!


Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo

Bitcoin QR Code

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Scan the QR code with your crypto wallet app

DOGECOIN

Dogecoin Logo

Dogecoin QR Code

D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA

Scan the QR code with your crypto wallet app

ETHEREUM

Ethereum Logo

Ethereum QR Code

0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a

Scan the QR code with your crypto wallet app

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: Hacker News
Share161Share28ShareShare4ShareTweet101
Previous Post

TS Imagine develops customer service bot atop Snowflake

Next Post

Samsung is giving away free 32-inch smart monitors, here’s how to qualify

Hacker News

Hacker News

Stay updated with Hacker News, where technology meets entrepreneurial spirit. Get the latest on tech trends, startup news, and discussions from the tech community. Read the latest updates here at Techcratic.

Related Posts

Snake Keyloggers Exploit Java Utilities to Evade Detection by Security Tools
Hacker News

Snake Keyloggers Exploit Java Utilities to Evade Detection by Security Tools

July 1, 2025
1.3k
topling/toplingdb: ToplingDB is a cloud native LSM Key-Value Store with searchable compression algo and distributed compaction
Hacker News

topling/toplingdb: ToplingDB is a cloud native LSM Key-Value Store with searchable compression algo and distributed compaction

July 1, 2025
1.3k
RP2350pc Open Source Hardware all in one computer with RP2350B, 8MB PSRAM, 16MB Flash, Four USB host, DVI/HDMI output and Audio Codec for retro computer emulation and education
Hacker News

RP2350pc Open Source Hardware all in one computer with RP2350B, 8MB PSRAM, 16MB Flash, Four USB host, DVI/HDMI output and Audio Codec for retro computer emulation and education

July 1, 2025
1.3k
New C4 Bomb Attack Breaks Through Chrome’s AppBound Cookie Protections
Hacker News

New C4 Bomb Attack Breaks Through Chrome’s AppBound Cookie Protections

July 1, 2025
1.3k
stan-smith/OpenFLOW: Make beautiful isometric infrastructure diagrams
Hacker News

stan-smith/OpenFLOW: Make beautiful isometric infrastructure diagrams

July 1, 2025
1.3k
Public Signal Backups Testing – Call for Testing
Hacker News

Public Signal Backups Testing – Call for Testing

June 30, 2025
1.3k
The New Skill in AI is Not Prompting, It’s Context Engineering
Hacker News

The New Skill in AI is Not Prompting, It’s Context Engineering

June 30, 2025
1.3k
Data Centers, Temperature, and Power
Hacker News

Data Centers, Temperature, and Power

June 30, 2025
1.3k
Load More
Next Post
FIRE STICK

Samsung is giving away free 32-inch smart monitors, here's how to qualify

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Tech Resources

  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Forbes
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo
  • Google News
  • Hacker News
  • Harvard Tech
  • I Like Cats ™
  • I Like Dogs ™
  • LifeHacker
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • PC World
  • Photofocus
  • Physics
  • Random Tech
  • Retro Rewind ™
  • Robot Report
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Art
  • Tech Careers
  • Tech Deals
  • Techcratic ™
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Tech News

  • 30 Second Tech ™
  • AI
  • Apple Insider
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • ExtremeTech
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo

Tech News

  • Harvard Tech
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • Physics
  • PC World
  • Random Tech
  • Retro Rewind ™
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Careers

Tech News​

  • Tech Art
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Site Links

  • About Techcratic
  • Affiliate Disclaimer
  • Affiliate Link Policy
  • Contact Techcratic
  • Dealors Discount Store
  • Privacy and Security Disclaimer
  • Privacy Policy
  • RSS Feed
  • Site Map
  • Support Techcratic
  • Techcratic
  • Tech Deals
  • TOS
  • 𝕏
Click For A Secret Deal

Techcratic – Your All In One Tech Hub © 2020 – 2025
All Rights Reserved
∞

No Result
View All Result
  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Gaming
  • I Like Cats ™
  • I Like Dogs ™
  • MacRumors
  • Macworld
  • Tech Deals
  • Techcratic ™
  • Techs Got To Eat ™
  • Tesla
  • UFO
  • Wired