Thomas Coughlin, Contributor
2025-07-08 00:47:00
www.forbes.com
Artificial Intelligence
In April this year, Kioxia’s Rory Bolt gave me a briefing on Kioxia’s AiSAQ, an open-source project intended to promote the expanded use of SSDs in RAG AI solutions. The focus on AI is moving from generating foundational models with massive and expensive training to cost effective and scalable ways to create inference solutions that can solve real world problems.
Retrieval-Augmented Generation is an approach to AI that combined traditional information retrieval systems with large language models. RAG enhances the performance of LLMs by allowing them to access and incorporate information from external knowledge sources, such as databases, websites, and internal documents, before generating a response. This approach helps LLMs produce more accurate, contextually relevant, and up-to-date information, especially when dealing with specific domains or real-time data.
Kioxia has used AI to improve the output of its NAND fabs since 2017, mostly using machine vision to monitor trends and defect rates. In 2020 Kioxia used AI to generate the world’s first AI-designed Manga, Phaedo, drawing on manga drawings and stories based on Osuma Tezuka’s work.
I was told that although larger data centers feed data to their AI models using hard drives, many in-house solutions train using data on SSDs. These solutions often work with foundational LLM models created with very large data sets and use RAG using in-house and perhaps more up to date data to tune the foundational model for a particular application and to avoid hallucinations. The image below illustrates how a database can be used for tuning of the original LLM.
How Retrieval-Augmented Generation works to improve LLM Inference
Here the customer query is answered using the LLM as well as domain specific and up to date information in a vector data base. Such RAG solutions can be done with the data base index and vectors all in DRAM, but such an approach can use a lot of memory, making them very expensive, particularly for large data bases.
Microsoft developed Disk ANN which moved the bulk of the vector DB content to SSDs. This reduced the required DRAM footprint for the DB enabling greater scaling of vector DBS. This is used in products such as Azure Vector DB and Cosmos DB.
Kioxia’s All-in-Storage ANNS with Product Quantization, or AiSAQ completes the move of database vectors into storage, further reducing the DRAM requirements. These three approaches are represented in the drawing below.
Comparison of data base DRAM requirements for DRAM and SSD-based RAG architectures
Kioxia says that this approach enabled greater scalability for RAG workflows and thus better accuracy in the models. The image below shows the significant reduction of DRAM required for large databases compared to the DRAM-based, and DiskANN approach and the improved query accuracy.
AiSAQ reduces DRAM costs, improves speed and inference accuracy
In early July Kioxia announced further improvements to its AiSAQ. This new open source release allows flexible controls that allow system architects to define the balance point between search performance and the number of vectors, which are opposing factors with the fixed capacity of SSD storage in the system. The resulting benefit enables architects of RAG systems to fine-tune the optimal balance between specific workloads and their requirements, without any hardware modifications.
Kioxia’s AiSAQ allows more scalable RAG AI inference systems by moving database vectors entirely into storage, thus avoiding DRAM growth with increasing database sizes.
Enhance your driving experience with the P12 Pro 4K Mirror Dash Cam Smart Driving Assistant, featuring Front and Rear Cameras, Voice Control, Night Vision, and Parking Monitoring. With a 4.3/5-star rating from 2,070 reviews and over 1,000 units sold in the past month, it’s a top-rated choice for drivers. The dash cam comes with a 32GB Memory Card included, making it ready to use out of the box. Available now for just $119.99, plus a $20 coupon at checkout. Don’t miss out on this smart driving essential from Amazon!
Help Power Techcratic’s Future – Scan To Support
If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.
As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!
BITCOIN bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge Scan the QR code with your crypto wallet app |
DOGECOIN D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA Scan the QR code with your crypto wallet app |
ETHEREUM 0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a Scan the QR code with your crypto wallet app |
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.