2025-01-31 09:20:00
github.com
The RamaLama project’s goal is to make working with AI boring
through the use of OCI containers.
RamaLama tool facilitates local management and serving of AI Models.
On first run RamaLama inspects your system for GPU support, falling back to CPU support if no GPUs are present.
RamaLama uses container engines like Podman or Docker to pull the appropriate OCI image with all of the software necessary to run an AI Model for your systems setup.
Running in containers eliminates the need for users to configure the host system for AI. After the initialization, RamaLama runs the AI Models within a container based on the OCI image.
RamaLama then pulls AI Models from model registries. Starting a chatbot or a rest API service from a simple single command. Models are treated similarly to how Podman and Docker treat container images.
When both Podman and Docker are installed, RamaLama defaults to Podman, The RAMALAMA_CONTAINER_ENGINE=docker
environment variable can override this behaviour. When neither are installed RamaLama will attempt to run the model with software on the local system.
RamaLama supports multiple AI model registries types called transports.
Supported transports:
RamaLama uses the Ollama registry transport by default. Use the RAMALAMA_TRANSPORTS environment variable to modify the default. export RAMALAMA_TRANSPORT=huggingface
Changes RamaLama to use huggingface transport.
Individual model transports can be modifies when specifying a model via the huggingface://
, oci://
, or ollama://
prefix.
ramalama pull huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf
To make it easier for users, RamaLama uses shortname files, which container
alias names for fully specified AI Models allowing users to specify the shorter
names when referring to models. RamaLama reads shortnames.conf files if they
exist . These files contain a list of name value pairs for specification of
the model. The following table specifies the order which RamaLama reads the files
. Any duplicate names that exist override previously defined shortnames.
Shortnames type | Path |
---|---|
Distribution | /usr/share/ramalama/shortnames.conf |
Administrators | /etc/ramamala/shortnames.conf |
Users | $HOME/.config/ramalama/shortnames.conf |
$ cat /usr/share/ramalama/shortnames.conf
[shortnames]
"tiny" = "ollama://tinyllama"
"granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"granite:7b" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"ibm/granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"merlinite" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
"merlinite:7b" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
...
RamaLama is available via PyPi https://pypi.org/project/ramalama
Tip
If you are a macOS user, this is the preferred method.
Install RamaLama by running this one-liner:
curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.sh | bash
Hardware | Enabled |
---|---|
CPU | ✅ |
Apple Silicon GPU (Linux / Asahi) | ✅ |
Apple Silicon GPU (macOS) | ✅ |
Apple Silicon GPU (podman-machine) | ✅ |
Nvidia GPU (cuda) | ✅ |
AMD GPU (rocm) | ✅ |
You can run
a chatbot on a model using the run
command. By default, it pulls from the Ollama registry.
Note: RamaLama will inspect your machine for native GPU support and then will
use a container engine like Podman to pull an OCI container image with the
appropriate code and libraries to run the AI Model. This can take a long time to setup, but only on the first run.
$ ramalama run instructlab/merlinite-7b-lab
Copying blob 5448ec8c0696 [--------------------------------------] 0.0b / 63.6MiB (skipped: 0.0b = 0.00%)
Copying blob cbd7e392a514 [--------------------------------------] 0.0b / 65.3MiB (skipped: 0.0b = 0.00%)
Copying blob 5d6c72bcd967 done 208.5MiB / 208.5MiB (skipped: 0.0b = 0.00%)
Copying blob 9ccfa45da380 [--------------------------------------] 0.0b / 7.6MiB (skipped: 0.0b = 0.00%)
Copying blob 4472627772b1 [--------------------------------------] 0.0b / 120.0b (skipped: 0.0b = 0.00%)
>
After the initial container image has been downloaded, you can interact with
different models, using the container image.
$ ramalama run granite3-moe
> Write a hello world application in python
print("Hello World")
In a different terminal window see the running podman container.
$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
91df4a39a360 quay.io/ramalama/ramalama:latest /home/dwalsh/rama... 4 minutes ago Up 4 minutes gifted_volhard
You can list
all models pulled into local storage.
$ ramalama list
NAME MODIFIED SIZE
ollama://smollm:135m 16 hours ago 5.5M
huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf 14 hours ago 460M
ollama://moondream:latest 6 days ago 791M
ollama://phi4:latest 6 days ago 8.43 GB
ollama://tinyllama:latest 1 week ago 608.16 MB
ollama://granite3-moe:3b 1 week ago 1.92 GB
ollama://granite3-moe:latest 3 months ago 1.92 GB
ollama://llama3.1:8b 2 months ago 4.34 GB
ollama://llama3.1:latest 2 months ago 4.34 GB
You can pull
a model using the pull
command. By default, it pulls from the Ollama registry.
$ ramalama pull granite3-moe
31% |████████ | 250.11 MB/ 783.77 MB 36.95 MB/s 14s
You can serve
multiple models using the serve
command. By default, it pulls from the Ollama registry.
$ ramalama serve --name mylama llama3
You can stop a running model if it is running in a container.
To use a UI, run a ramalama serve
command, then connect via your browser at:
127.0.0.1:8080
+---------------------------+
| |
| ramalama run granite3-moe |
| |
+-------+-------------------+
|
|
| +------------------+ +------------------+
| | Pull inferencing | | Pull model layer |
+-----------| runtime (cuda) |---------->| granite3-moe |
+------------------+ +------------------+
| Repo options: |
+-+-------+------+-+
| | |
v v v
+---------+ +------+ +----------+
| Hugging | | OCI | | Ollama |
| Face | | | | Registry |
+-------+-+ +---+--+ +-+--------+
| | |
v v v
+------------------+
| Start with |
| cuda runtime |
| and |
| granite3-moe |
+------------------+
Regard this alpha, everything is under development, so expect breaking changes, luckily it’s easy to reset everything and re-install:
rm -rf /var/lib/ramalama # only required if running as root user
rm -rf $HOME/.local/share/ramalama
and install again.
This project wouldn’t be possible without the help of other projects like:
llama.cpp
whisper.cpp
vllm
podman
huggingface
so if you like this tool, give some of these repos a ⭐, and hey, give us a ⭐ too while you are at it.
Open to contributors
Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.
Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!
Help Power Techcratic’s Future – Scan To Support
If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.
As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!
BITCOIN bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge Scan the QR code with your crypto wallet app |
DOGECOIN D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA Scan the QR code with your crypto wallet app |
ETHEREUM 0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a Scan the QR code with your crypto wallet app |
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.