Pre-made Models

AIKit comes with pre-made models that you can use out-of-the-box!

If it doesn't include a specific model, you can always create your own images, and host in a container registry of your choice!

CPU

note

AIKit supports both AMD64 and ARM64 CPUs. You can run the same command on either architecture, and Docker will automatically pull the correct image for your CPU. Depending on your CPU capabilities, AIKit will automatically select the most optimized instruction set.

Model	Optimization	Parameters	Command	Model Name	License
🦙 Llama 3.2	Instruct	1B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.2:1b`	`llama-3.2-1b-instruct`	Llama
🦙 Llama 3.2	Instruct	3B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.2:3b`	`llama-3.2-3b-instruct`	Llama
🦙 Llama 3.1	Instruct	8B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.1:8b`	`llama-3.1-8b-instruct`	Llama
🦙 Llama 3.3	Instruct	70B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.3:70b`	`llama-3.3-70b-instruct`	Llama
Ⓜ️ Mixtral	Instruct	8x7B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b`	`mixtral-8x7b-instruct`	Apache
🅿️ Phi 3.5	Instruct	3.8B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/phi3.5:3.8b`	`phi-3.5-3.8b-instruct`	MIT
🔡 Gemma 2	Instruct	2B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/gemma2:2b`	`gemma-2-2b-instruct`	Gemma
⌨️ Codestral 0.1	Code	22B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/codestral:22b`	`codestral-22b`	MNLP
QwQ		32B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/qwq:32b`	`qwq-32b-preview`	Apache 2.0

NVIDIA CUDA

Model	Optimization	Parameters	Command	Model Name	License
🦙 Llama 3.2	Instruct	1B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.2:1b`	`llama-3.2-1b-instruct`	Llama
🦙 Llama 3.2	Instruct	3B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.2:3b`	`llama-3.2-3b-instruct`	Llama
🦙 Llama 3.1	Instruct	8B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.1:8b`	`llama-3.1-8b-instruct`	Llama
🦙 Llama 3.3	Instruct	70B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.3:70b`	`llama-3.3-70b-instruct`	Llama
Ⓜ️ Mixtral	Instruct	8x7B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b`	`mixtral-8x7b-instruct`	Apache
🅿️ Phi 3.5	Instruct	3.8B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/phi3.5:3.8b`	`phi-3.5-3.8b-instruct`	MIT
🔡 Gemma 2	Instruct	2B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/gemma2:2b`	`gemma-2-2b-instruct`	Gemma
⌨️ Codestral 0.1	Code	22B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/codestral:22b`	`codestral-22b`	MNLP
QwQ		32B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/qwq:32b`	`qwq-32b-preview`	Apache 2.0
📸 Flux 1 Dev	Text to image	12B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/flux1:dev`	`flux-1-dev`	FLUX.1 [dev] Non-Commercial License

note

Please see models folder for pre-made model definitions.

If not being offloaded to GPU VRAM, minimum of 8GB of RAM is required for 7B models, 16GB of RAM to run 13B models, and 32GB of RAM to run 8x7B models.

All pre-made models include CUDA v12 libraries. They are used with NVIDIA GPU acceleration. If a supported NVIDIA GPU is not found in your system, AIKit will automatically fallback to CPU with the most optimized runtime (avx2, avx, or fallback).

Apple Silicon (experimental)

note

To enable GPU acceleration on Apple Silicon, please see Podman Desktop documentation.

Apple Silicon is an experimental runtime and it may change in the future. This runtime is specific to Apple Silicon only, and it will not work as expected on other architectures, including Intel Macs.

Only gguf models are supported on Apple Silicon.

Model	Optimization	Parameters	Command	Model Name	License
🦙 Llama 3.2	Instruct	1B	`podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/llama3.2:1b`	`llama-3.2-1b-instruct`	Llama
🦙 Llama 3.2	Instruct	3B	`podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/llama3.2:3b`	`llama-3.2-3b-instruct`	Llama
🦙 Llama 3.1	Instruct	8B	`podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/llama3.1:8b`	`llama-3.1-8b-instruct`	Llama
🅿️ Phi 3.5	Instruct	3.8B	`podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/phi3.5:3.8b`	`phi-3.5-3.8b-instruct`	MIT
🔡 Gemma 2	Instruct	2B	`podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/gemma2:2b`	`gemma-2-2b-instruct`	Gemma

Deprecated Models

The following pre-made models are deprecated and no longer updated. Images will continue to be pullable, if needed.

If you need to use these specific models, you can always create your own images, and host in a container registry of your choice!

CPU

Model	Optimization	Parameters	Command	License
🐬 Orca 2		13B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/orca2:13b`	Microsoft Research
🅿️ Phi 2	Instruct	2.7B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/phi2:2.7b`	MIT
🅿️ Phi 3	Instruct	3.8B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/phi3:3.8b`	`phi-3-3.8b`
🦙 Llama 3	Instruct	8B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3:8b`	`llama-3-8b-instruct`
🦙 Llama 3	Instruct	70B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3:70b`	`llama-3-70b-instruct`
🦙 Llama 2	Chat	7B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama2:7b`	`llama-2-7b-chat`
🦙 Llama 2	Chat	13B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama2:13b`	`llama-2-13b-chat`
🔡 Gemma 1.1	Instruct	2B	`docker run -d --rm -p 8080:8080 ghcr.io/sozercan/gemma:2b`	`gemma-2b-instruct`

NVIDIA CUDA

Model	Optimization	Parameters	Command	License
🐬 Orca 2		13B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/orca2:13b-cuda`	Microsoft Research
🅿️ Phi 2	Instruct	2.7B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/phi2:2.7b-cuda`	MIT
🅿️ Phi 3	Instruct	3.8B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/phi3:3.8b`	`phi-3-3.8b`
🦙 Llama 3	Instruct	8B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3:8b`	`llama-3-8b-instruct`
🦙 Llama 3	Instruct	70B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3:70b`	`llama-3-70b-instruct`
🦙 Llama 2	Chat	7B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama2:7b`	`llama-2-7b-chat`
🦙 Llama 2	Chat	13B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama2:13b`	`llama-2-13b-chat`
🔡 Gemma 1.1	Instruct	2B	`docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/gemma:2b`	`gemma-2b-instruct`

CPU​

NVIDIA CUDA​

Apple Silicon (experimental)​

Deprecated Models​

CPU​

NVIDIA CUDA​

CPU

NVIDIA CUDA

Apple Silicon (experimental)

Deprecated Models

CPU

NVIDIA CUDA