Skip to main content

Exllama v2 (GPTQ and EXL2)

ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.

This backend:

provides support for GPTQ and EXL2 models
requires CUDA runtime

note

This is an experimental backend and it may change in the future.

Example

warning

Please make sure to change syntax to #syntax=ghcr.io/sozercan/aikit:latest in the examples below.

EXL2

https://github.com/sozercan/aikit/blob/main/test/aikitfile-exllama2-exl2.yaml

GPTQ

https://github.com/sozercan/aikit/blob/main/test/aikitfile-exllama2-gptq.yaml

Example
- EXL2
- GPTQ