Skip to main content

Exllama v2 (GPTQ and EXL2)

ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.

This backend:

  • provides support for GPTQ and EXL2 models
  • requires CUDA runtime
note

This is an experimental backend and it may change in the future.

Example

warning

Please make sure to change syntax to #syntax=ghcr.io/sozercan/aikit:latest in the examples below.

EXL2

https://github.com/sozercan/aikit/blob/main/test/aikitfile-exllama2-exl2.yaml

GPTQ

https://github.com/sozercan/aikit/blob/main/test/aikitfile-exllama2-gptq.yaml