No description
  • Go 80.5%
  • Python 13.8%
  • Shell 3.9%
  • Dockerfile 1.8%
Find a file
engel75 736b276222
All checks were successful
Build and Push Bifrost with Llama Guardrails / build-and-push (push) Successful in 23m13s
fix(llama-guard): avoid double BOS when tokenizing client-rendered prompt
processor.tokenizer() defaults to add_special_tokens=True, which prepends
a BOS token. The client-rendered prompt path already includes the literal
'<|begin_of_text|>' marker at the start of the string, so the tokenizer
was emitting two BOS tokens in a row.

Confirmed via DEBUG_FULL_PROMPT decoded_with_special log:
  before: '<|begin_of_text|><|begin_of_text|><|header_start|>user<|header_end|>...'
  after:  '<|begin_of_text|><|header_start|>user<|header_end|>...'

Token count drops by 1 and the prompt now matches what apply_chat_template
emits exactly. Note: the chat_template path is unaffected because
apply_chat_template handles BOS itself; only the bypass path needed this
guard.
2026-04-30 08:13:11 +00:00
.claude/skills/gitnexus chore: complete bifrost-docker → llama-guardrails rename 2026-04-29 13:12:22 +00:00
.forgejo/workflows workflow needs docker 2026-04-29 20:01:44 +00:00
llama-guard fix(llama-guard): avoid double BOS when tokenizing client-rendered prompt 2026-04-30 08:13:11 +00:00
llama-guardrails fix(plugin): tolerate string-encoded numerics/booleans in plugin Config 2026-04-30 07:09:43 +00:00
prompt-guard initial commit 2026-04-14 11:49:25 +02:00
.gitignore chore(compose): use pre-built bifrost image, mount plugin/config as volumes 2026-04-29 14:48:41 +00:00
AGENTS.md chore: complete bifrost-docker → llama-guardrails rename 2026-04-29 13:12:22 +00:00
CLAUDE.md chore: complete bifrost-docker → llama-guardrails rename 2026-04-29 13:12:22 +00:00
docker-compose.yml feat(llama-guard): add DEBUG_FULL_PROMPT for tokenizer-input diagnostics 2026-04-29 21:42:01 +00:00
LICENSE chore: complete bifrost-docker → llama-guardrails rename 2026-04-29 13:12:22 +00:00
README.md docs: document standalone plugin build and local test workflow 2026-04-29 14:49:29 +00:00
test_header_handling.sh chore: complete bifrost-docker → llama-guardrails rename 2026-04-29 13:12:22 +00:00

Bifrost Guardrails Plugin

Content safety plugin for bifrost-http v1.4.23, implementing two-stage guardrails: Prompt Guard (input) and Llama Guard (input + output).

Architecture

                           ┌─────────────────────────────────────────────┐
                           │          bifrost-http (v1.4.23)             │
                           │                                             │
  HTTP Request ───────────►│  PreLLMHook: checks request before LLM call │
                           │  PostLLMHook: checks response after LLM call │
                           └──────────┬──────────────────────┬───────────┘
                                      │                      │
                        ┌─────────────▼───────────┐  ┌──────▼──────────┐
                        │  llama-guardrails.so  │  │                 │
                        │  (Go plugin, CGO build)  │  │                 │
                        │                          │  │                 │
                        │  ┌──────────────────┐   │  │                 │
                        │  │  PromptGuardPool │   │  │                 │
                        │  │  (round-robin)   │   │  │                 │
                        │  └────────┬─────────┘   │  │                 │
                        │           │              │  │                 │
                        │  ┌────────▼─────────┐   │  │                 │
                        │  │  LlamaGuardPool  │   │  │                 │
                        │  │  (round-robin)   │   │  │                 │
                        │  └──────────────────┘   │  │                 │
                        └──────────────────────────┘  │                 │
                                                      │                 │
                        ┌──────────────────────────────▼──┐  ┌──────────▼────────┐
                        │        prompt-guard service      │  │    llama-guard    │
                        │        (CPU, ~2 GB RAM)          │  │    service        │
                        │        port 8010                 │  │    (GPU, ~6-8 GB) │
                        │                                  │  │    port 8011      │
                        │        Llama-Prompt-Guard-2-86M  │  │                   │
                        │        AutoModelForSeqClassifier │  │  Llama-Guard-4-12B │
                        │                                  │  │  4-bit NF4 quantized│
                        └──────────────────────────────────┘  └───────────────────┘

Components

llama-guardrails.so (Go plugin)

Built as a Go plugin (.so) with CGO_ENABLED=1 and DYNAMIC=1, loaded by bifrost-http at runtime as a volume-mounted ELF shared object. Exposes four hook functions:

Hook Trigger Guard checked Blocks?
PreLLMHook Before LLM call Prompt Guard → Llama Guard Yes — short-circuit with safety message
PostLLMHook After LLM call Llama Guard Yes — replaces response with safety message
HTTPTransportPreHook On HTTP transport layer Prompt Guard Yes
HTTPTransportStreamChunkHook On each streaming chunk Llama Guard Yes

The Go plugin maintains two round-robin connection pools (PromptGuardPool, LlamaGuardPool) to distribute load across multiple replica URLs.

prompt-guard (Python, CPU)

FastAPI service running Llama-Prompt-Guard-2-86M. A small 86M-parameter classifier with ~3 labels (BENIGN, INJECTION, JAILBREAK). No GPU required.

Request: POST /scan{"text": "..."} Response: {"label": "INJECTION", "scores": {"BENIGN": 0.01, "INJECTION": 0.98, "JAILBREAK": 0.01}}

If scores[label] > threshold (default 0.9), the request is blocked with: "Request blocked: {label} detected."

llama-guard (Python, GPU)

FastAPI service running Llama-Guard-4-12B with 4-bit NF4 quantization (~68 GB VRAM). Performs content safety classification on both user prompts (pre-LLM) and LLM responses (post-LLM).

Request: POST /classify{"text": "..."} Response: {"label": "unsafe", "category": "S9", "score": 0.95}

Categories follow the Llama Guard taxonomy (S1S12). Input is truncated to 8192 tokens server-side to bound KV cache memory.

Build & Run

1. Build the plugin (produces ./llama-guardrails/llama_guardrails.so)

DOCKER_BUILDKIT=1 docker build \
  --target export \
  --output type=local,dest=./llama-guardrails \
  -f llama-guardrails/Dockerfile.plugin \
  ./llama-guardrails

2. Start the stack (uses pre-built bifrost image, mounts the plugin as volume)

docker compose up -d --wait

3. (Optional) Run Go unit tests

cd llama-guardrails && GOWORK=off go test ./... -count=1

Architecture Note: Plugin Build

The Go plugin (llama_guardrails.so) and the Bifrost runtime are built separately:

  • Bifrost is the pre-built image forge.engelmann.me/engel75/bifrost:v1.4.23-ew-4
  • The plugin is compiled as a standalone ELF .so via Dockerfile.plugin and mounted into the Bifrost container at runtime

ABI alignment between plugin and runtime is guaranteed by compiling the plugin against the exact same Bifrost source tree (go mod edit -replace github.com/maximhq/bifrost/core=/bifrost/core). See llama-guardrails/Dockerfile.plugin for details.

Configuration

Plugin config in llama-guardrails/config-data.json:

{
  "prompt_guard_urls": ["http://prompt-guard:8010"],
  "llama_guard_urls": ["http://llama-guard:8011"],
  "llama_guard_model": "meta-llama/Llama-Guard-4-12B",
  "timeout_ms": 10000,
  "prompt_guard_threshold": 0.9,
  "log_blocked_requests": true,
  "debug": true
}

Data Flow

Request path (PreLLMHook)

  1. Go plugin extracts last user message from schemas.BifrostRequest
  2. Sends to prompt-guardPOST /scan
    • If score > threshold → short-circuit with "Request blocked: {label} detected."
  3. Sends user message to llama-guardPOST /classify
    • If label=unsafe → short-circuit with "Request blocked: content violates safety policy (categories: S9)."
  4. If both pass, request proceeds to LLM

Response path (PostLLMHook)

  1. Go plugin extracts assistant message from schemas.BifrostResponse
  2. Sends to llama-guardPOST /classify
    • If label=unsafe → replaces response with "I'm unable to provide this response as it violates the platform safety policy."

Memory & VRAM

Service Model Memory Notes
prompt-guard Prompt-Guard-2-86M ~2 GB RAM CPU, no GPU
llama-guard Llama-Guard-4-12B ~68 GB VRAM GPU required, 4-bit NF4 quantized

Llama Guard input is truncated at 8192 tokens server-side to prevent KV cache from blowing up on long conversations. Only the last user message is sent for classification (not the full conversation history).