No description

Go 80.5%
Python 13.8%
Shell 3.9%
Dockerfile 1.8%

Find a file

engel75 736b276222 All checks were successful Build and Push Bifrost with Llama Guardrails / build-and-push (push) Successful in 23m13s Details fix(llama-guard): avoid double BOS when tokenizing client-rendered prompt processor.tokenizer() defaults to add_special_tokens=True, which prepends a BOS token. The client-rendered prompt path already includes the literal '<\|begin_of_text\|>' marker at the start of the string, so the tokenizer was emitting two BOS tokens in a row. Confirmed via DEBUG_FULL_PROMPT decoded_with_special log: before: '<\|begin_of_text\|><\|begin_of_text\|><\|header_start\|>user<\|header_end\|>...' after: '<\|begin_of_text\|><\|header_start\|>user<\|header_end\|>...' Token count drops by 1 and the prompt now matches what apply_chat_template emits exactly. Note: the chat_template path is unaffected because apply_chat_template handles BOS itself; only the bypass path needed this guard.		2026-04-30 08:13:11 +00:00
.claude/skills/gitnexus	chore: complete bifrost-docker → llama-guardrails rename	2026-04-29 13:12:22 +00:00
.forgejo/workflows	workflow needs docker	2026-04-29 20:01:44 +00:00
llama-guard	fix(llama-guard): avoid double BOS when tokenizing client-rendered prompt	2026-04-30 08:13:11 +00:00
llama-guardrails	fix(plugin): tolerate string-encoded numerics/booleans in plugin Config	2026-04-30 07:09:43 +00:00
prompt-guard	initial commit	2026-04-14 11:49:25 +02:00
.gitignore	chore(compose): use pre-built bifrost image, mount plugin/config as volumes	2026-04-29 14:48:41 +00:00
AGENTS.md	chore: complete bifrost-docker → llama-guardrails rename	2026-04-29 13:12:22 +00:00
CLAUDE.md	chore: complete bifrost-docker → llama-guardrails rename	2026-04-29 13:12:22 +00:00
docker-compose.yml	feat(llama-guard): add DEBUG_FULL_PROMPT for tokenizer-input diagnostics	2026-04-29 21:42:01 +00:00
LICENSE	chore: complete bifrost-docker → llama-guardrails rename	2026-04-29 13:12:22 +00:00
README.md	docs: document standalone plugin build and local test workflow	2026-04-29 14:49:29 +00:00
test_header_handling.sh	chore: complete bifrost-docker → llama-guardrails rename	2026-04-29 13:12:22 +00:00

README.md

Bifrost Guardrails Plugin

Content safety plugin for bifrost-http v1.4.23, implementing two-stage guardrails: Prompt Guard (input) and Llama Guard (input + output).

Architecture

                           ┌─────────────────────────────────────────────┐
                           │          bifrost-http (v1.4.23)             │
                           │                                             │
  HTTP Request ───────────►│  PreLLMHook: checks request before LLM call │
                           │  PostLLMHook: checks response after LLM call │
                           └──────────┬──────────────────────┬───────────┘
                                      │                      │
                        ┌─────────────▼───────────┐  ┌──────▼──────────┐
                        │  llama-guardrails.so  │  │                 │
                        │  (Go plugin, CGO build)  │  │                 │
                        │                          │  │                 │
                        │  ┌──────────────────┐   │  │                 │
                        │  │  PromptGuardPool │   │  │                 │
                        │  │  (round-robin)   │   │  │                 │
                        │  └────────┬─────────┘   │  │                 │
                        │           │              │  │                 │
                        │  ┌────────▼─────────┐   │  │                 │
                        │  │  LlamaGuardPool  │   │  │                 │
                        │  │  (round-robin)   │   │  │                 │
                        │  └──────────────────┘   │  │                 │
                        └──────────────────────────┘  │                 │
                                                      │                 │
                        ┌──────────────────────────────▼──┐  ┌──────────▼────────┐
                        │        prompt-guard service      │  │    llama-guard    │
                        │        (CPU, ~2 GB RAM)          │  │    service        │
                        │        port 8010                 │  │    (GPU, ~6-8 GB) │
                        │                                  │  │    port 8011      │
                        │        Llama-Prompt-Guard-2-86M  │  │                   │
                        │        AutoModelForSeqClassifier │  │  Llama-Guard-4-12B │
                        │                                  │  │  4-bit NF4 quantized│
                        └──────────────────────────────────┘  └───────────────────┘

Components

llama-guardrails.so (Go plugin)

Built as a Go plugin (.so) with CGO_ENABLED=1 and DYNAMIC=1, loaded by bifrost-http at runtime as a volume-mounted ELF shared object. Exposes four hook functions:

Hook	Trigger	Guard checked	Blocks?
`PreLLMHook`	Before LLM call	Prompt Guard → Llama Guard	Yes — short-circuit with safety message
`PostLLMHook`	After LLM call	Llama Guard	Yes — replaces response with safety message
`HTTPTransportPreHook`	On HTTP transport layer	Prompt Guard	Yes
`HTTPTransportStreamChunkHook`	On each streaming chunk	Llama Guard	Yes

The Go plugin maintains two round-robin connection pools (PromptGuardPool, LlamaGuardPool) to distribute load across multiple replica URLs.

prompt-guard (Python, CPU)

FastAPI service running Llama-Prompt-Guard-2-86M. A small 86M-parameter classifier with ~3 labels (BENIGN, INJECTION, JAILBREAK). No GPU required.

Request: POST /scan → {"text": "..."} Response: {"label": "INJECTION", "scores": {"BENIGN": 0.01, "INJECTION": 0.98, "JAILBREAK": 0.01}}

If scores[label] > threshold (default 0.9), the request is blocked with: "Request blocked: {label} detected."

llama-guard (Python, GPU)

FastAPI service running Llama-Guard-4-12B with 4-bit NF4 quantization (~6–8 GB VRAM). Performs content safety classification on both user prompts (pre-LLM) and LLM responses (post-LLM).

Request: POST /classify → {"text": "..."} Response: {"label": "unsafe", "category": "S9", "score": 0.95}

Categories follow the Llama Guard taxonomy (S1–S12). Input is truncated to 8192 tokens server-side to bound KV cache memory.

Build & Run

1. Build the plugin (produces `./llama-guardrails/llama_guardrails.so`)

DOCKER_BUILDKIT=1 docker build \
  --target export \
  --output type=local,dest=./llama-guardrails \
  -f llama-guardrails/Dockerfile.plugin \
  ./llama-guardrails

2. Start the stack (uses pre-built bifrost image, mounts the plugin as volume)

docker compose up -d --wait

3. (Optional) Run Go unit tests

cd llama-guardrails && GOWORK=off go test ./... -count=1

Architecture Note: Plugin Build

The Go plugin (llama_guardrails.so) and the Bifrost runtime are built separately:

Bifrost is the pre-built image forge.engelmann.me/engel75/bifrost:v1.4.23-ew-4
The plugin is compiled as a standalone ELF .so via Dockerfile.plugin and mounted into the Bifrost container at runtime

ABI alignment between plugin and runtime is guaranteed by compiling the plugin against the exact same Bifrost source tree (go mod edit -replace github.com/maximhq/bifrost/core=/bifrost/core). See llama-guardrails/Dockerfile.plugin for details.

Configuration

Plugin config in llama-guardrails/config-data.json:

{
  "prompt_guard_urls": ["http://prompt-guard:8010"],
  "llama_guard_urls": ["http://llama-guard:8011"],
  "llama_guard_model": "meta-llama/Llama-Guard-4-12B",
  "timeout_ms": 10000,
  "prompt_guard_threshold": 0.9,
  "log_blocked_requests": true,
  "debug": true
}

Data Flow

Request path (PreLLMHook)

Go plugin extracts last user message from schemas.BifrostRequest
Sends to prompt-guard → POST /scan
- If score > threshold → short-circuit with "Request blocked: {label} detected."
Sends user message to llama-guard → POST /classify
- If label=unsafe → short-circuit with "Request blocked: content violates safety policy (categories: S9)."
If both pass, request proceeds to LLM

Response path (PostLLMHook)

Go plugin extracts assistant message from schemas.BifrostResponse
Sends to llama-guard → POST /classify
- If label=unsafe → replaces response with "I'm unable to provide this response as it violates the platform safety policy."

Memory & VRAM

Service	Model	Memory	Notes
prompt-guard	Prompt-Guard-2-86M	~2 GB RAM	CPU, no GPU
llama-guard	Llama-Guard-4-12B	~6–8 GB VRAM	GPU required, 4-bit NF4 quantized

Llama Guard input is truncated at 8192 tokens server-side to prevent KV cache from blowing up on long conversations. Only the last user message is sent for classification (not the full conversation history).

README.md Unescape Escape