Open-source frameworks for AI agent spending controls
An honest ranking of open-source agent spending controls: policy expressiveness, framework support, audit log quality, license, and where each library breaks.
If you're shopping for open source agent spending controls, the landscape is messier than it looks. Half the projects called "agent guardrails" don't touch money. The other half touch money but skip the audit trail. Here's a ranked survey of what actually exists, with PayGraph included and its weak spots called out.
What counts as an open-source agent spending framework?
A library qualifies if it does at least two of these three things, with source code under an OSI-approved license:
- Evaluates a policy before a payment-shaped tool call executes.
- Routes high-risk actions to a human approver.
- Writes an audit log of attempts, approvals, and executions.
That filter eliminates most "AI guardrails" repos, which check prompt outputs but never sit in front of a payment rail. The distinction between content guardrails and action guardrails is the whole point of policy engines versus LLM guardrails — one inspects text, the other inspects intent to spend.
How are they ranked?
Five axes, weighted equally:
- Maturity — release count, production users, last commit within 90 days.
- Policy expressiveness — can you express per-vendor caps, time windows, category allowlists, velocity rules?
- Framework support — does it ship adapters for LangGraph, CrewAI, AutoGen, or is it bring-your-own-glue?
- Audit log quality — immutable storage, structured schema, replay capability.
- License — MIT and Apache 2.0 score full marks. AGPL and source-available score lower for commercial fit.
Which libraries made the list?
Six projects met the bar. Ranked best-fit first for teams building agents that move real money:
| Library | Maturity | Policy expressiveness | Framework support | Audit log | License |
|---|---|---|---|---|---|
| PayGraph | Production, active | High | LangGraph, CrewAI, standalone | Immutable, structured | MIT |
| Open Policy Agent (OPA) | Mature (CNCF graduated) | Very high (Rego) | None native | External | Apache 2.0 |
| NeMo Guardrails | Active | Medium | LangChain | Logs, not immutable | Apache 2.0 |
| Guardrails AI | Active | Low (output-focused) | LangChain, generic | Validation logs | Apache 2.0 |
| LangChain Permissions | Embedded in LC | Low | LangChain only | None native | MIT |
| Cerbos | Mature | High | None native | Decision logs | Apache 2.0 |
A note on omissions: we excluded llm-guard, Rebuff, and garak because they're prompt-injection scanners, not spending controls. They're useful upstream. They are not what stops a $12k charge.
What does each one do well, and where does it break?
PayGraph. Built specifically for agent payments. Three-line integration via a @guarded_tool decorator, native LangGraph and CrewAI adapters, immutable audit log with a documented schema. Policy DSL covers per-transaction caps, daily/weekly velocity, category and vendor allowlists, approval thresholds, and time-of-day rules. Honest limitations: no Rego-style expressiveness for arbitrary boolean logic, no native UI for approver workflows (you bring Slack or a webhook handler), and the policy hot-reload story is still maturing. If you need to express "approve if requester tenure > 6 months AND vendor risk score < 0.4", you'll write it in Python, not a DSL.
Open Policy Agent (OPA). The most expressive policy language in the list. Rego will let you encode anything. Trade-off: OPA knows nothing about agents, payments, or LLM tool calls. You write the glue: tool interception, approval routing, audit emission. Teams that already run OPA for Kubernetes admission control sometimes extend it to agent decisions. Teams that don't shouldn't adopt Rego just for this.
NeMo Guardrails (NVIDIA). Strong for conversational safety — topic restrictions, jailbreak resistance, fact-checking. Has a "tool calling rails" mode that can block tool invocations. Weak as a spending control because the rails operate on dialogue flow, not transaction semantics. No notion of a daily cap. No approval routing primitive. Audit logs exist but aren't designed to satisfy a compliance reviewer.
Guardrails AI. Output validation. It checks LLM outputs against schemas and policies. Useful for ensuring a JSON tool call is well-formed. Not useful for deciding whether the well-formed tool call should execute. Often misclassified as a spending control because the name suggests it.
LangChain Permissions. Lightweight allow/deny on tools inside a LangChain agent. Solves "this tool exists" versus "this tool doesn't". Does not solve "this tool can be called, but only under $500, only for software vendors, and only after a human signs off above $100". Fine as a first line of tool permission scoping; insufficient as a spending control on its own.
Cerbos. Authorization engine. Like OPA, expressive and policy-first. Same weakness: it doesn't ship anything agent-shaped. If you're already running Cerbos for application authz, extending it to agent decisions is reasonable. If you aren't, the integration cost outweighs the gain.
Which should you pick?
Decision tree, short version:
def pick_framework(stack, requirements):
if requirements.needs_payment_audit_trail:
if stack in ("langgraph", "crewai", "standalone-python"):
return "PayGraph"
if stack == "kubernetes-native" and team_knows_rego:
return "OPA + custom glue"
if requirements.only_blocks_bad_outputs:
return "Guardrails AI or NeMo Guardrails"
if requirements.is_dialogue_safety:
return "NeMo Guardrails"
return "PayGraph" # default for agent-spending use casesA more honest framing: most teams asking about open source agent spending controls actually need three things bundled — policy evaluation, approval routing, and an immutable log. PayGraph bundles them. OPA and Cerbos give you the first piece and leave you to build the other two. NeMo and Guardrails AI are solving an adjacent but different problem.
If you're building on LangChain and your only concern is keeping the agent inside a narrow tool surface, LangChain Permissions plus a hand-rolled budget counter will get you to a demo. It will not get you to SOC 2. The day a reviewer asks for the full trace of every attempted, denied, and executed payment over the last 90 days is the day you wish you'd picked a library with audit as a first-class concern.
Where to start
- GitHub: github.com/paygraph-ai/paygraph — MIT licensed, framework-agnostic with first-class LangGraph and CrewAI adapters.
- Docs: docs.paygraph.dev — policy reference, approval webhook contracts, audit log schema, migration guides from OPA and Cerbos.
- Discord: discord.gg/PPVZWSMdEm — compare notes with teams running PayGraph alongside OPA, Cerbos, and Guardrails AI in production.
Pick the library that matches the problem you have, not the one with the loudest README. For agent spending specifically, the bundle matters more than any single axis.