The Checkbox That Didn't Click: Fixing RBAC Permissions in a 97-Service Agent OS
The screenshot looked great. Twenty-four roles. Twenty-two resources. Six actions per resource — read, create, update, delete, execute, admin. A clean permission matrix with amber checkboxes on a dark grid. Ship it.
Except none of the checkboxes actually worked.
The Symptom
Users reported that the Roles & Permissions page in AitherVeil's admin panel showed all permissions as unchecked — even for the admin role that clearly had full access. Clicking a checkbox appeared to do nothing. The grid was read-only theater.
Worse: the "New Role" button opened a form with a name and description field, but no way to actually assign permissions during creation. You'd create a role, then have to somehow toggle permissions one by one — except that didn't work either.
The Root Cause: Three Parts vs. Two
AitherOS's RBAC system stores permissions as structured objects with three fields:
@dataclass
class Permission:
resource: str # e.g. "persona"
action: str # e.g. "read"
scope: str = "*" # e.g. "*" (all scopes)
def to_string(self) -> str:
return f"{self.resource}:{self.action}:{self.scope}"
When the GET /identity/roles endpoint serializes roles, it calls to_string() on each permission. The response looks like:
{
"id": "editor",
"name": "editor",
"permissions": ["persona:read:*", "persona:update:*", "canvas:create:*"]
}
Three-part strings: resource:action:scope.
The frontend permission matrix built check strings like this:
const perm = `${resource}:${action}` // "persona:read"
const has = role.permissions.includes(perm)
Two-part strings: resource:action.
"persona:read" !== "persona:read:*". Every. Single. Time. The includes() check failed for every permission on every role. The matrix rendered all checkboxes as empty.
It's the kind of bug that's invisible from either side. The backend team sees correct 3-part strings. The frontend team sees a reasonable 2-part convention. Nobody notices until an admin opens the page and sees a blank grid for a role they know has permissions.
The Second Bug: Removals That Never Remove
Even if the checkbox display was fixed, toggling a permission off would silently fail. The toggle handler sent a DELETE request with a 2-part path:
DELETE /roles/editor/permissions/persona:read
The backend removal logic did a strict string comparison:
def remove_permission_from_role(self, role_id, permission, ...):
perm_str = permission # "persona:read"
role.permissions = [
p for p in role.permissions
if p.to_string() != perm_str # "persona:read:*" != "persona:read"
]
The stored permission's to_string() returns "persona:read:*". The input is "persona:read". They never match. The filter keeps everything. The permission is never removed. The endpoint returns 200 OK.
Silent data corruption at its finest. The API says success, the permission persists, and the user has no idea their change was discarded.
The Fix: Normalize Everything
Backend: Canonicalize Before Comparing
One-line fix in the RBAC module. Before comparing, parse the input through the same Permission.from_string() to to_string() pipeline:
# Before:
perm_str = permission
# After:
perm_str = Permission.from_string(permission).to_string()
Now "persona:read" gets parsed into Permission(resource="persona", action="read", scope="*") and serialized back to "persona:read:*" before the comparison. The match succeeds. The permission is removed.
Frontend: Match All Formats
The frontend now uses a hasPerm() helper that handles every format the backend might return:
function hasPerm(permissions: string[], resource: string, action: string): boolean {
if (permissions.some(p => p === '*:*:*' || p === '*:*')) return true
if (permissions.some(p => p === `${resource}:*:*` || p === `${resource}:*`)) return true
const two = `${resource}:${action}`
const three = `${resource}:${action}:*`
return permissions.some(p => p === two || p === three)
}
And a normPerm() helper that always sends 3-part strings to the API:
function normPerm(resource: string, action: string): string {
return `${resource}:${action}:*`
}
This is defensive by design. If the backend ever returns 2-part strings (from older data or a different code path), the frontend still works. If it returns 3-part, it still works. Wildcards at the resource or global level are handled.
Building Proper Custom Roles
With the permission toggle actually working, we rebuilt the role creation experience from scratch.
The New "Create Role" Flow
- Name and description — standard form fields
- Inheritance — check existing roles to inherit their permissions
- Permission matrix — an expandable, interactive grid identical to the one used for editing existing roles
The matrix supports bulk operations:
- Click a row header to toggle all 6 actions for a resource
- Click a column header to toggle all 22 resources for an action
- Click the corner checkbox to select/deselect everything
Permissions are stored in local state as an array of 3-part strings. When you click "Create Role," the entire permission set is sent in a single POST /identity/roles request.
Editing Existing Roles
Each role now expands to show its full permission matrix. Clicking a checkbox fires an immediate API call — POST to add, DELETE to remove — and optimistically updates the UI from the response. For bulk operations (toggle a whole row/column), a single PUT /identity/roles/{id} replaces the entire permission array.
System roles (admin, super_admin, and anything with is_system: true) show as read-only with a lock icon. You can view their permissions but not modify them.
Role Duplication
Want a role that's almost like an existing one? Click the copy icon. It creates a {name}-copy role with identical permissions and no inheritance, so you can customize from there.
What We Actually Shipped
| Change | File | Impact |
|---|---|---|
| Normalize permission strings before comparison | RBAC module | Fixes silent removal failures |
| Rewrite permission matrix with 3-part format handling | roles/page.tsx | Checkboxes actually show and toggle state |
| Full permission matrix in role creation form | roles/page.tsx | Custom roles with permissions at creation time |
| Bulk toggle (row, column, all) | roles/page.tsx | Fast permission configuration |
| Inline role editing | roles/page.tsx | Edit name/description without navigation |
| Role duplication | roles/page.tsx | Quick custom role creation from templates |
| System role protection | roles/page.tsx | Prevents accidental modification of critical roles |
| Error toast on every API call | roles/page.tsx | No more silent failures |
Lessons
Serialization format mismatches are the worst bugs. Both sides think they're right because they are — within their own context. The backend's 3-part format is correct per the data model. The frontend's 2-part assumption is a reasonable convention. Neither side fails loudly. You need integration testing that asserts on actual rendered state, not just API responses.
Silent successes are worse than loud failures. The remove endpoint returned 200 even when nothing was removed. The filter expression [p for p in perms if p.to_string() != perm_str] doesn't fail when nothing matches — it just returns the original list. Adding a check like if len(before) == len(after): raise ValueError would have caught this immediately.
Admin panels are infrastructure. Every RBAC page that doesn't work is a security gap — it means permissions are managed by direct API calls, database edits, or not at all. The faster admins can create and configure roles visually, the tighter your actual security posture becomes.
The checkboxes click now.