Back to blog
engineeringsecurityrbacarchitecture

The Checkbox That Didn't Click: Fixing RBAC Permissions in a 97-Service Agent OS

March 9, 20268 min readDavid Parkhurst

The screenshot looked great. Twenty-four roles. Twenty-two resources. Six actions per resource — read, create, update, delete, execute, admin. A clean permission matrix with amber checkboxes on a dark grid. Ship it.

Except none of the checkboxes actually worked.

The Symptom

Users reported that the Roles & Permissions page in AitherVeil's admin panel showed all permissions as unchecked — even for the admin role that clearly had full access. Clicking a checkbox appeared to do nothing. The grid was read-only theater.

Worse: the "New Role" button opened a form with a name and description field, but no way to actually assign permissions during creation. You'd create a role, then have to somehow toggle permissions one by one — except that didn't work either.

The Root Cause: Three Parts vs. Two

AitherOS's RBAC system stores permissions as structured objects with three fields:

@dataclass
class Permission:
    resource: str   # e.g. "persona"
    action: str     # e.g. "read"
    scope: str = "*"  # e.g. "*" (all scopes)

    def to_string(self) -> str:
        return f"{self.resource}:{self.action}:{self.scope}"

When the GET /identity/roles endpoint serializes roles, it calls to_string() on each permission. The response looks like:

{
  "id": "editor",
  "name": "editor",
  "permissions": ["persona:read:*", "persona:update:*", "canvas:create:*"]
}

Three-part strings: resource:action:scope.

The frontend permission matrix built check strings like this:

const perm = `${resource}:${action}`  // "persona:read"
const has = role.permissions.includes(perm)

Two-part strings: resource:action.

"persona:read" !== "persona:read:*". Every. Single. Time. The includes() check failed for every permission on every role. The matrix rendered all checkboxes as empty.

It's the kind of bug that's invisible from either side. The backend team sees correct 3-part strings. The frontend team sees a reasonable 2-part convention. Nobody notices until an admin opens the page and sees a blank grid for a role they know has permissions.

The Second Bug: Removals That Never Remove

Even if the checkbox display was fixed, toggling a permission off would silently fail. The toggle handler sent a DELETE request with a 2-part path:

DELETE /roles/editor/permissions/persona:read

The backend removal logic did a strict string comparison:

def remove_permission_from_role(self, role_id, permission, ...):
    perm_str = permission  # "persona:read"
    role.permissions = [
        p for p in role.permissions
        if p.to_string() != perm_str  # "persona:read:*" != "persona:read"
    ]

The stored permission's to_string() returns "persona:read:*". The input is "persona:read". They never match. The filter keeps everything. The permission is never removed. The endpoint returns 200 OK.

Silent data corruption at its finest. The API says success, the permission persists, and the user has no idea their change was discarded.

The Fix: Normalize Everything

Backend: Canonicalize Before Comparing

One-line fix in the RBAC module. Before comparing, parse the input through the same Permission.from_string() to to_string() pipeline:

# Before:
perm_str = permission
# After:
perm_str = Permission.from_string(permission).to_string()

Now "persona:read" gets parsed into Permission(resource="persona", action="read", scope="*") and serialized back to "persona:read:*" before the comparison. The match succeeds. The permission is removed.

Frontend: Match All Formats

The frontend now uses a hasPerm() helper that handles every format the backend might return:

function hasPerm(permissions: string[], resource: string, action: string): boolean {
    if (permissions.some(p => p === '*:*:*' || p === '*:*')) return true
    if (permissions.some(p => p === `${resource}:*:*` || p === `${resource}:*`)) return true
    const two = `${resource}:${action}`
    const three = `${resource}:${action}:*`
    return permissions.some(p => p === two || p === three)
}

And a normPerm() helper that always sends 3-part strings to the API:

function normPerm(resource: string, action: string): string {
    return `${resource}:${action}:*`
}

This is defensive by design. If the backend ever returns 2-part strings (from older data or a different code path), the frontend still works. If it returns 3-part, it still works. Wildcards at the resource or global level are handled.

Building Proper Custom Roles

With the permission toggle actually working, we rebuilt the role creation experience from scratch.

The New "Create Role" Flow

  1. Name and description — standard form fields
  2. Inheritance — check existing roles to inherit their permissions
  3. Permission matrix — an expandable, interactive grid identical to the one used for editing existing roles

The matrix supports bulk operations:

  • Click a row header to toggle all 6 actions for a resource
  • Click a column header to toggle all 22 resources for an action
  • Click the corner checkbox to select/deselect everything

Permissions are stored in local state as an array of 3-part strings. When you click "Create Role," the entire permission set is sent in a single POST /identity/roles request.

Editing Existing Roles

Each role now expands to show its full permission matrix. Clicking a checkbox fires an immediate API call — POST to add, DELETE to remove — and optimistically updates the UI from the response. For bulk operations (toggle a whole row/column), a single PUT /identity/roles/{id} replaces the entire permission array.

System roles (admin, super_admin, and anything with is_system: true) show as read-only with a lock icon. You can view their permissions but not modify them.

Role Duplication

Want a role that's almost like an existing one? Click the copy icon. It creates a {name}-copy role with identical permissions and no inheritance, so you can customize from there.

What We Actually Shipped

ChangeFileImpact
Normalize permission strings before comparisonRBAC moduleFixes silent removal failures
Rewrite permission matrix with 3-part format handlingroles/page.tsxCheckboxes actually show and toggle state
Full permission matrix in role creation formroles/page.tsxCustom roles with permissions at creation time
Bulk toggle (row, column, all)roles/page.tsxFast permission configuration
Inline role editingroles/page.tsxEdit name/description without navigation
Role duplicationroles/page.tsxQuick custom role creation from templates
System role protectionroles/page.tsxPrevents accidental modification of critical roles
Error toast on every API callroles/page.tsxNo more silent failures

Lessons

Serialization format mismatches are the worst bugs. Both sides think they're right because they are — within their own context. The backend's 3-part format is correct per the data model. The frontend's 2-part assumption is a reasonable convention. Neither side fails loudly. You need integration testing that asserts on actual rendered state, not just API responses.

Silent successes are worse than loud failures. The remove endpoint returned 200 even when nothing was removed. The filter expression [p for p in perms if p.to_string() != perm_str] doesn't fail when nothing matches — it just returns the original list. Adding a check like if len(before) == len(after): raise ValueError would have caught this immediately.

Admin panels are infrastructure. Every RBAC page that doesn't work is a security gap — it means permissions are managed by direct API calls, database edits, or not at all. The faster admins can create and configure roles visually, the tighter your actual security posture becomes.

The checkboxes click now.