Building an AI Agent that puts humans in the loop with Knock and Cloudflare’s Agents SDK

_{This is a guest post by Chris Bell, CTO of}_Knock

There’s a lot of talk right now about building AI agents, but not a lot out there about what it takes to make those agents truly useful.

An Agent is an autonomous system designed to make decisions and perform actions to achieve a specific goal or set of goals, without human input.

No matter how good your agent is at making decisions, you will need a person to provide guidance or input on the agent’s path towards its goal. After all, an agent that cannot interact or respond to the outside world and the systems that govern it will be limited in the problems it can solve.

That’s where the “human-in-the-loop” interaction pattern comes in. You’re bringing a human into the agent’s loop and requiring an input from that human before the agent can continue on its task.

In this blog post, we’ll use Knock and the Cloudflare Agents SDK to build an AI Agent for a virtual card issuing workflow that requires human approval when a new card is requested.

You can find the complete code for this example in the repository.

Knock is messaging infrastructure you can use to send multi-channel messages across in-app, email, SMS, push, and Slack, without writing any integration code.

With Knock, you gain complete visibility into the messages being sent to your users while also handling reliable delivery, user notification preferences, and more.

You can use Knock to power human-in-the-loop flows for your agents using Knock’s Agent Toolkit, which is a set of tools that expose Knock’s APIs and messaging capabilities to your AI agents.

Using the Agent SDK as the foundation of our AI Agent

The Agents SDK provides an abstraction for building stateful, real-time agents on top of Durable Objects that are globally addressable and persist state using an embedded, zero-latency SQLite database.

Building an AI agent outside of using the Agents SDK and the Cloudflare platform means we need to consider WebSocket servers, state persistence, and how to scale our service horizontally. Because a Durable Object backs the Agents SDK, we receive these benefits for free, while having a globally addressable piece of compute with built-in storage, that’s completely serverless and scales to zero.

In the example, we’ll use these features to build an agent that users interact with in real-time via chat, and that can be paused and resumed as needed. The Agents SDK is the ideal platform for powering asynchronous agentic workflows, such as those required in human-in-the-loop interactions.

Setting up our Knock messaging workflow

Within Knock, we design our approval workflow using the visual workflow builder to create the cross-channel messaging logic. We then make the notification templates associated with each channel to which we want to send messages.

Knock will automatically apply the user’s preferences as part of the workflow execution, ensuring that your user’s notification settings are respected.

You can find an example workflow that we’ve already created for this demo in the repository. You can use this workflow template via the Knock CLI to import it into your account.

We’ve built the AI Agent as a chat interface on top of the AIChatAgent abstraction from Cloudflare’s Agents SDK (docs). The Agents SDK here takes care of the bulk of the complexity, and we’re left to implement our LLM calling code with our system prompt.

// src/index.ts

import { AIChatAgent } from "agents/ai-chat-agent";
import { openai } from "@ai-sdk/openai";
import { createDataStreamResponse, streamText } from "ai";

export class AIAgent extends AIChatAgent {
  async onChatMessage(onFinish) {
    return createDataStreamResponse({
      execute: async (dataStream) => {
        try {
          const stream = streamText({
            model: openai("gpt-4o-mini"),
            system: `You are a helpful assistant for a financial services company. You help customers with credit card issuing.`,
            messages: this.messages,
            onFinish,
            maxSteps: 5,
          });

          stream.mergeIntoDataStream(dataStream);
        } catch (error) {
          console.error(error);
        }
      },
    });
  }
}

On the client side, we’re using the useAgentChat hook from the agents/ai-react package to power the real-time user-to-agent chat.

We’ve modeled our agent as a chat per user, which we set up using the useAgent hook by specifying the name of the process as the userId.

// src/index.ts

import { useAgent } from "agents/react";
import { useAgentChat } from "agents/ai-react";

function Chat({ userId }: { userId: string }) {
  const agent = useAgent({ agent: "AIAgent", name: userId });
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useAgentChat({ agent });
  // ... 
}

This means we have an agent process, and therefore a durable object, per-user. For our human-in-the-loop use case, this becomes important later on as we talk about resuming our deferred tool call.

We give the agent our card issuing capability through exposing an issueCard tool. However, instead of writing the approval flow and cross-channel logic ourselves, we delegate it entirely to Knock by wrapping the issue card tool in our requireHumanInput method.

Now when the user asks to request a new card, we make a call out to Knock to initiate our card request, which will notify the appropriate admins in the organization to request an approval.

To set this up, we need to use Knock’s Agent Toolkit, which exposes methods to work with Knock in our AI agent and power cross-channel messaging.

import { createKnockToolkit } from "@knocklabs/agent-toolkit/ai-sdk";
import { tool } from "ai";
import { z } from "zod";

import { AIAgent } from "./index";
import { issueCard } from "./api";
import { BASE_URL } from "./constants";

async function initializeToolkit(agent: AIAgent) {
  const toolkit = await createKnockToolkit({ serviceToken: agent.env.KNOCK_SERVICE_TOKEN });

  const issueCardTool = tool({
    description: "Issue a new credit card to a customer.",
    parameters: z.object({
      customerId: z.string(),
    }),
    execute: async ({ customerId }) => {
      return await issueCard(customerId);
    },
  });

  const { issueCard } = toolkit.requireHumanInput(
    { issueCard: issueCardTool },
    {
      workflow: "approve-issued-card",
      actor: agent.name,
      recipients: ["admin_user_1"],
      metadata: {
        approve_url: `${BASE_URL}/card-issued/approve`,
        reject_url: `${BASE_URL}/card-issued/reject`,
      },
    }
  );
  
  return { toolkit, tools: { issueCard } };  
}

There’s a lot going on here, so let’s walk through the key parts:

We wrap our issueCard tool in the requireHumanInput method, exposed from the Knock Agent Toolkit
We want the messaging workflow to be invoked to be our approve-issued-card workflow
We pass the agent.name as the actor of the request, which translates to the user ID
We set the recipient of this workflow to be the user admin_user_1
We pass the approve and reject URLs so that they can be used in our message templates
The wrapped tool is then returned as issueCard

Under the hood, these options are passed to the Knock workflow trigger API to invoke a workflow per-recipient. The set of the recipients listed here could be dynamic, or go to a group of users through Knock’s subscriptions API.

We can then pass the wrapped issue card tool to our LLM call in the onChatMessage method on the agent so that the tool call can be called as part of the interaction with the agent.

export class AIAgent extends AIChatAgent {
  // ... other methods

  async onChatMessage(onFinish) {
    const { tools } = await initializeToolkit(this);

    return createDataStreamResponse({
      execute: async (dataStream) => {
        const stream = streamText({
          model: openai("gpt-4o-mini"),
          system: "You are a helpful assistant for a financial services company. You help customers with credit card issuing.",
          messages: this.messages,
          onFinish,
          tools,
          maxSteps: 5,
        });

        stream.mergeIntoDataStream(dataStream);
      },
    });
  }
}

Now when the agent calls the issueCardTool, we invoke Knock to send our approval notifications, deferring the tool call to issue the card until we receive an approval. Knock’s workflows take care of sending out the message to the set of recipient’s specified, generating and delivering messages according to each user’s preferences.

Using Knock workflows for our approval message makes it easy to build cross-channel messaging to reach the user according to their communication preferences. We can also leverage delays, throttles, batching, and conditions to orchestrate more complex messaging.

Once the message has been sent to our approvers, the next step is to handle the approval coming back, bringing the human into the agent’s loop.

The approval request is asynchronous, meaning that the response can come at any point in the future. Fortunately, Knock takes care of the heavy lifting here for you, routing the event to the agent worker via a webhook that tracks the interaction with the underlying message. In our case, that’s a click to the “approve” or “reject” button.

First, we set up a message.interacted webhook handler within the Knock dashboard to forward the interactions to our worker, and ultimately to our agent process.

In our example here, we route the approval click back to the worker to handle, appending a Knock message ID to the end of the approve_url and reject_url to track engagement against the specific message sent. We do this via liquid inside of our message templates in Knock: {{ data.approve_url }}?messageId={{ current_message.id }} . One caveat here is that if this were a production application, we’re likely going to handle our approval click in a different application than this agent is running. We co-located it here for the purposes of this demo only.

Once the link is clicked, we have a handler in our worker to mark the message as interacted using Knock’s message interaction API, passing through the status as metadata so that it can be used later.

import Knock from '@knocklabs/node';
import { Hono } from "hono";

const app = new Hono();
const client = new Knock();

app.get("/card-issued/approve", async (c) => {
  const { messageId } = c.req.query();
  
  if (!messageId) return c.text("No message ID found", { status: 400 });

  await client.messages.markAsInteracted(messageId, {
    status: "approved",
  });

  return c.text("Approved");
});

The message interaction will flow from Knock to our worker via the webhook we set up, ensuring that the process is fully asynchronous. The payload of the webhook includes the full message, including metadata about the user that generated the original request, and keeps details about the request itself, which in our case contains the tool call.

import { getAgentByName, routeAgentRequest } from "agents";
import { Hono } from "hono";

const app = new Hono();

app.post("/incoming/knock/webhook", async (c) => {
  const body = await c.req.json();
  const env = c.env as Env;

  // Find the user ID from the tool call for the calling user
  const userId = body?.data?.actors[0];

  if (!userId) {
    return c.text("No user ID found", { status: 400 });
  }

  // Find the agent DO for the user
  const existingAgent = await getAgentByName(env.AIAgent, userId);

  if (existingAgent) {
    // Route the request to the agent DO to process
    const result = await existingAgent.handleIncomingWebhook(body);

    return c.json(result);
  } else {
    return c.text("Not found", { status: 404 });
  }
});

We leverage the agent’s ability to be addressed by a named identifier to route the request from the worker to the agent. In our case, that’s the userId. Because the agent is backed by a durable object, this process of going from incoming worker request to finding and resuming the agent is trivial.

We then use the context about the original tool call, passed through to Knock and round tripped back to the agent, to resume the tool execution and issue the card.

export class AIAgent extends AIChatAgent {
  // ... other methods

  async handleIncomingWebhook(body: any) {
    const { toolkit } = await initializeToolkit(this);

    const deferredToolCall = toolkit.handleMessageInteraction(body);

    if (!deferredToolCall) {
      return { error: "No deferred tool call given" };
    }

    // If we received an "approved" status then we know the call was approved 
    // so we can resume the deferred tool call execution
    if (result.interaction.status === "approved") {
      const toolCallResult = 
	      await toolkit.resumeToolExecution(result.toolCall);

      const { response } = await generateText({
        model: openai("gpt-4o-mini"),
        prompt: `You were asked to issue a card for a customer. The card is now approved. The result was: ${JSON.stringify(toolCallResult)}.`,
      });

      const message = responseToAssistantMessage(
        response.messages[0],
        result.toolCall,
        toolCallResult
      );

      // Save the message so that it's displayed to the user
      this.persistMessages([...this.messages, message]);
    }

    return { status: "success" };
  }
}

Again, there’s a lot going on here, so let’s step through the important parts:

We attempt to transform the body, which is the webhook payload from Knock, into a deferred tool call via the handleMessageInteraction method
If the metadata status we passed through to the interaction call earlier has an “approved” status then we resume the tool call via the resumeToolExecution method
Finally, we generate a message from the LLM and persist it, ensuring that the user is informed of the approved card

With this last piece in place, we can now request a new card be issued, have an approval request be dispatched from the agent, send the approval messages, and route those approvals back to our agent to be processed. The agent will asynchronously process our card issue request and the deferred tool call will be resumed for us, with very little code.

Protecting against duplicate approvals

One issue with the above implementation is that we’re prone to issuing multiple cards if someone clicks on the approve button more than once. To rectify this, we want to keep track of the tool calls being issued, and ensure that the call is processed at most once.

To power this we leverage the agent’s built-in state, which can be used to persist information without reaching for another persistence store like a database or Redis, although we could absolutely do so if we wished. We can track the tool calls by their ID and capture their current status, right inside the agent process.

type ToolCallStatus = "requested" | "approved" | "rejected";

export interface AgentState {
  toolCalls: Record;
}

class AIAgent extends AIChatAgent {
  initialState: AgentState = {
    toolCalls: {},
  };
  
  setToolCallStatus(toolCallId: string, status: ToolCallStatus) {
    this.setState({
      ...this.state,
      toolCalls: { ...this.state.toolCalls, [toolCallId]: status },
    });
  } 
  // ... 
}

Here, we create the initial state for the tool calls as an empty object. We also add a quick setter helper method to make interactions easier.

Next up, we need to record the tool call being made. To do so, we can use the onAfterCallKnock option in the requireHumanInput helper to capture that the tool call has been requested for the user.

const { issueCard }  = toolkit.requireHumanInput(
  { issueCard: issueCardTool },
  {
    // Keep track of the tool call state once it's been sent to Knock
    onAfterCallKnock: async (toolCall) => 
      agent.setToolCallStatus(toolCall.id, "requested"),
    // ... as before
  }
);

Finally, we then need to check the state when we’re processing the incoming webhook, and mark the tool call as approved (some code omitted for brevity).

export class AIAgent extends AIChatAgent {
  async handleIncomingWebhook(body: any) {
    const { toolkit } = await initializeToolkit(this);
    const deferredToolCall = toolkit.handleMessageInteraction(body);
    const toolCallId = result.toolCall.id;

    // Make sure this is a tool call that can be processed
    if (this.state.toolCalls[toolCallId] !== "requested") {
      return { error: "Tool call is not requested" };
    }

    if (result.interaction.status === "approved") {
      const toolCallResult = await toolkit.resumeToolExecution(result.toolCall);
      this.setToolCallStatus(toolCallId, "approved");
      // ... rest as before
    }
  }
}

Using the Agents SDK and Knock, it’s easy to build advanced human-in-the-loop experiences that defer tool calls.

Knock’s workflow builder and notification engine gives you building blocks to create sophisticated cross-channel messaging for your agents. You can easily create escalation flows that send messages through SMS, push, email, or Slack that respect the notification preferences of your users. Knock also gives you complete visibility into the messages your users are receiving.

The Durable Object abstraction underneath the Agents SDK means that we get a globally addressable agent process that’s easy to yield and resume back to. The persistent storage in the Durable Object means we can retain the complete chat history per-user, and any other state that’s required in the agent process to resume the agent with (like our tool calls). Finally, the serverless nature of the underlying Durable Object means we’re able to horizontally scale to support a large number of users with no effort.

If you’re looking to build your own AI Agent chat experience with a multiplayer human-in-the-loop experience, you’ll find the complete code from this guide available in GitHub.

Source link

Building an AI Agent that puts humans in the loop with Knock and Cloudflare’s Agents SDK

Building an AI Agent that puts humans in the loop with Knock and Cloudflare’s Agents SDK

Using the Agent SDK as the foundation of our AI Agent

Setting up our Knock messaging workflow

Protecting against duplicate approvals

Geef een reactie Reactie annuleren