Kabir Goel

Apr 25, 2025

Prediction: People will be defining voice agent behavior using React-esque semantics sooner or later. I don’t mean as part of a web app, I mean that voice-only agents will have conversational flows defined using React/JSX. Something like this, essentially (this will seem familiar if you’ve built a voice agent flow with Pipecat Flows or another orchestration tool):

export function ConversationFlow({ patientName, patientBirthday }) {
  const [authStatus, setAuthStatus] = useState('unauthenticated');

  return (
    <Conversation>
      <Node>
        <System>You are a friendly voice assistant.</System>
        <System>Ask the user to confirm their name and their birthday.</System>
        <Tool name="verify" properties={{ name: ..., birthday: ... }} execute={(props) => {
          if (props.name === patientName && props.birthday === patientBirthday) {
	    setAuthStatus('authenticated');
          } else {
	    setAuthStatus('forbidden');
	  }
        }} />
      </Node>
      {authStatus === 'authenticated' ? <AuthenticatedFlow /> : <UnauthorizedFlow />}
    </Conversation>
  );
}

This probably won’t be the exact form, since you need to keep in mind that:

You can’t “re-render” because you’re laying things out in time instead of space.
The ordering of elements is meaningful, and state (such as verified) can only flow downwards since you can’t go back in time.

…and design the affordances accordingly. But this model definitely feels more amenable to building good abstractions than defining conversation flows in code.