Run Multiple AI Agents by Voice: One Person, Four Agents, Zero Keystrokes

Q: What is the actual command sequence?

With hands-free on, speak a sentence that starts with 'type', then your prompt; Infina types it into the focused window. Say 'send' to press Enter, then 'open' followed by an app name to move to the next agent, and repeat.

TL;DR: If you run one AI agent, you spend most of your time waiting. If you run several, you become the bottleneck: every prompt costs a window switch, a typed paragraph, and an Enter press. Learning to run multiple AI agents by voice removes that bottleneck. With Infina you dictate a prompt, say "send", say "open" plus the next app's name, and repeat, all hands-free from 2 to 3 feet away, transcribed on-device. It is $99 one-time (at the time of writing) with a 7-day refund, and for parallel-agent work it is the cheapest throughput upgrade you can buy.

Infina is our product, so we are biased. The math below is not ours, though; it is just what parallel agents do to a keyboard.

The economics: one agent wastes you, four agents expose you

Agents changed the shape of the workday. Claude Code, Cursor, and Codex will each happily grind on a task for minutes at a time, which creates a scheduling problem that never existed before.

With one agent, you wait. You send a prompt, the agent works, and you sit there watching tokens stream. Your most expensive resource (your judgment) idles for most of the hour.

With several agents, you are the bottleneck. The obvious fix is to run 2 to 4 sessions on different tasks so something always needs your input. But now every rotation costs you: Cmd-Tab to the right window, click into the input, type a paragraph, press Enter, find the next window.

That per-prompt friction is exactly why most people who try a parallel AI agents workflow quietly slide back to one agent. Not because the agents could not keep up, but because their hands could not.

Voice removes the bottleneck. Speaking is roughly three times faster than typing, as commonly cited, and with a full hands-free loop the window switching goes away too. The agents stop waiting on your fingers.

How to run multiple AI agents by voice: the loop

Here is the entire workflow with Infina's hands-free mode on (double-tap Cmd to enable it):

Read agent one's output on screen.
Speak a sentence that starts with "type", then the next instruction. Infina types it into the focused window.
Say "send". Infina presses Enter.
Say "open Terminal" (or Cursor, or any app by name). You are now at agent two.
Repeat until every agent has work, then go think.

Each rotation takes seconds and zero keystrokes. It works from 2 to 3 feet away, so "check on the agents" no longer means "sit down at the keys".

Dictation apps still make you touch the keyboard to trigger the mic and to send. Infina completes the whole prompt, send, switch-app loop hands-free, which is the difference between dictation and actual agent orchestration by voice. The category is defined in hands-free voice prompting.

A concrete 3-agent setup

A layout we see work well, mixing tools deliberately:

Claude Code, session one (terminal): the big refactor. Long-running, occasional check-ins. The deep-dive on this piece is hands-free Claude Code.
Claude Code, session two (second terminal tab or window): tests and cleanup chores that trail the refactor.
Cursor: UI work, where you want to see the result render. Voice specifics in voice typing for Cursor.

Add a Codex session as a fourth lane if you have an isolated task like docs or a migration script. Beyond four, most people find review (not prompting) becomes the limit, which is fine: the point is that the limit is now your judgment, not your typing.

Rotation in practice sounds like this:

"Type. The refactor plan looks right, go ahead, but keep the public API unchanged. Send." "Open Terminal." "Type. Rerun the failing tests and fix only the assertion messages. Send." "Open Cursor." One more instruction, and every lane is moving again.

Because Infina types at the OS level into whatever app is focused, this works across any mix of terminals, editors, and chat windows with no per-app setup.

Why hands-free matters here specifically

For a single chat window, push-to-talk dictation is honestly enough: hold Option, speak, press Enter. Multi-agent work is where the hands-free part earns its keep, for a physical reason.

Your hands are busy with the real work. While agents grind, you are scrolling a diff, sketching on a whiteboard, holding coffee, annotating a printout. Queueing the next instruction should not force you to put any of that down and reacquire the keyboard.

The rotation is the workload. With four agents you might issue dozens of prompts an hour. At that frequency, the trigger-and-send friction of ordinary dictation is not a rounding error; it is the whole tax.

Distance keeps you in review mode. From a step back you read outputs like an editor instead of a typist. Say the correction the moment you spot it, then keep reading. You review while they work; you never stop to type.

That is the quiet punchline of running multiple AI agents by voice: it does not make any one agent faster. It makes you a better scheduler of all of them.

Honest limits

The plain-spoken fine print:

Hands-free is our newest feature and labeled experimental in the app. It ships off by default and prefers a quiet-ish room; hold-Option push-to-talk is the always-reliable fallback for every step except the switching.
English only in the base product. More languages come with the optional $10/month cloud add-on, which also adds polished output from large language models for the emails and docs side of your day.
Mac only, Apple Silicon required for the on-device models. Transcription runs on the Neural Engine, works offline, and your audio never leaves your Mac by default.
Raw output by design. Agents do not need polished prose, so the base product optimizes for speed. Polish is the add-on, on an app you own, not a $15/month subscription.

FAQ

How many AI agents can one person realistically run by voice? Most people settle at 2 to 4: two Claude Code sessions plus Cursor is a common mix. Past that, reviewing outputs becomes the limit rather than prompting, which is exactly the bottleneck you want to have.

Do I need different tools for different agents? No. Infina types into whatever app is focused, at the OS level, so the same loop drives Claude Code terminals, Cursor, Codex, and any chat window. "Open" plus the app name moves you between them.

What is the actual command sequence? With hands-free on, speak a sentence that starts with "type", then your prompt; Infina types it into the focused window. Say "send" to press Enter, then "open" followed by an app name to move to the next agent, and repeat.

Why not just use a normal dictation app for this? A normal dictation app types text, but triggering the mic, pressing Enter, and switching windows stay on your hands, and at dozens of prompts an hour that friction is the whole cost. Infina runs the complete loop by voice, from 2 to 3 feet away.

Does this require the internet? No. By default transcription and hands-free listening both run on-device (Apple Silicon required), so the whole multi-agent loop works offline and no audio leaves your Mac.

How much does Infina cost? $99 one-time at the time of writing, every 1.x update included, with a 7-day no-questions-asked money-back guarantee. No subscription; the optional cloud add-on is $10/month with its own 7-day trial. Full details on pricing.

The bottom line

Agents made compute cheap and made your attention the scarce input. One agent squanders your attention on waiting; several agents squander it on typing and window juggling.

Voice fixes the allocation. Speak the instruction, send it, switch, and let your eyes and judgment do the only work that still needs a human.

One $99 purchase, no subscription, and every hour of reclaimed rotation time is pure margin. If it does not pay for itself in your first week of parallel-agent work, the refund is one email.