Why Prompt by Voice? The Full Case for Voice Prompting

TL;DR: Prompts are long, conversational, throwaway text, which is exactly the text dictation handles best, and speaking is commonly cited at roughly three times typing speed. With Infina the loop goes fully hands-free: from a couple of feet away you say "type" plus your prompt and it gets typed, say "send" and Enter is pressed, say "open Claude Code" or "open Cursor" and you are in front of the next agent. No keyboard at any step, and no other dictation app completes that prompt, send, and switch-apps loop hands-free in plain English. Infina costs $99 once as of July 2026, runs on-device by default, and comes with a 7-day no-questions refund.

The text changed. The input method didn't

For most of computing history, the words you typed were the product: the email, the essay, the code. It made sense to craft them at a keyboard.

In the AI era, the words you produce are mostly instructions. You describe a feature to Claude Code, a refactor to Cursor, a draft to ChatGPT, and the tool produces the artifact. Your output became the prompt.

That flips the economics of input. When describing what you want is the job, the speed at which you can describe things is the speed at which you work. We make the full version of that argument in typing is the bottleneck, but the short version is: the keyboard now caps the one channel that matters.

Prompts are the perfect text to speak

So why prompt by voice specifically? Because prompts have three properties that make them the single best fit for dictation of any text you produce all day.

Prompts are long. A good prompt front-loads context: what you want, what to avoid, which files matter, what "done" looks like. Detailed prompts routinely run 100 to 300 words. At typing speed that is a real chunk of effort, so people compress, and compressed prompts produce vague results and extra retry rounds.

Prompts are conversational. A prompt reads like something you would say to a sharp colleague: "take the login flow, keep the animations, but move the error handling into the hook." That is spoken-language register. You do not have to translate your thought into prose; you just say the thought.

Prompts are throwaway. Nobody publishes their prompts. The model reads them once, does the work, and the text is gone. Filler words and slightly loose grammar cost you nothing, because large language models parse messy natural speech effortlessly.

Long, conversational, and disposable: that is the exact profile of text where dictation shines and careful typing is wasted effort.

Raw beats polished for AI prompts

Here is the counterintuitive part: for prompting, you do not want your dictation cleaned up.

Polish pipelines rewrite your words, and a rewrite can smooth away the precise phrasing you chose: a variable name, a "do NOT touch the schema," a deliberately blunt constraint. The model downstream never needed the polish anyway. It needed your words, fast and intact.

That is why Infina's base product is raw, on-device dictation by design: your speech is transcribed on your Mac's Neural Engine, formatted with fast on-device rules, and typed exactly where your cursor is. Nothing rewrites your intent, nothing leaves your Mac, and it works offline.

And when you do want polished prose for emails or documents, we are not conceding that to anyone: the optional $10/month cloud add-on uses large cloud models via our cloud AI providers (Together AI and Groq) for sharper transcription, polished cleanup, and more languages. The subscription dictation apps charge $15/month forever for that as their whole product; with Infina it is an optional layer, with its own 7-day free trial, on top of an app you own.

The arithmetic: roughly three times faster

You do not need a study for this one; the numbers are checkable in a minute.

Commonly cited averages put typing around 40 words per minute, with practiced typists reaching 60 to 80. Conversational speech is commonly cited at 130 to 160 words per minute. Take the middle of each range and speaking comes out at roughly three times typing speed.

Now apply it to a heavy AI day. Say you write 30 prompts averaging 150 words: 4,500 words of instructions.

Typed at 60 words per minute: about 75 minutes of keyboard time.
Spoken at 140 words per minute: about 32 minutes.

That is around 40 minutes returned, every working day, from the same prompts. Even if your real ratio lands at two-to-one instead of three, the daily return is large. We break the evidence and the caveats down properly in speaking vs typing productivity.

There is a second effect the arithmetic misses: when words get cheap, you stop rationing them. Spoken prompts tend to carry more context because adding context costs seconds, not minutes. More context per prompt means fewer retry loops, which is a speed gain the words-per-minute math does not even count.

The full voice prompting loop, hands-free

Faster dictation alone is a nice upgrade. The compounding step is removing your hands from the loop entirely.

Every mainstream dictation app is push-to-talk: hold a hotkey, speak, release, then press Enter yourself and Cmd-Tab to the next window yourself. The speaking went hands-free; the loop around it did not.

Infina's hands-free mode closes the whole cycle. Double-tap Cmd to toggle it on, then from a couple of feet away:

Say "type" plus your prompt: "type refactor the auth middleware, keep the session logic, add tests." Infina types it into the focused app. "Type" itself is the trigger; there is no hotkey.
Say "send". Infina presses Enter.
Say "open Claude Code" (or "open Cursor", "open Notes"). Infina switches apps.
Repeat.

Prompt, send, switch, from across the room, hands around a coffee mug. While one agent works, you brief the next one. That is how a single person keeps two or three AI agents busy at once, and it is the workflow we unpack in hands-free voice prompting.

To be precise about the claim: plenty of tools dictate, and accessibility-grade voice control can drive a whole computer. But no other dictation app completes the prompt, send, and switch-apps loop hands-free in plain English. That specific loop is the moat.

Honesty about the mode itself: hands-free is our newest surface, labeled experimental, and it ships off by default. Hold-Option push-to-talk dictation is the mature everyday path, and it alone delivers the three-times arithmetic above. The practical setup for both is in how to dictate prompts to AI.

Where typing still wins

The case for voice prompting is strong precisely because it is scoped. Keep the keyboard for:

Editing code. Surgical edits, renames, and navigation are keyboard work. Voice writes the instructions; your hands handle the scalpel.
Precise symbol soup. A regex or a one-line shell incantation is often faster typed than spoken.
Shared spaces. Talking to your Mac in an open-plan office is a social cost. Voice prompting is happiest at home offices and private rooms.

None of that dents the core claim, because none of it is where your word count lives. The bulk of an AI-heavy day is describing intent, and describing intent is spoken work.

What it costs, and what it returns

Infina is $99 one-time as of July 2026 (the price ladder rises as seats sell), with every 1.x update included and a 7-day no-questions money-back guarantee instead of a trial. Details on pricing.

Set the $99 against the arithmetic above: tens of minutes a day, one purchase, no subscription. If it does not pay for itself in your first week, the refund is one email.

Requirements, stated plainly: Mac only, Apple Silicon for the on-device models, and English-only in the base product (the cloud add-on handles more languages).

FAQ

Why prompt by voice instead of typing? Prompts are long, conversational, throwaway text, the exact profile where dictation excels, and speaking is commonly cited at roughly three times typing speed. When your job is describing what you want to AI tools, voice raises the ceiling on your main output channel.

Doesn't messy spoken language confuse the AI? No. Large language models parse natural, slightly loose speech effortlessly; they were trained on oceans of it. What matters in a prompt is intent and context, and speaking makes it cheap to include more of both.

Do I need the hands-free mode to benefit? No. Hold-Option push-to-talk dictation delivers the speed gain on its own. Hands-free mode (double-tap Cmd to toggle, off by default, labeled experimental) adds the "send" and "open [app]" loop for people running multiple AI agents.

Is my audio sent to the cloud? Not by default. Infina transcribes on your Mac using an on-device model on the Apple Neural Engine, works offline, and stores no audio or transcripts by default. Cloud processing exists only as the optional $10/month add-on.

What does Infina cost? $99 one-time as of July 2026, including every 1.x update, with a 7-day no-questions-asked refund. No subscription for the core app; the cloud add-on is an optional $10/month with its own 7-day trial.

Which AI tools does it work with? All of them. Infina types at the OS level into whatever app is focused: Claude Code, Cursor, Codex, ChatGPT, any terminal, editor, or chat window. No per-app extension needed.

The bottom line

The AI era quietly changed what your keyboard is for. Your words became instructions, the volume of them exploded, and the artifact you used to type is now produced by the model.

Prompting by voice is not a gimmick on top of that shift; it is the matching input method. Prompts are long, conversational, and disposable, speech is roughly three times faster by open arithmetic, and raw dictation preserves your intent better than any polish pipeline.

Infina is built for exactly this: raw on-device dictation for $99 once as of July 2026, plus the only hands-free prompt, send, and switch-apps loop on the Mac. Try it against your own prompt log for a week; the refund is there if the arithmetic does not show up in your day.