TL;DR: Gemini CLI is Google's free, open-source terminal AI agent, and as of July 4, 2026 it has no built-in voice input. Infina fixes that: hold Option and dictate your prompt on-device, or go fully hands-free. Every other dictation app chains you to the keyboard, a hotkey pressed for every single dictation; with Infina you push your chair back, eat lunch two feet from the screen, and run the whole loop by voice: "type" plus your prompt, "send" to submit, "open Terminal" to hit the next session. $99 once, no subscription, 7-day money-back guarantee.
Gemini CLI is free. Your typing time is not
Gemini CLI is Google's open-source AI agent for the terminal, Apache 2.0 licensed with a generous free tier on a personal Google account, as checked on July 4, 2026.
You drive it in plain English: describe the task, and it plans steps, edits files, runs shell commands, and adjusts. Which means your main input to it is paragraphs of natural language.
There is no microphone in it. As of July 4, 2026, Gemini CLI takes typed text only.
So the agent is free, but you pay for it in typing: task briefs, corrections, follow-ups, all day at 40 words a minute when you speak at 150. Voice typing for Gemini CLI gives those hours back.
The terminal is just another text box
Infina types at the OS level, into whatever app has focus. A terminal is not a special case: to Infina, the Gemini CLI prompt line is a text field like any other.
That means the same dictation works in Terminal.app, iTerm, and Warp, with no plugin, no MCP server, no per-tool setup. We cover the general case in voice typing for the terminal, and Warp specifics in voice typing for Warp.
The basic workflow:
- Focus the Gemini CLI session.
- Hold Option (⌥) and speak: "look at the failing checkout tests, figure out the root cause, and propose a fix without changing the API surface."
- Release. The text lands at the prompt.
- Press Enter.
Transcription runs entirely on your Mac, on an on-device speech model on the Apple Neural Engine. It works offline, and your audio never leaves your device.
And because Gemini understands unpolished speech, filler words and all, raw fast dictation is exactly the right tool. You are briefing an agent, not publishing prose.
One tip for spoken prompts: lead with the goal, then context, then constraints. "Add retry logic to the upload queue, the code is in src/uploads, do not add new dependencies" survives being said out loud better than a sentence you have to restart.
Hands-free voice typing for Gemini CLI
Push-to-talk is where to start, but it still keeps a finger on the keyboard for every prompt. That is the ceiling of every mainstream dictation app: hotkey, speak, hotkey, Enter, forever.
Infina's hands-free mode removes the keyboard from the loop entirely:
- Double-tap Cmd (⌘) to turn hands-free mode on. Listening runs on-device, so nothing is recorded or sent anywhere while it waits.
- Speak a sentence that starts with "type": "type run the linter and fix everything it flags in the components folder." Infina types it into Gemini CLI.
- Say "send". Infina presses Enter.
- Say "open Terminal", "open Warp", or "open Cursor" to jump to another session, and repeat.
From 2 to 3 feet away it just works: lean back, watch the agent chew through the task list, and queue the next brief between bites of lunch.
Honest notes: hands-free ships off by default and is labeled experimental, and the base product is English-only. Push-to-talk is the fallback that always works.
One voice, several agents
Gemini CLI's free tier makes it the easiest agent to keep open next to the others. A common desk as of mid-2026: Gemini CLI in one tab, Codex in another, Claude Code in a third.
Hands-free, that desk runs like this:
- Gemini CLI finishes a refactor. Say "type" plus the review note, then "send".
- Say "open Warp", land in the Codex tab, dictate its next task, "send".
- Lean back while both work.
Typing, you babysit one agent at a time. Speaking, you conduct several. That is the real productivity story: thousands of words of prompts a day without touching the keyboard, more agents in flight, more shipped.
What it costs (and what it does not)
Gemini CLI costs nothing. Infina costs $99 one-time as of July 4, 2026, with every 1.x update included and a 7-day no-questions money-back guarantee. No subscription.
Compare that to the subscription dictation apps at $15 a month forever, which still stop at typing text: you press the hotkey, you press Enter, you Cmd-Tab between tabs. Infina's base product is raw on-device dictation built for AI prompting, plus the hands-free loop no other dictation app runs.
If you also want polished dictation for email and docs, the optional cloud add-on ($10/month, cancel anytime, 7-day free trial) adds sharper cloud transcription, LLM-polished cleanup, and multiple languages. That beats the subscription apps at their own game, on top of a license you own.
Full details on pricing.
FAQ
Does Gemini CLI have voice input built in? No. As of July 4, 2026, Gemini CLI takes typed text only; there is no built-in dictation on any platform. You add voice with a system-level dictation tool like Infina, which types into the focused terminal.
How do I dictate prompts to Gemini in the terminal? Focus your Gemini CLI session, hold Option, and speak the prompt; release and press Enter. With Infina's hands-free mode on, skip the keyboard: say a sentence starting with "type", then say "send" to submit it.
Does this work in iTerm and Warp, or only Terminal.app? All of them. Infina types at the OS level into whatever app is focused, so Terminal.app, iTerm, Warp, and the VS Code terminal all behave the same, with no plugin or extension required.
Do I have to speak punctuation for Gemini CLI prompts? No. Gemini understands conversational, unpunctuated speech, so raw dictation is the right default. Speed and fidelity to your intent matter for agent prompts; typographic polish does not.
Is my audio sent to Google or anyone else when I dictate? No. By default Infina transcribes your speech entirely on your Mac (Apple Silicon required); your audio never leaves your device, and dictation works offline. Cloud processing is an optional $10/month add-on, strictly opt-in.
Gemini CLI is free, so why pay $99 for dictation? Because the agent being free makes your typing the last bottleneck. $99 once buys back hours of prompt-typing every week, with no subscription and a 7-day refund if it does not stick.
The bottom line
Gemini CLI made a serious terminal agent free for everyone. It did not make talking to it free: as of July 4, 2026 you still brief it by typing.
Speaking is three times faster, and agents do not need polished text. Start with Option-hold dictation into the Gemini CLI prompt, then switch on hands-free and run prompt, send, and switch-app by voice from across the room.
That full loop is the thing only Infina does: $99 once, on-device by default, risk-free for 7 days.