AI and I

I decided to use AI to transcribe a short voice memo in Hebrew. Five minutes of recording. You’d think: 21st century, neural networks, the future has arrived.

Naive man.

First, ChatGPT confidently announced that it could work with audio. Then it spent ten minutes trying to do exactly that. Then it confessed, blushing and stammering, that here, as it turned out, it didn’t actually have a speech recognition model. I’m still trying to figure out what was meant by “here.”

After that, the two of us spent another fifteen minutes or so working out how to even install Whisper on a Mac via brew, pipx, ffmpeg, and the other ritual incantations of digital shamanism.

Then Whisper spent about five minutes reading the file. And then another twenty or so slowly spitting out the text in tiny portions, like an ancient dot-matrix printer suffering from a severe case of depression.

Total: a five-minute voice memo turned into roughly forty minutes of my life.

The funniest part is that if I’d just dictated the text using macOS’s built-in dictation straight into Notes or Pages, the whole thing would have happened practically in real time. With errors — but ones I’d have fixed in a couple of minutes, cursing under my breath (have you, by the way, ever tried cursing under your breath after forgetting to turn off the microphone? a rare form of entertainment). Total: seven minutes. By hand I’d have typed it in about twenty.

And in the meantime, as punishment, ChatGPT is drawing caricatures of itself. Flattering everyone, as always and as usual — itself, me, all of us.