George Mandis

Mandis, George

Exploring phonics with OpenAI

June 12th, 2025 • ~500 words • 2 minute read

As a child of the 80s and 90s, I have the vaguest memory of Hooked on Phonics commercials on television. More specifically their catchphrase "hooked on phonics worked for me!" Reconciling this message in the D.A.R.E era felt like mixed-messaging. Do you or don't you want me hooked on things?

Today I was reminded of this musical sign-off—for Hooked on Phonics, not D.A.R.E—after an introductory conversation with an ed-tech organization for an engineering leadership position. They described the product and platform and how it teaches kids to read. "Oh, I'm old enough to remember phonics being a thing that was pushed in schools," I said.

It turns out old ideas never stray far.

An interesting thing that came out of that conversation was noting that the text-to-speech models don't do a great job with the phonetic sounds.

This makes some sense to me. I imagine the way the models are built, they're going to do a better job of producing entire words. Phonemes—distinct units of sound that make up a word—probably don't map well to tokens, even if tokens themselves can map to parts of words.

This isn't a researched take—it's just my intuition at the moment.

I'll have to dig deeper if I stay curious, but if the TTS models are using tokens in a way similar to how LLMs do, I could see how they would map more cleanly to words and partial words than the distinct sounds that necessarily comprise them.

After the call I immediately thought of a way to leverage AI shenanigans to try and prototype a phoneme-to-audio pipeline using OpenAI’s APIs:

Identify distinct phonemes within a word, mapping them to International Phonetic Alphabet (IPA) symbols
Provide them back as a list via structured outputs
Iterate through the list and feed those individual phonemes back into OpenAI's TTS offering to generate the sounds
Stitch it all together in a cute little web interface and see how it does

I hacked together a quick demo to test this out. Behold, my mediocre exploration of phonics via OpenAI:

You can view the source code for my phoneme explorer here on GitHub:

https://github.com/georgemandis/openai-phoneme-exploration

The actual structured output of phonemes, to my extremely untrained eye, didn't seem that bad! It seemed to map to the way words were presented in various Wikipedia articles as I compared them. That's an extremely unrigorous test, but it surpassed my expectations.

The audio however met my expectations, which is to say it was fairly... mediocre. It handled my name with aplomb, which seemed promising at first. A a relatively simple but multi-syllabic word like "continental" was a borderline disaster.

Still, it’s a fun proof of concept—and enough to make me curious about what better phoneme-focused datasets or synthesis models might look like.

If I want to keep exploring this idea I think the next step might be to look for a more canonical reference set of audio samples out there that map to the IPA. Cursory searching didn't lend itself to an obvious source, but I’d be surprised if something like that isn’t out there already.

--

Published on Thursday, June 12th 2025. Read this post as Markdown or plain-text.

If you enjoyed reading this consider signing-up for my newsletter, sharing it on Hacker News or hiring me.