Exploring phonics with OpenAI
• ~500 words • 2 minute read
As a child of the 80s and 90s, I have the vaguest memory of Hooked on Phonics commercials on television. More specifically their catchphrase "hooked on phonics worked for me!" Reconciling this message in the D.A.R.E era felt like mixed-messaging. Do you or don't you want me hooked on things?
Today I was reminded of this musical sign-off—for Hooked on Phonics, not D.A.R.E—after an introductory conversation with an ed-tech organization for an engineering leadership position. They described the product and platform and how it teaches kids to read. "Oh, I'm old enough to remember phonics being a thing that was pushed in schools," I said.
It turns out old ideas never stray far.
An interesting thing that came out of that conversation was noting that the text-to-speech models don't do a great job with the phonetic sounds.
This makes some sense to me. I imagine the way the models are built, they're going to do a better job of producing entire words. Phonemes—distinct units of sound that make-up a word—probably don't map well to tokens, even if tokens themselves can map to parts of words.
This isn't a researched take—it's just my intuition at the moment.
I'll have to dig deeper if I'm curious, but if the TTS models are using tokens in a way similar to how LLMs do, I could see how they would map more cleanly to words and partial words than the distinct sounds that necessarily comprise them.
After the call I immediately thought of a way to leverage AI shenanigans to try and create a tool that would:
- Identify distinct phonemes within a word, mapping them to International Phonetic Alphabet (IPA) symbols
- Provide them back as a list via structured outputs
- Iterate through the list and feed those individual phonemes back into OpenAI's TTS offering to generate the sounds
- Stitch it all together in a cute little web interface and see how it does
I hacked together a quick demo to test this out. Behold, my mediocre exploration of phonics via OpenAI:
You can view the source code for my phoneme explorer here on GitHub:
The actual structured output of phonemes, to my extremely untrained eye, didn't seem that bad! It seemed to map to the way words were presented in various Wikipedia articles as I compared them. That's an extremely unrigorous test, but it surpassed my expectations.
The audio however met my expectations, which is to say it was fairly... mediocre. It handled my name with aplomb, which seemed promising at first. A a relatively simple but multi-syllabic word like "continental" was a borderline disaster.
If I want to keep exploring this idea I think the next step might be to look for a more canonical reference set of audio samples out there that map to the IPA. Cursory searching didn't lend itself to an obvious source, but I'll be shocked if something like that doesn't exist.
--Published on Thursday, June 12th 2025. Read this post in Markdown or plain-text.