The thrilling conclusion: Mandarin for Beginners is live

The thrilling conclusion: Mandarin for Beginners is live

As you may recall from several previous posts, I've been working on creating audio recordings of beginner Mandarin lessons for English speakers using text-to-speech (TTS) technology. I used OpenAI's TTS model, which handles both English and Mandarin well, and also tried ElevenLabs, which offers many voice options but struggles with switching between the two languages.

However, neither of these TTS solutions support Speech Synthesis Markup Language (SSML) or allow me to control the timing of pauses after important words and phrases.

Testing SSML with Amazon Polly

Still searching for TTS that supports bilingual speech and SSML, I went back to the drawing board with Amazon Polly from Amazon Web Services, one of the oldest and best-established TTS models. I started by reviewing documentation for the API and SDK, hoping to integrate it into my homegrown voiceover recording app. However, the reference code as well as the secure signing process I would need to manage access to the API looked like they would take some time to absorb.

I decided that using the AWS console to test Polly and make recordings would be the less painful option. This required creating an AWS free account and then setting up an IAM user identity. Once that was done, I was able to access the Polly interface:

Amazon Polly's TTS console. Only accessible with an IAM user ID.

I chose the long-form option, toggled SSML to on, and pasted in my script. After an initial test, I tagged the Chinese language passages with the SSML language tag, which looks like this <lang xml:lang="cmn-CN">我 (wǒ)</lang>. Polly’s documentation provides a detailed guide to SSML. After running through the recording a few more times and strategically adding pauses, I was relatively pleased with my result. I went ahead and recorded four simple lessons.

audio-thumbnail
A rough cut of Lesson Three (Social Phrases)
0:00
/156.216

The recordings weren’t perfect. Each included a few awkward pauses and an instance or two of garbled speech that I would need to replace via old-fashioned manual editing, but they were good enough to use in a test of the Mandarin for Beginners GPT.

The perfidy of Code Interpreter

I updated my GPT by adding the Polly-made audio recordings and their transcripts as resource documents. Then I adjusted the instructions for the GPT, explaining how it should interact with students and instructing it to provide links to the PDF and MP3 resource documents. I had Code Interpreter enabled, which supports file uploads and downloads, so this should work, right?

Alas, it only worked about once every four tries. The rest of the time, the GPT would claim it could not provide downloads or would provide links that went to nowhere. It was back to the drawing board…again.

Trouble with Code Interpreter
In this "lazy" response, the GPT pretends it doesn't have Code Interpreter.

ChatGPT’s new Read Aloud feature

Then, I had an idea. I’d just started seeing a new button on my ChatGPT. I tested it out and did a little research. It turned out to be the new Read Aloud feature. I tested it out with snippets of my Mandarin lessons and…they sounded pretty darn good! The English and the Mandarin both had reasonable accents.

The audio speaker below the text response is the Read Aloud button.

The pacing was not ideal when it read long chunks of text, but I was able to address that by prompting the GPT to share small bits of the lesson, allowing students to digest them and hear pronunciations as many times as they want.

💡
An excerpt from the prompt

Lesson One will focus on Mandarin pronouns. The lesson should be based on the lesson plan in your knowledge base.

Review pronouns one at a time—do not provide a big long list at the beginning—and remind students to use the "Read aloud" button so they can hear word pronunciations. Ask them to let you know when they are ready to continue.

The experience should go like this: 

Introduce one or two pronouns.

Pause and invite the student to play the audio and practice pronunciation.

Ask the student to let you know when they are ready to continue.

A GPT is born

After testing the GPT several more times, I released it into the wild. While it’s unlikely to see much traffic—discovery in the GPT store is incredibly primitive—I’m happy with the final result. Tackling these lessons could be a fun activity for my kids and an opportunity for me to dust off my very limited Mandarin skills from college.

The experiment continues…

ChatGPT - Mandarin for Beginners
Learn basic Mandarin words and phrases, like greetings, questions, social phrases and more.