Is ElevenLabs Worth Subscribing To? A User’s Review
Looking for an AI generated audio service with text-to-speech and sound effects? Wondering if ElevenLabs is worth the time and money?
This review is not meant to be an exhaustive listing of every feature and how it compares to every other audio AI out there.
Over the last year I’ve subscribed to ElevenLabs several times for different client projects. Here are my impressions as an actual user who needed the product to work for specific tasks. I am not affiliated with ElevenLabs in any way.
The client projects I’ve used ElevenLabs for include creating video ads for businesses. If you’re a business owner or marketer who could use video ads without learning all these tools yourself, see what I’ve produced or get in touch.
Text to Speech
I have the most experience with the “Instant Speech” generator, used for speech chunks of 3,000 characters or less.
There is a large selection of pre-made voices. While some are better than others, and some sound more obviously AI than others, overall there is a lot to work with here.
For the most part, the voices pronounce words correctly and add emphasis or pauses that sound natural. They are way beyond the horrible text to speech of the past.
When you select a voice to generate speech, it will generate two outputs at a time, each with a little variation between them (there’s no way to change this number). Occasionally, one output is perfect on the first attempt, but I found that this is rare. When I want precise emphasis or cadence, even for short paragraphs, it often takes five or ten tries to get something I’m happy with.
Issues with Text to Speech
1. It often cuts the last word short. It doesn’t actually cut it off, but the clip will end on the last sound of the last word in a way that sounds terse and unnatural. It’s not easy to hear this while generating outputs, though, but usually becomes obvious later when assembling a video and listening to the audio in context. I like to include a filler sentence that can be cut out later, but provides a buffer for the parts I want to make sure are not cut off. A short sentence works better than just one word, because one (or two) random words are often not read aloud when placed after a regular sentence.
2. The voices can vary too much between outputs. The variation is a problem because sometimes the same voice can sound like different people. Here’s “Liam” reading the same two words: First take. Second Take. This can cause a lot of extra work trying to match outputs from different prompts.
There’s a slider to help control the variability, which can make the outputs a bit more consistent with the tradeoff of sounding potentially more robotic. This is better than nothing, but ultimately, it’s not a good choice between an inconsistent voice and a robotic one.
There is the ability to insert words in brackets to better indicate how the text should be read. This is somewhat helpful. For instance, here’s a line where I told it to [laugh] at the end. But the system sometimes just reads the words in the brackets out loud, like in this recording where I put an [anxious] direction at the beginning to set the tone.
3. Occasionally there will be background noises or sound effects that can ruin otherwise good outputs. This isn’t too common, though. If I like the output itself, I isolate the voice later using CapCut, which is a simple editing program. (I’ll have a review of it soon.)
4. The user interface to select voices is convoluted. If you’ve used ElevenLabs and have been confused by this part, it’s not just you.
How it should be: Select a voice and press a button to use that voice.
How it is: Find a voice and press a button next to “Select a voice” but the button takes you back to the previous page. That’s because you need to add the voice to your “voice library” in order to select it. But if your library is full (10 voices on the Starter plan) it won’t let you add the voice. So you now have to leave the page and navigate to the voice library, to delete one of the saved voices and free up more room. But this takes you away from the voice you’ve found, so now you have to navigate back to the voice options and find it again to add to the library.
On top of this, the catalogue of voices is cut up into different sections, and it’s unclear if there’s a way to search all the voices at once.
The Studio interface allows generated content to be organized in chapters, and is generally better at handling large amounts of text than Instant Speech. I used Studio when a client asked if I could make some audiobooks.
The same voices are available as in Instant Speech, and from a usability standpoint, Studio is relatively easy and straightforward. There are plenty of options to fine-tune the reading, though my client didn’t want me spending a bunch of time on that; I mainly went with the default output.
Issues with Studio Text to Speech
It can struggle with punctuation and pauses. I spent a while in trial and error, adding and removing paragraph breaks and ellipses, to get it to read chapter headings correctly before the body text.
The default reading is more robotic than with Instant Speech for the same voices. Overall, there are fewer quality-of-life issues here, though.
Sound Effects
I’ve found the sound effects generator to be a very useful feature for adding background audio to videos. Things like far-off bird sounds, breathing, electrical shorts, air-conditioner hum, and suburban backyards. When a prompt is entered, the system automatically generates four outputs. I haven’t found a way to change this number. It is possible, though, to select the output duration from 0.5 seconds to 22 seconds.
For short clips of basic sounds, I found the outputs to be quite good. Sometimes I had to tweak my prompts and generate multiple batches, but I rarely went away without something I could use.
Issues with Sound Effects
It’s not uncommon for one of the outputs to be silent or very faint. The system also has a hard time with abstract prompts like “transcended enlightenment.”
Probably the worst thing about Sound Effects is that there’s no clear way to delete the output history. This is a problem because if I need to try five times to get the right bird sounds, now the history is forever clogged up with 20 outputs, and it becomes harder and harder to find old but good outputs as time goes on.
There is, apparently, some method to delete the history using API access, though I haven’t attempted it.
Other Features
ElevenLabs has multiple features beyond text to speech and sound effects: voice changer, voice isolator, voice clone, music generator, dubbing, speech to text, and some others. As of writing this, I haven’t used these enough to review them.
ElevenLabs has multiple pricing tiers, with monthly and yearly subscription options. I have used the “Starter” plan at $5 per month and the “Creator” plan at $22 per month. The higher-priced plan gives more of everything and a few options not included in the lower plans.
When I needed to make short voiceovers and some sound effects, the $5 Starter plan was genuinely useful.
I used the Creator plan when making the audiobooks. It’s definitely necessary to have a higher plan like this, if you want to narrate multiple thousands of words. While I did run out of credits pretty quickly making audiobooks, I think the Creator plan is a reasonable value for the price.
One thing about ElevenLabs that’s great compared to some AI subscriptions is that it’s fast and easy to unsubscribe whenever you want.
Bottom Line
Even when I am only using a few of the available features, ElevenLabs gives good value for the money. There are some pretty annoying user interface issues, but they don’t outweigh all the quality features.
Overall, ElevenLabs is a useful service and definitely above average in the AI ecosystem.
ElevenLabs is one of several AI tools I use to create video ads for businesses. If you’d rather have someone handle the tools and deliver the finished product, here’s my work and here’s how to reach me.


