ElevenLabs just got nuked by open source
How To Clone Voices With AI For Free.
In the sea of Clawdbot posts, Qwen quietly dropped something that could sink ElevenLabs’ market share.
Qwen, Alibaba’s AI lab, has consistently been a thorn in the side of NA tech companies. Recently, they just dropped Qwen TTS 3, a multilingual text-to-speech suite that clones voices in 3 seconds, creates entirely new voices from text descriptions, and runs locally without API keys.
If you've been assuming high-quality voice cloning is "mostly an ElevenLabs thing," that assumption is now outdated.
We’ve crossed into a world where a free(ish), downloadable, run-it-yourself model can generate voice that’s good enough, and “good enough” is the part that should make you sit up straight.
Here’s what “good enough” sounds like:
*This is a synthetically generated AI voice clone of Donald Trump created for educational purposes to demonstrate how accessible voice cloning has become.
It’s This Easy To Set Up.
Here’s the exact process:
Go to https://huggingface.co/spaces/Qwen/Qwen3-TTS in any browser. You’ll see a web interface with three tabs at the top.
Click the one labeled “Voice Clone (Base).”
Step 1: Provide the Reference
First, you need to teach the model what the voice sounds like.
Upload or Record: Provide 10-15 seconds of clear speech.
Quality Control: Ensure there is no background noise (like fans or AC) and no mumbling.
Transcription: In the Reference Text field, type the exact words spoken in that audio clip. This allows the model to map specific sounds (phonemes) to the voice’s unique characteristics.
Step 2: Input the Target Text
Now, tell the model what you want the voice to say.
The Script: Type your new sentence into the Target Text field.
Flexibility: This can be anything, even phrases the original speaker has never said. The model will apply the reference voice’s DNA to this new text.
Step 3: Configure and Generate
Finalize the technical settings to get the best output.
Select Model: Choose the 1.7B model for the highest quality (unless you’re in a massive rush, in which case 0.6B is faster).
Execute: Click “Clone & Generate.”
Result: Wait roughly 20 seconds for the audio player to populate with your cloned speech.
A Side by Side Comparison:
How to Protect Yourself Now
Now that you’ve seen how easy voice cloning has become, the question is: what do you do about it?
You can’t stop people from downloading open-source models, but you can reduce how easily your voice can be used against you.
Start by:
Upgrading your verification norms. If you run a business or manage money, require secondary confirmation for transfers and urgent requests.
That means text plus a call-back to a known number, or internal ticketing that creates a paper trail. Never act on “reply to incoming call” for anything involving money or credentials. The person on the other end might sound exactly like your business partner, but voice alone isn’t proof anymore.
Establish a “voice isn’t proof” policy. Make it explicit with your team, family , and clients that voice alone is not authentication.
If it involves money, credentials, or urgency, you verify via a second channel. Period. Train people to expect this verification step and normalize the practice. The goal is to make double-checking the default, not the exception.
The defensive window is short.
The sooner you implement these norms, the safer you’ll be.
Thank you for reading,
Max






The speed at which voice authentication became unreliable is kinda scary. Qwen's 3-second clone capability basiclly renders voice-based verification obsolete before most orgs have even updated their security protocols. That defensive window framing hits diferent when security infrastructure always lags behind capability rollouts.