myshell-tts.vercel.app Open in urlscan Pro
76.76.21.98  Public Scan

URL: https://myshell-tts.vercel.app/
Submission: On April 07 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

MyShell TTS
 * Home
 * Tone Color Cloning
 * Voice Style Control
 * Zero-Shot Cross-Lingual Cloning
 * Comparison with SOTA

 * Home
 * Tone Color Cloning
 * Voice Style Control
 * Zero-Shot Cross-Lingual Cloning
 * Comparison with SOTA

MyShell TTS



OPENVOICE: VERSATILE INSTANT VOICE CLONING

We introduce OpenVoice, a versatile instant voice cloning approach that requires
only a short audio clip from the reference speaker to replicate their voice and
generate speech in multiple languages. OpenVoice enables granular control over
voice styles, including emotion, accent, rhythm, pauses, and intonation, in
addition to replicating the tone color of the reference speaker. OpenVoice also
achieves zero-shot cross-lingual voice cloning for languages not included in the
massive-speaker training set. OpenVoice is also computationally efficient,
costing tens of times less than commercially available APIs that offer even
inferior performance. The technical report and source code can be found at
https://arxiv.org/pdf/2312.01479.pdf and https://github.com/myshell-ai/OpenVoice


ACCURATE TONE COLOR CLONING

OpenVoice can accurately clone the reference tone color and generate speech in
multiple languages and accents.

Reference
0:00
Generated
0:00
Generated
0:00
Reference
0:00
Generated
0:00
Generated
0:00
See more examples


FLEXIBLE VOICE STYLE CONTROL

OpenVoice enables granular control over voice styles, such as emotion and
accent, as well as other style parameters including rhythm, pauses, and
intonation. Here we demonstrate the control over emotion and accent of the
generated voice.

Reference
0:00
Generated - Sad
0:00
Generated - Happy
0:00
Generated - Indian Accent
0:00
Generated - British Accent
0:00
Generated - Australian Accent
0:00
See more examples


ZERO-SHOT CROSS-LINGUAL VOICE CLONING

The reference voice and the generated voice can be in any languages outside the
massive-speaker multi-lingual dataset. We use ā€œUā€ to denote the unseen languages
in the following examples.

Reference - English
0:00
Generated ā€“ Mixed Lingual (U)
0:00
Generated - Japanese
0:00
Generated - Spanish (U)
0:00
Generated - German (U)
0:00
Generated - Russian (U)
0:00
See more examples


COMPARISON WITH STATE-OF-THE-ARTS

Reference
0:00
Generated - XTTS-v2
0:00
Generated - Valle-X
0:00
Generated - OpenVoice
0:00
Reference
0:00
Generated - XTTS-v2
0:00
Generated - Valle-X
0:00
Generated - OpenVoice
0:00
See more examples