A new open-source AI called OpenVoice offers voice cloning with unprecedented speed and accuracy.
Developed by researchers at MIT, Tsinghua University, and Canadian startup MyShell, OpenVoice uses just seconds of audio to clone a voice and allows granular control over tone, emotion, accent, rhythm, and more.
MyShell unveiled OpenVoice in a post this week, linking to a pre-reviewed research paper explaining the technology as well as demo sites on MyShell and HuggingFace where users can try it.
Dual AI models enable instant voice cloning
OpenVoice comprises two AI models working together for text-to-speech conversion and voice tone cloning.
The first model handles language style, accents, emotion, and other speech patterns. It was trained on 30,000 audio samples with varying emotions from English, Chinese, and Japanese speakers. The second “tone converter” model learned from over 300,000 samples encompassing 20,000 voices.
By combining the universal speech model with a user-provided voice sample, OpenVoice can clone voices with very little data. This helps it generate cloned speech significantly faster than alternatives like Meta’s Voicebox.
Californian startup
OpenVoice comes from California-based startup MyShell, founded in 2023. With $5.6 million in early funding and over 400,000 users already, MyShell bills itself as a decentralised platform for creating and discovering AI apps.
In addition to pioneering instant voice cloning, MyShell offers original text-based chatbot personalities, meme generators, user-created text RPGs, and more. Some content is locked behind a subscription fee. The company also charges bot creators to promote their bots on its platform.
By open-sourcing its voice cloning capabilities through HuggingFace while monetising its broader app ecosystem, MyShell stands to increase users across both while advancing an open model of AI development.
Source: AI News
Recent Comments