Nvidia has announced a groundbreaking AI music editor that can produce sounds never heard before, including a trumpet that meows. This innovative tool, named Fugatto, has the capability to generate music, sounds, and speech using text and audio inputs it has not previously encountered.
How does Fugatto demonstrate creative capabilities?
A video demonstration reveals that Fugatto can compose music based on imaginative prompts, such as one involving a saxophone howling and barking alongside electronic music with dogs barking.
What unique sound effects can be generated?
The company also provided additional examples, showcasing its ability to generate distinctive sound effects from descriptions, including deep, rumbling bass pulses accompanied by intermittent, high-pitched digital chirps, which evoke the sound of a massive sentient machine awakening.
In what ways can voices be modified?
Fugatto can even modify an individual’s voice, allowing for alterations in accent or emotional tone, such as making a voice sound angry or calm. Moreover, it offers music editing features; for instance, it can isolate vocals in a track, incorporate different instruments, and even replace a piano melody with an opera singer’s voice.
Read more: Abu Dhabi’s G42 boosts Arabic natural language AI with JAIS 70B, 20 more advanced models
What research and training data were used?
A research paper accompanying the announcement details the extensive datasets that Nvidia claims Fugatto was trained on, including a sound effects library sourced from the BBC.
While several AI audio tools exist from companies like Stability AI, OpenAI, Google DeepMind, ElevenLabs, and Adobe, none have claimed the ability to create entirely new and unprecedented sounds. Some AI startups are currently facing copyright lawsuits over their music generation tools, and a recent report indicated that Nvidia and others trained their AI models using subtitles from thousands of YouTube videos.
How was Fugatto developed and what is its future availability?
To develop Fugatto, Nvidia states that researchers compiled a dataset containing millions of audio samples. They also designed instructions that significantly broadened the range of tasks the model could perform, enhancing its accuracy and enabling new functionalities without the need for additional data. However, Nvidia has not disclosed when, or if, the tool will be made widely accessible.