As a musician, the things I want most from generative AI is:
1. Being able to have the AI fill in a track in the song, but use the whole song as input to figure out what to generate. Ideally for drums this would be a combination of individual drum hits, effects and midi so I'm able to tweak it after generation. If it used the Ableton effects and drum rack then that would be perfect.
2. Take my singing and make it both sound great and like any combination of great singers (e.g. give me a bit of Taylor Swift combined with Cat Power)
I've had a play with the style transfer between singers (bullet point 2 above) but when I last tried it, it was garbage in / garbage out, and my singing is garbage.
What I don't want: To just generate a whole song. Adobe does this style of assistive AI well in the photo editing space but no one seems to have brought it to audio yet.
Logic Pro comes the closest to this with the addition of virtual drummers. You can assign it to follow certain rhythmic timings by connecting it to a main instrument track, and by labeling sections of your song (bridge, chorus, etc) you can regenerate until you find something you're happy with.
It's a far cry from having a real drummer, but it works in a pinch.
Spark amp is now advertising with some interesting AI options. https://au.positivegrid.com/products/spark-2 As far as I understand it can attempt to generate backing drums (and music?) for jamming, which sounds great for practicing.
Fine-tuned on pure rap data to create an AI system specialized in rap generation
Expected capabilities include AI rap battles and narrative expression through rap
Rap has exceptional storytelling and expressive capabilities, offering extraordinary application potential
Using a certain other music generator I got it to accidentally say ***. It said it with a Latino American accent too.
In fact for whatever reason this tool couldn’t use a typical AAVE voice. Just Sage Francis / Atmosphere like dictionary raps and a few Latino American ones.
A big limitation of AI sloop is it tries to not offend anyone.
> Art that can’t even try to offend is barely art.
Good art is secondary to avoiding some journalist writing a hit piece about how they used your company’s AI generator to depict Hitler saying the gamer word.
Music, in any form, is definitely art. Instrumentals can convey emotions and stories just as powerfully as lyrics — sometimes even more. It’s all about what it makes you feel.
The Rite of Spring (Stravinsky) has entered the chat!
And if that's not offensive enough, Music in Similar Motion by Glass, or Metal Machine Music by Pat Metheny or any of Glenn Branca's "guitar symphonies" will likely do the job for most people.
Yes, please ruin music. Ruin everything you can. As long as you can build it, you should ruin it. There's really no limit. It's the masses who will actually do the ruining, so those building the technology are totally blameless. And you might even make some money, so it's all worth it.
These music/art AI threads are always gross. Make things for the joy of creating. Prompting some AI model doesn’t make you a musician - it actively robs you of the thing that’s rewarding about actually creating something.
People said this about photography. It ruined painting, and it did in fact put a lot of portrait painters out of business because in that era the reason you hired them was not for art. It was for a photograph made with brushes.
This is a good read on photography and art. Note that the rhetoric sounds almost identical to today's AI rhetoric.
“If photography is allowed to supplement art in some of its functions, it will soon supplant or corrupt it altogether, thanks to the stupidity of the multitude which is its natural ally." - Charles Baudelaire
I don't think AI is a threat to actual art at all. If I want art, I explicitly do not want slop churned out of a model. I want something created by a human being to communicate something. That's the entire point.
Some around here will argue that there have been double blind tests showing that people sometimes can't tell the difference between AI output and human art. That is missing the point too. Knowing who the artist is is part of the artistic experience. If someone deepfake calls you with a model imitating your friend, is it the same as talking to your friend? The parasociality of art is part of it.
It may -- as photography did with portraiture -- be a threat to some of the ways that artists make a living, and I do understand the pushback from that. Back before photography a lot of painters made a living being cameras, and all that work dried up pretty fast. Today AI is replacing all the "filler" churned out by artists. The only silver lining I guess is that artists generally hate this work and it never paid well.
Another thing I expect to happen is: actual AI art. I don't think this has happened yet. There has not yet been an AI equivalent of the Pictoralists.
AI art is art if the AI is used by a human being as a tool to communicate what art communicates, to do what art does. Art, I guess, does many things. It entertains, informs in a way, but also communicates matters of the emotional and spiritual aspect of human existence -- of consciousness -- that can't be communicated well in other ways. If someone uses AI as a tool to do that, it's art.
A lot of what we see today coming out of AI models is I think correctly called "slop" because it is not that. There's no artistic intention or craft behind it.
BTW I'd argue that "slop" exists in the realm of music, literature, painting, and other arts made with traditional methods too. For centuries there's been low-effort smutty pulp fiction, crappy imitative pop music, and gimmicky low-effort painting. Those things are slop made with lower-tech tools. A pretentious gimmicky painting where someone threw some paint at a wall has less artistic merit than a photograph that someone composed with care to communicate something.
I'm not bashing people for being skeptical of AI and worrying about its effects on the arts. There are, like I said, very valid points, especially about the ability of artists to make a living and the effect AI "slop" can have on the population. I just think we have been here before, many times.
A tool existing doesn't ruin anything. Genuinely stop being so dramatic and learn to ignore things you don't like. Our society would be a lot better and calmer if people did that rather than start pointless crusades.
> aggressive, Heavy Riffs, Blast Beats, Satanic Black Metal
Result: A generic pop-rock song without riffs or blast beats. Not even power metal or corset core, let alone anything even slightly resembling Black Metal.
Interesting how there is no mention of how the training data for this was collected. This does sound quite a bit better than Meta's MusicGen, but then again that model was also trained on a small licensed dataset.
I want to play something on my keyboard (the only instrument I am slightly ok at) and then be able to tell it to play it with a saxophone and describe exactly how I want it played. I don’t need an AI to create a song for me, I need 100 session musicians at my disposal to create the song I want. I am very excited about having that type of ai.
Man, this whole topic hits way harder than I expected. AI taking shots at music creation makes me feel a bit hyped but also kinda iffy, especially when I hear people say it plays it way too safe. You think keeping things safe in AI art helps anyone actually level up or just holds us all back?
If the “Guns in butts” / “my wife is a jar of dirt” songs are any indication, I don’t know how “safe” it will be, at least as far as content goes.
I’m similarly hyped and iffy. If you could have a model that listens to a looping segment to contextualize it, and then play with other patterns on top but through a more expressive way (or even humming/singing and allowing the LLM of sorts to compose it together), that could be interesting. Would it be panned for being AI-assisted? I’d hope not, I think?
“I hear you're buying a synthesizer and an arpeggiator and are throwing your computer out the window because you want to make something real. You want to make a Yaz record. I hear that you and your band have sold your guitars and bought turntables. I hear that you and your band have sold your turntables and bought guitars.”
We just add AI in whatever forms it takes to the list I suppose.
Really interesting — we're seeing more efforts now to bring the "foundation model" approach to creative domains like music, but I wonder how well these models can internalize musical structure over long time scales. Has anyone here compared ACE-Step to something like MusicGen or Riffusion in terms of coherence across entire compositions?
The diagram is super vague. How are the lyrics encoded? What does the encoder look like inside? What is the input size, input format, output size, output format? Are the three encoder outputs added? Concatentated? When Mert and m-Hubert combine are they added? Multiplied? Subtracted? Concatenated?
1. Being able to have the AI fill in a track in the song, but use the whole song as input to figure out what to generate. Ideally for drums this would be a combination of individual drum hits, effects and midi so I'm able to tweak it after generation. If it used the Ableton effects and drum rack then that would be perfect.
2. Take my singing and make it both sound great and like any combination of great singers (e.g. give me a bit of Taylor Swift combined with Cat Power)
I've had a play with the style transfer between singers (bullet point 2 above) but when I last tried it, it was garbage in / garbage out, and my singing is garbage.
What I don't want: To just generate a whole song. Adobe does this style of assistive AI well in the photo editing space but no one seems to have brought it to audio yet.
It's a far cry from having a real drummer, but it works in a pinch.
Fine-tuned on pure rap data to create an AI system specialized in rap generation Expected capabilities include AI rap battles and narrative expression through rap Rap has exceptional storytelling and expressive capabilities, offering extraordinary application potential
Using a certain other music generator I got it to accidentally say ***. It said it with a Latino American accent too.
In fact for whatever reason this tool couldn’t use a typical AAVE voice. Just Sage Francis / Atmosphere like dictionary raps and a few Latino American ones.
A big limitation of AI sloop is it tries to not offend anyone.
Art that can’t even try to offend is barely art.
Good art is secondary to avoiding some journalist writing a hit piece about how they used your company’s AI generator to depict Hitler saying the gamer word.
Shall it continue in an unholy loop until the end of time ?
It can challenge the old standards, it can push genres into new places.
AI music can’t. It’s too safe.
The Rite of Spring (Stravinsky) has entered the chat!
And if that's not offensive enough, Music in Similar Motion by Glass, or Metal Machine Music by Pat Metheny or any of Glenn Branca's "guitar symphonies" will likely do the job for most people.
Or anything 4/4 in some areas at a certain time.
Pat Metheny - Zero Tolerance for Silence
https://suno.com/s/5bXmu47Iv1o0xR9U
https://suno.com/s/4pWqmP1zP97LsK8k
https://suno.com/s/ETXPsm6eYwMoiphR
The main genre of music I listen to is electronic.
Many electronic songs are written to evoke a specific feeling, without meaningful lyrics.
When I produce electronic music, I have no particular lyrics or composition in mind.
I just fiddle around with different sound layers going for a particular "vibe" and mash things together until they sound good to me.
I see AI as another tool to expedite this process.
This is a good read on photography and art. Note that the rhetoric sounds almost identical to today's AI rhetoric.
“If photography is allowed to supplement art in some of its functions, it will soon supplant or corrupt it altogether, thanks to the stupidity of the multitude which is its natural ally." - Charles Baudelaire
https://medium.com/@aaronhertzmann/how-photography-became-an...
I don't think AI is a threat to actual art at all. If I want art, I explicitly do not want slop churned out of a model. I want something created by a human being to communicate something. That's the entire point.
Some around here will argue that there have been double blind tests showing that people sometimes can't tell the difference between AI output and human art. That is missing the point too. Knowing who the artist is is part of the artistic experience. If someone deepfake calls you with a model imitating your friend, is it the same as talking to your friend? The parasociality of art is part of it.
It may -- as photography did with portraiture -- be a threat to some of the ways that artists make a living, and I do understand the pushback from that. Back before photography a lot of painters made a living being cameras, and all that work dried up pretty fast. Today AI is replacing all the "filler" churned out by artists. The only silver lining I guess is that artists generally hate this work and it never paid well.
Another thing I expect to happen is: actual AI art. I don't think this has happened yet. There has not yet been an AI equivalent of the Pictoralists.
AI art is art if the AI is used by a human being as a tool to communicate what art communicates, to do what art does. Art, I guess, does many things. It entertains, informs in a way, but also communicates matters of the emotional and spiritual aspect of human existence -- of consciousness -- that can't be communicated well in other ways. If someone uses AI as a tool to do that, it's art.
A lot of what we see today coming out of AI models is I think correctly called "slop" because it is not that. There's no artistic intention or craft behind it.
BTW I'd argue that "slop" exists in the realm of music, literature, painting, and other arts made with traditional methods too. For centuries there's been low-effort smutty pulp fiction, crappy imitative pop music, and gimmicky low-effort painting. Those things are slop made with lower-tech tools. A pretentious gimmicky painting where someone threw some paint at a wall has less artistic merit than a photograph that someone composed with care to communicate something.
Edit: people said this about writing too!
https://www.anthologialitt.com/post/the-god-thoth-and-the-in...
I'm not bashing people for being skeptical of AI and worrying about its effects on the arts. There are, like I said, very valid points, especially about the ability of artists to make a living and the effect AI "slop" can have on the population. I just think we have been here before, many times.
Result: A generic pop-rock song without riffs or blast beats. Not even power metal or corset core, let alone anything even slightly resembling Black Metal.
Yup. Still doing what I expect from AI music.
I’m similarly hyped and iffy. If you could have a model that listens to a looping segment to contextualize it, and then play with other patterns on top but through a more expressive way (or even humming/singing and allowing the LLM of sorts to compose it together), that could be interesting. Would it be panned for being AI-assisted? I’d hope not, I think?
“I hear you're buying a synthesizer and an arpeggiator and are throwing your computer out the window because you want to make something real. You want to make a Yaz record. I hear that you and your band have sold your guitars and bought turntables. I hear that you and your band have sold your turntables and bought guitars.”
We just add AI in whatever forms it takes to the list I suppose.
Puh tss puh tss Kshhh ff kshhh ff Bmm tkk bmm tkk Drr-dr-d-d-dt
I really wish people could make better diagrams.
- https://huggingface.co/spaces/ACE-Step/ACE-Step