ACE-Step

ACE-Step is a free AI tool that turns text into full songs fast. Open-source and ready for remixing.

Overview

ACE-Step is an AI model that makes full-length tracks in under half a minute. You type a prompt and boom 4 minutes of music hits back in 20 seconds (if you're lucky enought o be using something like an A100 GPU). No more waiting around or getting half-finished loops. With consumer grade GPU's you have to be more patient.

Hardware Performance

Device 27 Steps 60 Steps
NVIDIA A100 27.27x 12.27x
RTX 4090 34.48x 15.63x
RTX 3090 12.76x 6.48x
M2 Max 2.27x 1.03x

RTF (Real-Time Factor) shown - higher values indicate faster generation

This thing runs on diffusion tech like what you’d see in image AI and pairs it with a compression encoder from Sana and a lightweight transformer. The combo keeps things quick without killing quality.

You can control how long the song is and mess with the melody rhythm and harmony.

The model supports various training-free applicaitons:

  • retake: regenerate a variation of the same song.
  • repaint: regenerate a specific part of the song.
  • edit: modify the lyrics of the song.

Got an old track? You can tweak it, remix it or use this thing as a base to build voice cloning tools or music apps. It's all open-source under Apache 2.0 so devs can do whatever they want with it.

It's not perfect. Go too long and the structure can get a bit mushy. Weird instruments or styles like Chinese rap? Might sound off. Output quality also changes based on the seed so it's a bit of a lucky draw.

🔔 Important Notice: The only official website for the ACE-Step project is their GitHub Pages site. They do not currently operate any other websites.

Supported Languages

  • Arabic
  • Chinese
  • Czech
  • Dutch
  • English
  • French
  • German
  • Hindi
  • Hungarian
  • Italian
  • Japanese
  • Korean
  • Polish
  • Portuguese
  • Russian
  • Spanish
  • Turkish

Tags

Freeware Apache License 2.0 PC-based #Voice & Audio

Educators and Trainers Creative Professionals Content Creators Media and Film Makers Marketing and Branding Specialists Voice and Audio Professionals Developers and Tech Creators Nonprofit and Advocacy Creators Small Business Owners Entertainment and Performance Artists Professional Content Creators

This tool is free to use when installed locally and is offered under Apache License 2.0.

Plan Name Tier Type
Free free

Most folks reacting to ACE-Step are surprised at how solid this thing is for being open-source and only 3.5B params. It’s quick fun and decent for casual use. That said, nobody’s calling it pro-level yet.

What people like

Speed. It’s super fast. Even users on RTX 3090 cards are cranking out full songs in under 30 seconds.
Free and open. No paywall no hidden terms just download and go.
Simple pop tracks work best. The model nails basic catchy songs.
Random is fun. That “gacha-style” randomness? Makes it a blast to play with different prompts and seeds.

What’s not quite working yet

Vocals feel robotic. Voices sound flat and sometimes distorted.
Results vary. Some prompts work great others flop hard.
Genre blind spots. Death metal and fast rap? It struggles.
Lyrics don’t always sync. Especially with languages like German or Japanese the model taps out early.
Control could be better. Users want more say over intros outros volume and overall flow.

User tips and notes

ComfyUI users report more compression artifacts than the site demo.
Tags like [JP] or [RU] can help guide lyric languages.
Length settings are a bit weird some got longer songs than expected without realizing why.

ACE-Step is a fun fast model for anyone dabbling in AI music. It’s not polished but it gets the job done if you're into tinkering remixing or just messing with music prompts. [ Reddit: 1, 2 ]

Rating:
Useful Links
Ace-Step Audio Model Native Support in ComfyUI

Tutorial

ComfyUI now supports Ace-Step natively. This documentation explains how to start running this model.

This page was last updated on May 13, 2025 at 8:36 PM