HeartTranscriptor audio model

Name: HeartTranscriptor
Also Known As: Heart Transcriptor, HeartTranscriptor-oss
Licence: Apache License 2.0
Creator: HeartMuLa Team

HeartTranscriptor dropped Jan 2026. It’s part of the HeartMula setup and handles the audio-to-text part.

You give it an audio clip, it spits out the words. Works for lyrics and plain speech too. So calling it a lyrics transcriber isn’t the best name, since it handles regular talking just fine.

You don’t need a powerful rig. It runs on 6–8 GB VRAM. Even works on CPU if you don’t mind it being slow.

You use it in ComfyUI. Just drag the HeartTranscriptor node in, hook up an audio input and connect a text output to show the words.

It does pretty well with clear audio. Singing can trip it up sometimes, like if someone hits a high note it might mishear stuff, like “VRAM” sounding like “VROM.” It’s better with normal talking.

Easy to plug into your setup. You just swap out your old audio-to-text node with this one. It has fewer settings and looks cleaner too.

Key Features

Audio-to-Text

No performance evaluations available for this model yet.

No sample outputs available for this model yet.

Where To Find HeartTranscriptor

If you'd like to access this model, you can explore the following possibilities:

Weights Apache License 2.0

Other Models by HeartMuLa Team

HeartMuLa

Useful Links

HeartMula Transcript Usage Example

Tutorial

How to use this node in ComfyUI

Added on: January 22, 2026