RSANS Lyric Alignment & Visualization Engine
A headless CLI tool that turns audio and plain-text lyrics into a karaoke-style lyric video — entirely offline. Frame-accurate word timing, automatic rhyme detection, and ASS subtitle rendering via FFmpeg.
RSANS tokenizes lyrics, runs them through on-device Whisper for word-level timing, and aligns the output using Needleman-Wunsch DP so a single dropped word doesn't cascade into misalignment downstream.
- Frame-accurate lyric sync via whisper.cpp on-device model
- Union-find rhyme grouping with automatic shared color assignment
- libass integration for precise glyph measurement and subtitle rendering
- JSON-first composable pipeline — re-run any stage independently
A CLI dispatcher routes four commands — analyze, rhyme, export, and full — each backed by an isolated C++ module. Whisper handles transcription, the Aligner runs Needleman-Wunsch over lyric vs. model tokens, RhymeGrouper applies union-find, and the ASS renderer initializes libass to measure true glyph height before generating subtitle events. FFmpeg encodes the final MP4 via a -vf subtitles= filter.