![]() I see there are active discussions and potential improvements around timing accuracy and timing offset in the repo. Obviously, there are limits to how well an AI transcription module can address the accessibility requirements of Closed Captioning (CC) and Subtitles for the Deaf and Hard of Hearing (SDH), but accessibility goes beyond pure transcription. The NIDCD also include guidelines for both quality and accessibility. I see that there are some folks pulling word-by-word #3 (reply in thread), which I assume is for Paint-on captions.Ĭan anyone share any lessons-learned or research projects that could be used to generate broadcast-grade captions with Whisper? When I say broadcast-grade, I mean not just SRT or ASS transcription or translation for home anime consumption, but captions that aim to adhere to the transcription components of Title 47 which describes the US FCC's guidelines for accuracy, synchronicity, completeness and DCMP-quality captions or subtitles, where the DCMP include recommendations for markup, presentation rate, time-on-screen with a focus for educational use. Whisper generates SRT & WebVTT transcripts by default, producing Pop-on subtitles. ![]() ![]() I wanted to start a discussion to understand how researchers or app-developers are wrapping Whisper for generating Closed Captioning & SDH Subtitles, since I imagine that accessibility as well as transcription is a common use case.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |