How to Effectively Deploy Auto Captioning Solutions for Streaming VOD


Artificial Intelligence (AI) is transforming the video streaming world. While AI as a technology has been around for some time, 数据的数字化,加上对此类解决方案的需求,推动了该行业比预期更快地采用人工智能. 基于人工智能的系统现在用于语音识别、数据分析和其他深度学习平台. 它们提供了准确性和可扩展性,不仅补充了人类的输入,而且具有超越人类效率的能力.

An area where AI offers multiple benefits is Automated Speech Recognition (ASR). 语音识别是人工智能的一个领域,它可以识别口语并将其翻译成文本. ASR is a core component for multiple systems, 包括视频点播(VOD)流媒体环境中使用的自动字幕系统.

Why Auto Captioning is Critical for Streaming

Captions are a crucial component of VOD streaming services. 使用说明, 提供视频点播服务的OTT提供商可以扩大其覆盖范围,让全球数百万观众轻松访问流媒体内容.

For many years, captioning was a manual process. 然而, OTT服务提供商正在为越来越多的全球观众处理大量的流媒体内容. It is not humanly possible or cost-effective to caption everything manually. 字幕是一项专业工作,需要了解语言复杂性的专家来完成. To minimize costs and maximize efficiency, auto captioning has become a significantly important AI task.   

Key Components of an Auto-Captioning Solution



图1. Components for auto captioning generation

ASR引擎是负责将语音转换为文本的核心组件. If OTT service providers want to ensure effective global coverage and accuracy of content, 他们需要一个支持大多数语言和每种语言的重要方言的ASR引擎.

From a technology standpoint, newer ASR technology offers better accuracy—greater than 95% for clean speech content.

选择一种能够识别说话人变化的ASR解决方案也很重要. 说话人识别可以帮助正确定位字幕,以确保每个字幕都接近说话人. It can also provide clarity in instances where there are multiple speakers.

除了, ASR解决方案应该提供诸如“hmm”和“oh”等非语音的转录,以保持说话内容和转录内容之间的密切准确性. 

自然语言处理(NLP)是整个自动字幕解决方案的关键部分, 确保准确 punctuation and intelligent sentence segmentation. With NLP, OTT service providers can punctuate sentences to improve readability. NLP还可以帮助在标题的自然点处提供换行,以进一步优化可读性. 

此外,流媒体服务提供商必须遵守地区要求. An auto captioning system can help service providers manage caption quality, 比如每分钟字数, number of maximum lines to be used for caption display, and the sensitive use of profanity. 

使用自定义字典的解决方案将通过在调用ASR之前提供上下文来提高ASR系统的准确性. 假设服务提供商试图为其流媒体服务自动添加电视连续剧的标题. The names of all the characters are known, and some of them are difficult. ASR引擎可以在识别阶段对这些名称进行优先排序,以确保转录器保持良好的准确性. 

Best Practices for Deploying ASR Systems

采用提供灵活部署策略的ASR引擎是VOD流媒体应用的理想选择. OTT服务提供商应该寻找一个既可以部署在本地,也可以部署在不同云服务(如AWS和Google cloud)上的ASR系统. Cloud-based solutions, in particular, can be deployed with a faster time to market. 

Auto-captioning solutions have advanced compared with 20 years ago. They are now widely used in real-world video streaming applications. But there are accuracy limitations. Because of accents and the number of languages, it is not possible to maintain high accuracy all of the time. 

To overcome accuracy limitations of auto-captioning solutions, 越来越多的服务提供商正在采用一种混合模式,在向全球观众播放视频之前,手动检查自动字幕结果. 只有在需要更高的遵从性,并且干净对话框的可用性不可行的情况下才需要人工检查(图2)。.


图2. Hybrid Model for Auto Captioning

Performing a full manual inspection of generated captions can be a very tedious task. 创建审查工具是为了帮助服务提供商以最有效的方式审查和纠正生成的标题. 审查工具应该具有基于置信度评分对话语进行分类的能力,这样那些置信度评分低的话语就可以首先被审查,因为它们最有可能有错误. 审查工具需要能够在循环中播放所有话语和音频,以便快速检查. 一旦检测到错误,该工具必须能够提供纠正其属性的方法(例如.e., text, font style, timecodes, color, etc.以一种轻松的方式. This will ensure faster reviewing of auto-captioning tasks and faster time to delivery.


ASR systems solve critical problems in the VOD streaming industry today, 使服务提供商能够提高利用语音到文本处理创建的字幕的准确性. 然而, ASR systems are not without limitations.

通过采用混合方法,将自动标注与交付前的快速人工检查相结合, OTT服务提供商可以提高其VOD流媒体工作流程的准确性并显著提高效率.

[Editor’s note: This is a contributed article from Interra系统. 流媒体 accepts vendor bylines based solely on their value to our readers.]

