asapp/sew-tiny-100k简介
发布时间:2026-05-17 01:07:39
文章来源:www.cxwl.com
访问次数:5
SEW-tiny
SEW by ASAPP Research
The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc…
Paper: Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
Authors: Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi
Abstract
This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes.
The original model can be found under https://github.com/asappresearch/sew#model-checkpoints .
Usage
See this blog for more information on how to fine-tune the model. Note that the class Wav2Vec2ForCTC has to be replaced by SEWForCTC.

标签:漫画下载,pdf漫画下载,跨境电商,媒体,独立站,百度文库,站联,影音网站,PanDownload,其它网站
关于文章《asapp/sew-tiny-100k简介》特别声明
《asapp/sew-tiny-100k简介》更新日期为:2026-05-17 01:07:39;目前浏览的小伙伴达到5,初夏导航所有作品(图文、音视频)均由用户自行上传分享,仅供网友学习交流。若您的权利被侵害,请联系

