Global Patent Index - EP 4047596 A1

EP 4047596 A1 20220824 - TWO-PASS END TO END SPEECH RECOGNITION

Title (en)

TWO-PASS END TO END SPEECH RECOGNITION

Title (de)

ZWEIGÄNGIGE END-TO-END-SPRACHERKENNUNG

Title (fr)

RECONNAISSANCE DE LA PAROLE DE BOUT EN BOUT À DEUX PASSAGES

Publication

EP 4047596 A1 20220824 (EN)

Application

EP 22166641 A 20200603

Priority

  • US 201962856815 P 20190604
  • US 201962943703 P 20191204
  • EP 20747231 A 20200603
  • US 2020035912 W 20200603

Abstract (en)

A method implemented by one or more processors, the method comprising: training an automatic speech recognition (ASR) model, wherein training the ASR model comprises: selecting a training instance which includes training audio data capturing a training utterance and a text representation of the training utterance; processing the training audio data portion of the selected training instance using a shared encoder portion of the ASR model to generate training encoder output; processing the training encoder output using a recurrent neural network transformer (RNN-T) decoder portion of the ASR model to generate training RNN-T output; generating a candidate text representation of the training utterance based on the training RNN-T output; generating a first-pass loss based on comparing the candidate text representation of the training utterance and the text representation portion of the training instance; updating the shared encoder and/or the RNN-T decoder based on the first-pass loss; and using the shared encoder in training a listen, attend, spell (LAS) decoder portion of the ASR model.

IPC 8 full level

G10L 15/06 (2013.01); G06N 3/04 (2006.01); G06N 3/08 (2006.01); G10L 15/16 (2006.01); G10L 15/32 (2013.01)

CPC (source: CN EP KR US)

G06N 3/044 (2023.01 - EP KR); G06N 3/045 (2023.01 - EP KR); G06N 3/08 (2013.01 - EP KR); G10L 15/05 (2013.01 - US); G10L 15/063 (2013.01 - EP); G10L 15/16 (2013.01 - CN EP KR US); G10L 15/22 (2013.01 - CN); G10L 15/26 (2013.01 - CN); G10L 15/32 (2013.01 - EP KR US); G10L 19/167 (2013.01 - KR)

Citation (search report)

  • [A] WO 2018207390 A1 20181115 - MITSUBISHI ELECTRIC CORP [JP]
  • [A] IAN WILLIAMS ET AL: "Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search", INTERSPEECH 2018, 1 January 2018 (2018-01-01), ISCA, pages 2227 - 2231, XP055719650, DOI: 10.21437/Interspeech.2018-2416
  • [A] HE YANZHANG ET AL: "Streaming End-to-end Speech Recognition for Mobile Devices", ICASSP 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 12 May 2019 (2019-05-12), pages 6381 - 6385, XP033564866, DOI: 10.1109/ICASSP.2019.8682336
  • [A] ZHOU SHIYU ET AL: "A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese", 17 November 2018, ADVANCES IN DATABASES AND INFORMATION SYSTEMS; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], SPRINGER INTERNATIONAL PUBLISHING, CHAM, PAGE(S) 210 - 220, ISBN: 978-3-319-10403-4, XP047496514
  • [A] SUNG TZU-WEI ET AL: "Towards End-to-end Speech-to-text Translation with Two-pass Decoding", ICASSP 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 12 May 2019 (2019-05-12), pages 7175 - 7179, XP033565334, DOI: 10.1109/ICASSP.2019.8682801
  • [T] TARA N SAINATH ET AL: "Two-Pass End-to-End Speech Recognition", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 29 August 2019 (2019-08-29), XP081489070

Designated contracting state (EPC)

AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DOCDB simple family (publication)

WO 2020247489 A1 20201210; AU 2020288565 A1 20211209; AU 2020288565 B2 20230216; AU 2023202949 A1 20230601; AU 2023202949 B2 20240516; CN 114097025 A 20220225; EP 3776536 A1 20210217; EP 3776536 B1 20220406; EP 4047596 A1 20220824; JP 2022534888 A 20220804; JP 2024019405 A 20240209; KR 20210154849 A 20211221; US 2022310072 A1 20220929

DOCDB simple family (application)

US 2020035912 W 20200603; AU 2020288565 A 20200603; AU 2023202949 A 20230511; CN 202080040823 A 20200603; EP 20747231 A 20200603; EP 22166641 A 20200603; JP 2021569526 A 20200603; JP 2023199215 A 20231124; KR 20217037998 A 20200603; US 202017616129 A 20200603