This paper was accepted into the EMNLP Workshop on Computational Approaches to Linguistic Code Switching (CALCS).
Code switching (CS), that is, mixing different languages in a single sentence, is a common phenomenon in communication and can be a challenge in many natural language processing (NLP) environments. Previous studies on speech CS have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and translation into one of the languages present in the source (monolingual transcription).
In this article, we focus on two essential yet unexplored areas for real-world CS speech translation: broadcast configuration and translation to a third language (i.e., a language not included in the source). To this end, we expanded the Fisher and Miami test and validation data sets to include new Spanish and German targets. Using this data, we train a model for both offline and streaming STs and establish baseline results for the two configurations mentioned above.