Three questions to the project partners of IICT

Three questions to the project partners of IICT

The Swiss Innovation Agency (Innosuisse) has approved a major four-year project entitled "Inclusive Information and Communication Technologies" (IICT) as part of its new Flagship Initiative. We have asked three questions to partners of the project.

Regarding accessibility, this project focuses on five applications: text simplification, sign language translation, sign language assessment, audio description and spoken subtitles.

In these interview series, we have asked three questions about the IICT Innosuisse project to our main partners.

Sarah Ebling, Head of the “Language technology for accessibility" group at the Department of Computational Linguistics of the University of Zurich (UZH) and Professor of Accessibility Studies at Zurich University of Applied Sciences (ZHAW)

Which topics/areas are you working on within the IICT project?

I’m the principal investigator of the overall project. With my teams at UZH and ZHAW, I am involved in the text simplification, sign language translation, sign language assessment, and audio description subprojects.

Which results do you expect at the end of the project?

For each of the subprojects, we attempted to define both a more immediate innovation as well as a more visionary one. For example, for sign language translation, we will be able to display alert messages using a digital signer rather soon, relying on the paradigm of rule-based machine translation and on work in sign language production. Deep-learning-based machine translation into sign language also falls under the scope of the project; this line of research is far more challenging.

Where do you see the biggest challenges?

Given that all of the subprojects involve deep learning techniques to some degree, large amounts of data are required in each of them. Obtaining this data or creating artificial data as part of data augmentation is a challenge.

Julien Torrent,
Head of Innovation at Icare Institute

Which topics/areas is Icare Institute working on within the IICT project?

Within the framework of the IICT Flagship, the Icare research institute is working on sub-project 1, concerning the simplification of texts, and on sub-project 4, concerning audio description and L2V.

For the simplification of texts, the aim is to create a set of algorithms to reduce the complexity of a text in order to make it comprehensible to a maximum number of people, particularly people with cognitive disabilities. To spread this practice more widely, from a pedagogical point of view, this rule system will allow manual and also semi-automatic corrections, allowing the user to learn how to simplify a text so that it becomes a standard in their daily writing of texts.

For audio description and L2V, the idea is to make visual content accessible to visually impaired or blind people. We work here with advanced artificial intelligence technologies such as transformers, which in this case, make it possible to extract visual information from videos and transcribe the content into text, which can then be voiced using voice synthesis. For the audio description process, a dynamic flow of activity is set up, which allows the operator to select the bricks he wants to activate according to the results he wants to obtain (description of the context, the scenes and/or the facial expressions).

Which results do you expect at the end of the project?

The Icare Institute, which specializes in applied research, aims to integrate advanced technologies into its two current projects to solve accessibility problems on a national scale. The aim is to create practical solutions for professionals and end consumers. The results should be relevant enough to be put into practice in everyday life.

Where do you see the biggest challenges?

One of the biggest challenges we face is finding ways to integrate advanced AI technologies into practical applications that are accessible to end users. It is essential to find the right compromise between accuracy and usability, which is not always easy. In addition, we need to ensure that the use of AI technologies does not run counter to ethical principles, in particular with regard to data protection and user privacy. In sub-project 4, an in-depth analysis of user needs determines that users expect an emotional description of facial expressions and scenes. To meet this demand, an innovative approach was implemented and has already yielded promising results, with a 15% improvement over state-of-the-art models. These results still need to be improved and consolidated, but the prospects are very encouraging.

Paul Anton Mayer,
Chief Digital Officer at capito

Which topics/areas is capito working on within the IICT project?

capito simplifies information with artificial intelligence, so that everybody can understand. This is groundwork in subproject 1, which is the basis for many other solutions.

Which results do you expect at the end of the project?

Solutions for an inclusive society.
I do not only expect perceivable and understandable information in:

  • public broadcasting
  • public administration
  • insurance and banking

I demand it. Our society needs this kind of solutions.

Where do you see the biggest challenges?

Data management and integration. We are working with artificial intelligence, therefore we run on data. And data value chains are often hard to maintain and control. Integration, because large, scalable services are often complex to integrate in legacy infrastructure, and it is a challenge to comply with existing regulations in our fields of action. But luckily, we have strategies to overcome these challenges!

Dr. Mathew Magimai Doss,
Senior Researcher at idiap Research Institute

Which topics/areas is Idiap working on within the IICT project?

In the IICT project, Idiap is involved in two Sub-projects, namely,

  1. Sub-project 3: Sign language assessment with HfH, Idiap, University of Surrey as the research partners and Swiss Deaf Association (SGB-FSS) as the implementation partner. In Sub-project 3. This sub-project deals with sign language processing, with a goal to develop and integrate into the SGB-FSS’s Signwise platform a sign language assessment system that provides automatic feedback for sign language learners. The R&D is focusing on Swiss German Sign Language (DSGS) and isolated signing.
  2. Sub-project 5: Spoken subtitles with Idiap as the research partner and SWISS TXT as the implementation partner. This sub-project deals with speech processing, with a goal to develop an add-on technology that generates natural sounding expressive speech for subtitles generated by SWISS TXT. The R&D is focusing on development of speech synthesis and voice conversion systems for English, German, French and Italian.

 

Which results do you expect at the end of the project?

In Sub-project 3, on the Signwise platform, we expect to automate assessment for sign language productions on the level of individual signs, and to incorporate adaptive testing into a receptive sign language test.

In Sub-project 5, we expect to have an expressive speech synthesis technology that is integrated into Swissinfo (SWI), www.swissinfo.ch, where the video contributions from swissinfo.ch (SWI), which are often only available in the original language are made available in other major languages. A secondary application is synthesis of news articles on SWI. Both these applications are aimed at effective dissemination of information related to Switzerland and aid in enhancing people’s receptiveness to diverse cultures by breaking language barrriers.

Where do you see the biggest challenges?

In Sub-project 3, we are valorizing a sign language technology that was developed as part of the Swiss National Science Foundation Sinergia project SMILE and SMILE-II. The biggest challenge is in scaling the technology from controlled laboratory settings to real world settings where there is less control on the hardware and environmental setting and making the technology acceptable to learners and teachers.

Today, with advancements in deep learning, synthesizing human-like speech is no more a challenging task. The main challenge is to make output of such speech synthesis systems as expressive as humans’ speech. For that, we need a speech synthesis technology that can be controlled in a fine-grained manner and is accepted by naïve human listeners. In Sub-project 5, the challenge lies in achieving that.

For more information on all 15 Innosuisse ongoing Flagships, please click here.

 

Robin Ribback

Robin Ribback

Innovation Manager
+41 58 136 40 32

Florian Maillard

Florian Maillard

Junior Project Coordinator
+41 58 136 43 05