Estonia collects speech data for Estonian language preservation

The Eastern European country is appealing to speakers of the Estonian language to donate their speech for a database that aims to help businesses, public sector institutions and research institutes create services based on speech technology.

On September 12, the Estonian Ministry of Economic Affairs and Communications (MKM) and the State Information System Agency announced a “Donate Your Speech” campaign, with the aim of preserving the Estonian language and developing language technologies to strengthen the delivery of public services to Estonians. speakers.

By creating an open database of 4,000 hours of spoken language, the nation aims to support businesses, public sector institutions and research institutes in creating services and products based on speech technology. Voice technology can be used to record meetings, convert interviews to written form and generate automatic captions for media.

Ott Velsberg, Chief Data Officer of Estonia, told GovInsider that the campaign aims to promote the adoption of language technologies in public sector information systems as well as in the private sector, including speech recognition technology, real-time captioning solutions and text-to-speech software. . This will improve access to services and provide Estonians with better ways to interact with public and private sector services.

The campaign will support the Estonian Language Strategy 2021-2035, which aims to ensure that the Estonian language remains the main language in all spheres of life in the Republic and to strengthen the status of the Estonian language.

According to the Language Roadmap, the use of Estonian has declined over the past decade in some fields such as the service sector, IT and higher education due to the growth of the workforce. international work. The development of language technology that takes into account the Estonian language and its variants is a key objective of the language strategy, as it will support citizens’ participation in an increasingly digital society.

Open data

With open data, agencies no longer need to collect data sets for individual projects.

Velsberg says that so far linguistic datasets have been collected primarily to meet the needs of individual projects, and “the workflows for their release have not been firmly established.”

An open data portal containing abundant, high-quality language data, including voice data, translation materials and sign language datasets, will enable more agencies to develop services that use language technologies, he explains. The portal currently aims to capture spontaneous Estonian speech and dialogue as spoken by native and foreign speakers of the language.

To collect the data needed for this campaign, Velsberg says the nation plans to run a large-scale advertising campaign across all media channels to raise awareness of the importance of language technology and the sustainability of the Estonian language.

Voice documents collected through the project will be transcribed and all personally identifiable information will be removed, according to the official website. Users will also have the ability to request that their records be removed from the database if they wish.

Estonia’s current voice technologies

Voice recognition software is already commonplace in this Eastern European country.

In early 2022, the country launched Bürokratt, which provides citizens with voice-activated public services – the “Siri of digital public services”. With Bürokratt, citizens can access all public services in Estonia, from applying for benefits to renewing a passport, through voice interactions with an AI-based virtual assistant, according to Emerging Europe.

The Estonian Public Broadcasting Agency has also introduced artificial intelligence-generated real-time captions for live programs on television, reaching nearly 20,000 people, says Velsberg.

The Estonian parliament uses voice recognition technology to prepare the minutes of parliamentary sessions, which are reviewed by editors before being published, according to the official E-Estonia website.

Projects elsewhere

Speech recognition technologies have helped agencies around the world improve public services and preserve lesser-used languages.

In Singapore, AI Singapore has developed a speech recognition program capable of recognizing colloquial English spoken in the country. This helps the country’s civil protection force quickly transcribe emergency calls, enabling them to dispatch emergency services more quickly.

A non-profit media organization in New Zealand, Te Hiku, has amassed an extensive audiovisual archive of Maori words, phrases and idioms. He uses an open-source application to collect oral recordings in indigenous languages, which will be used to train AI models.

The non-profit organization is currently collaborating with local and international data scientists to perfect speech technology tools, apps that teach Te Reo Māori pronunciation to virtual assistants, says the United Nations International Telecommunication Union. .

UNESCO’s Global Plan of Action for the International Decade of Indigenous Languages ​​(2022-2032) also includes calls for communications technology companies to play a role in creating an enabling environment for strengthening the capacities of indigenous institutions working on projects for the preservation and revitalization of lesser used languages. languages.

As public services go digital and increasingly embrace language technologies to better serve people, agencies will need to consider the language spoken. Initiatives such as Estonia’s new campaign will play a crucial role in building the data infrastructure needed to power such technology in an inclusive way.

Source link

Comments are closed.