Researchers at the University of Helsinki teach comp

Computers generally understand Finnish only as the normative standard known as kirjakieli. Finnish dialects, however, create a lot of problems when interacting with computers, as it is impossible to speak a language without speaking in some dialect. A research group has built artificial intelligence (AI) models capable of automatically detecting, normalizing and generating Finnish dialects. The results were published in The 2021 conference on empirical methods in natural language processing.

Collecting data to make an AI understand dialect Finnish and Swedish has been in the news recently. The methods developed by the research group of Mika Hämäläinen, Niko Partanen, Khalid Alnajjar and Jack Rueter from the University of Helsinki goes further and enables an RN to speak Finnish dialects fluently.

In the paradigm of computational creativity, they developed a method to convert Standard Finnish into one of 23 Finnish sub-dialects. Computers should not only be able to understand dialect Finnish, but they should also be able to speak in a dialect.

“With our method, an intelligent system such as a robot can say akku on lopussa (the battery is low), for example in the Etelä-Karjala akku o lopussa dialect, in the Etelä-Satakunta akku ol lopus dialect or in the Länsi-Uusimaa dialect akku o lopus. », Says Hämäläinen.

For example, the commonly used Google Translate algorithm fails to translate a Finnish dialect sentence Oisko sulla jotai esimerkkei siit (Do you have any examples of this) producing a completely incorrect “English” translation Oisko sulla something like this simply because Google Translate was designed to work exclusively on Standard Finnish. This same phenomenon can be observed with all AI tools that support Finnish like Apple Siri or dictation in macOS.

Dialects are detected from spoken audio and text

Research shows that dialect detection is a difficult task when relying on plain text. Dialect identification is easier when the model also has access to audio, as many dialects are marked with distinctive phonetic properties. Thus, the latest research published by researchers deals with the detection of dialects from spoken audio and text.

“The process of normalizing dialects to standard text has many advantages. It allows to analyze dialect materials using tools for standard Finnish, and we can also use the standardized version as a search item when we want to find something in dialect materials ”, explains Khalid Alnajjar.

The researchers point out that the problem of understanding dialects is complex and that no model can understand natural language like humans do. But the models created open up many more interesting directions for research, such as the degree to which a dialect deviates from the norm and what the syntactic differences are between different varieties of languages.

“With this, we can improve the current state of Finnish natural language processing solutions and create AI models suitable for individuals. For example, we have already achieved impressive results in voice recognition of a person’s speech, even in endangered languages, ”says Niko Partanen.

The research group also developed a similar standardization methodology for the dialects of Swedish spoken in Finland (Hämäläinen et al., 2020b) and historic Finnish (Hämäläinen et al., 2021b).

The dialect generator can be tested online (https://uralicnlp.com/murre) and the dialect normalizer and generator code has been posted openly on Github (https://github.com/mikahama/murre). The dialect ID can also be found on Github (https://github.com/Rootroo-ltd/FinnishDialectIdentification).


Warning: AAAS and EurekAlert! are not responsible for the accuracy of any press releases posted on EurekAlert! by contributing institutions or for the use of any information via the EurekAlert system.


Source link

Comments are closed.