At the Institute for Language and Folk Culture (Isof), they have worked extensively with dialect collection, primarily between 1935 and 1970, when researchers traveled around the country recording how people spoke.
The initial purpose was largely to preserve traditional dialects, explains Annette C Torensjö, head of the Department of Archives and Research in Uppsala at Isof.
The recordings were used for many years mainly by researchers, but last year, the KB-Lab at the Royal Library got in touch. Leonora Vesterbacka is a senior data scientist and leads a project to train speech-to-text models, which "translate" speech into written text.
A Small Part of Swedish
American AI models are often trained on massive multilingualism, and then the Swedish part becomes very small.
There is a model from Open AI called Whisper, which is trained on 680,000 hours of various languages, of which around 2,000 hours are Swedish. That's 0.3 percent Swedish, explains Leonora Vesterbacka.
The models work well on standard Swedish, but worse on dialects. That's where the Halland dialects, and others who had their dialects recorded, come in.
If someone had told me in the 1980s that the new gold would be an old weird roll of recordings in an archive, I would have thought they were joking.
They also use protocols and recordings from the Riksdag Administration.
They are responsible for recording what is said in the Riksdag and then making it available and releasing protocols. This has been going on for a long time.
In Sweden, we are so good at preserving everything. It's amazing to see that it can be used in the future as well.
Making Available
When the models are fully trained, they can, for example, be used to transcribe medical records and meetings or write subtitles for TV broadcasts. They can also be used to make spoken material, such as podcasts and TV broadcasts from authorities, available.
At Isof, they had not believed that their old dialect recordings would become an important step into the future.
I'm very pleased that our dialect recordings really have such great relevance now, that this is something that can be worked with as part of social development, says Annette C Torensjö.
A dialect is a language variant spoken by the inhabitants of a geographically defined area. It differs from the standard language as well as from adjacent dialects. The differences can concern language features at all levels: phonological, morphological, lexical, and syntactic.
Sociolect, or social dialect, is a language variant that is characteristic of a particular social group. Sociolects are often limited to pronunciation, grammar, and vocabulary, and are also regionally restricted, so that members of the same social class in different parts of a language area vary from each other.
Source: ne.se