Unlocking the power of task-driven pretraining That is what zero-shot learning enables, and it is a key component in Speller100 that allows us to expand to languages with very little to no data. Imagine someone had taught you how to spell in English and you automatically learned to also spell in German, Dutch, Afrikaans, Scots, and Luxembourgish. Another concept, zero-shot learning, allows a model to accurately learn and correct spelling without any additional language-specific labeled training data. The foundation of Speller100 is based on the concept of language families-for our purposes, larger groups of languages based on similarities that multiple languages share. In order to create spelling correction solutions for these latter types of languages, models cannot rely solely on training data to learn the spelling of a language. For a language with very little web presence and user feedback, it’s challenging to gather an adequate amount of training data. This practice has been very effective, especially for languages where user feedback data has been gathered on a large scale. For precise and high-performing error models, search engines have largely leveraged user feedback on autocorrection recourse links. Search engines have long used web documents for robust language models. Traditionally, spelling correction solutions have leveraged noisy channel theory and made great improvements in building better statistical error models and language models. Speller100 has improved quality in a great many low- and no-resource languages, such as Macedonian, Belarusian, Azerbaijani, Pashto, Slovak, Romanian, and others to bring much better experience to our users. Query: Above are some examples of Bing search results after Speller100 implementation. This was made possible by leveraging recent advances in AI, particularly zero-shot learning combined with carefully designed large-scale pretraining tasks, and we also draw on historical linguistics theories. This is a huge step forward, especially when considering that spelling correction was available for just a few dozen languages a short time ago. A speller for 100-plus languages in Microsoftĭespite these challenges, we have recently launched our large-scale multilingual spelling correction models worldwide with high precision and high recall in 100-plus languages! These models, technology we collectively call Speller100, are currently helping to improve search results for these languages in Bing. We’ve found we need a very large number of data points to train a high-quality spelling correction model for each language, and sourcing data in over 100 languages would be incredibly difficult logistically-not to mention costly in both time and money. In order to make Bing more inclusive, we set out to expand our current spelling correction service to 100-plus languages, setting the same high bar for quality that we set for the original two dozen languages. However, that left users who issued queries in many more languages dealing with inferior results or manually correcting queries themselves. We have had high-quality spelling correction for about two dozen languages for quite some time. Since it is important to us to provide all customers with access to accurate, state-of-the-art spelling correction, we are improving search so that it is inclusive of more languages from around the world with the help of AI at Scale. Our spelling correction technology powers several product experiences across Microsoft. Therefore, spelling correction is the very first component in the Bing search stack because searching for the correct spelling of what users mean improves all downstream search components. When queries are misspelled, we match the wrong set of documents and trigger incorrect answers, which can produce a suboptimal results page for our customers. In search we’ve found about 15% of queries submitted by customers have misspellings. We serve a diverse set of customers all over the planet who issue queries in over 100 languages. At Microsoft Bing, our mission is to delight users everywhere with the best search experience.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |