Language codes for Twitter
March 12, 2010 in Uncategorized
How to find tweets in a specific language?
That’s an issue for many Twitter users, including language learners and native speakers of other languages. Because of the dominance of English on the web, it’s easy to find English tweets. But finding tweets in other languages is not so straightforward.
One solution could be to tag each tweet with a language code. Using IANA’s existing language codes seems an ideal solution for compatibility and ease of recognition. This coding system is used widely on the web and in its basic form uses a two-letter code for each language. For example, the code for English is en, the code for Maori is mi.
It would be possible to use a tag prefix symbol in front of each code so that we could search for tweets in that language. But we need to use a different tag prefix than is currently used for tweet topics. Ideally, we could use just a single non-alpha character: where # is used for topic tags, we could use something like the percent sign. So, a tweet in modern Greek might be tagged with %el.
It might also be useful to flag tweets written in a non-standard language character set. For example, because of the limitations of some Twitter clients, we might want an additional symbol to tag a tweet that it is in Greek but which is transliterated into an English character set. Eg %el!
Since there is a lack of documentation on which specific characters are distinguished by Twitter’s search function, any use of a new tag prefix to denote language will require some trial-and-error testing to ensure it works effectively. If the polyglot community of Twitter users could agree on such a coding system, it would make it much easier to find relevant posts in languages other than English.
Image: Brueghel’s Tower of Babel