Exercise: Compare the Wikipedia translations
Given an article on the English Wikipedia, for example about Perl, Python, Ruby, PHP, or JavaScript, create a program that will fetch the size of all the translated versions of this article from every language on Wikipedia.
Depending on the level of investigation you'd like to do you can start implementing right away or you could read one or more of the hint that explain what you need to fetch.
Hints
Wikipedia provides an API to fetch the content of the page in raw format. It also provide a lot more details about its API, including information about API::Properties.
The language links are served by Wikidata.
Hints
This URL will return the content of the 'Perl' page of the English version of the Wikipedia in JSON format:
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Perl
This URL will return the list of translated versions of the page with Q-id Q42:
https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&props=sitelinks&ids=Q42
Given a title (in this case PHP), the following URL will return the Q-id of the page:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&format=json&titles=PHP'
Hints
There seem to be 4 types of language links returned from Wikidata:
Plain Wikipedia links that end in the word 'wiki' such as itwiki, newwiki, or pdcwiki. They can be 2 or more characters. The real URL is the same without the last 4 characters.
Wikipedia links with underscores such as zh_yuewiki, bat_smgwiki, or zh_min_nanwiki are quite similar, but we need to replace the underscore _ characters by dash - characters.
Wikiquote links. For exampe enwikiquote which map to https://en.wikiquote.org/.
Wikibook links, such as frwikibook which map to https://fr.wikibooks.org/.
Tools
Solutions
Published on 2015-11-16