Given an article on the English Wikipedia, for example about Perl, Python, Ruby, PHP, or JavaScript, create a program that will fetch the size of all the translated versions of this article from every language on Wikipedia.

Depending on the level of investigation you'd like to do you can start implementing right away or you could read one or more of the hint that explain what you need to fetch.

Hints

Wikipedia provides an API to fetch the content of the page in raw format. It also provide a lot more details about its API, including information about API::Properties.

The language links are served by Wikidata.

Hints

This URL will return the content of the 'Perl' page of the English version of the Wikipedia in JSON format:

https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Perl

This URL will return the list of translated versions of the page with Q-id Q42:

https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&props=sitelinks&ids=Q42

Given a title (in this case PHP), the following URL will return the Q-id of the page:

https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&format=json&titles=PHP'

Hints

There seem to be 4 types of language links returned from Wikidata:

Plain Wikipedia links that end in the word 'wiki' such as itwiki, newwiki, or pdcwiki. They can be 2 or more characters. The real URL is the same without the last 4 characters.

Wikipedia links with underscores such as zh_yuewiki, bat_smgwiki, or zh_min_nanwiki are quite similar, but we need to replace the underscore _ characters by dash - characters.

Wikiquote links. For exampe enwikiquote which map to https://en.wikiquote.org/.

Wikibook links, such as frwikibook which map to https://fr.wikibooks.org/.

Tools

Solutions

wikipedia stats in GitHub