2 =======================
4 In wikis, **redirects** are special pages in that silently take readers
5 from the page they are visiting to another page in in the wiki. In the
6 `English Wikipedia`__, redirects make up more than half of all article
9 __ https://en.wikipedia.org/wiki/Main_Page
11 .. image:: example_image
13 Different data sources of handle redirects differently. `The MediaWiki
14 API`__ will automatically "follow" redirects but the `XML database
15 dumps`__ treat redirects like normal articles. In both cases, redirects
16 are often invisible to reseachers.
18 __ https://www.mediawiki.org/wiki/API:Main_page
19 __ https://meta.wikimedia.org/wiki/Data_dumps
21 Because redirects constitute a majority of all pages and see a large
22 portion of all traffic, Wikipedia reseachers need to take redirects into
23 account or their findings may be incomplete or incorrect. For example,
24 the following image shows the distribution fo edits across pages in
25 Wikipedia for every page, and for non-redirects only.
27 .. image:: redirects_whatever.png
29 Because redirects are almost never edited, the distributions are
30 very different. Because redirects are viewed but almost never edited,
31 any study of views over articles should also take redirects into
34 Because redirects can change over time, the snapshots of redirects
35 stored by Wikimedia and published by Wikimedia Foundation are
36 incomplete. Taking redirects into account fully involves looking at the
37 content of every single revision of every article to determine both
38 when and where pages redirect.
40 Much more detail can be found in `Consider the Redirect: A Missing
41 Dimension of Wikipedia Research`__ — a short paper that we have written
42 to acccompany this dataset and these tools. If you use this software or
43 these data, we would appreciate if you cite the paper:
45 *Hill, Benjamin Mako and Aaron Shaw. "Consider the Redirect: A Missing
46 Dimension of Wikipedia Research." In Proceedings of the 10th
47 International Symposium on Open Collaboration (OpenSym 2014). ACM
50 __ hill_shaw-consider_the_redirect.pdf
52 Generating Redirect Spells
53 =============================
55 Generating redirect spells from an MediaWiki XML dump involves two steps:
57 1. Searching the full text of every revision of every page in a dump to
58 determine if any given revision is a redirect.
60 2. Using the results of (1) to generate a list of "spells" that describe
61 periods of time that articles in a wiki redirect to other articles.
63 We have software in Python and R to do these two steps under the `GNU GPL
64 version 3`__. The software is designed for people already comfortable
65 with working with MediaWiki XML dumps and the tools and software
70 You can download the software from our git repository like::
74 Detailed documentation on how to use the software is in available in our
78 =========================
80 In Consider the Redirect, we present an analysis of redirect data from
81 English Wikipedia in the dump created on DATE. You can download the dump
82 files from HERE. Because generating these dumps can be computationally
83 intense, we have published the output of the software above run on the
84 this dump. This includes 9,277,563 redirect spells that our software
85 identified and is the dataset used in the paper.
87 You can download the dataset in the following formats:
89 - RData (240MB) — Suitable for use in GNU R
90 - bzip2 compressed tab seperated values — Suitable
95 For details about the dataset, why it is important, and for examples on
96 how it can be used to come to better findings in Wikipedia research,
99 If you notice issues or bugs in the data or script, contact `Benjamin
100 Mako Hill`__ or `Aaron Shaw`__.
102 __ http://mako.cc/contact/
105 Patches and improvements are welcome! Details on how to produce and send
106 a patch using git are online here.