1 Page Protection Software and Dataset
2 ==================================================================
4 .. figure:: biology_screenshot.png
8 Example of the English Wikipedia article on Biology which has been
9 protected for long periods of time. Note the "View Source" button
10 instead of "Edit" and the small lock signaling that the page is
13 **Page protection** is a `feature of MediaWiki software`__ that allows
14 administrators to restrict contributions to particular pages. For
15 example, a page can be “protected” so that only administrators or
16 logged-in editors with a history of good editing can edit, move, or
19 __ https://www.mediawiki.org/wiki/Help:Protected_pages
21 Protection might involve “full protection” where a page can only be
22 edited by administrators (i.e., “sysops”) or “semi-protection” where a
23 page can only be edited by accounts with a history of good edits
24 (i.e., “autoconfirmed” users).
26 Although largely hidden, page protection profoundly shapes activity on
27 the site. For example, page protection is an important tool used to
28 manage access and participation in situations where vandalism or
29 interpersonal conflict can threaten to undermine content quality.
30 While protection affects only a small portion of pages in English
31 Wikipedia, many of the most highly viewed pages are protected. For
32 example, the “Main Page” in English Wikipedia has been protected since
33 February, 2006 and all Featured Articles are protected at the time
34 they appear on the site’s main page. Millions of viewers may never
35 edit Wikipedia because they never see an edit button.
37 Despite it's widespread and influential nature, very little
38 quantitative research on Wikipedia has taken page protection into
39 account systematically. This page contains software and data to help
40 Wikipedia researchers do exactly this in their work.
42 Because a page's protection status can change over time, the snapshots
43 of page protection data stored by Wikimedia and `published by
44 Wikimedia Foundation in as dumps`__ is incomplete. As a result, taking
45 protection into account involves looking at several different sources
48 __ http://dumps.wikimedia.org/
50 Much more detail can be found in our paper `Page Protection: Another Missing
51 Dimension of Wikipedia Research`__. If you use this software or these data,
52 we would appreciate if you cite the paper:
54 *Hill, Benjamin Mako and Aaron Shaw. (2015) “Page Protection: Another Missing
55 Dimension of Wikipedia Research.” In Proceedings of the 11th International
56 Symposium on Open Collaboration (OpenSym 2015). ACM Press. DOI:
57 10.1145/2788993.2789846*
59 __ http://mako.cc/academic/hill_shaw-protection_opensym2015.pdf
61 Page Protection Software
62 =============================
64 Building page protection data is a multi-step and labor intensive
65 process. We have `publicly released software in Python and R to do
66 these two steps`__ under the `GNU GPL version 3`__. The software is
67 designed for people already comfortable with working with MediaWiki
68 XML dumps and the tools and software necessary to do this.
70 __ http://projects.mako.cc/source/?p=protection-tools
71 __ http://www.gnu.org/licenses/gpl-3.0.html
73 You can download the software from our git repository like::
75 git clone git://projects.mako.cc/protection-tools
77 Detailed documentation on how to use the software is in available in `our
83 =========================
85 .. figure:: protections_over_time.png
89 Count of pages protected from editing in English Wikipedia over
90 time for all pages and for the article namespace only.
92 In `our paper`__, we present an analysis of page protection data from
93 English Wikipedia in the dump created in January 2015. You can
94 download `the dump files we used`__ from `the Wikimedia Foundation
95 dataset archive`__ and at the URLs detailed in the README__. Because
96 generating these dumps can be computationally intense, we have
97 published the output of the software above run on the this dump.
99 You can download the dataset in the following formats:
101 - `RData`__ — Suitable for use in `GNU R`__
102 - `bzip2 compressed tab separated values`__ — Suitable for use
103 in other languages and statistical packages.
105 __ http://mako.cc/academic/hill_shaw-protection_opensym2015.pdf
107 __ http://dumps.wikimedia.org/enwiki/20150112/
108 __ http://dumps.wikimedia.org/
109 __ enwiki_201501-protection_spells-v1.RData
110 __ http://www.r-project.org/
111 __ enwiki_201501-protection_spells-v1.tsv.bz2
117 For details about the dataset, why it is important, and for examples on
118 how it can be used to come to better findings in Wikipedia research,
119 please read `the companion paper`__.
121 __ http://mako.cc/academic/hill_shaw-protection_opensym2015.pdf
123 If you notice issues or bugs in our data or `code`__, contact `Benjamin
124 Mako Hill`__ or `Aaron Shaw`__.
126 __ http://projects.mako.cc/source/?p=project-tools
127 __ http://mako.cc/contact/
128 __ http://aaronshaw.org/
130 Patches and improvements are welcome! Details on `how to produce and send
131 a patch using git are online`__.
133 __ http://projects.mako.cc/source/
137 ⓒ Copyright `Benjamin Mako Hill`__ and `Aaron Shaw`__ :: `Creative Commons BY-SA`__ :: Updated: Thu Jul 3 13:22:29 PDT 2014
139 __ http://mako.cc/academic/
140 __ http://aaronshaw.org/
141 __ http://creativecommons.org/licenses/by-sa/4.0/
143 .. LocalWords: png figwidth px autoconfirmed