1 Page Protection Software and Dataset
2 ==================================================================
4 .. figure:: biology_screenshot.png
8 Example of the English Wikipedia article on Biology which has been
9 protected for long periods of time. Note the "View Source" button
10 instead of "Edit" and the small lock signaling that the page is
13 **Page protection** is a `feature of MediaWiki software`__ that allows
14 administrators to restrict contributions to particular pages. For
15 example, a page can be “protected” so that only administrators or
16 logged-in editors with a history of good editing can edit, move, or
19 __ https://www.mediawiki.org/wiki/Help:Protected_pages
21 Protection might involve “full protection” where a page can only be
22 edited by administrators (i.e., “sysops”) or “semi-protection” where a
23 page can only be edited by accounts with a history of good edits
24 (i.e., “autoconfirmed” users).
26 Although largely hidden, page protection profoundly shapes activity on
27 the site. For example, page protection is an important tool used to
28 manage access and participation in situations where vandalism or
29 interpersonal conflict can threaten to undermine content quality.
30 While protection affects only a small portion of pages in English
31 Wikipedia, many of the most highly viewed pages are protected. For
32 example, the “Main Page” in English Wikipedia has been protected since
33 February, 2006 and all Featured Articles are protected at the time
34 they appear on the site’s main page. Millions of viewers may never
35 edit Wikipedia because they never see an edit button.
37 Despite it's widespread and influential nature, very little
38 quantitative research on Wikipedia has taken page protection into
39 account systematically. This page contains software and data to help
40 Wikipedia research do exactly this in their work.
42 Because a page's protection status can change over time, the snapshots
43 of page protection data stored by Wikimedia and `published by
44 Wikimedia Foundation in as dumps`__ is incomplete. As a result, taking
45 protection into account involves looking at several different sources
48 __ http://dumps.wikimedia.org/
50 Much more detail can be found in our paper (currently under review)
51 `Page Protection: Another Missing Dimension of Wikipedia
52 Research`__. If you use this software or these data, we would
53 appreciate if you cite the paper:
55 *Hill, Benjamin Mako and Aaron Shaw. (2015) “Page Protection: Another Missing
56 Dimension of Wikipedia Research.” In Proceedings of the 11th International
57 Symposium on Open Collaboration (OpenSym 2015). ACM Press. DOI:
58 10.1145/2788993.2789846*
60 __ http://mako.cc/academic/hill_shaw-protection_opensym2015.pdf
62 Page Protection Software
63 =============================
65 Building page protection data is a multi-step and labor intensive
66 process. We have `publicly released software in Python and R to do
67 these two steps`__ under the `GNU GPL version 3`__. The software is
68 designed for people already comfortable with working with MediaWiki
69 XML dumps and the tools and software necessary to do this.
71 __ http://projects.mako.cc/source/?p=protection-tools
72 __ http://www.gnu.org/licenses/gpl-3.0.html
74 You can download the software from our git repository like::
76 git clone git://projects.mako.cc/protection-tools
78 Detailed documentation on how to use the software is in available in `our
84 =========================
86 .. figure:: protections_over_time.png
90 Count of pages protected from editing in English Wikipedia over
91 time for all pages and for the article namespace only.
93 In `our paper`__, we present an analysis of page protection data from
94 English Wikipedia in the dump created in January 2015. You can
95 download `the dump files we used`__ from `the Wikimedia Foundation
96 dataset archive`__ and at the URLs detailed in the README__. Because
97 generating these dumps can be computationally intense, we have
98 published the output of the software above run on the this dump.
100 You can download the dataset in the following formats:
102 - `RData`__ — Suitable for use in `GNU R`__
103 - `bzip2 compressed tab separated values`__ — Suitable for use
104 in other languages and statistical packages.
106 __ http://mako.cc/academic/hill_shaw-protection_opensym2015.pdf
108 __ http://dumps.wikimedia.org/enwiki/20150112/
109 __ http://dumps.wikimedia.org/
110 __ enwiki_201501-protection_spells-v1.RData
111 __ http://www.r-project.org/
112 __ enwiki_201501-protection_spells-v1.tsv.bz2
118 For details about the dataset, why it is important, and for examples on
119 how it can be used to come to better findings in Wikipedia research,
120 please read `the companion paper`__.
122 __ http://mako.cc/academic/hill_shaw-protection_opensym2015.pdf
124 If you notice issues or bugs in our data or `code`__, contact `Benjamin
125 Mako Hill`__ or `Aaron Shaw`__.
127 __ http://projects.mako.cc/source/?p=project-tools
128 __ http://mako.cc/contact/
129 __ http://aaronshaw.org/
131 Patches and improvements are welcome! Details on `how to produce and send
132 a patch using git are online`__.
134 __ http://projects.mako.cc/source/
138 ⓒ Copyright `Benjamin Mako Hill`__ and `Aaron Shaw`__ :: `Creative Commons BY-SA`__ :: Updated: Thu Jul 3 13:22:29 PDT 2014
140 __ http://mako.cc/academic/
141 __ http://aaronshaw.org/
142 __ http://creativecommons.org/licenses/by-sa/4.0/
144 .. LocalWords: png figwidth px autoconfirmed