X-Git-Url: https://projects.mako.cc/source/redirect-tools/blobdiff_plain/9d33a112f7d061be0f582dfe80b07ffb804952ee..HEAD:/docs/homepage.rst diff --git a/docs/homepage.rst b/docs/homepage.rst index 206afa8..ffe7719 100644 --- a/docs/homepage.rst +++ b/docs/homepage.rst @@ -14,27 +14,28 @@ __ https://en.wikipedia.org/wiki/Main_Page __ https://en.wikipedia.org/wiki/Seattle,_Washington __ https://en.wikipedia.org/wiki/Seattle -In wikis, **redirects** are special pages in that silently take readers -from the page they are visiting to another page in in the wiki. In the -`English Wikipedia`__, redirects make up more than half of all article -pages. +In wikis, **redirects** are special pages in that silently take +readers from the page they are visiting to another page in the +wiki. In the `English Wikipedia`__, redirects make up more than half +of all article pages. -Different data sources of handle redirects differently. For example, -`the MediaWiki API`__ will automatically "follow" redirects but the `XML -database dumps`__ treat redirects like normal articles. In both cases, -redirects are often invisible to reseachers. +Different Wikipedia data sources handle redirects differently. For +example, `the MediaWiki API`__ will automatically "follow" redirects +but the `XML database dumps`__ treat redirects like normal +articles. In both cases, redirects are often invisible to researchers. __ https://www.mediawiki.org/wiki/API:Main_page __ https://meta.wikimedia.org/wiki/Data_dumps Because redirects constitute a majority of all pages and see a large -portion of all traffic, Wikipedia reseachers need to take redirects into -account or their findings may be incomplete or incorrect. For example, -the histogram on this page shows the distribution fo edits across pages -in Wikipedia for every page, and for non-redirects only. Because -redirects are almost never edited, the distributions are very different. -Similarly, because redirects are viewed but almost never edited, any -study of views over articles should also take redirects into account. +portion of all traffic, Wikipedia researchers need to take redirects +into account or their findings may be incomplete or incorrect. For +example, the histogram on this page shows the distribution of edits +across pages in Wikipedia for every page, and for non-redirects only. +Because redirects are almost never edited, the distributions are +very different. Similarly, because redirects are viewed but almost +never edited, any study of views over articles should also take +redirects into account. .. figure:: edits_over_pages.png :align: right @@ -54,15 +55,16 @@ when and where pages redirect. Much more detail can be found in `Consider the Redirect: A Missing Dimension of Wikipedia Research`__ — a short paper that we have written -to acccompany this dataset and these tools. If you use this software or +to accompany this dataset and these tools. If you use this software or these data, we would appreciate if you cite the paper: - *Hill, Benjamin Mako and Aaron Shaw. "Consider the Redirect: A Missing - Dimension of Wikipedia Research." In Proceedings of the 10th - International Symposium on Open Collaboration (OpenSym 2014). ACM - Press, 2014.* + Hill, Benjamin Mako & Shaw, Aaron. (2014) "Consider the Redirect: A + Missing Dimension of Wikipedia Research." In *Proceedings of the 10th + International Symposium on Open Collaboration (OpenSym 2014)*. ACM + Press. `doi: 10.1145/2641580.2641616`__ -__ http://mako.cc/academic/hill_shaw-consider_the_redirect.pdf +__ https://doi.org/10.1145/2641580.2641616 +__ https://doi.org/10.1145/2641580.2641616 Generating Redirect Spells ============================= @@ -78,25 +80,25 @@ Generating redirect spells from an MediaWiki XML dump involves two steps: We have `publicly released software in Python and R to do these two steps`__ under the `GNU GPL version 3`__. The software is designed for people already comfortable with working with MediaWiki XML dumps and the -tools and software necessary to do this. We have provided -`documentation`__ on how to use these tools. +tools and software necessary to do this. __ http://projects.mako.cc/source/?p=redirect-tools __ http://www.gnu.org/licenses/gpl-3.0.html -__ README.html You can download the software from our git repository like:: git clone git://projects.mako.cc/redirect-tools -Detailed documentation on how to use the software is in available in our -README file. +Detailed documentation on how to use the software is in available in `our +README file`__. + +__ README.html Redirect Spell Data ========================= -In our paper `Consider the Redirect`__, we present an analysis of -redirect data from English Wikipedia in the dump created in October +In `our paper`__, we present an analysis of redirect data from English +Wikipedia in the dump created in October 2012. You can download `the dump files we used`__ from `the Wikimedia Foundation dataset archive`__. Because generating these dumps can be computationally intense, we have published the output of the software @@ -105,16 +107,37 @@ our software identified and is the dataset used in the paper. You can download the dataset in the following formats: -- `RData (240MB)`__ — Suitable for use in GNU R -- `bzip2 compressed tab seperated values (178MB)`__ — Suitable for use +- `RData (240MB)`__ — Suitable for use in `GNU R`__ +- `bzip2 compressed tab separated values (178MB)`__ — Suitable for use in other languages and statistical packages. -__ http://mako.cc/academic/hill_shaw-consider_the_redirect.pdf +__ https://doi.org/10.1145/2641580.2641616 __ http://dumps.wikimedia.org/enwiki/20121001/ __ http://dumps.wikimedia.org/ __ enwiki_201210-redirect_spells-v1.RData +__ http://www.r-project.org/ __ enwiki_201210-redirect_spells-v1.tsv.bz2 +Limitations +=============== + +Taking redirects into account is one important step that Wikipedia +researchers should take but it is hardly a panacea. As just one example, +in conversations after the publication of this paper, we have realized +that page moves may lead to additional challenges in interpreting view +data and in some cases to challenges in interpreting redirect data +itself. This work reflects a step toward increased validity but it is +incomplete. + +Depending on the research question, a complete picture may need to take +redirects, moves, other administrative actions, changing ways of +measuring views, bot and bot-assisted editing, along with other +currently unidentified features, into account. We hope to extend our +work with redirects and explore these issues and we hope other +researchers will join us in these efforts to build a better +understanding, tools, and datasets that can improve Wikipedia research. + + More Information ================== @@ -122,7 +145,7 @@ For details about the dataset, why it is important, and for examples on how it can be used to come to better findings in Wikipedia research, please read `the companion paper`__. -__ http://mako.cc/academic/hill_shaw-consider_the_redirect.pdf +__ https://doi.org/10.1145/2641580.2641616 If you notice issues or bugs in our data or `code`__, contact `Benjamin Mako Hill`__ or `Aaron Shaw`__. @@ -137,3 +160,10 @@ a patch using git are online`__. __ http://projects.mako.cc/source/ +---- + +ⓒ Copyright `Benjamin Mako Hill`__ and `Aaron Shaw`__ :: `Creative Commons BY-SA`__ :: Updated: Sun Dec 11 16:43:57 PST 2016 + +__ http://mako.cc/academic/ +__ http://aaronshaw.org/ +__ http://creativecommons.org/licenses/by-sa/4.0/