X-Git-Url: https://projects.mako.cc/source/state_of_wikimedia_research_2013/blobdiff_plain/122dd75e45ccdfd01c795839ede755ffed1a69c2..ba6cb65ce89c04f5222e74a82c93e4f030dd16d3:/20130809-wikimania_research.tex diff --git a/20130809-wikimania_research.tex b/20130809-wikimania_research.tex index 3dddda9..58bb079 100644 --- a/20130809-wikimania_research.tex +++ b/20130809-wikimania_research.tex @@ -55,6 +55,8 @@ } } +% create an empty quotetxt so we can reuse it +\newcommand{\quotetxt}{} % add function to stop numbering appendix slides \newcommand{\backupbegin}{ @@ -163,7 +165,12 @@ \let\olditemize\itemize \renewcommand\itemize{\olditemize\itemsep-1pt} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\section{Introduction} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + %% SLIDE: Title Slide +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame}[plain] \begin{tikzpicture} @@ -193,8 +200,530 @@ \input{vc} \tikz[overlay,shift=(current page.south west)]{\node [xshift=5.6em,yshift=0.5em]{\colorbox{makopurple1}{\color{white} \tt \smaller \smaller \smaller revision:\ \VCRevision\ (\VCDateTEX)}};} + \note{I've been doing this for many years. I started in 2008 and + skipped one year, I think. + + This began as an excuse for me to make sure I was up to date on + Wikimedia Research.} + +\end{frame} + +%% SLIDE: Anecdote from Wikimania 2008 +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\renewcommand{\quotetxt}{``This talk will try to [provide] a quick + tour – a literature review in the scholarly parlance – of the last + year's academic landscape around Wikimedia and its projects geared + at non-academic editors and readers. It will try to categorize, + distill, and describe, from a birds eye view, the academic landscape + as it is shaping up around + our project.''\\ + \hfill – \e{From my Wikimania 2008 Submission}} +\begin{frame} + + {\smaller \quotetxt} + + \pause + \includegraphics[width=\textwidth]{figures/google_scholar_result.png} + + \pause + \tikz{\draw (current page.center) [xshift=-2.1cm, yshift=0.9cm, color=red] + ellipse (1.5cm and 0.5cm);} + + \note<1>{Back in Wikimania 2008, I set out to run a session at + Wikimania that would provide a comprehensive literature review of + articles in Wikipedia published in the last year. + + \begin{quote} + \quotetxt + \end{quote} + + Then, about two weeks before Wikimania, I did the scholar search + so I could build the literature.} + + \note<2->{I tried to import the whole list into Zotero and managed + to get banned for abusing the Google Scholar because they thought + that no human being could realistically consume the amount of + material published on Wikipedia that year. + + So anyway, I had a 45 minute talk so it worked out to 3.45 seconds + to per paper... + + And believe it or, this year is even bigger. + + And my talk is even shorter.} + +\end{frame} + +%% SLIDE: Citations Per Year +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame} + + \includegraphics[width=\textwidth]{figures/citations_by_year.pdf} + + \centering + + {\smaller \emph{Number of citation, per year, with the term + “wikipedia” in the title.\\ + (Source: Google scholar results. Accessed: 2013-08-06)}} + + \note{Academics have written \e{a lot} of papers about + Wikipedia. There are more than 500 papers published about + Wikipedia each year and although we've reached a peak, it's not + really slowing. + + We're on track this year to meet or surpass that.} + +\end{frame} + +% %% SLIDE: breakdown by time? +% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +% \begin{frame} + +% \includegraphics[width=\textwidth]{figures/wikipeda_citations_bytime.png} +% \end{frame} + + +%% SLIDE: My Scope Conditions +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame} + + \includegraphics[width=\textwidth]{figures/multiple_issues.png} + + \larger \larger + In selecting papers for this session, the goal is always to choose + examples of work that: + + \begin{itemize} + \larger \larger + \item Represent \e{important themes} from Wikipedia in the last year. + \item Research that is likely to be of \e{interest} to Wikimedians. + \item Research by people who are \e{not at Wikimania}. + \end{itemize} + + Within these goals, the selections are \e{incomplete}, and \e{wrong}. + + \note{This is my disclaimer slide...} +\end{frame} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\section{Paper Summaries} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\subsection{Wikipedia in Context} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +%% SLIDE: Reagle and Loveland Citation +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame}{Wikipedia in Historical Context} + + \larger \larger Loveland, Jeff, and Joseph Reagle. “Wikipedia and + Encyclopedic Production.” \emph{New Media \& Society} + (2013). DOI:10.1177/1461444812470428. + + \note{Jeff Loveland is a historian of encyclopedias. Joseph Reagle + is a media studies scholar who wrote the first book length + academic treatment of Wikipedia. + + Loveland heard about Reagle's book through an article in the + Signpost but felt it was weak on history. So, they got together + and put together a great piece of work that places Wikipedia into + historical context.} +\end{frame} + +%% SLIDE: Reagle and Loveland Overview +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame}{Wikipedia in Historical Context} + + \larger \larger \larger Loveland and Reagle cite three modes + of encyclopedia production: + + \begin{itemize} + \larger \larger \larger + \item Compulsive collection + \item Stigmertic accumluative + \item Corporate production + \end{itemize} + + In each case, they see a connection between Wikipedia and methods of + the past. + + \note<1>{The authors identify three historical methods through which + encyclopedias were written and they suggest that, in different + ways, each plays a role in Wikipedia: + + \begin{itemize} + \item \e{Compulsive collection} were people who were individually + driven to collect information. Think Pliny the Elder. And then + think Wikipediaholics and WikiBreak enforcing software. + \item \e{Stigmergic accumulation} references the `stigmergy' is a + word form Zoology that describes how wasps build nests and + references accumulation. In the past, this meant piracy and + building off of others. In Wikipedia, it means revision, + incorporation of other sources, and more. + \item \e{Corporate productin} means working together with many + other people. Diderot took advantage of at least 140 different + authors. Think the OED collecting information from + others. Wikipedia of course uses a similar model. + \end{itemize} + + In each case, they think that Wikipedia's model is not a total + break from the past in the way many people talk abou it.} + + \note<2>{Now my own bias as a reseacher is to look to more + quantitative or easy to apply work. + + \e{Takeaway:} But I think is a great example how much of the more humanities + focused work on Wikipedia can do a wonderful job of providing us + context and a better way to think about and talk about what we're + doing.} +\end{frame} + + +\subsection{Wikipedia as Data Source} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +%% SLIDE: Citation +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame}{Wikipedia as Data Source} + + \larger \larger + + Sérasset, Giles. “Dbnary: Wiktionary as a LMF Based Multilingual RDF + Network.” In \emph{Proceedings of the Eight International Conference on + Language Resources and Evaluation}, 2012. + + \begin{center} + \visible<2>{\url{http://dbnary.forge.imag.fr/}} + \end{center} + + \note<1>{There's a whole genre of paper that is about Wikipedia only + in that is uses WP as its dataset. This might even be a + \e{majority} of all papers published on Wikipedia. + + This paper up here, on a project called ``Dbnary'', is attempt to + build a \e{lexical network} out of Wiktionary data. Essentially, + they are using Wiktionary as a network of words and their + relationships -- including definitions, translations, synonyms, + antonyms, etc. -- in different languages, often connected through + common etymologies. + + Lexical networks are are essential to a whole family of + computerized natural language processing and a variety of + linguistic projects. + + What I like about what Sérraseset did was that he created not only + use it as a dataset but really did a bunch of work to make + Wiktionary more useful to other resources.} + + \note<2>{The researcher has created an open source tool – available + at the URL above. + + And anybody can use this tool, along with the dumps as published + by WMF, to produce their own, on their computers, is about 5 + minutes. + + The paper also contains a list of challenges that Wiktionary + contributors might be able to use to extract data more effectively + in the future. + + \e{Takeaway:} I think that this paper suggests, like a lot of + simliar work, how Wikipedia's effect is broader than just what + comes through viewership on the web. And that there are important + ways we might be able to work with researchers like this to become + more effective.} + +\end{frame} + +\subsection{Wikipedia and Quality} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +%% SLIDE: Wikipedia and Quality Citation +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame}{Wikipedia and Quality} + + \larger \larger + + Volsky, Peter G., Cristina M. Baldassari, Sirisha Mushti, and Craig + S. Derkay. ``Quality of Internet Information in Pediatric + Otolaryngology: A Comparison of Three Most Referenced Websites.'' + \emph{International Journal of Pediatric Otorhinolaryngology} 76, + no. 9 (September 2012): 1312–1316. DOI:10.1016/j.ijporl.2012.05.026. + + \note{There is little industry of articles designed to evaluate + Wikipedia's quality. There are literally dozens of these each + year. And one that thing that frustrates me is that its very rare + that the people doing these coordinate with Wikipedia or that + Wikipedians systematically reach out to the people doing these to + learn. + + This is an example of one from pediatric otolayrnology. That is, + the study of dieases of the ear, nose, and throat -- in children.} + +\end{frame} + +%% SLIDE: Results +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame}{Wikipedia and Quality: Evaluation of Otolaryngology Articles} + \smaller \smaller + \begin{columns} + \column{0.53\textwidth} + \centering + + \includegraphics[width=0.6\textwidth]{figures/oto-content_accuracy.png} + + Accuracy as scored for content against a rubric\\ + developed from otolaryngology textbooks. + + \bigskip + + \includegraphics[width=0.6\textwidth]{figures/oto-errors_omissions.png} + + Mean numbers of errors and omissions. + + \column{0.47\textwidth} + \centering + + \includegraphics[width=0.6\textwidth]{figures/oto_reading_level.png} + + Composite score for user interface. + + \bigskip + + \includegraphics[width=0.6\textwidth]{figures/oto-user_interface.png} + + Flesch–Kinkaid Reading Level. + + \end{columns} + + \bigskip + + {\centering + {\larger WK=Wikipedia; ML=MedLinePlus; EM=eMedicine.} + + } + + \note{Like many of these studies, this study cmpares Wikipedia to + other sites. In this case, eMedicne, and Medicine Plus. They used + a series of textbooks and experts to evaluate the the content + errors and they used some standard systems to evaluate usability + and reading level. + + They find that Wikipedia has the most errors, the least accuracy, + aa medium reading level. But similar in most cases to MedLinePlus. + + And Wikipedia had a rather good user interface compared to the + others. + + I'm not sure what that says about the others user interface. + + \e{Takeaway:} We need to be better about getting these datsets and + helping integrate these into improving the encyclopedia.} \end{frame} +\subsection{Perception of Quality} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +%% SLIDE: Perception of Quality +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame}{Perception of Quality} + + \larger \larger Towne, W. Ben, Aniket Kittur, Peter Kinnaird, and + James Herbsleb. “Your Process Is Showing: Controversy Management and + Perceived Quality in Wikipedia.” In \emph{Proceedings of the 2013 + Conference on Computer Supported Cooperative Work}, 1059–1068. CSCW + ’13. New York, NY, USA: ACM, 2013. DI:10.1145/2441776.2441896. + + \note{A group at Carnegie Mellon put together a really nice piece + that tried to surface Wikipedia's talk pages. Now, as many of you + will know intuitive, a majority of Wikipededia's work happens on + talk pages are invisible to many users. What would happen if we + made this more visible?} +\end{frame} + +\begin{frame}{Perception of Quality: Towne et al.} + + \larger \larger + ``Laws, like sausages, cease to inspire respect in proportion + as we know how they are made.''\\ + \hfill -- John G. Saxe, + + \begin{itemize} + \larger \larger + \item<2-> Discussion $\Rightarrow$ Lower Ratings + \item<3-> Unresolved conflict $\Rightarrow$ Even lower ratings + \item<4-> Discussion $\Rightarrow$ Higher reported preception of + Wikipedia and article! + \end{itemize} + + \note{The goal was to test this theory in Wikipedia. + + An experiment, on Mechanical Turk, to show people Wikipedia + articles and also to show them the talk pages. Then then asked + people to rate the articles, and their perception of the article + and of Wikipedia. + + \begin{itemize} + \item When discussion is shown, quality rating were significantly lower. + \item When discussion involving conflict was displayed, article + quality ratings were even lower. + \item If the editors involved in the conflict resolved it + through a positive collaboration approach, the negative + effects of conflict disappeared. + \item Participants reported that reading the discussion raised + their perceptions of both the article’s quality and Wikipedia + in general. (i.e., they were not aware of the rating-lowering + effect of the discussion, and generally.) + \end{itemize} + + \e{Takeaway:} There's a deep and interesting tradeoff that cuts to + the core of Wikimedia's two missions to empower folks by getting + involved in the process to display material. This kind of work + explores big important questions at the heart of the foundations + work.} + +\end{frame} + +\subsection{Tool Building for Wikipedians} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +%% SLIDE: Tool Building for Wikipedians +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame}{Tool Building for Wikipedians} + + \larger \larger Solorio, Thamar, Ragib Hasan, and Mainul Mizan. ``A + Case Study of Sockpuppet Detection in Wikipedia.'' In + \emph{Proceedings of the Workshop on Language in Social Media}, + 59–68. Atlanta, Georgia, USA: Association for Computational + Linguistics, 2013. + + \note<1>{This is paper from a computational linguistics conference. And + they set out to create a method to identify sockpuppets in + Wikipedia. + + There's a little academic industry designed to detect authorship + across texts and alias. But one problem that literature has is + that they almost no data of people \e{trying} to hide their + identity where the identity was later confirmed. + + Wikipedia has no such problem. There were more than 2,700 cases of + suspected sock-puppeting in Wikipedia in 2012 alone.} + + \note<2>{They use a database of confirmed (with checkuser) and rejected + cases of sockpuppeting to train a machine learning based approach + to classify edits. + + The system achieved an accuracy of 68.83\% in the tested cases. + + This is not very good because simply always confirming the + suspected sockpuppet abuse would have achieved 53.24\% accuracy. + After adding features based on the user's edit frequency by time + of day and day of the week, it achieved 84.04\% confidence. + + The authors have ideas of creating a system that could run in the + background and detect sockpuppets. But even if that never happens, + community members have done similar work in the past. And this + represents a set of tools and techniques from which the community + could directly benefit. + + \e{Takeaway:} We need to get better about working with all the + people, like this, building tools for our communities.} + + +\end{frame} + + +\subsection{Effects of Feedback} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +%% SLIDE: Effects of Feedback Citation +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame}{Effects of Feedback} + + \larger \larger Zhu, Haiyi, Amy Zhang, Jiping He, Robert Kraut, and Aniket + Kittur. ``Effects of Peer Feedback on Contribution: A Field + Experiment in Wikipedia.'' In \emph{Proceedings of the SIGCHI + Conference on Human Factors in Computing Systems}. Paris, France: + ACM, 2013. + + \note{There have been a whole bunch of studies which have looked at + the effects of feedback on contribution to Wikipedia. Reverts, + welcome messages, et. And they have shown a series of effects. + + But one concern with this work is that it is not causal. People + who receive negative messages are often behaving differently than + people who do not. + + This reflects a real experiment, done in Wikipedia, where + different types of feedback were randomly assigned. + + In August-November 2011, they left feedback for 703 creators of + new articles in Wikipedia after at least two days and making sure + the article had a certain amount of content and had not been + tagged for speedy deletion.} + +\end{frame} + +%% SLIDE: Effects of Feedback Figures +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame}{Effects of Feedback: Zhu et al.} + \centering + + \includegraphics[height=0.85\textheight]{figures/shared_leadership-figures.pdf} + + \note{They left four kinds of feedback: positive, negative, + directive, and social. + + And they were interested in both the effect on editing in the new + article they mention and on general editing on Wikipedia. + + Feedback had no effect at all on experienced contributors. At + all. This was surpising to the folks running the study but maybe + not to the folks in this room. + + In newbies, they found that negative feedback and directive + feedback had a positive effect on editing in the focal article and + positive feedback had a effect on general editing (but not the + article in question). And they found no other effects. + + \e{Takeaway:} We should learn from and improve our processes based + on studies like these. We should work with researchers to do more + experiments. There are important ethical implications. There was a + long section of the paper about talking to the research ctte and + others.} + +\end{frame} + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\section{Conclusion} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +%% SLIDE: Other Resources +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\begin{frame}{More Resources} + + \begin{itemize} + \larger \larger \larger + \item \e{Wikimedia Research Newsletter} [[:meta:Research:Newsletter]] + \item \e{WikiSym} (Last week in Hong Kong!) + \item \e{WikiPapers Repository} [http://wikipapers.referata.com] + \item \e{Much More} + \end{itemize} + + \note{Those are my six postcards. + + There has been just tons and tons of work in this area. Trying to + talk about this in 20 minutes strikes me as increasingly crazy + every year I try to do it. + + The most important source, now going for a couple years, is the + Wikimedia Research Newsletter which is published monthly in the + signpost. + + But there are other resources as well. And I encourage you to get + involved.} + +\end{frame} \end{document}