+\begin{frame}
+ \centertext{6em}{Wikipedia as a Source of Data}
+
+ \note{Mako}
+\end{frame}
+
+\begin{frame}
+
+ \frametitle{Wikipedia as a source of data}
+
+ \larger \larger Ronen, S., Gonçalves, B., Hu, K. Z., Vespignani, A.,
+ Pinker, S., \& Hidalgo, C. A. (2014). \e{Links that speak: The
+ global language network and its association with global
+ fame}. Proceedings of the National Academy of Sciences, 111(52),
+ E5616—E5622. \href{http://doi.org/10.1073/pnas.1410931111}{doi:10.1073/pnas.1410931111}
+
+\end{frame}
+
+\begin{frame}
+ \frametitle{How to measure the global influence of languages?}
+
+ \larger \larger
+
+ \e{Traditional} methods rely on:
+
+ \begin{itemize}
+ \larger \larger
+ \item \e{Population} of speakers
+ \item \e{Income} or political power of speakers
+ \end{itemize}
+
+ Paper presents \e{new network method} based on measuring
+ \e{co-speakers} of languages in several data sources including
+ Wikipedia.
+
+\end{frame}
+
+\begin{frame}
+ \frametitle{Wikipedia as a source of data: Ronen et al.}
+
+ \includegraphics[width=\textwidth]{figures/ronen_fig1.png}
+
+ \note{Two languages are connected when users that edit an article in
+ one Wikipedia language edition are significantly more likely to
+ also edit an article in the edition of the other language.
+
+ If an editor of Spanish is also likely to edit Galician, we'll
+ call those languages connected.}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Wikipedia as a source of data: Ronen et al.}
+
+ \includegraphics[width=\textwidth]{figures/ronen_people.png}
+
+ \note{\begin{itemize}
+ \item The number of people per language (born 1800–1950) with
+ articles in at least 26 Wikipedia language editions as a
+ function of their language’s eigenvector centrality.
+ \item The bottom row shows the number of people per language (born
+ 1800–1950) listed in \emph{Human Accomplishment} (a book by
+ Charles Murray) as a function of their language’s eigenvector
+ centrality.
+ \end{itemize}}
+\end{frame}
+
+
+\subsection{Community and Organization}
+
+\begin{frame}
+ \centertext{6em}{Community and Organization}
+
+ \note{Mako}
+\end{frame}
+
+\begin{frame}
+
+ \frametitle{Community and organization}
+
+ \larger \larger Warncke-Wang, M., Ranjan, V., Terveen, L., \& Hecht,
+ B. (2015). \e{Misalignment Between Supply and Demand of Quality Content
+ in Peer Production Communities}. In Ninth International AAAI
+ Conference on Web and Social Media (ICWSM).
+
+ % Retrieved from \href{http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10591}{http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10591}
+
+\end{frame}
+
+\begin{frame}
+ \frametitle{Community and organization: Warncke-Wang et al.}
+
+ \larger \larger
+ \e{Perfect Alignment Hypothesis (PAH)}: There is an exact match
+ between the supply of high-quality content and the demand for it.
+
+ \bigskip
+
+ \includegraphics[width=\textwidth]{figures/warncke-english_confusion.pdf}
+
+ \note{\e{Quality}: Stub, Start, C, B, Good Article, A, Featured Article
+
+ \e{Popularity}: equivalently sized buckets}
+\end{frame}
+
+\begin{frame}
+ \frametitle{Community and organization: Warncke-Wang et al.}
+
+ Measure of the degree of misalignment can be used to build lists of
+ categories that are relatively \e{``overproduced''} and
+ \e{``underproduced''}:
+
+ \bigskip
+
+ \includegraphics[width=\textwidth]{figures/warncke-english_overunder.pdf}
+\end{frame}
+
+\subsection{Content Quality}
+
+\begin{frame}
+ \centertext{6em}{Content quality}
+
+ \note{Tilman
+
+ A decade after the landmark "Nature" study, there still aren't too
+ many systematic evaluations of the accuracy of Wikipedia's content.
+ Health articles continue to receive scrutiny, though. With good
+ reason: Wikipedia is "the most frequently consulted online health
+ care resource globally" [NEJM article].}
+\end{frame}
+
+\begin{frame}
+
+\frametitle{Quality of drug articles}
+
+ \larger \larger
+ Hwang et al., ``\e{Drug Safety in the Digital Age}.''
+ N Engl J Med 2014; 370:2460-2462 June 26, 2014
+ \href{http://dx.doi.org/10.1056/NEJMp1401767}{doi: 10.1056/NEJMp1401767}.
+ \bigskip
+
+ Kräenbring et al., \e{Accuracy and completeness of drug
+ information in Wikipedia: a comparison with standard textbooks of
+ pharmacology}. PLoS One 9 (9): e106930.
+ \href{http://dx.doi.org/10.1371/journal.pone.0106930}
+ {doi:10.1371/journal.pone.0106930}
+
+
+ \note{Tilman
+
+ We selected two papers that evaluated drug articles, with
+ different approaches. The first one is a short article in the
+ extremely prestigious NEJM.}
+\end{frame}
+
+\begin{frame}
+
+\frametitle{Quality of drug articles: NEJM}
+
+ \includegraphics[width=0.49\textwidth]{figures/Pradaxa_tweet_FDAMedWach.png}
+ % from https://twitter.com/FDAMedWatch/status/281547908095041536
+ % = first one in the list at http://www.nejm.org/doi/suppl/10.1056/NEJMp1401767/suppl_file/nejmp1401767_appendix.pdf
+ \includegraphics[width=0.49\textwidth]{figures/Dabitragan_Contraindications_WP_FDA_warning}
+
+ \tikz{\node [yshift=1.5cm,xshift=-0.4cm] at (current page.center) {\includegraphics[width=1.5cm]{figures/long-arrow-right.png}};}
+ \begin{itemize}
+ \larger \larger
+ \item The US Food and Drug Administration (\e{FDA}) frequently
+ issues safety warnings about prescription drugs. How long does it
+ take until these are reflected on English Wikipedia?
+ \item 41\% updated within two weeks (58\% for high-prevalent
+ diseases), but 36\% still unchanged after more than a year.
+ \end{itemize}
+
+ \note{Tilman
+
+ Articles about drugs used to treat high-prevalent diseases (affecting
+ > 1 m Americans / year) were updated faster.\\
+ But the result still caused concern.\\
+ Authors find "there may be a benefit to enabling the FDA to update or
+ automatically feed new safety communications to Wikipedia pages, as
+ it does with WebMD". The paper raised awareness among WikiProject
+ Medicine editors, but there's no systematic updating mechanism yet.}
+
+
+\end{frame}
+
+\begin{frame}
+
+\frametitle{Quality of drug articles: PLoS One}
+
+ \begin{itemize}
+ \larger \larger \larger
+ \item Selected 100 drugs from German undergrad curriculum in pharmacology
+ \item Extracted information from two standard textbooks
+ \item "Accuracy of drug information in [German] Wikipedia was 99.7\%±0.2\% when compared to the textbook data." Similar results for English Wikipedia
+ \end{itemize}
+
+\end{frame}
+
+
+\begin{frame}
+
+\frametitle{Quality of drug articles: PLoS One}
+
+ \begin{itemize}
+ \larger \larger \larger
+ \item Completeness (as compared to the textbooks):
+ \begin{itemize} \larger \larger
+ \item 83.8\% (of 224 statements) for German WP
+ \item 87.2\% for English WP
+ \end{itemize}
+ \item Completeness of contraindications information was 100\% in the En WP sample.
+ \item English WP cited academic publications more often than German WP.
+ \item Quality "significantly improved" in drug articles assessed
+ in a 2010 study.
+ \end{itemize}
+
+ \note{Tilman
+
+ The majority of the missing information (62.5\%) on German WP
+ was judged non-relevant for undergrad students.
+
+ The result on completeness of contraindications information is
+ somewhat in contrast with the NEJM study. Then again, the
+ textbooks were probably not perfectly up-to-date either.}
+\end{frame}
+
+
+
+\begin{frame}
+ \centertext{6em}{Automation in Wikipedia}
+
+ \note{Tilman
+
+ Starting to see more practical applications of AI methods to editing.
+
+ Bots have been writing Wikipedia articles ever since back in 2002,
+ User:Rambot covered US municipalities from US census data.
+
+ Picked these two related papers for their somewhat unusual approach}
+\end{frame}
+
+
+\begin{frame}
+ \frametitle{Automation in Wikipedia}
+
+ \larger \larger
+ Banerjee et al., \e{Playscript Classification and Automatic Wikipedia
+ Play Articles Generation}.
+ 2014 22nd International Conference on Pattern Recognition (ICPR).
+ pp. 3630–3635.
+ \href{http://dx.doi.org/10.1109/ICPR.2014.624}
+ {DOI:10.1109/ICPR.2014.624}
+ \href{http://www.cse.unt.edu/~ccaragea/papers/icpr14.pdf}{Author's copy}
+
+\end{frame}
+
+
+\begin{frame}
+
+\frametitle{Automation in Wikipedia: Bot-written theatre play articles}
+
+ \begin{itemize}
+ \larger \larger \larger
+ \item Bot searches for playscripts and related documents on the web
+ \bigskip
+ \item Extract key information from them, e.g.
+ \begin{itemize} \larger
+ \item The play's main characters
+ \item Relevant sentences from online synopses of the play
+ \item Mentions in Google Books and Google News (as evidence that
+ the play satisfies Wikipedia's notability criteria)
+
+ \end{itemize}
+
+ \item Some heuristics to exclude non-encyclopedic sentences, e.g.
+ first person statements
+
+ \end{itemize}
+
+ \note{Tilman
+
+ NB: Most article creation bots work from well-defined databases
+ (e.g. species, census data, geographical databases).
+
+ This bots finds article topics and online references itself,
+ using an elaborate classifier algorithm to distinguish scripts
+ from non-scripts.}
+\end{frame}