--- /dev/null
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+
+<article id="paper-11194">
+ <articleinfo>
+ <title>To Fork or Not To Fork</title>
+ <subtitle>Lessons From Ubuntu and Debian</subtitle>
+ <author>
+ <firstname>Benjamin</firstname>
+ <othername>Mako</othername>
+ <surname>Hill</surname>
+ <affiliation>
+ <orgname>Canonical Limited</orgname>
+ </affiliation>
+ <affiliation>
+ <orgname>The Debian GNU/Linux Project</orgname>
+ </affiliation>
+ <affiliation>
+ <orgname>Software in the Public Interest, Inc.</orgname>
+ </affiliation>
+
+ <authorblurb>
+ <para>Benjamin Mako Hill is an intellectual property
+ researcher and activist and a professional Free/Open Source
+ Software (FOSS) advocate, developer, and consultant. He is
+ active participant in the Debian Project in both technical
+ and non-technical roles and a founder of Debian-Nonprofit
+ and other Free Software projects. He is the author of the
+ Free Software Project Management HOWTO and many published
+ works on Free and Open Source Software. He currently is
+ working full time for Canonical Ltd. on Ubuntu, a new
+ Debian-based distribution.</para>
+ </authorblurb>
+
+ </author>
+
+ <copyright>
+ <year>2005</year>
+ <holder>Benjamin Mako Hill</holder>
+ </copyright>
+ </articleinfo>
+
+ <abstract>
+ <para>This is where my abstract should go.</para>
+ </abstract>
+
+ <section>
+ <title>Introduction</title>
+
+ <para>The explosive growth of free and open source software over
+ the last decade has been mirrored by an equally explosive growth
+ in the ambitiousness of free software projects in choosing and
+ tackling problems. The free software movement approaches these
+ large problems with more code and with more expansive
+ communities than was even thinkable a decade ago. Example of
+ these massive projects include desktop environments — like
+ GNOME and KDE — and distributions like Debian.</para>
+
+ <para>These projects are leveraging the work of thousands of
+ programmers — both volunteer and paid — and are producing
+ millions of lines of code. Their software is being used by
+ millions of users with a diverse set of needs. This paper
+ focuses on two major effects of this situation:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>The communities that free software projects — and in
+ particular large projects — serve are increasingly diverse.
+ It is becoming increasingly difficult for a single large
+ project to release any single product that can cater to all
+ of its potential users.</para>
+ </listitem>
+ <listitem>
+ <para>It's becoming increasingly difficult to reproduce these
+ large projects. While reproducing entire project is
+ impossible for small groups of hackers, it is often not
+ substantially easier for small groups to even track and
+ maintain a fork of a large project over time.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>Taken together, these facts imply an increasingly realized
+ free software community in which programmers frequently derive
+ but where traditional forking is often untenable. "Forks," as
+ they are traditionally defined, will be improved upon.
+ Communities around large free software projects will be smarter
+ about the process of deriviation than they have been in the
+ past.</para>
+
+ <para>We are already seeing this with GNU/Linux distributions. New
+ distributions are rarely built from scratch today. Instead, they
+ adapted from and built on top of the work of existing projects.
+ As projects and userbases grow, these derived distributions are
+ increasingly common. Most of what I describe in this essay are
+ tools and experiences of derived distributions.</para>
+
+ <para>Software makers must pursue the idea of an
+ <emphasis>ecosystem</emphasis> of free software projects and
+ products that have forked but that maintain a close relationship
+ as they develop parallelly and symbiotically. To do this,
+ developers should:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>Break down the process of derivation into a set of
+ different types of customization and derivation and
+ prioritize methods of derivation.</para>
+ </listitem>
+ <listitem>
+ <para>Create and foster social solutions to the social aspects
+ of the derivation problem.</para>
+ </listitem>
+ <listitem>
+ <para>Build and use new tools specifically designed to
+ coordinate development of software in the context of an
+ ecosystem of projects.</para>
+ </listitem>
+ <listitem>
+ <para>Distribute and utilize distributed version control tools
+ with an emphasis on maintaining differences over
+ time.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>This paper is an early analysis of this set of problems. As
+ such, it is highly focused on the experience of the Ubuntu
+ project and it's existence as a derived Debian distribution. It
+ also pulls from my experience with Debian-NP and the Custom
+ Debian Distribution (CDD) community. Since I am active member of
+ both the Ubuntu and Debian-NP projects, these are areas that I
+ can discuss with some degree of knowlege and experience.</para>
+ </section>
+
+ <section>
+ <title>"Fork" Is A Four Letter Word</title>
+
+ <para>The act of taking the code for a free software project and
+ bifurcating it to create a new project is called "forking."
+ There have been a number of famous forks in free software
+ history. One of the most famous was the schism that led to the
+ parallel development of two versions of the Emacs text editor:
+ GNU Emacs and XEmacs. This schism persists to this day.</para>
+
+ <para>Some forks, like Emacs and XEmacs, are permanent. Others are
+ relatively sort lived. An example of this is the GCC project
+ which saw two forks — EGCS and PGCC — that both eventually
+ merged back into GCC. Forking can happen for any number of
+ reasons. Often developers on a project develop political or
+ personal differences that keep them from continuing to work
+ together. In some cases, maintainers become unresponsive and
+ other developers on the project fork the project to keep the
+ project alive in some form.</para>
+
+ <para>Ultimately though, most forks occur because people do not
+ agree on the features, the mechanisms, or the technology at the
+ core of a project. People have different goals, different
+ problems, and want different tools. Often, these goals, problems
+ and tools are similar up until a certain point before the need
+ to part ways becomes essential.</para>
+
+ <para>A fork occurs on the level of code but a fork is not merely
+ — or even primarily — technical. Many projects create
+ "branches." Branches are alternative version of a piece of
+ software used to experiment with intrusive or unstable features
+ and bug fixes. Forks are distinguished from branches both in
+ that they are often more significant departures from a technical
+ perspective (i.e., more lines of code have been changed and/or
+ the changes are more invasive or represent a more fundamental
+ rethinking of the problem) and in that they are bifurcations
+ defined in social terms. Branches involve a
+ <emphasis>single</emphasis> developer or community of developers
+ — even if it does boil down to distinct subgroups within a
+ community — whereas forks are separate projects.</para>
+
+ <para>Forking has historically been viewed as a bad thing in free
+ software communities: they are seen to stem from people's
+ inability to work together and have ended in reproduction of
+ work. When I published the first version of the <ulink
+ url="http://mako.cc/projects/howto/">Free Software Project
+ Management HOWTO</ulink> more than four years ago, I included
+ a small subsection on forking which described forking to
+ prospective free software project leaders with this text:</para>
+
+ <blockquote>
+ <para>The short version of the fork section is, don't do them.
+ Forks force developers to choose one project to work with,
+ cause nasty political divisions, and redundancy of
+ work.</para>
+ </blockquote>
+
+ <para>In the <emphasis>best</emphasis> situations, a fork means
+ that two groups of people need to go on developing features and
+ doing work they would ordinarily do <emphasis>in addition
+ to</emphasis> tracking the forked project and having to
+ hand-select and apply features and fixes to their own code-base.
+ This level of monitoring and constant comparison can be
+ extremely difficult and time-consuming. The situation is not
+ helped substantially by traditional source control tools like
+ diff, patch, CVS and Subversion which are not optimized for this
+ task. The worse (and much more common) situation occurs when two
+ groups go about their work ignorant or partially ignorant of the
+ work done on the other side of the fork. Important features and
+ fixes are implemented twice — differently and
+ incompatibly.</para>
+
+ <para>The most substantial bright side to these drawbacks is that
+ the problems associated with forking are so severe and notorious
+ that, in most cases, the threat of a fork is enough to force
+ maintainers to work out solutions that keep the fork from
+ happening in the first place.</para>
+
+ <para>Before moving on, it is worth pointing out that fork is
+ something of a contested term. Because definitions of forks
+ involve, to one degree or another, statements about the
+ political, organization, and technical distinctions between
+ projects, bifurcations that many people call branches or
+ parallel trees are described as others as forks. Recently,
+ fueled by the advent of distributed version control systems, the
+ definition of what is and is not a fork has becoming
+ increasingly unclear. In part due to the same systems, the
+ benefits and drawbacks of what is increasingly problematically
+ called forking is equally debatable.</para>
+
+ </section>
+
+ <section>
+ <title>Case Study</title>
+
+ <para>In my introduction, I described how the growing scope of
+ free software projects and the rapidly increasingly size and
+ diversity of project's user communities is spearheading the need
+ for new type of derivation that avoids, as best as possible, the
+ drawbacks of forking. Nowhere is this more evident than in the
+ largest projects with the broadest scope: a small group of
+ projects that includes operating system distributions.</para>
+
+
+ <section>
+ <title>The Debian Project</title>
+
+ <para>The Debian project is a the largest, in terms of both code
+ and volunteers, free software distribution. It is the also,
+ arguably, the largest free software project in terms of the
+ number of volunteers. Debian includes more than 15,000
+ packages and the work of well over 1,000 official volunteers
+ and many more contributors without official membership status.
+ Projects without Debian's massive volunteer base cannot
+ replicate what Debian has accomplished; they can rarely hope
+ to even maintain what Debian currently has separately.</para>
+
+ <para>At the time that this paper was written, Distrowatch lists
+ 129 distributions based on Debian<footnote>
+ <para>Information is listed on the distrowatch homepage
+ here: <ulink
+ url="http://distrowatch.com/dwres.php?resource=independence">http://distrowatch.com/dwres.php?resource=independence</ulink></para>
+
+ </footnote> — most of them currently active to one degree or
+ another. Each distribution represents at least one person —
+ and in most cases a community of people — who disagreed with
+ Debian's vision or direction strongly enough to want to create
+ a new distribution <emphasis>and</emphasis> who had the
+ technical capacity to follow through with this goal. Despite
+ Debian's long-standing slogan — "the universal operating
+ system" — the fact that the Debian project has become the
+ fastest growing operating system while spawning so many
+ derivatives is testament to the fact that, as far as software
+ is concerned, one size does <emphasis>not</emphasis> fit
+ all.<footnote>
+ <para>Netcraft posts yearly updates on the speed at which
+ Linux distributions are growing. The one in question can
+ be found at <ulink
+ url="http://news.netcraft.com/archives/2004/01/28/debian_fastest_growing_linux_distribution.html">http://news.netcraft.com/archives/2004/01/28/debian_fastest_growing_linux_distribution.html</ulink></para>
+ </footnote>
+ </para>
+
+
+ <para>Organizationally, Debian derivers are located both inside
+ and outside of the Debian project. A group of derivers working
+ within the Debian project has labeled themselves "Custom
+ Debian Distributions" and has created nearly a dozen projects
+ customizing and deriving from Debian for specific groups of
+ users including non-profit organization, the medical
+ community, lawyers, children and many others.<footnote>
+ <para>I have spearheaded and built a derivation of Debian
+ called Debian-Nonprofit (Debian-NP) geared for non-profit
+ organizations working within the Debian project.</para>
+ </footnote> These projects build on the core distribution and
+ the canonical archive from <emphasis>within</emphasis> the
+ organizational and political limits of the Debian project and
+ constantly seek to minimize the delta by focusing on less
+ invasive changes and by advancing creative ways of building
+ the <emphasis>ability</emphasis> to make changes in the core
+ Debian code base through established and policy compliant
+ procedures.</para>
+
+<!-- http://linktocddinformation -->
+
+ <para>A second group of Debian customizers includes those
+ working outside of the Debian project organizationally.
+ Notable among this list are (in alphabetical order) Knoppix,
+ Libranet, Linspire (formerly Lindows), Progeny, MEPIS, Ubuntu,
+ Userlinux, and Xandros. With its strong technological base,
+ excellent package management, wide selection of packages to
+ choose from, and strong commitment to software freedom which
+ ensures derivability, Debian provides an ideal point from
+ which to create a GNU/Linux distribution.</para>
+
+ </section>
+
+
+ <section>
+ <title>Ubuntu</title>
+
+ <para>The Ubuntu project was started by Mark Shuttleworth in
+ April 2004 and the first version was executed almost entirely
+ by a small group of a Debian developers by Shuttleworth's
+ company Canonical Limited.<footnote>
+ <para>Information Ubuntu can be found on the <ulink
+ url="http://www.ubuntu.com">Ubuntu homepage.</ulink>
+ Information Canonical Limited can be found at <ulink
+ url="http://www.canonical.com">Canonical's
+ homepage</ulink>.</para>
+ </footnote> It was released to the world in the fall of 2004.
+ The second version was released six months later in April
+ 2005. The goals of Ubuntu are to provide a distribution based
+ on a subset of Debian with:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>Regular and predictable releases — every six months
+ with support for eighteen months.</para>
+ </listitem>
+ <listitem>
+ <para>An emphasis on free software that will maintain the
+ derivability of the distribution.</para>
+ </listitem>
+ <listitem>
+ <para>An emphasis on usability and a consistent desktop
+ vision. As an example, this has translated into less
+ questions in the installer and a default selection and
+ configuration of packages that is usable for most desktop
+ users "out of the box."</para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>The Ubuntu project provides an interesting example of a
+ project that aims to derive from Debian to an extensive
+ degree. Ubuntu made code-level changes to nearly 1300 packages
+ in Debian at the time that this paper was written and the
+ speed of changes will only accelerate with time; the total
+ number of changes and the total size of the delta will
+ grow.<footnote>
+ <para>Scott James Remnant maintains a list of these patches
+ online here: <ulink
+ url="http://people.ubuntu.com/~scott/patches/">http://people.ubuntu.com/~scott/patches/</ulink></para>
+ </footnote> The changes that Ubuntu makes are primarily of the
+ most intrusive kind — changes to the code itself.</para>
+
+ <para>That said, the Ubuntu project is explicit about the fact
+ that it could not exist with the work done by the Debian
+ project before Ubuntu was created.<footnote>
+ <para>You can see that explicit statement on Ubunut's
+ website here: <ulink
+ url="http://www.ubuntulinux.org/ubuntu/relationship/">http://www.ubuntulinux.org/ubuntu/relationship/</ulink></para>
+ </footnote> More importantly, Ubutnu explains that it cannot
+ continue to provide the complete set of packages that its
+ users depend on without the ongoing work by the Debian
+ project. Even though Ubuntu has made changes to the nearly
+ 1300 packages, this is less than ten percent of the total
+ packages shipped in Ubuntu and pulled from Debian.</para>
+
+ <para>Scott James Remnant, a prominent Debian developer and a
+ hacker on Ubuntu who works for Canonical Ltd., described the
+ situation this way on his web log to introduce the Ubuntu
+ development methodology in the week after first public
+ announcement of Canonical and Ubuntu:<footnote>
+ <para>The entire post can be read here: <ulink
+ url="http://www.netsplit.com/blog/work/canonical/ubuntu_and_debian.html">http://www.netsplit.com/blog/work/canonical/ubuntu_and_debian.html</ulink></para>
+ </footnote>
+ </para>
+
+ <blockquote>
+
+ <para>I don't think Ubuntu is a "fork" of Debian, at least not
+ in the traditional sense. A fork suggests that at some
+ point we go our separate way from Debian and then
+ occasionally merge in changes as we carry on down our own
+ path.</para>
+
+ <para>Our model is quite different; every six months we take a
+ snapshot of Debian's unstable distribution, apply any
+ outstanding patches from our last release to it and spend a
+ couple of months testing and bug-fixing it.</para>
+
+
+ <para>
+ <inlinemediaobject>
+ <imageobject>
+ <imagedata fileref="picture-11194-01.png" format="PNG"/>
+ </imageobject>
+ </inlinemediaobject>
+ </para>
+
+ <para>One thing that should be obvious from this is our job is
+ a lot easier if Debian take all of our changes, the model
+ actually encourages us to give back to Debian.</para>
+
+ <para>That's why from the very first day we started fixing
+ bugs we began sending <ulink
+ url="http://www.no-name-yet.com/patches/">the
+ patches</ulink>_ back to Debian through the BTS. Not only
+ will it make our job so much easier when we come to freeze
+ for "hoary", our next release, but it's exactly what every
+ derivative should do in the first place.</para>
+
+ </blockquote>
+
+ <para>There is some debate on the degree to which Ubuntu
+ developers have succeeded in accomplishing the goals laid out
+ by Remnant. Ubuntu has filed hundreds of patches in the bug
+ tracking system although it has often run into problems in
+ deciding <emphasis>what</emphasis> constitutes something that
+ should be fed back to Debian. Many changes are simply not
+ relevant to upstream Debian developers. For example, they may
+ include changes to a package in response to another change
+ made in another package in Ubuntu that will not or has not
+ been taken by Debian.</para>
+
+ <para>The Ubuntu project's track record in working
+ constructively with Debian is, at the moment, decidedly mixed.
+ While an increasingly large number of Debian developers are
+ maintaining their packages actively within both projects, many
+ in both Debian and Ubuntu feel that Ubuntu has work left to do
+ in living up to its own goals of a smooth productive
+ relationship with Debian.</para>
+
+ </section>
+
+ <section>
+ <title>Applicability</title>
+
+ <para>Ubuntu and Debian are distributions and — as such —
+ operate on a different scale than the vast majority of free
+ software projects. Using a very simple metric, they include
+ more code and more people. As a result, there are questions as
+ to whether the experiences and lessons learned from these
+ projects are particularly applicable to the experience of
+ smaller free software projects.</para>
+
+ <para>Clearly, because of the difficulties associated with
+ forking massive amount of code and the problems associated
+ with duplicating the work of large volunteer bases,
+ distributions are forced into finding a way to balance the
+ benefits and drawbacks of forking. However, while the need is
+ stronger and more immediate in larger projects, the benefits
+ of their solutions will often be fully transferable.</para>
+
+ <para>Clearly, modifiability of free software to better fit the
+ needs of its users lies at the heart of the free software
+ movement's success. However, while modification usually comes
+ in the form of collaboration on a single code-base, this is
+ function of limitations in software development methodologies
+ and tools rather than the best response to the needs or
+ desires of users or developers.</para>
+
+ <para>I believe that the fundamental advantage of free software
+ in the next decade will be in the growing ability of any
+ single free software project to be multiple things to multiple
+ users simultaneously. This will translate into the fact that,
+ in the next ten years, technology and social processes will
+ evolve so that forking is increasingly less of a bad thing.
+ Free software development methodology will become less
+ dependent on a single project and begin to emphasize parallel
+ development within an ecosystem of software development
+ working on related projects. The result is that free software
+ projects will gain a competitive advantage over propriety
+ software projecrts through their ability to better serve the
+ increasingly diverse needs of increasingly large and
+ increasingly diverse user-bases. More projects will derive and
+ less redundant code will be written.</para>
+
+ <para>Projects more limited in code and scope may use the tools
+ and methods in different combinations, in different ways, and
+ to different degrees than the examples around distributions
+ introduced here. Different projects with different needs will
+ find that certain solutions work better than others. Because
+ communities of the size of Debian are difficult to fork in a
+ way that is beneficial to any party, it is in these
+ communities that the technology and development methodologies
+ will are first emerging. With time, these strategies and tools
+ will find themselves employed productively in a wide variety
+ of projects with a broad spectrum of sizes, needs, scopes and
+ descriptions.</para>
+
+ </section>
+
+ </section>
+
+ <section>
+ <title>Balancing Forking With Collaboration</title>
+
+ <section>
+ <title>Derivation and Problem Analysis</title>
+
+ <para>The easiest step in creating a productive derivative
+ software project is to break down the problems of deriviations
+ into a series of different classes of modification. Certain
+ types of modification is more easily done and are
+ intrinsically more maintainable.</para>
+
+ <para>In the context of distributions, the problem of derivation
+ can be broken down into the following types of changes (sorted
+ roughly according to the intrusiveness inherent in solving the
+ problem and the severity of the long-term maintainability
+ problems that they introduce):</para>
+
+ <orderedlist>
+ <listitem>
+ <para>Selection of individual pieces of software;a</para>
+ </listitem>
+ <listitem>
+ <para>Changes to the way that packages are installed or run
+ (e.g., in a Live CD type environment or using a different
+ installer);</para>
+ </listitem>
+ <listitem>
+ <para>Configuration of different pieces of software;</para>
+ </listitem>
+ <listitem>
+ <para>Changes made to the actual software package (made on
+ the level of changes to the packages code);</para>
+ </listitem>
+ </orderedlist>
+
+ <para>By breaking down the problem in this way. Debian derivers
+ have been able to approach deriviation in ways that focus
+ energy on the less intrusive problems first.</para>
+
+ <para>The first area that Ubuntu focused on was selecting a
+ subset of package that Ubuntu would support. Ubuntu selected
+ and supports approximate 2,000 packages. These became the
+ <command>main</command> component in Ubuntu. Other packages in
+ Debian were included in a separate section of the Ubuntu
+ archive called <command>universe</command> but were not
+ guaranteed to be supported with bug or security fixes. By
+ focusing on a small subset of packages, the Ubuntu team was
+ able to select a maintainable subsection of the Debian archive
+ that they could maintain over time.</para>
+
+ <para>The most simple derived distributions — often working
+ within the Debian project as CDDs but also including projects
+ like Userlinux — are merely lists of packages and do nothing
+ outside of package selection. The installation of lists of
+ packages and the maintenance of those lists over time can be
+ aided through the creation of what are called "metapackages:"
+ empty packages that are maintained over time.</para>
+
+ <para>The second item, configuration changes, are also
+ relatively low-impact. Focusing on moving as many changes as
+ possible into the realm of configuration changes is a
+ relatively low-impact strategy that derivers working within
+ the Debian project intent on a single code-base have pursued
+ actively. Their idea is that rather than forking a piece of
+ code due to disagreement in how the program should work, they
+ can leave the code intact but add the
+ <emphasis>ability</emphasis> to work in a different way. This
+ alternate functionality is made toggleable through a
+ configuration change of the distribution in much the same that
+ applications can be configured differently or shipped with
+ different configuration files. Since the Debian project has a
+ unified package configuration framework called Debconf,
+ derivers are able to configure an entire system in aa highly
+ centralized manner.<footnote>
+ <para>More information on Debconf can be found online at:
+ <ulink
+ url="http://www.kitenet.net/programs/debconf/">http://www.kitenet.net/programs/debconf/</ulink></para>
+ </footnote> This is not unlike RedHat's Kickstart although the
+ emphasis is on maintenance of those configuration changes over
+ the life and evolution of the package; Kickstart is focused
+ merely on installation of the package.</para>
+
+ <para>A third type of configuration is limited to changes in the
+ environment through which a system is run or installed. One is
+ example is Progeny's Anaconda-based Debian installer provides
+ which an alternate installer but installs an identical system.
+ Another example is the Knoppix project is famous for its Live
+ CD environments.<footnote>
+ <para>In reality, Knoppix makes a wide range of changes to a
+ default Debian installation that spam all items in my list
+ above.</para>
+ </footnote> Other Live CD projects, including Ubuntu's
+ Casper project, are purely a different
+ way of running the exact same code.</para>
+
+ <para>Because these three methods are relatively non-invasive,
+ they are reasonable strategies for small teams and individuals
+ working on creating a derived distribution. However, many
+ desirable changes — and in the case of some derived
+ distributions, most desirable changes — require more
+ invasive changes. The final and most invasive type of change
+ — changes to code — is the most difficult but also the most
+ promising and powerful if solved. Changes of this type involve
+ bifurcations of the code-base and will be the topic of the
+ remainder of this paper.</para>
+
+ </section>
+
+ <section>
+ <title>Distributed Source Control</title>
+
+ <para>One promising method of maintaining changes in forked or
+ branched problems lies in distributed version control systems
+ (VCS). Traditional VCS systems work in a highly centralized
+ fashion. CVS, the archetypal free software VCS and the basis
+ for many others, is based around the model of a single
+ centralized server. Anyone who wishes to commit to a project
+ must commit to the centralized repository. While CVS allows
+ users to create branches, anyone with commit rights has access
+ to the entire repository. The tools for branching and merging
+ over time are not particularly good.</para>
+
+ <para>The branching model is primarily geared toward a system
+ where development is bifurcated and then the branch is merged
+ completely back into the main tree. Normal use of a branch
+ might include creating a development branch, making a series
+ of development releases while maintaining and fixing important
+ bugs in the stable primary branch, and then ultimately
+ replacing the stable release with the development release. The
+ CVS model is <emphasis>not</emphasis> geared toward a system
+ where an arbitrary delta, or sets of deltas, is maintained
+ over time.</para>
+
+ <para>Distributed version control aims to solve a number of
+ problems introduced by CVS and alluded to above by:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>Allowing people to work disconnected from each other
+ and to sync with each other, in whole or in part, in an
+ arbitrary and ad-hoc fashion.</para>
+ </listitem>
+ <listitem>
+ <para>Allowing deltas to be maintained over time.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>Ultimately, this requires tools that are better at merging
+ changes and in <emphasis>not</emphasis> merging certain
+ changes when that is desirable. It also leads to tools capable
+ of history-sensitive merging.</para>
+
+ <para>The most famous switch to a distributed VCS model from a
+ centralized VCS model was the move by the Linux kernel
+ development community to the proprietary distributed version
+ control system BitKeeper. In his recent announcement of the
+ decision to part ways with BitKeeper, Linus Torvalds
+ said:</para>
+
+ <blockquote>
+ <para>In fact, one impact BK has had is to very fundamentally
+ make us (and me in particular) change how we do things. That
+ ranges from the fine-grained changeset tracking to just how
+ I ended up trusting sub-maintainers with much bigger things,
+ and not having to work on a patch-by-patch basis any
+ more.<footnote> <para>The full message can be read online
+ at: <ulink
+ url="http://kerneltrap.org/mailarchive/1/message/48393/thread">http://kerneltrap.org/mailarchive/1/message/48393/thread</ulink></para>
+ </footnote>
+ </para>
+ </blockquote>
+
+ <para>At the time of the switch, free distributed version
+ control tools were less advanced than they are today. At the
+ moment, an incomplete list of free software VCS tools includes
+ GNU Arch, Bazaar, Bazaar-NG, Darcs, Monotone, SVK (based on
+ Subversion), GIT (a system developed by Linus Torvlards as a
+ temporary replacement for BitKeeper) and others.</para>
+
+ <para>Each of these tools, at least after they reach a certain
+ level of maturity, allow or will allow its users to develop
+ software in a distributed fashion and to, over time, compare
+ their software and pull changes from others significantly more
+ easily than they could otherwise. The idea of parallel
+ development lies at the heart of the model, the tools for
+ merging and resolving conflicts over time, and the ability to
+ "cherry pick" certain patches or changes from a parallel
+ developer each make this type of development significantly
+ more useful than it has been in the past.</para>
+
+ <para>VCSs work entirely on the level of code. Due to the nature
+ of the types of changes that Ubuntu project is making to
+ Debian's code, Ubuntu has focused primarily on this model and
+ Canonical currently funds two major distributed control
+ products — the Bazaar and Bazaar-NG projects.</para>
+
+ <para>In many ways, employing distributed version control
+ effectively is a much easier problem to solve for small, more
+ traditional, free software development projects than it is for
+ GNU/Linux distributions. Because the problems with maintaining
+ parallel development of a single piece of software in a set of
+ related distributed repositories is primary use case for
+ distributed version control system, distributed VCS alone can
+ be a technical solution for certain types of parallel
+ development. As the tools and social processes for distributed
+ VCS evolve, they will become increasingly important tools in
+ the way that free software is developed.</para>
+
+ <para>Because the problems of scale associated with buildling an
+ entire derivative distribution are more complicated than those
+ associated with working with a single project, distributed
+ version control has not yet been widely deployed in the Ubuntu
+ project. Instead, the project is focusing on integrating these
+ into problem specific tools built on top of distributed
+ version control.</para>
+
+ </section>
+
+ <section>
+ <title>Problem Specific Tools</title>
+
+ <para>Another technique that Canonical Ltd. is experimenting
+ with is the creation of high level tools built on top of
+ distributed version control tools specifically designed for
+ maintaining difference between packages. Because packages are
+ usually distributed as a source file with a collection of one
+ or more patches, this introduces the unique possibility of
+ creating a limited high-level VCS system based on this
+ fact.</para>
+
+ <para>In the case of Ubuntu and Debian, he tool is creating one
+ branch per patch or feature and using heuristics to analyze
+ patch files and create these branches intelligently. The
+ package build system section of the total patch can also be
+ kept as a separate branch. Canonical's tool, called the
+ Hypothetical Changeset Tool (HCT) (although no longer
+ hypothetical), is one experimental way of creating a very
+ simple, very streamlined interface for dealing with a
+ particular type of source that is created and distributed in a
+ particular type of way with a particular type of
+ change.</para>
+
+ <para>While HCT promises to be very useful for people making
+ derived distributions based on Debian, its wider application
+ may be limited. That said, this provides an example of the way
+ that problem and context specific tools may play an essential
+ role in the maintenance of derived code more generally.</para>
+
+ </section>
+
+
+ <section>
+ <title>Social Solutions</title>
+
+ <para>It has been said that a common folly among technophiles is
+ based on the temptation to employ technical solutions toward
+ solving social problems. The problem of deriving software is
+ both a technical <emphasis>and</emphasis> a social problem and
+ adequately addressing the issue will require approaches that
+ take into consideration both type of solution.</para>
+
+ <para>Scott James Remnant compares the relationship between
+ distributions and derived distributions as not unlike the
+ relationship between distributions and upstream
+ maintainers:</para>
+ <blockquote>
+
+ <para>I don't think this is much different from how Debian
+ maintainers interact with their upstreams. As Debian
+ maintainers we take and package upstream software and then
+ act as a gateway for bugs and problems. Quite often we fix
+ bugs ourselves and apply the patch to the package and send
+ it upstream. Sometimes the upstream don't incorporate that
+ patch and we have to make sure we don't accidentally drop it
+ each subsequent release, we much prefer it if they take
+ them, but we don't get angry if they don't.</para>
+
+ <para>This is how I see the relationship between Ubuntu and
+ Debian, we're no more a fork of Debian than a Debian package
+ is a fork of its upstream.</para>
+ </blockquote>
+
+ <para>Scott alludes the fact that, at least in the world of
+ distributions, parallel development is already one way to view
+ the <emphasis>modus operandi</emphasis> of existing GNU/Linux
+ distributions. The relationship between a deriver and derivee
+ on the distribution level mirrors the relationship between the
+ distribution and the "upstream" authors of the packages that
+ make up the distribution. These relationships are rarely based
+ around technological tools but are entirely in the realm of
+ social solutions.</para>
+
+ <para>Ubuntu has pursued a number of different initiatives along
+ these lines. The first of these has been to regularly file
+ bugs in the Debian bug tracking system when bugs are fixed
+ that exist in Debian are fixed in Ubuntu. While this can be
+ partially automated, the choice to automate this is a purely
+ social one.</para>
+
+ <para>Ubuntu is still left with questions in regards to changes
+ that are made to packages that do not necessarily fix bugs or
+ that fix bugs that do not exist in Debian but may in the
+ future. Some Debian developers want to hear about the full
+ extent of changes made to their software in Ubuntu while
+ others do not want to be bothered. Ubuntu should continue to
+ work with Debian to find ways to allow developers to stay in
+ sync.</para>
+
+ <para>There is a recent initiative by some developers in Debian,
+ largely led by myself, to create a stronger relationship
+ between the Debian project and its ecosystem of derivers.
+ While the form that this will ultimately take is unclear,
+ projects existing within an ecosystem should explore the realm
+ of appropriate social relationships that will ensure that they
+ can work together and be informed of each others' work without
+ resorting to "spamming" each other with irrelevant or
+ unnecessary information.</para>
+
+ <para>Another issue that has recently played an important role
+ in the Debian/Ubuntu relationship is the importance of both
+ giving adequate credit to the authors or upstream maintainers
+ of software without implying a closer relationship than is the
+ case. Derivers must walk a file line where they credit others'
+ work on a project without implying that the others works for,
+ support, or are connected to the derivers project which, for
+ any number of reasons, the original author might not want to
+ be associated with.</para>
+
+ <para>In the case of Debian and Ubuntu, this has resulted in an
+ emphasis on keeping or importing changelog entries when
+ changes are imported and in noting the pedigree of changes
+ more generally. It has recently also been discussed in terms
+ of the "maintainer" field in each package in Ubuntu. Ubuntu
+ wants to avoid making changes to every unmodified source
+ package (and introducing an unnecessary delta) but does not
+ want to give the impression that the maintainer of the package
+ is someone unassociated with Ubuntu. While no solution has
+ been decided at the time of writing, one idea involved marking
+ the maintainer of the package explicitly as a Debian
+ maintainer at the time that the binary packages are built on
+ the Ubuntu build machines.</para>
+
+ <para>The emphasis on social solutions is also essential when
+ using distributed VCS technology. As Linus Torvalds alluded to
+ in the quote above, the importance of technological changes to
+ distributed VCS technology is only felt when people begin to
+ work in a different way — when they begin to employ differnet
+ social models of developer interaction.</para>
+
+ <para>While Ubuntu's experience can provide a good model for
+ tackling some of these source control issues, it can only
+ serve as a model and not as a fixed answer. Social solutions
+ must be appropriate for a given social relationship. Even in
+ situations where a package is branched because of social
+ incompatibility, a certain level of collaboration on a social
+ level will be essential to the long term viability of the
+ derivation.</para>
+
+ </section>
+
+ </section>
+
+ <section>
+ <title>Conclusions</title>
+
+ <para>As the techniques described in this paper evolve, the role
+ that they play in free software development becomes increasingly
+ prominent and increasingly important. Joining them will be other
+ techniques and models that I have not seen and cannot predict.
+ Because of the size and usefulness of their code and the size of
+ their development communities, large projects like Debian and
+ Ubuntu have been forced into confronting and attempting to
+ mediate the problems inherent in forking and deriving. However,
+ as these problems are negotiated and tools and processes are
+ advanced toward solutions, free software projects of all sizes
+ will be able to offer users exactly what they want with minimal
+ redundancy and little duplication of work. In doing this, free
+ software will harness a power that proprietary models cannot
+ compete with. They will increase their capacity to produce
+ better products and better processes. Ulimately, it will help
+ free software capture more users, bring in more developers, and
+ produce more free software of a higher quality.</para>
+
+ </section>
+
+</article>
+
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-omittag:t
+sgml-shorttag:t
+sgml-namecase-general:t
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+sgml-indent-step: 2
+sgml-indent-data: 2
+sgml-set-face: t
+End:
+-->
\ No newline at end of file