+++ /dev/null
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
- "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-
-<article id="paper-11194">
- <articleinfo>
- <title>To Fork or Not To Fork</title>
- <subtitle>Lessons From Ubuntu and Debian</subtitle>
- <author>
- <firstname>Benjamin</firstname>
- <othername>Mako</othername>
- <surname>Hill</surname>
- <affiliation>
- <orgname>Canonical Limited</orgname>
- </affiliation>
- <affiliation>
- <orgname>The Debian GNU/Linux Project</orgname>
- </affiliation>
- <affiliation>
- <orgname>Software in the Public Interest, Inc.</orgname>
- </affiliation>
-
- <authorblurb>
- <para>Benjamin Mako Hill is an intellectual property
- researcher and activist and a professional Free/Open Source
- Software (FOSS) advocate and developer. He is active
- participant in the Debian Project in both technical and
- non-technical roles. He is the author of the Free Software
- Project Management HOWTO and many published works on Free
- and Open Source Software. He currently is working full time
- for Canonical Ltd. on Ubuntu, a new Debian-based
- distribution.</para>
- </authorblurb>
-
- </author>
-
- <copyright>
- <year>2005</year>
- <holder>Benjamin Mako Hill</holder>
- </copyright>
-
-
- <legalnotice>
- <para>This material is licensed under the <ulink
- url="http://creativecommons.org/licenses/by-sa/2.0/">Creative
- Commons Attribution-Sharealike 2.0 License</ulink>.</para>
-
- <para>The canonical location for the most recent version of this
- document is <ulink url="http://mako.cc/">at the author's
- website</ulink>.</para>
-
- </legalnotice>
-
- <revhistory>
- <revision>
- <revnumber>0.2</revnumber>
- <date>August 7, 2005</date>
- <revremark>Correction and improvements.</revremark>
- </revision>
- <revision>
- <revnumber>0.1</revnumber>
- <date>May 15, 2005</date>
-
- <revdescription>
- <para>The first version of this paper was written to an
- accepted talk given at Linuxtag 2005 given in Karlsruhe,
- Germany.</para>
- </revdescription>
-
- </revision>
- </revhistory>
-
-
- </articleinfo>
-
- <section>
- <title>Introduction</title>
-
- <para>The explosive growth of free and open source software over
- the last decade has been mirrored by an equally explosive growth
- in the ambitiousness of free software projects in choosing and
- tackling problems. The free software movement approaches these
- large problems with more code and with more expansive
- communities than was thinkable a decade ago. Example of these
- massive projects include desktop environments — like GNOME
- and KDE — and distributions like Debian, RedHat, and
- Gentoo.</para>
-
- <para>These projects are leveraging the work of thousands of
- programmers — both volunteer and paid — and are
- producing millions of lines of code. Their software is being
- used by millions of users with diverse sets of needs. This
- paper focuses on two major effects of this situation:</para>
-
- <itemizedlist>
- <listitem>
-
- <para>The communities that free software projects — and
- in particular large projects — serve are increasingly
- diverse. It is becoming increasingly difficult for a single
- large project to release any single product that can cater
- to all of its potential users.</para>
-
- </listitem>
- <listitem>
-
- <para>It's becoming increasingly difficult to reproduce these
- large projects. While reproducing entire project is
- impossible for small groups of hackers, it is often not even
- possible for small groups to even track and maintain a fork
- of a large project over time.</para>
-
- </listitem>
- </itemizedlist>
-
- <para>Taken together, these facts imply an increasingly realized
- free software community in which programmers frequently derive
- but where traditional forking is often untenable. "Forks," as
- they are traditionally defined, must be improved upon.
- Communities around large free software projects must be smarter
- about the process of derivation than they have been in the
- past.</para>
-
- <para>We are already seeing this with GNU/Linux distributions. New
- distributions are rarely built from scratch today. Instead, they
- adapted from and built on top of the work of existing projects.
- As projects and user-bases grow, these derived distributions are
- increasingly common. Most of what I describe in this essay are
- tools and experiences of derived distributions.</para>
-
- <para>Software makers must pursue the idea of an
- <emphasis>ecosystem</emphasis> of free software projects and
- products that have forked but that maintain a close relationship
- as they develop parallelly and symbiotically. To do this,
- developers should:</para>
-
- <itemizedlist>
- <listitem>
- <para>Break down the process of derivation into a set of
- different types of customization and derivation and
- prioritize methods of derivation.</para>
- </listitem>
- <listitem>
- <para>Create and foster social solutions to the social aspects
- of the derivation problem.</para>
- </listitem>
- <listitem>
- <para>Build and use new tools specifically designed to
- coordinate development of software in the context of an
- ecosystem of projects.</para>
- </listitem>
- <listitem>
- <para>Distribute and utilize distributed version control tools
- with an emphasis on maintaining differences over
- time.</para>
- </listitem>
- </itemizedlist>
-
- <para>This paper is an early analysis of this set of problems. As
- such, it is highly focused on the experience of the Ubuntu
- project and its existence as a derived Debian distribution. It
- also pulls from my experience with Debian-NP and the Custom
- Debian Distribution (CDD) community. Since I participate in both
- the Ubuntu and CDD projects, these are areas that I can discuss
- with some degree of knowledge and experience.</para>
- </section>
-
- <section>
- <title>"Fork" Is A Four Letter Word</title>
-
- <para>The act of taking the code for a free software project and
- bifurcating it to create a new project is called "forking."
- There have been a number of famous forks in free software
- history. One of the most famous was the schism that led to the
- parallel development of two versions of the Emacs text editor:
- GNU Emacs and XEmacs. This schism persists to this day.</para>
-
- <para>Some forks, like Emacs and XEmacs, are permanent. Others are
- relatively short lived. An example of this is the GCC project
- which saw two forks — EGCS and PGCC — that both
- eventually merged back into GCC. Forking can happen for any
- number of reasons. Often developers on a project develop
- political or personal differences that keep them from continuing
- to work together. In some cases, maintainers become unresponsive
- and other developers fork to keep the software alive.</para>
-
- <para>Ultimately though, most forks occur because people do not
- agree on the features, the mechanisms, or the technology at the
- core of a project. People have different goals, different
- problems, and want different tools. Often, these goals, problems
- and tools are similar up until a certain point before the need
- to part ways becomes essential.</para>
-
- <para>A fork occurs on the level of code but a fork is not merely
- — or even primarily — technical. Many projects create
- "branches." Branches are alternative versions of a piece of
- software used to experiment with intrusive or unstable features
- and fixes. Forks are distinguished from branches both in
- that they are often more significant departures from a technical
- perspective (i.e., more lines of code have been changed and/or
- the changes are more invasive or represent a more fundamental
- rethinking of the problem) and in that they are bifurcations
- defined in social and political terms. Branches involve a
- <emphasis>single</emphasis> developer or community of developers
- — even if it does boil down to distinct subgroups within a
- community — whereas forks are separate projects.</para>
-
- <para>Forking has historically been viewed as a bad thing in free
- software communities: they are seen to stem from people's
- inability to work together and have ended in reproduction of
- work. When I published the first version of the <ulink
- url="http://mako.cc/projects/howto/">Free Software Project
- Management HOWTO</ulink> more than four years ago, I included
- a small subsection on forking which described the concept to
- future free software project leaders with this text:</para>
-
- <blockquote>
- <para>The short version of the fork section is, don't do them.
- Forks force developers to choose one project to work with,
- cause nasty political divisions, and redundancy of
- work.</para>
- </blockquote>
-
- <para>In the <emphasis>best</emphasis> situations, a fork means
- that two groups of people need to go on developing features and
- doing work they would ordinarily do <emphasis>in addition
- to</emphasis> tracking the forked project and having to
- hand-select and apply features and fixes to their own code-base.
- This level of monitoring and constant comparison can be
- extremely difficult and time-consuming. The situation is not
- helped substantially by traditional source control tools like
- diff, patch, CVS and Subversion which are not optimized for this
- task. The worse (and much more common) situation occurs when two
- groups go about their work ignorant or partially ignorant of the
- code being cut on the other side of the fork. Important features
- and fixes are implemented twice — differently and
- incompatibly.</para>
-
- <para>The most substantial bright side to these drawbacks is that
- the problems associated with forking are so severe and notorious
- that, in most cases, the threat of a fork is enough to force
- maintainers to work out solutions that keep the fork from
- happening in the first place.</para>
-
- <para>Finally, it is worth pointing out that fork is something of
- a contested term. Because definitions of forks involve, to one
- degree or another, statements about the political, organization,
- and technical distinctions between projects, bifurcations that
- many people call branches or parallel trees are described by
- others as forks. Recently, fueled by the advent of distributed
- version control systems, the definition of what is and is not a
- fork has become increasingly unclear. In part due to the same
- systems, the benefits and drawbacks of what is increasingly
- problematically called forking is equally debatable.</para>
-
- </section>
-
- <section>
- <title>Case Study</title>
-
- <para>In my introduction, I described how the growing scope of
- free software projects and the rapidly increasingly size and
- diversity of user communities is spearheading the need for new
- type of derivation that avoids, as best as possible, the
- drawbacks of forking. Nowhere is this more evident than in the
- largest projects with the broadest scope: a small group of
- projects that includes operating system distributions.</para>
-
-
- <section>
- <title>The Debian Project</title>
-
- <para>The Debian project is by many counts the largest free
- software distribution in terms of code. It is the also,
- arguably, the largest free software project in terms of the
- number of volunteers. Debian includes more than 15,000
- packages and the work of well over 1,000 official volunteers
- and many more contributors without official membership.
- Projects without Debian's massive volunteer base cannot
- replicate what Debian has accomplished; they can rarely hope
- to even maintain what Debian has produced.</para>
-
- <para>At the time that this paper was written, Distrowatch lists
- 129 distributions based on Debian<footnote>
- <para>Information is listed on the distrowatch homepage
- here: <ulink
- url="http://distrowatch.com/dwres.php?resource=independence">http://distrowatch.com/dwres.php?resource=independence</ulink></para>
-
- </footnote> — most of them
- are currently active to varying degrees. Each distribution
- represents at least one person — and in most cases a
- community of people — who disagreed with Debian's vision
- or direction strongly enough to want to create a new
- distribution <emphasis>and</emphasis> who had the technical
- capacity to follow through with this goal. Despite Debian's
- long-standing slogan — "the universal operating system"
- — the fact
- that the Debian project has become the fastest growing
- operating system while spawning so many derivatives is
- testament to the fact that, as far as software is concerned,
- one size <emphasis>can not</emphasis> fit all.<footnote>
- <para>Netcraft posts yearly updates on the speed at which
- Linux distributions are growing. The one in question can be
- found at: <ulink
- url="http://news.netcraft.com/archives/2004/01/28/debian_fastest_growing_linux_distribution.html">http://news.netcraft.com/archives/2004/01/28/debian_fastest_growing_linux_distribution.html</ulink></para>
- </footnote>
- </para>
-
-
- <para>Organizationally, Debian derivers are located both inside
- and outside of the Debian project. A group of derivers working
- within the Debian project has labeled themselves "Custom
- Debian Distributions" and has created nearly a dozen projects
- customizing and deriving from Debian for specific groups of
- users including non-profit organization, the medical
- community, lawyers, children and many others.<footnote>
- <para>I spearheaded and help build a now mostly defunct
- derivation of Debian called Debian-Nonprofit (Debian-NP)
- geared for non-profit organizations by working within the
- Debian project.</para>
- </footnote> These projects build on the core Debian distribution and
- the canonical archive from <emphasis>within</emphasis> the
- organizational and political limits of the Debian project and
- constantly seek to minimize the delta by focusing on less
- invasive changes and by advancing creative ways of building
- the <emphasis>ability</emphasis> to alter the core
- Debian code base through established and policy compliant
- procedures.</para>
-
-<!-- http://linktocddinformation -->
-
- <para>A second group of Debian customizers includes those
- working outside of the Debian project organizationally.
- Notable among this list are (in alphabetical order) Knoppix,
- Libranet, Linspire (formerly Lindows), Progeny, MEPIS, Ubuntu,
- Userlinux, and Xandros. With its strong technological base,
- excellent package management, wide selection of packages to
- choose from, and strong commitment to software freedom which
- ensures derivability, Debian provides an ideal point from
- which to create a GNU/Linux distribution.</para>
-
- </section>
-
-
- <section>
- <title>Ubuntu</title>
-
- <para>The Ubuntu project was started by Mark Shuttleworth in
- April 2004 and the first version was built almost entirely
- by a small group of a Debian developers employed by Shuttleworth's
- company Canonical Limited.<footnote>
- <para>Information Ubuntu can be found on the <ulink
- url="http://www.ubuntu.com">Ubuntu homepage.</ulink>
- Information Canonical Limited can be found at <ulink
- url="http://www.canonical.com">Canonical's
- homepage</ulink>.</para>
- </footnote> It was released to the world in late 2004.
- The second version was released six months later in April
- 2005. The goals of Ubuntu are to provide a distribution based
- on a subset of Debian with:</para>
-
- <itemizedlist>
- <listitem>
- <para>Regular and predictable releases — every six months
- with support for eighteen months.</para>
- </listitem>
- <listitem>
- <para>An emphasis on free software that will maintain the
- derivability of the distribution.</para>
- </listitem>
- <listitem>
- <para>An emphasis on usability and a consistent desktop
- vision. As an example, this has translated into less
- questions in the installer and a default selection and
- configuration of packages that is usable for most desktop
- users "out of the box."</para>
- </listitem>
-
- </itemizedlist>
-
- <para>The Ubuntu project provides an interesting example of a
- project that aims to derive from Debian to an extensive
- degree. Ubuntu made code-level changes to nearly 1300 packages
- in Debian at the time that this paper was written and the
- speed of changes will not decelerate with time; the total
- number of changes and the total size of the delta will
- grow.<footnote>
- <para>Scott James Remnant maintains a list of these patches
- online here: <ulink
- url="http://people.ubuntu.com/~scott/patches/">http://people.ubuntu.com/~scott/patches/</ulink></para>
- </footnote> The changes that Ubuntu makes are primarily of the
- most intrusive kind — changes to the code itself.</para>
-
- <para>That said, the Ubuntu project is explicit about the fact
- that it could not exist without the work done by the Debian
- project.<footnote>
- <para>You can see that explicit statement on Ubuntu's
- website here: <ulink
- url="http://www.ubuntulinux.org/ubuntu/relationship/">http://www.ubuntulinux.org/ubuntu/relationship/</ulink></para>
- </footnote> More importantly, Ubuntu explains that it cannot
- continue to provide the complete set of packages that its
- users depend on without the ongoing work by the Debian
- project. Even though Ubuntu has made changes to the nearly
- 1300 packages, this is less than ten percent of the total
- packages shipped in Ubuntu and pulled from Debian.</para>
-
- <para>Scott James Remnant, a prominent Debian developer and a
- hacker on Ubuntu who works for Canonical Ltd., described the
- situation this way on his web log to introduce the Ubuntu
- development methodology in the week after the first public
- announcement of Canonical and Ubuntu:<footnote> <para>The
- entire post can be read here: <ulink
- url="http://www.netsplit.com/blog/work/canonical/ubuntu_and_debian.html">http://www.netsplit.com/blog/work/canonical/ubuntu_and_debian.html</ulink></para>
- </footnote>
- </para>
-
- <blockquote>
-
- <para>I don't think Ubuntu is a "fork" of Debian, at least not
- in the traditional sense. A fork suggests that at some
- point we go our separate way from Debian and then
- occasionally merge in changes as we carry on down our own
- path.</para>
-
- <para>Our model is quite different; every six months we take a
- snapshot of Debian's unstable distribution, apply any
- outstanding patches from our last release to it and spend a
- couple of months testing and bug-fixing it.</para>
-
-
- <para>
- <inlinemediaobject>
- <imageobject>
- <imagedata fileref="tfontf-picture-01.png" format="PNG"/>
- </imageobject>
- </inlinemediaobject>
- </para>
-
- <para>One thing that should be obvious from this is that our
- job is a lot easier if Debian takes all of our changes. The
- model actually encourages us to give back to
- Debian.</para>
-
- <para>That's why from the very first day we started fixing
- bugs we began sending <ulink
- url="http://www.no-name-yet.com/patches/">the
- patches</ulink> back to Debian through the BTS. Not only
- will it make our job so much easier when we come to freeze
- for "hoary", our next release, but it's exactly what every
- derivative should do in the first place.</para>
-
- </blockquote>
-
- <para>There is some debate on the degree to which Ubuntu
- developers have succeeded in accomplishing the goals laid out
- by Remnant. Ubuntu has filed hundreds of patches in the bug
- tracking system but it has also run into problems in deciding
- <emphasis>what</emphasis> constitutes something that should be
- fed back to Debian. Many changes are simply not relevant to
- Debian developers. For example, they may include changes to a
- package in response to another change made in another package
- in Ubuntu that will not or has not been taken by Debian. In
- many other cases, the best action in regards to a particular
- change, a particular package, and a particular upstream Debian
- developer is simply unclear.</para>
-
- <para>The Ubuntu project's track record in working
- constructively with Debian is, at the moment, a mixed one.
- While an increasingly large number of Debian developers are
- maintaining their packages actively within both projects, many
- in both Debian and Ubuntu feel that Ubuntu has work left to do
- in living up to its own goal of a completely smooth productive
- relationship with Debian.</para>
-
- <para>That said, the importance of the goals described by
- Remnant in the context of of the Ubuntu development model
- cannot be overstated. Every line of delta between Debian and
- Ubuntu has a cost for Ubuntu developers. Technology, social
- practices, and wise choices may reduce that cost but it cannot
- eliminate it. The resources that Ubuntu can bring to bear upon
- the problem of building a distribution are limited — far
- more limited than Debian's. As a result, there is a limit to
- how far Ubuntu can diverge; it is always in Ubuntu's advantage
- to minimize the delta where possible.</para>
-
- </section>
-
- <section>
- <title>Applicability</title>
-
- <para>Ubuntu and Debian are distributions and — as such
- — operate on a different scale than the vast majority of
- free software projects. They include more code and more
- people. As a result, there are questions as to whether the
- experiences and lessons learned from these projects are
- particularly applicable to the experience of smaller free
- software projects.</para>
-
- <para>Clearly, because of the difficulties associated with
- forking massive amount of code and the problems associated
- with duplicating the work of large volunteer bases,
- distributions are forced into finding a way to balance the
- benefits and drawbacks of forking. However, while the need is
- stronger and more immediate in larger projects, the benefits
- of their solutions will often be fully transferable.</para>
-
- <para>Clearly, modifiability of free software to better fit the
- needs of its users lies at the heart of the free software
- movement's success. However, while modification usually comes
- in the form of collaboration on a single code-base, this is
- a function of limitations in software development methodologies
- and tools rather than the best response to the needs or
- desires of users or developers.</para>
-
- <para>I believe that the fundamental advantage of free software
- in the next decade will be in the growing ability of any
- single free software project to be multiple things to multiple
- users simultaneously. This will translate into the fact that,
- in the next ten years, technology and social processes will
- evolve, so that forking is increasingly less of a bad thing.
- Free software development methodology will become less
- dependent on a single project and begin to emphasize parallel
- development within an ecosystem of related projects. The
- result is that free software projects will gain a competitive
- advantage over propriety software projects through their
- ability to better serve the increasingly diverse needs of
- increasingly large and increasingly diverse user-bases.
- Although it sounds paradoxical today, more projects will
- derive and less redundant code will be written.</para>
-
- <para>Projects more limited in code and scope may use the tools
- and methods described in the remainder of this paper in
- different combinations, in different ways, and to different
- degrees than the examples around distributions introduced
- here. Different projects with different needs will find that
- certain solutions work better than others. Because communities
- of the size of Debian are difficult to fork in a way that is
- beneficial to any party, it is in these communities that the
- technology and development methodologies are first
- emerging. With time, these strategies and tools will find
- themselves employed productively in a wide variety of projects
- with a broad spectrum of sizes, needs, scopes and
- descriptions.</para>
-
- </section>
-
- </section>
-
- <section>
- <title>Balancing Forking With Collaboration</title>
-
- <section>
- <title>Derivation and Problem Analysis</title>
-
- <para>The easiest step in creating a productive derivative
- software project is to break down the problems of derivations
- into a series of different classes of modification. Certain
- types of modification are more easily done and are
- intrinsically more maintainable.</para>
-
- <para>In the context of distributions, the problem of derivation
- can be broken down into the following types of changes (sorted
- roughly according to the intrusiveness inherent in solving the
- problem and the severity of the long-term maintainability
- problems that they introduce):</para>
-
- <orderedlist>
- <listitem>
- <para>Selection of individual pieces of software;</para>
- </listitem>
- <listitem>
- <para>Changes to the way that packages are installed or run
- (e.g., in a Live CD type environment or using a different
- installer);</para>
- </listitem>
- <listitem>
- <para>Configuration of different pieces of software;</para>
- </listitem>
- <listitem>
- <para>Changes made to the actual software package (made on
- the level of changes to the packages code);</para>
- </listitem>
- </orderedlist>
-
- <para>By breaking down the problem in this way, Debian derivers
- have been able to approach derivation in ways that focus
- energy on the less intrusive problems first.</para>
-
- <para>The first area that Ubuntu focused on was selecting a
- subset of packages that Ubuntu would support. Ubuntu selected
- and supports approximate 2,000 packages. These became the
- <command>main</command> component in Ubuntu. Other packages in
- Debian were included in a separate section of the Ubuntu
- archive called <command>universe</command> but were not
- guaranteed to be supported with bug or security fixes. By
- focusing on a small subset of packages, the Ubuntu team was
- able to select a maintainable subsection of the Debian archive
- that they could maintain over time.</para>
-
- <para>The most simple derived distributions — often
- working within the Debian project as CDDs but also including
- projects like Userlinux — are merely lists of packages
- and do nothing outside of package selection. The installation
- of lists of packages and the maintenance of those lists over
- time can be aided through the creation of what are called
- <emphasis>metapackages</emphasis>: empty packages with long
- lists of "dependencies."</para>
-
- <para>The second item, configuration changes, is also
- relatively low-impact. Focusing on moving as many changes as
- possible into the realm of configuration changes is a
- sustainable strategy that derivers working within the Debian
- project intent on a single code-base have pursued actively.
- Their idea is that rather than forking a piece of code due to
- disagreement in how the program should work, they can leave
- the code intact but add the <emphasis>ability</emphasis> to
- work in a different way to the software. This alternate
- functionality is made toggleable through a configuration
- change in the same manner that applications are configured
- through questions asked at install time. Since the Debian
- project has a unified package configuration framework called
- Debconf, derivers are able to configure an entire system in a
- highly centralized manner.<footnote> <para>More information on
- Debconf can be
- found online at: <ulink
- url="http://www.kitenet.net/programs/debconf/">http://www.kitenet.net/programs/debconf/</ulink></para>
- </footnote> This is not unlike RedHat's Kickstart although the
- emphasis is on maintenance of those configuration changes over
- the life and evolution of the package; Kickstart is focused
- merely on installation of the package.</para>
-
- <para>A third type of configuration is limited to changes in the
- environment through which a system is run or installed. One is
- example is Progeny's Anaconda-based Debian installer which
- provides an alternate installer but results in an identical
- system. Another example is the Knoppix project which is famous
- for its "Live CD" environments. While, Knoppix makes a wide
- range of invasive changes that span all items in my list
- above, other Live CD projects, including Ubuntu's "Casper"
- project, are much closer to an alternate shell through which
- the same code is run.</para>
-
- <para>Because these three methods are relatively non-invasive,
- they are reasonable strategies for small teams and individuals
- working on creating a derived distribution. However, many
- desirable changes — and in the case of some derived
- distributions, <emphasis>most</emphasis> desirable changes
- — require more invasive techniques. The final and most
- invasive type of change — changes to code — is the
- most difficult but also the most promising and powerful if it
- can be done sustainably. Changes of this type involve
- bifurcations of the code-base and will be the topic of the
- remainder of this paper.</para>
-
- </section>
-
- <section>
- <title>Distributed Source Control</title>
-
- <para>One promising method of maintaining deltas in forked or
- branched projects lies in distributed version control systems
- (VCS). Traditional VCS systems work in a highly centralized
- fashion. CVS, the archetypal free software VCS and the basis
- for many others, is based around the model of a single
- centralized server. Anyone who wishes to commit to a project
- must commit to the centralized repository. While CVS allows
- users to create branches, anyone with commit rights has access
- to the entire repository. The tools for branching and merging
- over time are not particularly good.</para>
-
- <para>The branching model is primarily geared toward a system
- where development is bifurcated and then the branch is merged
- completely back into the main tree. Normal use of a branch
- might include creating a development branch, making a series
- of development releases while maintaining and fixing important
- bugs in the stable primary branch, and then ultimately
- replacing the stable release with the development release. The
- CVS model is <emphasis>not</emphasis> geared toward a system
- where an arbitrary delta, or sets of deltas, are maintained
- over time.</para>
-
- <para>Distributed version control aims to solve a number of
- problems introduced by CVS and alluded to above by:</para>
-
- <itemizedlist>
- <listitem>
- <para>Allowing people to work disconnected from each other
- and to sync with each other, in whole or in part, in an
- arbitrary and ad-hoc fashion.</para>
- </listitem>
- <listitem>
- <para>Allowing deltas to be maintained over time.</para>
- </listitem>
- </itemizedlist>
-
- <para>Ultimately, this requires tools that are better at merging
- changes and in <emphasis>not</emphasis> merging certain
- changes when that is the desired behavior. It also leads to tools capable
- of history-sensitive merging.</para>
-
- <para>The most famous switch to a distributed VCS model from a
- centralized VCS model was the move by the Linux kernel
- development community to the proprietary distributed version
- control system BitKeeper. In his recent announcement of the
- decision to part ways with BitKeeper, Linus Torvalds
- said:</para>
-
- <blockquote>
- <para>In fact, one impact BK has had is to very fundamentally
- make us (and me in particular) change how we do things. That
- ranges from the fine-grained changeset tracking to just how
- I ended up trusting sub-maintainers with much bigger things,
- and not having to work on a patch-by-patch basis any
- more.<footnote> <para>The full message can be read online
- at: <ulink
- url="http://kerneltrap.org/mailarchive/1/message/48393/thread">http://kerneltrap.org/mailarchive/1/message/48393/thread</ulink></para>
- </footnote>
- </para>
- </blockquote>
-
- <para>At the time of the switch, free distributed version
- control tools were less advanced than they are today. At the
- moment, an incomplete list of free software VCS tools includes
- GNU Arch, Bazaar, Bazaar-NG, Darcs, Monotone, SVK (based on
- Subversion), GIT (a system developed by Linus Torvalds as a
- replacement for BitKeeper) and others.</para>
-
- <para>Each of these tools, at least after they reach a certain
- level of maturity, allow or will allow users to develop
- software in a distributed fashion and to, over time, compare
- their software and pull changes from others significantly more
- easily than they could otherwise. The idea of parallel
- development lies at the heart of the model. The tools for
- merging and resolving conflicts over time, and the ability to
- "cherry pick" certain patches or changes from a parallel
- developer each make this type of development significantly
- more useful than it has been in the past.</para>
-
- <para>VCSs work entirely on the level of code. Due to the nature
- of the types of changes that Ubuntu project is making to
- Debian's code, Ubuntu has focused primarily on this model and
- Canonical currently funds two major distributed control
- products — the Bazaar and Bazaar-NG projects.</para>
-
- <para>In many ways, employing distributed version control
- effectively is a much easier problem to solve for small, more
- traditional, free software development projects than it is for
- GNU/Linux distributions. Because the problems associated with
- maintaining parallel development of a single piece of software
- in a set of related distributed repositories is the primary
- use case for distributed version control systems, distributed
- VCS alone can be a technical solution for certain types of
- parallel development. As the tools and social processes for
- distributed VCS evolve, they will become increasingly
- important tools in the way that free software is
- developed.</para>
-
- <para>Because the problems of scale associated with building an
- entire derivative distribution are more complicated than those
- associated with working with a single "upstream" project,
- distributed version control is only now being actively
- deployed in the Ubuntu project. In doing so, the project is
- focusing on integrating these into problem specific tools
- built on top of distributed version control.</para>
-
- </section>
-
- <section>
- <title>Problem Specific Tools</title>
-
- <para>Another technique that Canonical Ltd. is experimenting
- with is the creation of high level tools built on top of
- distributed version control tools specifically designed for
- maintaining difference between packages. Because packages are
- usually distributed as a source file with a collection of one
- or more patches, this introduces the unique possibility of
- creating a high-level VCS system based around this fact.</para>
-
- <para>In the case of Ubuntu and Debian, the ideal tool creates
- one branch per patch or feature and uses heuristics to
- analyze patch files and create these branches
- intelligently. The package build system section of the total
- patch can also be kept as a separate branch. Canonical's tool,
- called the Hypothetical Changeset Tool (HCT) (although no
- longer hypothetical), is one experimental way of creating a
- very simple, very streamlined interface for dealing with a
- particular type of source that is created and distributed in a
- particular type of way with a particular type of
- change.</para>
-
- <para>While HCT promises to be very useful for people making
- derived distributions based on Debian, its application outside
- distribution makers will, in all likelihood, be limited. That
- said, it provides an example of the way that problem and
- context specific tools may play an essential role in the
- maintenance of derived code more generally.</para>
-
- </section>
-
-
- <section>
- <title>Social Solutions</title>
-
- <para>It has been said that it is a common folly of a
- technophile to attempt to employ technical solutions toward
- solving social problems. The problem of deriving software is
- both a technical <emphasis>and</emphasis> social problem and
- adequately addressing the larger problems requires approaches that
- take into consideration both types of solution.</para>
-
- <para>Scott James Remnant compares the relationship between
- distributions and derived distributions as similar to the
- relationship between distributions and upstream
- maintainers:</para>
- <blockquote>
-
- <para>I don't think this is much different from how Debian
- maintainers interact with their upstreams. As Debian
- maintainers we take and package upstream software and then
- act as a gateway for bugs and problems. Quite often we fix
- bugs ourselves and apply the patch to the package and send
- it upstream. Sometimes the upstream don't incorporate that
- patch and we have to make sure we don't accidentally drop it
- each subsequent release, we much prefer it if they take
- them, but we don't get angry if they don't.</para>
-
- <para>This is how I see the relationship between Ubuntu and
- Debian, we're no more a fork of Debian than a Debian package
- is a fork of its upstream.</para>
- </blockquote>
-
- <para>Scott alludes the fact that, at least in the world of
- distributions, parallel development is already one way to view
- the <emphasis>modus operandi</emphasis> of existing GNU/Linux
- distributions. The relationship between a deriver and derivee
- on the distribution level mirrors the relationship between the
- distribution and the "upstream" authors of the packages that
- make up the distribution. These relationships are rarely based
- around technological tools but are entirely in the realm of
- social solutions.</para>
-
- <para>Ubuntu has pursued a number of different initiatives along
- these lines. The first of these has been to regularly file
- bugs in the Debian bug tracking system when bugs that exist in
- Debian are fixed in Ubuntu. While this can be partially
- automated, the choice to automate this and the manner in which
- it it is set up is a purely social one.</para>
-
- <para>However, as I alluded to above, Ubuntu is still left with
- questions in regards to changes that are made to packages that
- do not necessarily fix bugs or that fix bugs that do not exist
- in Debian but may in the future. Some Debian developers want
- to hear about the full extent of changes made to their
- software in Ubuntu while others do not want to be
- bothered. Ubuntu should continue to work with Debian to find
- ways to allow developers to stay in sync.</para>
-
- <para>There are also several initiatives by developers in
- Debian, Ubuntu, and in other derivations to create a
- stronger relationship between the Debian project and its
- ecosystem of derivers and between Ubuntu and Debian in
- particular. While the form that this will ultimately take is
- unclear, projects existing within an ecosystem should explore
- the realm of appropriate social relationships that will ensure
- that they can work together and be informed of each others'
- work without resorting to "spamming" each other with
- irrelevant or unnecessary information.</para>
-
- <para>Another issue that has recently played an important role
- in the Debian/Ubuntu relationship is the importance of both
- giving adequate credit to the authors or upstream maintainers
- of software without implying a closer relationship than is the
- case. Derivers must walk a file line where they credit others'
- work on a project without implying that the others work for,
- support, or are connected to the derivers project to which, for
- any number of reasons, the "upstream" author might not want to
- be associated.</para>
-
- <para>In the case of Debian and Ubuntu, this has resulted in an
- emphasis on keeping or importing changelog entries when
- changes are imported and in noting the pedigree of changes
- more generally. It has recently also been discussed in terms
- of the "maintainer" field in each package in Ubuntu. Ubuntu
- wants to avoid making changes to every unmodified source
- package (and introducing an unnecessary delta) but does not
- want to give the impression that the maintainer of the package
- is someone unassociated with Ubuntu. While no solution has
- been decided at the time of writing, one idea involved marking
- the maintainer of the package explicitly as a Debian
- maintainer at the time that the binary packages are built on
- the Ubuntu build machines.</para>
-
- <para>The emphasis on social solutions is also essential when
- using distributed VCS technology. As Linus Torvalds alluded to
- in the quote above, the importance of technological changes to
- distributed VCS technology is only felt when people begin to
- work in a different way — when they begin to employ
- different social models of developer interaction.</para>
-
- <para>While Ubuntu's experience can provide a good model for
- tackling some of these source control issues, it can only
- serve as a model and not as a fixed answer. Social solutions
- must be appropriate for a given social relationship. Even in
- situations where a package is branched because of social
- disagreements, a certain level of collaboration on a social
- level will be essential to the long term viability of the
- derivative.</para>
-
- </section>
-
- </section>
-
- <section>
- <title>Conclusions</title>
-
- <para>As the techniques described in this paper evolve, the role
- that they play in free software development becomes increasingly
- prominent and increasingly important. Joining them will be other
- techniques and models that I have not described and cannot
- predict. Because of the size and usefulness of their code and
- the size of their development communities, large projects like
- Debian and Ubuntu have been forced into confronting and
- attempting to mediate the problems inherent in forking and
- deriving. However, as these problems are negotiated and tools
- and processes are advanced toward solutions, free software
- projects of all sizes will be able to offer users exactly what
- they want with minimal redundancy and little duplication of
- work. In doing this, free software will harness a power that
- proprietary models cannot compete with. They will increase their
- capacity to produce better products and better processes.
- Ultimately, it will help free software capture more users, bring
- in more developers, and produce more free software of a higher
- quality.</para>
-
- </section>
-
-</article>
-
-
-<!-- Keep this comment at the end of the file
-Local variables:
-mode: xml
-sgml-omittag:t
-sgml-shorttag:t
-sgml-namecase-general:t
-sgml-general-insert-case:lower
-sgml-minimize-attributes:nil
-sgml-always-quote-attributes:t
-sgml-parent-document:nil
-sgml-exposed-tags:nil
-sgml-local-catalogs:nil
-sgml-local-ecat-files:nil
-sgml-indent-step: 2
-sgml-indent-data: 2
-sgml-set-face: t
-End:
--->