1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
5 <article id="paper-11194">
7 <title>To Fork or Not To Fork</title>
8 <subtitle>Lessons From Ubuntu and Debian</subtitle>
10 <firstname>Benjamin</firstname>
11 <othername>Mako</othername>
12 <surname>Hill</surname>
14 <orgname>Canonical Limited</orgname>
17 <orgname>The Debian GNU/Linux Project</orgname>
20 <orgname>Software in the Public Interest, Inc.</orgname>
24 <para>Benjamin Mako Hill is an intellectual property
25 researcher and activist and a professional Free/Open Source
26 Software (FOSS) advocate and developer. He is active
27 participant in the Debian Project in both technical and
28 non-technical roles. He is the author of the Free Software
29 Project Management HOWTO and many published works on Free
30 and Open Source Software. He currently is working full time
31 for Canonical Ltd. on Ubuntu, a new Debian-based
39 <holder>Benjamin Mako Hill</holder>
44 <para>This material is licensed under the <ulink
45 url="http://creativecommons.org/licenses/by-sa/2.0/">Creative
46 Commons Attribution-Sharealike 2.0 License</ulink>.</para>
48 <para>The canonical location for the most recent version of this
49 document is <ulink url="http://mako.cc/">at the author's
50 website</ulink>.</para>
56 <revnumber>0.2</revnumber>
57 <date>August 7, 2005</date>
58 <revremark>Correction and improvements.</revremark>
61 <revnumber>0.1</revnumber>
62 <date>May 15, 2005</date>
65 <para>The first version of this paper was written to an
66 accepted talk given at Linuxtag 2005 given in Karlsruhe,
77 <title>Introduction</title>
79 <para>The explosive growth of free and open source software over
80 the last decade has been mirrored by an equally explosive growth
81 in the ambitiousness of free software projects in choosing and
82 tackling problems. The free software movement approaches these
83 large problems with more code and with more expansive
84 communities than was thinkable a decade ago. Example of these
85 massive projects include desktop environments — like GNOME
86 and KDE — and distributions like Debian, RedHat, and
89 <para>These projects are leveraging the work of thousands of
90 programmers — both volunteer and paid — and are
91 producing millions of lines of code. Their software is being
92 used by millions of users with diverse sets of needs. This
93 paper focuses on two major effects of this situation:</para>
98 <para>The communities that free software projects — and
99 in particular large projects — serve are increasingly
100 diverse. It is becoming increasingly difficult for a single
101 large project to release any single product that can cater
102 to all of its potential users.</para>
107 <para>It's becoming increasingly difficult to reproduce these
108 large projects. While reproducing entire project is
109 impossible for small groups of hackers, it is often not even
110 possible for small groups to even track and maintain a fork
111 of a large project over time.</para>
116 <para>Taken together, these facts imply an increasingly realized
117 free software community in which programmers frequently derive
118 but where traditional forking is often untenable. "Forks," as
119 they are traditionally defined, must be improved upon.
120 Communities around large free software projects must be smarter
121 about the process of derivation than they have been in the
124 <para>We are already seeing this with GNU/Linux distributions. New
125 distributions are rarely built from scratch today. Instead, they
126 adapted from and built on top of the work of existing projects.
127 As projects and user-bases grow, these derived distributions are
128 increasingly common. Most of what I describe in this essay are
129 tools and experiences of derived distributions.</para>
131 <para>Software makers must pursue the idea of an
132 <emphasis>ecosystem</emphasis> of free software projects and
133 products that have forked but that maintain a close relationship
134 as they develop parallelly and symbiotically. To do this,
135 developers should:</para>
139 <para>Break down the process of derivation into a set of
140 different types of customization and derivation and
141 prioritize methods of derivation.</para>
144 <para>Create and foster social solutions to the social aspects
145 of the derivation problem.</para>
148 <para>Build and use new tools specifically designed to
149 coordinate development of software in the context of an
150 ecosystem of projects.</para>
153 <para>Distribute and utilize distributed version control tools
154 with an emphasis on maintaining differences over
159 <para>This paper is an early analysis of this set of problems. As
160 such, it is highly focused on the experience of the Ubuntu
161 project and its existence as a derived Debian distribution. It
162 also pulls from my experience with Debian-NP and the Custom
163 Debian Distribution (CDD) community. Since I participate in both
164 the Ubuntu and CDD projects, these are areas that I can discuss
165 with some degree of knowledge and experience.</para>
169 <title>"Fork" Is A Four Letter Word</title>
171 <para>The act of taking the code for a free software project and
172 bifurcating it to create a new project is called "forking."
173 There have been a number of famous forks in free software
174 history. One of the most famous was the schism that led to the
175 parallel development of two versions of the Emacs text editor:
176 GNU Emacs and XEmacs. This schism persists to this day.</para>
178 <para>Some forks, like Emacs and XEmacs, are permanent. Others are
179 relatively short lived. An example of this is the GCC project
180 which saw two forks — EGCS and PGCC — that both
181 eventually merged back into GCC. Forking can happen for any
182 number of reasons. Often developers on a project develop
183 political or personal differences that keep them from continuing
184 to work together. In some cases, maintainers become unresponsive
185 and other developers fork to keep the software alive.</para>
187 <para>Ultimately though, most forks occur because people do not
188 agree on the features, the mechanisms, or the technology at the
189 core of a project. People have different goals, different
190 problems, and want different tools. Often, these goals, problems
191 and tools are similar up until a certain point before the need
192 to part ways becomes essential.</para>
194 <para>A fork occurs on the level of code but a fork is not merely
195 — or even primarily — technical. Many projects create
196 "branches." Branches are alternative versions of a piece of
197 software used to experiment with intrusive or unstable features
198 and fixes. Forks are distinguished from branches both in
199 that they are often more significant departures from a technical
200 perspective (i.e., more lines of code have been changed and/or
201 the changes are more invasive or represent a more fundamental
202 rethinking of the problem) and in that they are bifurcations
203 defined in social and political terms. Branches involve a
204 <emphasis>single</emphasis> developer or community of developers
205 — even if it does boil down to distinct subgroups within a
206 community — whereas forks are separate projects.</para>
208 <para>Forking has historically been viewed as a bad thing in free
209 software communities: they are seen to stem from people's
210 inability to work together and have ended in reproduction of
211 work. When I published the first version of the <ulink
212 url="http://mako.cc/projects/howto/">Free Software Project
213 Management HOWTO</ulink> more than four years ago, I included
214 a small subsection on forking which described the concept to
215 future free software project leaders with this text:</para>
218 <para>The short version of the fork section is, don't do them.
219 Forks force developers to choose one project to work with,
220 cause nasty political divisions, and redundancy of
224 <para>In the <emphasis>best</emphasis> situations, a fork means
225 that two groups of people need to go on developing features and
226 doing work they would ordinarily do <emphasis>in addition
227 to</emphasis> tracking the forked project and having to
228 hand-select and apply features and fixes to their own code-base.
229 This level of monitoring and constant comparison can be
230 extremely difficult and time-consuming. The situation is not
231 helped substantially by traditional source control tools like
232 diff, patch, CVS and Subversion which are not optimized for this
233 task. The worse (and much more common) situation occurs when two
234 groups go about their work ignorant or partially ignorant of the
235 code bieng cut on the other side of the fork. Important features
236 and fixes are implemented twice — differently and
239 <para>The most substantial bright side to these drawbacks is that
240 the problems associated with forking are so severe and notorious
241 that, in most cases, the threat of a fork is enough to force
242 maintainers to work out solutions that keep the fork from
243 happening in the first place.</para>
245 <para>Finally, it is worth pointing out that fork is something of
246 a contested term. Because definitions of forks involve, to one
247 degree or another, statements about the political, organization,
248 and technical distinctions between projects, bifurcations that
249 many people call branches or parallel trees are described by
250 others as forks. Recently, fueled by the advent of distributed
251 version control systems, the definition of what is and is not a
252 fork has become increasingly unclear. In part due to the same
253 systems, the benefits and drawbacks of what is increasingly
254 problematically called forking is equally debatable.</para>
259 <title>Case Study</title>
261 <para>In my introduction, I described how the growing scope of
262 free software projects and the rapidly increasingly size and
263 diversity of user communities is spearheading the need for new
264 type of derivation that avoids, as best as possible, the
265 drawbacks of forking. Nowhere is this more evident than in the
266 largest projects with the broadest scope: a small group of
267 projects that includes operating system distributions.</para>
271 <title>The Debian Project</title>
273 <para>The Debian project is by many counts the largest free
274 software distribution in terms of code. It is the also,
275 arguably, the largest free software project in terms of the
276 number of volunteers. Debian includes more than 15,000
277 packages and the work of well over 1,000 official volunteers
278 and many more contributors without official membership.
279 Projects without Debian's massive volunteer base cannot
280 replicate what Debian has accomplished; they can rarely hope
281 to even maintain what Debian has produced.</para>
283 <para>At the time that this paper was written, Distrowatch lists
284 129 distributions based on Debian<footnote>
285 <para>Information is listed on the distrowatch homepage
287 url="http://distrowatch.com/dwres.php?resource=independence">http://distrowatch.com/dwres.php?resource=independence</ulink></para>
289 </footnote> — most of them currently active to varying
290 degrees. Each distribution represents at least one person —
291 and in most cases a community of people — who disagreed with
292 Debian's vision or direction strongly enough to want to create
293 a new distribution <emphasis>and</emphasis> who had the
294 technical capacity to follow through with this goal. Despite
295 Debian's long-standing slogan — "the universal operating
296 system" — the fact that the Debian project has become the
297 fastest growing operating system while spawning so many
298 derivatives is testament to the fact that, as far as software
299 is concerned, one size <emphasis>can not</emphasis> fit
301 <para>Netcraft posts yearly updates on the speed at which
302 Linux distributions are growing. The one in question can
304 url="http://news.netcraft.com/archives/2004/01/28/debian_fastest_growing_linux_distribution.html">http://news.netcraft.com/archives/2004/01/28/debian_fastest_growing_linux_distribution.html</ulink></para>
309 <para>Organizationally, Debian derivers are located both inside
310 and outside of the Debian project. A group of derivers working
311 within the Debian project has labeled themselves "Custom
312 Debian Distributions" and has created nearly a dozen projects
313 customizing and deriving from Debian for specific groups of
314 users including non-profit organization, the medical
315 community, lawyers, children and many others.<footnote>
316 <para>I spearheaded and help build a now mostly defunct
317 derivation of Debian called Debian-Nonprofit (Debian-NP)
318 geared for non-profit organizations by working within the
319 Debian project.</para>
320 </footnote> These projects build on the core Debian distribution and
321 the canonical archive from <emphasis>within</emphasis> the
322 organizational and political limits of the Debian project and
323 constantly seek to minimize the delta by focusing on less
324 invasive changes and by advancing creative ways of building
325 the <emphasis>ability</emphasis> to alter the core
326 Debian code base through established and policy compliant
329 <!-- http://linktocddinformation -->
331 <para>A second group of Debian customizers includes those
332 working outside of the Debian project organizationally.
333 Notable among this list are (in alphabetical order) Knoppix,
334 Libranet, Linspire (formerly Lindows), Progeny, MEPIS, Ubuntu,
335 Userlinux, and Xandros. With its strong technological base,
336 excellent package management, wide selection of packages to
337 choose from, and strong commitment to software freedom which
338 ensures derivability, Debian provides an ideal point from
339 which to create a GNU/Linux distribution.</para>
345 <title>Ubuntu</title>
347 <para>The Ubuntu project was started by Mark Shuttleworth in
348 April 2004 and the first version was built almost entirely
349 by a small group of a Debian developers employed by Shuttleworth's
350 company Canonical Limited.<footnote>
351 <para>Information Ubuntu can be found on the <ulink
352 url="http://www.ubuntu.com">Ubuntu homepage.</ulink>
353 Information Canonical Limited can be found at <ulink
354 url="http://www.canonical.com">Canonical's
355 homepage</ulink>.</para>
356 </footnote> It was released to the world in late 2004.
357 The second version was released six months later in April
358 2005. The goals of Ubuntu are to provide a distribution based
359 on a subset of Debian with:</para>
363 <para>Regular and predictable releases — every six months
364 with support for eighteen months.</para>
367 <para>An emphasis on free software that will maintain the
368 derivability of the distribution.</para>
371 <para>An emphasis on usability and a consistent desktop
372 vision. As an example, this has translated into less
373 questions in the installer and a default selection and
374 configuration of packages that is usable for most desktop
375 users "out of the box."</para>
380 <para>The Ubuntu project provides an interesting example of a
381 project that aims to derive from Debian to an extensive
382 degree. Ubuntu made code-level changes to nearly 1300 packages
383 in Debian at the time that this paper was written and the
384 speed of changes will not decelerate with time; the total
385 number of changes and the total size of the delta will
387 <para>Scott James Remnant maintains a list of these patches
389 url="http://people.ubuntu.com/~scott/patches/">http://people.ubuntu.com/~scott/patches/</ulink></para>
390 </footnote> The changes that Ubuntu makes are primarily of the
391 most intrusive kind — changes to the code itself.</para>
393 <para>That said, the Ubuntu project is explicit about the fact
394 that it could not exist without the work done by the Debian
396 <para>You can see that explicit statement on Ubuntu's
398 url="http://www.ubuntulinux.org/ubuntu/relationship/">http://www.ubuntulinux.org/ubuntu/relationship/</ulink></para>
399 </footnote> More importantly, Ubuntu explains that it cannot
400 continue to provide the complete set of packages that its
401 users depend on without the ongoing work by the Debian
402 project. Even though Ubuntu has made changes to the nearly
403 1300 packages, this is less than ten percent of the total
404 packages shipped in Ubuntu and pulled from Debian.</para>
406 <para>Scott James Remnant, a prominent Debian developer and a
407 hacker on Ubuntu who works for Canonical Ltd., described the
408 situation this way on his web log to introduce the Ubuntu
409 development methodology in the week after first public
410 announcement of Canonical and Ubuntu:<footnote>
411 <para>The entire post can be read here: <ulink
412 url="http://www.netsplit.com/blog/work/canonical/ubuntu_and_debian.html">http://www.netsplit.com/blog/work/canonical/ubuntu_and_debian.html</ulink></para>
418 <para>I don't think Ubuntu is a "fork" of Debian, at least not
419 in the traditional sense. A fork suggests that at some
420 point we go our separate way from Debian and then
421 occasionally merge in changes as we carry on down our own
424 <para>Our model is quite different; every six months we take a
425 snapshot of Debian's unstable distribution, apply any
426 outstanding patches from our last release to it and spend a
427 couple of months testing and bug-fixing it.</para>
433 <imagedata fileref="tfontf-picture-01.png" format="PNG"/>
438 <para>One thing that should be obvious from this is our job is
439 a lot easier if Debian take all of our changes, the model
440 actually encourages us to give back to Debian.</para>
442 <para>That's why from the very first day we started fixing
443 bugs we began sending <ulink
444 url="http://www.no-name-yet.com/patches/">the
445 patches</ulink> back to Debian through the BTS. Not only
446 will it make our job so much easier when we come to freeze
447 for "hoary", our next release, but it's exactly what every
448 derivative should do in the first place.</para>
452 <para>There is some debate on the degree to which Ubuntu
453 developers have succeeded in accomplishing the goals laid out
454 by Remnant. Ubuntu has filed hundreds of patches in the bug
455 tracking system but it has also run into problems in deciding
456 <emphasis>what</emphasis> constitutes something that should be
457 fed back to Debian. Many changes are simply not relevant to
458 Debian developers. For example, they may include changes to a
459 package in response to another change made in another package
460 in Ubuntu that will not or has not been taken by Debian. In
461 many other cases, the best action in regards to a particular
462 change, a particular package, and a particular upstream Debian
463 developer is simply unclear.</para>
465 <para>The Ubuntu project's track record in working
466 constructively with Debian is, at the moment, a mixed one.
467 While an increasingly large number of Debian developers are
468 maintaining their packages actively within both projects, many
469 in both Debian and Ubuntu feel that Ubuntu has work left to do
470 in living up to its own goal of a completely smooth productive
471 relationship with Debian.</para>
473 <para>That said, the importance of the goals described by
474 Remnant in the context of of the Ubuntu development model
475 cannot be overstated. Ever line of delta between Debian and
476 Ubuntu has a cost for Ubuntu developers. Technology, social
477 practices, and wise choices may reduce that cost but it cannot
478 eliminate it. The resources that Ubuntu can bring to bear upon
479 the problem of building a distribution are limited — far
480 more limited than Debian's. As a result, there is a limit to
481 how far Ubuntu can diverge; it is always in Ubuntu's advantage
482 to minimize the delta where possible.</para>
487 <title>Applicability</title>
489 <para>Ubuntu and Debian are distributions and — as such
490 — operate on a different scale than the vast majority of
491 free software projects. They include more code and more
492 people. As a result, there are questions as to whether the
493 experiences and lessons learned from these projects are
494 particularly applicable to the experience of smaller free
495 software projects.</para>
497 <para>Clearly, because of the difficulties associated with
498 forking massive amount of code and the problems associated
499 with duplicating the work of large volunteer bases,
500 distributions are forced into finding a way to balance the
501 benefits and drawbacks of forking. However, while the need is
502 stronger and more immediate in larger projects, the benefits
503 of their solutions will often be fully transferable.</para>
505 <para>Clearly, modifiability of free software to better fit the
506 needs of its users lies at the heart of the free software
507 movement's success. However, while modification usually comes
508 in the form of collaboration on a single code-base, this is
509 function of limitations in software development methodologies
510 and tools rather than the best response to the needs or
511 desires of users or developers.</para>
513 <para>I believe that the fundamental advantage of free software
514 in the next decade will be in the growing ability of any
515 single free software project to be multiple things to multiple
516 users simultaneously. This will translate into the fact that,
517 in the next ten years, technology and social processes will
518 evolve so that forking is increasingly less of a bad thing.
519 Free software development methodology will become less
520 dependent on a single project and begin to emphasize parallel
521 development within an ecosystem of related projects. The
522 result is that free software projects will gain a competitive
523 advantage over propriety software projects through their
524 ability to better serve the increasingly diverse needs of
525 increasingly large and increasingly diverse user-bases.
526 Although it sounds paradoxical today, more projects will
527 derive and less redundant code will be written.</para>
529 <para>Projects more limited in code and scope may use the tools
530 and methods described in the remainder of this paper in
531 different combinations, in different ways, and to different
532 degrees than the examples around distributions introduced
533 here. Different projects with different needs will find that
534 certain solutions work better than others. Because communities
535 of the size of Debian are difficult to fork in a way that is
536 beneficial to any party, it is in these communities that the
537 technology and development methodologies are first
538 emerging. With time, these strategies and tools will find
539 themselves employed productively in a wide variety of projects
540 with a broad spectrum of sizes, needs, scopes and
548 <title>Balancing Forking With Collaboration</title>
551 <title>Derivation and Problem Analysis</title>
553 <para>The easiest step in creating a productive derivative
554 software project is to break down the problems of derivations
555 into a series of different classes of modification. Certain
556 types of modification are more easily done and are
557 intrinsically more maintainable.</para>
559 <para>In the context of distributions, the problem of derivation
560 can be broken down into the following types of changes (sorted
561 roughly according to the intrusiveness inherent in solving the
562 problem and the severity of the long-term maintainability
563 problems that they introduce):</para>
567 <para>Selection of individual pieces of software;</para>
570 <para>Changes to the way that packages are installed or run
571 (e.g., in a Live CD type environment or using a different
575 <para>Configuration of different pieces of software;</para>
578 <para>Changes made to the actual software package (made on
579 the level of changes to the packages code);</para>
583 <para>By breaking down the problem in this way. Debian derivers
584 have been able to approach derivation in ways that focus
585 energy on the less intrusive problems first.</para>
587 <para>The first area that Ubuntu focused on was selecting a
588 subset of packages that Ubuntu would support. Ubuntu selected
589 and supports approximate 2,000 packages. These became the
590 <command>main</command> component in Ubuntu. Other packages in
591 Debian were included in a separate section of the Ubuntu
592 archive called <command>universe</command> but were not
593 guaranteed to be supported with bug or security fixes. By
594 focusing on a small subset of packages, the Ubuntu team was
595 able to select a maintainable subsection of the Debian archive
596 that they could maintain over time.</para>
598 <para>The most simple derived distributions — often
599 working within the Debian project as CDDs but also including
600 projects like Userlinux — are merely lists of packages
601 and do nothing outside of package selection. The installation
602 of lists of packages and the maintenance of those lists over
603 time can be aided through the creation of what are called
604 <emphasis>metapackages</emphasis>: empty packages with long
605 lists of "dependencies."</para>
607 <para>The second item, configuration changes, are also
608 relatively low-impact. Focusing on moving as many changes as
609 possible into the realm of configuration changes is a
610 sustainable strategy that derivers working within the Debian
611 project intent on a single code-base have pursued actively.
612 Their idea is that rather than forking a piece of code due to
613 disagreement in how the program should work, they can leave
614 the code intact but add the <emphasis>ability</emphasis> to
615 work in a different way to the software. This alternate
616 functionality is made toggleable through a configuration
617 change in the same manner that applications are configured
618 through questions asked at install time. Since the Debian
619 project has a unified package configuration framework called
620 Debconf, derivers are able to configure an entire system in a
621 highly centralized manner.<footnote> <para>More information on
623 found online at: <ulink
624 url="http://www.kitenet.net/programs/debconf/">http://www.kitenet.net/programs/debconf/</ulink></para>
625 </footnote> This is not unlike RedHat's Kickstart although the
626 emphasis is on maintenance of those configuration changes over
627 the life and evolution of the package; Kickstart is focused
628 merely on installation of the package.</para>
630 <para>A third type of configuration is limited to changes in the
631 environment through which a system is run or installed. One is
632 example is Progeny's Anaconda-based Debian installer which
633 provides an alternate installer but results in an identical
634 system. Another example is the Knoppix project which is famous
635 for its "Live CD" environments. While, Knoppix makes a wide
636 range of invasive changes that span all items in my list
637 above, other Live CD projects, including Ubuntu's "Casper"
638 project, are much closer to an alternate shell through which
639 the same code is run.</para>
641 <para>Because these three methods are relatively non-invasive,
642 they are reasonable strategies for small teams and individuals
643 working on creating a derived distribution. However, many
644 desirable changes — and in the case of some derived
645 distributions, <emphasis>most</emphasis> desirable changes
646 — require more invasive techniques. The final and most
647 invasive type of change — changes to code — is the
648 most difficult but also the most promising and powerful if it
649 can be done sustainably. Changes of this type involve
650 bifurcations of the code-base and will be the topic of the
651 remainder of this paper.</para>
656 <title>Distributed Source Control</title>
658 <para>One promising method of maintaining deltas in forked or
659 branched projects lies in distributed version control systems
660 (VCS). Traditional VCS systems work in a highly centralized
661 fashion. CVS, the archetypal free software VCS and the basis
662 for many others, is based around the model of a single
663 centralized server. Anyone who wishes to commit to a project
664 must commit to the centralized repository. While CVS allows
665 users to create branches, anyone with commit rights has access
666 to the entire repository. The tools for branching and merging
667 over time are not particularly good.</para>
669 <para>The branching model is primarily geared toward a system
670 where development is bifurcated and then the branch is merged
671 completely back into the main tree. Normal use of a branch
672 might include creating a development branch, making a series
673 of development releases while maintaining and fixing important
674 bugs in the stable primary branch, and then ultimately
675 replacing the stable release with the development release. The
676 CVS model is <emphasis>not</emphasis> geared toward a system
677 where an arbitrary delta, or sets of deltas, are maintained
680 <para>Distributed version control aims to solve a number of
681 problems introduced by CVS and alluded to above by:</para>
685 <para>Allowing people to work disconnected from each other
686 and to sync with each other, in whole or in part, in an
687 arbitrary and ad-hoc fashion.</para>
690 <para>Allowing deltas to be maintained over time.</para>
694 <para>Ultimately, this requires tools that are better at merging
695 changes and in <emphasis>not</emphasis> merging certain
696 changes when that is the desired behavior. It also leads to tools capable
697 of history-sensitive merging.</para>
699 <para>The most famous switch to a distributed VCS model from a
700 centralized VCS model was the move by the Linux kernel
701 development community to the proprietary distributed version
702 control system BitKeeper. In his recent announcement of the
703 decision to part ways with BitKeeper, Linus Torvalds
707 <para>In fact, one impact BK has had is to very fundamentally
708 make us (and me in particular) change how we do things. That
709 ranges from the fine-grained changeset tracking to just how
710 I ended up trusting sub-maintainers with much bigger things,
711 and not having to work on a patch-by-patch basis any
712 more.<footnote> <para>The full message can be read online
714 url="http://kerneltrap.org/mailarchive/1/message/48393/thread">http://kerneltrap.org/mailarchive/1/message/48393/thread</ulink></para>
719 <para>At the time of the switch, free distributed version
720 control tools were less advanced than they are today. At the
721 moment, an incomplete list of free software VCS tools includes
722 GNU Arch, Bazaar, Bazaar-NG, Darcs, Monotone, SVK (based on
723 Subversion), GIT (a system developed by Linus Torvalds as a
724 temporary replacement for BitKeeper) and others.</para>
726 <para>Each of these tools, at least after they reach a certain
727 level of maturity, allow or will allow users to develop
728 software in a distributed fashion and to, over time, compare
729 their software and pull changes from others significantly more
730 easily than they could otherwise. The idea of parallel
731 development lies at the heart of the model, the tools for
732 merging and resolving conflicts over time, and the ability to
733 "cherry pick" certain patches or changes from a parallel
734 developer each make this type of development significantly
735 more useful than it has been in the past.</para>
737 <para>VCSs work entirely on the level of code. Due to the nature
738 of the types of changes that Ubuntu project is making to
739 Debian's code, Ubuntu has focused primarily on this model and
740 Canonical currently funds two major distributed control
741 products — the Bazaar and Bazaar-NG projects.</para>
743 <para>In many ways, employing distributed version control
744 effectively is a much easier problem to solve for small, more
745 traditional, free software development projects than it is for
746 GNU/Linux distributions. Because the problems with maintaining
747 parallel development of a single piece of software in a set of
748 related distributed repositories is the primary use case for
749 distributed version control systems, distributed VCS alone can
750 be a technical solution for certain types of parallel
751 development. As the tools and social processes for distributed
752 VCS evolve, they will become increasingly important tools in
753 the way that free software is developed.</para>
755 <para>Because the problems of scale associated with building an
756 entire derivative distribution are more complicated than those
757 associated with working with a single "upstream" project,
758 distributed version control is only now being actively
759 deployed in the Ubuntu project. In doing so, th project is
760 focusing on integrating these into problem specific tools
761 built on top of distributed version control.</para>
766 <title>Problem Specific Tools</title>
768 <para>Another technique that Canonical Ltd. is experimenting
769 with is the creation of high level tools built on top of
770 distributed version control tools specifically designed for
771 maintaining difference between packages. Because packages are
772 usually distributed as a source file with a collection of one
773 or more patches, this introduces the unique possibility of
774 creating a high-level VCS system based around this fact.</para>
776 <para>In the case of Ubuntu and Debian, the ideal tool creates
777 one branch per patch or feature and uses heuristics to
778 analyze patch files and create these branches
779 intelligently. The package build system section of the total
780 patch can also be kept as a separate branch. Canonical's tool,
781 called the Hypothetical Changeset Tool (HCT) (although no
782 longer hypothetical), is one experimental way of creating a
783 very simple, very streamlined interface for dealing with a
784 particular type of source that is created and distributed in a
785 particular type of way with a particular type of
788 <para>While HCT promises to be very useful for people making
789 derived distributions based on Debian, its application outside
790 distribution makers will, in all likelihood, be limited. That
791 said, it provides an example of the way that problem and
792 context specific tools may play an essential role in the
793 maintenance of derived code more generally.</para>
799 <title>Social Solutions</title>
801 <para>It has been said that it is a common folly of a
802 technophile to attempt to employ technical solutions toward
803 solving social problems. The problem of deriving software is
804 both a technical <emphasis>and</emphasis> a social problem and
805 adequately addressing the larger problems requires approaches that
806 take into consideration both types of solution.</para>
808 <para>Scott James Remnant compares the relationship between
809 distributions and derived distributions as not unlike the
810 relationship between distributions and upstream
814 <para>I don't think this is much different from how Debian
815 maintainers interact with their upstreams. As Debian
816 maintainers we take and package upstream software and then
817 act as a gateway for bugs and problems. Quite often we fix
818 bugs ourselves and apply the patch to the package and send
819 it upstream. Sometimes the upstream don't incorporate that
820 patch and we have to make sure we don't accidentally drop it
821 each subsequent release, we much prefer it if they take
822 them, but we don't get angry if they don't.</para>
824 <para>This is how I see the relationship between Ubuntu and
825 Debian, we're no more a fork of Debian than a Debian package
826 is a fork of its upstream.</para>
829 <para>Scott alludes the fact that, at least in the world of
830 distributions, parallel development is already one way to view
831 the <emphasis>modus operandi</emphasis> of existing GNU/Linux
832 distributions. The relationship between a deriver and derivee
833 on the distribution level mirrors the relationship between the
834 distribution and the "upstream" authors of the packages that
835 make up the distribution. These relationships are rarely based
836 around technological tools but are entirely in the realm of
837 social solutions.</para>
839 <para>Ubuntu has pursued a number of different initiatives along
840 these lines. The first of these has been to regularly file
841 bugs in the Debian bug tracking system when bugs that exist in
842 Debian are fixed in Ubuntu. While this can be partially
843 automated, the choice to automate this and the manner in which
844 it it is set up is a purely social one.</para>
846 <para>However, as I alluded to above, Ubuntu is still left with
847 questions in regards to changes that are made to packages that
848 do not necessarily fix bugs or that fix bugs that do not exist
849 in Debian but may in the future. Some Debian developers want
850 to hear about the full extent of changes made to their
851 software in Ubuntu while others do not want to be
852 bothered. Ubuntu should continue to work with Debian to find
853 ways to allow developers to stay in sync.</para>
855 <para>There are also several initiatives by developers in
856 Debian, Ubuntu, and in other derivations to create a
857 stronger relationship between the Debian project and its
858 ecosystem of derivers and between Ubuntu and Debian in
859 particular. While the form that this will ultimately take is
860 unclear, projects existing within an ecosystem should explore
861 the realm of appropriate social relationships that will ensure
862 that they can work together and be informed of each others'
863 work without resorting to "spamming" each other with
864 irrelevant or unnecessary information.</para>
866 <para>Another issue that has recently played an important role
867 in the Debian/Ubuntu relationship is the importance of both
868 giving adequate credit to the authors or upstream maintainers
869 of software without implying a closer relationship than is the
870 case. Derivers must walk a file line where they credit others'
871 work on a project without implying that the others work for,
872 support, or are connected to the derivers project to which, for
873 any number of reasons, the "upstream" author might not want to
874 be associated.</para>
876 <para>In the case of Debian and Ubuntu, this has resulted in an
877 emphasis on keeping or importing changelog entries when
878 changes are imported and in noting the pedigree of changes
879 more generally. It has recently also been discussed in terms
880 of the "maintainer" field in each package in Ubuntu. Ubuntu
881 wants to avoid making changes to every unmodified source
882 package (and introducing an unnecessary delta) but does not
883 want to give the impression that the maintainer of the package
884 is someone unassociated with Ubuntu. While no solution has
885 been decided at the time of writing, one idea involved marking
886 the maintainer of the package explicitly as a Debian
887 maintainer at the time that the binary packages are built on
888 the Ubuntu build machines.</para>
890 <para>The emphasis on social solutions is also essential when
891 using distributed VCS technology. As Linus Torvalds alluded to
892 in the quote above, the importance of technological changes to
893 distributed VCS technology is only felt when people begin to
894 work in a different way — when they begin to employ
895 different social models of developer interaction.</para>
897 <para>While Ubuntu's experience can provide a good model for
898 tackling some of these source control issues, it can only
899 serve as a model and not as a fixed answer. Social solutions
900 must be appropriate for a given social relationship. Even in
901 situations where a package is branched because of social
902 disagreements, a certain level of collaboration on a social
903 level will be essential to the long term viability of the
911 <title>Conclusions</title>
913 <para>As the techniques described in this paper evolve, the role
914 that they play in free software development becomes increasingly
915 prominent and increasingly important. Joining them will be other
916 techniques and models that I have not described and cannot
917 predict. Because of the size and usefulness of their code and
918 the size of their development communities, large projects like
919 Debian and Ubuntu have been forced into confronting and
920 attempting to mediate the problems inherent in forking and
921 deriving. However, as these problems are negotiated and tools
922 and processes are advanced toward solutions, free software
923 projects of all sizes will be able to offer users exactly what
924 they want with minimal redundancy and little duplication of
925 work. In doing this, free software will harness a power that
926 proprietary models cannot compete with. They will increase their
927 capacity to produce better products and better processes.
928 Ultimately, it will help free software capture more users, bring
929 in more developers, and produce more free software of a higher
937 <!-- Keep this comment at the end of the file
942 sgml-namecase-general:t
943 sgml-general-insert-case:lower
944 sgml-minimize-attributes:nil
945 sgml-always-quote-attributes:t
946 sgml-parent-document:nil
947 sgml-exposed-tags:nil
948 sgml-local-catalogs:nil
949 sgml-local-ecat-files:nil