1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
5 <article id="paper-11194">
7 <title>To Fork or Not To Fork</title>
8 <subtitle>Lessons From Ubuntu and Debian</subtitle>
10 <firstname>Benjamin</firstname>
11 <othername>Mako</othername>
12 <surname>Hill</surname>
14 <orgname>Canonical Limited</orgname>
17 <orgname>The Debian GNU/Linux Project</orgname>
20 <orgname>Software in the Public Interest, Inc.</orgname>
24 <para>Benjamin Mako Hill is an intellectual property
25 researcher and activist and a professional Free/Open Source
26 Software (FOSS) advocate and developer. He is active
27 participant in the Debian Project in both technical and
28 non-technical roles. He is the author of the Free Software
29 Project Management HOWTO and many published works on Free
30 and Open Source Software. He currently is working full time
31 for Canonical Ltd. on Ubuntu, a new Debian-based
39 <holder>Benjamin Mako Hill</holder>
44 <para>This material is licensed under the <ulink
45 url="http://creativecommons.org/licenses/by-sa/2.0/">Creative
46 Commons Attribution-Sharealike 2.0 License</ulink>.</para>
48 <para>The canonical location for the most recent version of this
49 document is <ulink url="http://mako.cc/">at the author's
50 website</ulink>.</para>
56 <revnumber>0.2</revnumber>
57 <date>August 7, 2005</date>
58 <revremark>Correction and improvements.</revremark>
61 <revnumber>0.1</revnumber>
62 <date>May 15, 2005</date>
65 <para>The first version of this paper was written to an
66 accepted talk given at Linuxtag 2005 given in Karlsruhe,
77 <title>Introduction</title>
79 <para>The explosive growth of free and open source software over
80 the last decade has been mirrored by an equally explosive growth
81 in the ambitiousness of free software projects in choosing and
82 tackling problems. The free software movement approaches these
83 large problems with more code and with more expansive
84 communities than was thinkable a decade ago. Example of these
85 massive projects include desktop environments — like GNOME
86 and KDE — and distributions like Debian, RedHat, and
89 <para>These projects are leveraging the work of thousands of
90 programmers — both volunteer and paid — and are
91 producing millions of lines of code. Their software is being
92 used by millions of users with diverse sets of needs. This
93 paper focuses on two major effects of this situation:</para>
98 <para>The communities that free software projects — and
99 in particular large projects — serve are increasingly
100 diverse. It is becoming increasingly difficult for a single
101 large project to release any single product that can cater
102 to all of its potential users.</para>
107 <para>It's becoming increasingly difficult to reproduce these
108 large projects. While reproducing entire project is
109 impossible for small groups of hackers, it is often not even
110 possible for small groups to even track and maintain a fork
111 of a large project over time.</para>
116 <para>Taken together, these facts imply an increasingly realized
117 free software community in which programmers frequently derive
118 but where traditional forking is often untenable. "Forks," as
119 they are traditionally defined, must be improved upon.
120 Communities around large free software projects must be smarter
121 about the process of derivation than they have been in the
124 <para>We are already seeing this with GNU/Linux distributions. New
125 distributions are rarely built from scratch today. Instead, they
126 adapted from and built on top of the work of existing projects.
127 As projects and user-bases grow, these derived distributions are
128 increasingly common. Most of what I describe in this essay are
129 tools and experiences of derived distributions.</para>
131 <para>Software makers must pursue the idea of an
132 <emphasis>ecosystem</emphasis> of free software projects and
133 products that have forked but that maintain a close relationship
134 as they develop parallelly and symbiotically. To do this,
135 developers should:</para>
139 <para>Break down the process of derivation into a set of
140 different types of customization and derivation and
141 prioritize methods of derivation.</para>
144 <para>Create and foster social solutions to the social aspects
145 of the derivation problem.</para>
148 <para>Build and use new tools specifically designed to
149 coordinate development of software in the context of an
150 ecosystem of projects.</para>
153 <para>Distribute and utilize distributed version control tools
154 with an emphasis on maintaining differences over
159 <para>This paper is an early analysis of this set of problems. As
160 such, it is highly focused on the experience of the Ubuntu
161 project and its existence as a derived Debian distribution. It
162 also pulls from my experience with Debian-NP and the Custom
163 Debian Distribution (CDD) community. Since I participate in both
164 the Ubuntu and CDD projects, these are areas that I can discuss
165 with some degree of knowledge and experience.</para>
169 <title>"Fork" Is A Four Letter Word</title>
171 <para>The act of taking the code for a free software project and
172 bifurcating it to create a new project is called "forking."
173 There have been a number of famous forks in free software
174 history. One of the most famous was the schism that led to the
175 parallel development of two versions of the Emacs text editor:
176 GNU Emacs and XEmacs. This schism persists to this day.</para>
178 <para>Some forks, like Emacs and XEmacs, are permanent. Others are
179 relatively short lived. An example of this is the GCC project
180 which saw two forks — EGCS and PGCC — that both
181 eventually merged back into GCC. Forking can happen for any
182 number of reasons. Often developers on a project develop
183 political or personal differences that keep them from continuing
184 to work together. In some cases, maintainers become unresponsive
185 and other developers fork to keep the software alive.</para>
187 <para>Ultimately though, most forks occur because people do not
188 agree on the features, the mechanisms, or the technology at the
189 core of a project. People have different goals, different
190 problems, and want different tools. Often, these goals, problems
191 and tools are similar up until a certain point before the need
192 to part ways becomes essential.</para>
194 <para>A fork occurs on the level of code but a fork is not merely
195 — or even primarily — technical. Many projects create
196 "branches." Branches are alternative versions of a piece of
197 software used to experiment with intrusive or unstable features
198 and fixes. Forks are distinguished from branches both in
199 that they are often more significant departures from a technical
200 perspective (i.e., more lines of code have been changed and/or
201 the changes are more invasive or represent a more fundamental
202 rethinking of the problem) and in that they are bifurcations
203 defined in social and political terms. Branches involve a
204 <emphasis>single</emphasis> developer or community of developers
205 — even if it does boil down to distinct subgroups within a
206 community — whereas forks are separate projects.</para>
208 <para>Forking has historically been viewed as a bad thing in free
209 software communities: they are seen to stem from people's
210 inability to work together and have ended in reproduction of
211 work. When I published the first version of the <ulink
212 url="http://mako.cc/projects/howto/">Free Software Project
213 Management HOWTO</ulink> more than four years ago, I included
214 a small subsection on forking which described the concept to
215 future free software project leaders with this text:</para>
218 <para>The short version of the fork section is, don't do them.
219 Forks force developers to choose one project to work with,
220 cause nasty political divisions, and redundancy of
224 <para>In the <emphasis>best</emphasis> situations, a fork means
225 that two groups of people need to go on developing features and
226 doing work they would ordinarily do <emphasis>in addition
227 to</emphasis> tracking the forked project and having to
228 hand-select and apply features and fixes to their own code-base.
229 This level of monitoring and constant comparison can be
230 extremely difficult and time-consuming. The situation is not
231 helped substantially by traditional source control tools like
232 diff, patch, CVS and Subversion which are not optimized for this
233 task. The worse (and much more common) situation occurs when two
234 groups go about their work ignorant or partially ignorant of the
235 code being cut on the other side of the fork. Important features
236 and fixes are implemented twice — differently and
239 <para>The most substantial bright side to these drawbacks is that
240 the problems associated with forking are so severe and notorious
241 that, in most cases, the threat of a fork is enough to force
242 maintainers to work out solutions that keep the fork from
243 happening in the first place.</para>
245 <para>Finally, it is worth pointing out that fork is something of
246 a contested term. Because definitions of forks involve, to one
247 degree or another, statements about the political, organization,
248 and technical distinctions between projects, bifurcations that
249 many people call branches or parallel trees are described by
250 others as forks. Recently, fueled by the advent of distributed
251 version control systems, the definition of what is and is not a
252 fork has become increasingly unclear. In part due to the same
253 systems, the benefits and drawbacks of what is increasingly
254 problematically called forking is equally debatable.</para>
259 <title>Case Study</title>
261 <para>In my introduction, I described how the growing scope of
262 free software projects and the rapidly increasingly size and
263 diversity of user communities is spearheading the need for new
264 type of derivation that avoids, as best as possible, the
265 drawbacks of forking. Nowhere is this more evident than in the
266 largest projects with the broadest scope: a small group of
267 projects that includes operating system distributions.</para>
271 <title>The Debian Project</title>
273 <para>The Debian project is by many counts the largest free
274 software distribution in terms of code. It is the also,
275 arguably, the largest free software project in terms of the
276 number of volunteers. Debian includes more than 15,000
277 packages and the work of well over 1,000 official volunteers
278 and many more contributors without official membership.
279 Projects without Debian's massive volunteer base cannot
280 replicate what Debian has accomplished; they can rarely hope
281 to even maintain what Debian has produced.</para>
283 <para>At the time that this paper was written, Distrowatch lists
284 129 distributions based on Debian<footnote>
285 <para>Information is listed on the distrowatch homepage
287 url="http://distrowatch.com/dwres.php?resource=independence">http://distrowatch.com/dwres.php?resource=independence</ulink></para>
289 </footnote> — most of them
290 are currently active to varying degrees. Each distribution
291 represents at least one person — and in most cases a
292 community of people — who disagreed with Debian's vision
293 or direction strongly enough to want to create a new
294 distribution <emphasis>and</emphasis> who had the technical
295 capacity to follow through with this goal. Despite Debian's
296 long-standing slogan — "the universal operating system"
298 that the Debian project has become the fastest growing
299 operating system while spawning so many derivatives is
300 testament to the fact that, as far as software is concerned,
301 one size <emphasis>can not</emphasis> fit all.<footnote>
302 <para>Netcraft posts yearly updates on the speed at which
303 Linux distributions are growing. The one in question can be
305 url="http://news.netcraft.com/archives/2004/01/28/debian_fastest_growing_linux_distribution.html">http://news.netcraft.com/archives/2004/01/28/debian_fastest_growing_linux_distribution.html</ulink></para>
310 <para>Organizationally, Debian derivers are located both inside
311 and outside of the Debian project. A group of derivers working
312 within the Debian project has labeled themselves "Custom
313 Debian Distributions" and has created nearly a dozen projects
314 customizing and deriving from Debian for specific groups of
315 users including non-profit organization, the medical
316 community, lawyers, children and many others.<footnote>
317 <para>I spearheaded and help build a now mostly defunct
318 derivation of Debian called Debian-Nonprofit (Debian-NP)
319 geared for non-profit organizations by working within the
320 Debian project.</para>
321 </footnote> These projects build on the core Debian distribution and
322 the canonical archive from <emphasis>within</emphasis> the
323 organizational and political limits of the Debian project and
324 constantly seek to minimize the delta by focusing on less
325 invasive changes and by advancing creative ways of building
326 the <emphasis>ability</emphasis> to alter the core
327 Debian code base through established and policy compliant
330 <!-- http://linktocddinformation -->
332 <para>A second group of Debian customizers includes those
333 working outside of the Debian project organizationally.
334 Notable among this list are (in alphabetical order) Knoppix,
335 Libranet, Linspire (formerly Lindows), Progeny, MEPIS, Ubuntu,
336 Userlinux, and Xandros. With its strong technological base,
337 excellent package management, wide selection of packages to
338 choose from, and strong commitment to software freedom which
339 ensures derivability, Debian provides an ideal point from
340 which to create a GNU/Linux distribution.</para>
346 <title>Ubuntu</title>
348 <para>The Ubuntu project was started by Mark Shuttleworth in
349 April 2004 and the first version was built almost entirely
350 by a small group of a Debian developers employed by Shuttleworth's
351 company Canonical Limited.<footnote>
352 <para>Information Ubuntu can be found on the <ulink
353 url="http://www.ubuntu.com">Ubuntu homepage.</ulink>
354 Information Canonical Limited can be found at <ulink
355 url="http://www.canonical.com">Canonical's
356 homepage</ulink>.</para>
357 </footnote> It was released to the world in late 2004.
358 The second version was released six months later in April
359 2005. The goals of Ubuntu are to provide a distribution based
360 on a subset of Debian with:</para>
364 <para>Regular and predictable releases — every six months
365 with support for eighteen months.</para>
368 <para>An emphasis on free software that will maintain the
369 derivability of the distribution.</para>
372 <para>An emphasis on usability and a consistent desktop
373 vision. As an example, this has translated into less
374 questions in the installer and a default selection and
375 configuration of packages that is usable for most desktop
376 users "out of the box."</para>
381 <para>The Ubuntu project provides an interesting example of a
382 project that aims to derive from Debian to an extensive
383 degree. Ubuntu made code-level changes to nearly 1300 packages
384 in Debian at the time that this paper was written and the
385 speed of changes will not decelerate with time; the total
386 number of changes and the total size of the delta will
388 <para>Scott James Remnant maintains a list of these patches
390 url="http://people.ubuntu.com/~scott/patches/">http://people.ubuntu.com/~scott/patches/</ulink></para>
391 </footnote> The changes that Ubuntu makes are primarily of the
392 most intrusive kind — changes to the code itself.</para>
394 <para>That said, the Ubuntu project is explicit about the fact
395 that it could not exist without the work done by the Debian
397 <para>You can see that explicit statement on Ubuntu's
399 url="http://www.ubuntulinux.org/ubuntu/relationship/">http://www.ubuntulinux.org/ubuntu/relationship/</ulink></para>
400 </footnote> More importantly, Ubuntu explains that it cannot
401 continue to provide the complete set of packages that its
402 users depend on without the ongoing work by the Debian
403 project. Even though Ubuntu has made changes to the nearly
404 1300 packages, this is less than ten percent of the total
405 packages shipped in Ubuntu and pulled from Debian.</para>
407 <para>Scott James Remnant, a prominent Debian developer and a
408 hacker on Ubuntu who works for Canonical Ltd., described the
409 situation this way on his web log to introduce the Ubuntu
410 development methodology in the week after the first public
411 announcement of Canonical and Ubuntu:<footnote> <para>The
412 entire post can be read here: <ulink
413 url="http://www.netsplit.com/blog/work/canonical/ubuntu_and_debian.html">http://www.netsplit.com/blog/work/canonical/ubuntu_and_debian.html</ulink></para>
419 <para>I don't think Ubuntu is a "fork" of Debian, at least not
420 in the traditional sense. A fork suggests that at some
421 point we go our separate way from Debian and then
422 occasionally merge in changes as we carry on down our own
425 <para>Our model is quite different; every six months we take a
426 snapshot of Debian's unstable distribution, apply any
427 outstanding patches from our last release to it and spend a
428 couple of months testing and bug-fixing it.</para>
434 <imagedata fileref="tfontf-picture-01.png" format="PNG"/>
439 <para>One thing that should be obvious from this is that our
440 job is a lot easier if Debian takes all of our changes. The
441 model actually encourages us to give back to
444 <para>That's why from the very first day we started fixing
445 bugs we began sending <ulink
446 url="http://www.no-name-yet.com/patches/">the
447 patches</ulink> back to Debian through the BTS. Not only
448 will it make our job so much easier when we come to freeze
449 for "hoary", our next release, but it's exactly what every
450 derivative should do in the first place.</para>
454 <para>There is some debate on the degree to which Ubuntu
455 developers have succeeded in accomplishing the goals laid out
456 by Remnant. Ubuntu has filed hundreds of patches in the bug
457 tracking system but it has also run into problems in deciding
458 <emphasis>what</emphasis> constitutes something that should be
459 fed back to Debian. Many changes are simply not relevant to
460 Debian developers. For example, they may include changes to a
461 package in response to another change made in another package
462 in Ubuntu that will not or has not been taken by Debian. In
463 many other cases, the best action in regards to a particular
464 change, a particular package, and a particular upstream Debian
465 developer is simply unclear.</para>
467 <para>The Ubuntu project's track record in working
468 constructively with Debian is, at the moment, a mixed one.
469 While an increasingly large number of Debian developers are
470 maintaining their packages actively within both projects, many
471 in both Debian and Ubuntu feel that Ubuntu has work left to do
472 in living up to its own goal of a completely smooth productive
473 relationship with Debian.</para>
475 <para>That said, the importance of the goals described by
476 Remnant in the context of of the Ubuntu development model
477 cannot be overstated. Every line of delta between Debian and
478 Ubuntu has a cost for Ubuntu developers. Technology, social
479 practices, and wise choices may reduce that cost but it cannot
480 eliminate it. The resources that Ubuntu can bring to bear upon
481 the problem of building a distribution are limited — far
482 more limited than Debian's. As a result, there is a limit to
483 how far Ubuntu can diverge; it is always in Ubuntu's advantage
484 to minimize the delta where possible.</para>
489 <title>Applicability</title>
491 <para>Ubuntu and Debian are distributions and — as such
492 — operate on a different scale than the vast majority of
493 free software projects. They include more code and more
494 people. As a result, there are questions as to whether the
495 experiences and lessons learned from these projects are
496 particularly applicable to the experience of smaller free
497 software projects.</para>
499 <para>Clearly, because of the difficulties associated with
500 forking massive amount of code and the problems associated
501 with duplicating the work of large volunteer bases,
502 distributions are forced into finding a way to balance the
503 benefits and drawbacks of forking. However, while the need is
504 stronger and more immediate in larger projects, the benefits
505 of their solutions will often be fully transferable.</para>
507 <para>Clearly, modifiability of free software to better fit the
508 needs of its users lies at the heart of the free software
509 movement's success. However, while modification usually comes
510 in the form of collaboration on a single code-base, this is
511 a function of limitations in software development methodologies
512 and tools rather than the best response to the needs or
513 desires of users or developers.</para>
515 <para>I believe that the fundamental advantage of free software
516 in the next decade will be in the growing ability of any
517 single free software project to be multiple things to multiple
518 users simultaneously. This will translate into the fact that,
519 in the next ten years, technology and social processes will
520 evolve, so that forking is increasingly less of a bad thing.
521 Free software development methodology will become less
522 dependent on a single project and begin to emphasize parallel
523 development within an ecosystem of related projects. The
524 result is that free software projects will gain a competitive
525 advantage over propriety software projects through their
526 ability to better serve the increasingly diverse needs of
527 increasingly large and increasingly diverse user-bases.
528 Although it sounds paradoxical today, more projects will
529 derive and less redundant code will be written.</para>
531 <para>Projects more limited in code and scope may use the tools
532 and methods described in the remainder of this paper in
533 different combinations, in different ways, and to different
534 degrees than the examples around distributions introduced
535 here. Different projects with different needs will find that
536 certain solutions work better than others. Because communities
537 of the size of Debian are difficult to fork in a way that is
538 beneficial to any party, it is in these communities that the
539 technology and development methodologies are first
540 emerging. With time, these strategies and tools will find
541 themselves employed productively in a wide variety of projects
542 with a broad spectrum of sizes, needs, scopes and
550 <title>Balancing Forking With Collaboration</title>
553 <title>Derivation and Problem Analysis</title>
555 <para>The easiest step in creating a productive derivative
556 software project is to break down the problems of derivations
557 into a series of different classes of modification. Certain
558 types of modification are more easily done and are
559 intrinsically more maintainable.</para>
561 <para>In the context of distributions, the problem of derivation
562 can be broken down into the following types of changes (sorted
563 roughly according to the intrusiveness inherent in solving the
564 problem and the severity of the long-term maintainability
565 problems that they introduce):</para>
569 <para>Selection of individual pieces of software;</para>
572 <para>Changes to the way that packages are installed or run
573 (e.g., in a Live CD type environment or using a different
577 <para>Configuration of different pieces of software;</para>
580 <para>Changes made to the actual software package (made on
581 the level of changes to the packages code);</para>
585 <para>By breaking down the problem in this way, Debian derivers
586 have been able to approach derivation in ways that focus
587 energy on the less intrusive problems first.</para>
589 <para>The first area that Ubuntu focused on was selecting a
590 subset of packages that Ubuntu would support. Ubuntu selected
591 and supports approximate 2,000 packages. These became the
592 <command>main</command> component in Ubuntu. Other packages in
593 Debian were included in a separate section of the Ubuntu
594 archive called <command>universe</command> but were not
595 guaranteed to be supported with bug or security fixes. By
596 focusing on a small subset of packages, the Ubuntu team was
597 able to select a maintainable subsection of the Debian archive
598 that they could maintain over time.</para>
600 <para>The most simple derived distributions — often
601 working within the Debian project as CDDs but also including
602 projects like Userlinux — are merely lists of packages
603 and do nothing outside of package selection. The installation
604 of lists of packages and the maintenance of those lists over
605 time can be aided through the creation of what are called
606 <emphasis>metapackages</emphasis>: empty packages with long
607 lists of "dependencies."</para>
609 <para>The second item, configuration changes, is also
610 relatively low-impact. Focusing on moving as many changes as
611 possible into the realm of configuration changes is a
612 sustainable strategy that derivers working within the Debian
613 project intent on a single code-base have pursued actively.
614 Their idea is that rather than forking a piece of code due to
615 disagreement in how the program should work, they can leave
616 the code intact but add the <emphasis>ability</emphasis> to
617 work in a different way to the software. This alternate
618 functionality is made toggleable through a configuration
619 change in the same manner that applications are configured
620 through questions asked at install time. Since the Debian
621 project has a unified package configuration framework called
622 Debconf, derivers are able to configure an entire system in a
623 highly centralized manner.<footnote> <para>More information on
625 found online at: <ulink
626 url="http://www.kitenet.net/programs/debconf/">http://www.kitenet.net/programs/debconf/</ulink></para>
627 </footnote> This is not unlike RedHat's Kickstart although the
628 emphasis is on maintenance of those configuration changes over
629 the life and evolution of the package; Kickstart is focused
630 merely on installation of the package.</para>
632 <para>A third type of configuration is limited to changes in the
633 environment through which a system is run or installed. One is
634 example is Progeny's Anaconda-based Debian installer which
635 provides an alternate installer but results in an identical
636 system. Another example is the Knoppix project which is famous
637 for its "Live CD" environments. While, Knoppix makes a wide
638 range of invasive changes that span all items in my list
639 above, other Live CD projects, including Ubuntu's "Casper"
640 project, are much closer to an alternate shell through which
641 the same code is run.</para>
643 <para>Because these three methods are relatively non-invasive,
644 they are reasonable strategies for small teams and individuals
645 working on creating a derived distribution. However, many
646 desirable changes — and in the case of some derived
647 distributions, <emphasis>most</emphasis> desirable changes
648 — require more invasive techniques. The final and most
649 invasive type of change — changes to code — is the
650 most difficult but also the most promising and powerful if it
651 can be done sustainably. Changes of this type involve
652 bifurcations of the code-base and will be the topic of the
653 remainder of this paper.</para>
658 <title>Distributed Source Control</title>
660 <para>One promising method of maintaining deltas in forked or
661 branched projects lies in distributed version control systems
662 (VCS). Traditional VCS systems work in a highly centralized
663 fashion. CVS, the archetypal free software VCS and the basis
664 for many others, is based around the model of a single
665 centralized server. Anyone who wishes to commit to a project
666 must commit to the centralized repository. While CVS allows
667 users to create branches, anyone with commit rights has access
668 to the entire repository. The tools for branching and merging
669 over time are not particularly good.</para>
671 <para>The branching model is primarily geared toward a system
672 where development is bifurcated and then the branch is merged
673 completely back into the main tree. Normal use of a branch
674 might include creating a development branch, making a series
675 of development releases while maintaining and fixing important
676 bugs in the stable primary branch, and then ultimately
677 replacing the stable release with the development release. The
678 CVS model is <emphasis>not</emphasis> geared toward a system
679 where an arbitrary delta, or sets of deltas, are maintained
682 <para>Distributed version control aims to solve a number of
683 problems introduced by CVS and alluded to above by:</para>
687 <para>Allowing people to work disconnected from each other
688 and to sync with each other, in whole or in part, in an
689 arbitrary and ad-hoc fashion.</para>
692 <para>Allowing deltas to be maintained over time.</para>
696 <para>Ultimately, this requires tools that are better at merging
697 changes and in <emphasis>not</emphasis> merging certain
698 changes when that is the desired behavior. It also leads to tools capable
699 of history-sensitive merging.</para>
701 <para>The most famous switch to a distributed VCS model from a
702 centralized VCS model was the move by the Linux kernel
703 development community to the proprietary distributed version
704 control system BitKeeper. In his recent announcement of the
705 decision to part ways with BitKeeper, Linus Torvalds
709 <para>In fact, one impact BK has had is to very fundamentally
710 make us (and me in particular) change how we do things. That
711 ranges from the fine-grained changeset tracking to just how
712 I ended up trusting sub-maintainers with much bigger things,
713 and not having to work on a patch-by-patch basis any
714 more.<footnote> <para>The full message can be read online
716 url="http://kerneltrap.org/mailarchive/1/message/48393/thread">http://kerneltrap.org/mailarchive/1/message/48393/thread</ulink></para>
721 <para>At the time of the switch, free distributed version
722 control tools were less advanced than they are today. At the
723 moment, an incomplete list of free software VCS tools includes
724 GNU Arch, Bazaar, Bazaar-NG, Darcs, Monotone, SVK (based on
725 Subversion), GIT (a system developed by Linus Torvalds as a
726 replacement for BitKeeper) and others.</para>
728 <para>Each of these tools, at least after they reach a certain
729 level of maturity, allow or will allow users to develop
730 software in a distributed fashion and to, over time, compare
731 their software and pull changes from others significantly more
732 easily than they could otherwise. The idea of parallel
733 development lies at the heart of the model. The tools for
734 merging and resolving conflicts over time, and the ability to
735 "cherry pick" certain patches or changes from a parallel
736 developer each make this type of development significantly
737 more useful than it has been in the past.</para>
739 <para>VCSs work entirely on the level of code. Due to the nature
740 of the types of changes that Ubuntu project is making to
741 Debian's code, Ubuntu has focused primarily on this model and
742 Canonical currently funds two major distributed control
743 products — the Bazaar and Bazaar-NG projects.</para>
745 <para>In many ways, employing distributed version control
746 effectively is a much easier problem to solve for small, more
747 traditional, free software development projects than it is for
748 GNU/Linux distributions. Because the problems associated with
749 maintaining parallel development of a single piece of software
750 in a set of related distributed repositories is the primary
751 use case for distributed version control systems, distributed
752 VCS alone can be a technical solution for certain types of
753 parallel development. As the tools and social processes for
754 distributed VCS evolve, they will become increasingly
755 important tools in the way that free software is
758 <para>Because the problems of scale associated with building an
759 entire derivative distribution are more complicated than those
760 associated with working with a single "upstream" project,
761 distributed version control is only now being actively
762 deployed in the Ubuntu project. In doing so, the project is
763 focusing on integrating these into problem specific tools
764 built on top of distributed version control.</para>
769 <title>Problem Specific Tools</title>
771 <para>Another technique that Canonical Ltd. is experimenting
772 with is the creation of high level tools built on top of
773 distributed version control tools specifically designed for
774 maintaining difference between packages. Because packages are
775 usually distributed as a source file with a collection of one
776 or more patches, this introduces the unique possibility of
777 creating a high-level VCS system based around this fact.</para>
779 <para>In the case of Ubuntu and Debian, the ideal tool creates
780 one branch per patch or feature and uses heuristics to
781 analyze patch files and create these branches
782 intelligently. The package build system section of the total
783 patch can also be kept as a separate branch. Canonical's tool,
784 called the Hypothetical Changeset Tool (HCT) (although no
785 longer hypothetical), is one experimental way of creating a
786 very simple, very streamlined interface for dealing with a
787 particular type of source that is created and distributed in a
788 particular type of way with a particular type of
791 <para>While HCT promises to be very useful for people making
792 derived distributions based on Debian, its application outside
793 distribution makers will, in all likelihood, be limited. That
794 said, it provides an example of the way that problem and
795 context specific tools may play an essential role in the
796 maintenance of derived code more generally.</para>
802 <title>Social Solutions</title>
804 <para>It has been said that it is a common folly of a
805 technophile to attempt to employ technical solutions toward
806 solving social problems. The problem of deriving software is
807 both a technical <emphasis>and</emphasis> social problem and
808 adequately addressing the larger problems requires approaches that
809 take into consideration both types of solution.</para>
811 <para>Scott James Remnant compares the relationship between
812 distributions and derived distributions as similar to the
813 relationship between distributions and upstream
817 <para>I don't think this is much different from how Debian
818 maintainers interact with their upstreams. As Debian
819 maintainers we take and package upstream software and then
820 act as a gateway for bugs and problems. Quite often we fix
821 bugs ourselves and apply the patch to the package and send
822 it upstream. Sometimes the upstream don't incorporate that
823 patch and we have to make sure we don't accidentally drop it
824 each subsequent release, we much prefer it if they take
825 them, but we don't get angry if they don't.</para>
827 <para>This is how I see the relationship between Ubuntu and
828 Debian, we're no more a fork of Debian than a Debian package
829 is a fork of its upstream.</para>
832 <para>Scott alludes the fact that, at least in the world of
833 distributions, parallel development is already one way to view
834 the <emphasis>modus operandi</emphasis> of existing GNU/Linux
835 distributions. The relationship between a deriver and derivee
836 on the distribution level mirrors the relationship between the
837 distribution and the "upstream" authors of the packages that
838 make up the distribution. These relationships are rarely based
839 around technological tools but are entirely in the realm of
840 social solutions.</para>
842 <para>Ubuntu has pursued a number of different initiatives along
843 these lines. The first of these has been to regularly file
844 bugs in the Debian bug tracking system when bugs that exist in
845 Debian are fixed in Ubuntu. While this can be partially
846 automated, the choice to automate this and the manner in which
847 it it is set up is a purely social one.</para>
849 <para>However, as I alluded to above, Ubuntu is still left with
850 questions in regards to changes that are made to packages that
851 do not necessarily fix bugs or that fix bugs that do not exist
852 in Debian but may in the future. Some Debian developers want
853 to hear about the full extent of changes made to their
854 software in Ubuntu while others do not want to be
855 bothered. Ubuntu should continue to work with Debian to find
856 ways to allow developers to stay in sync.</para>
858 <para>There are also several initiatives by developers in
859 Debian, Ubuntu, and in other derivations to create a
860 stronger relationship between the Debian project and its
861 ecosystem of derivers and between Ubuntu and Debian in
862 particular. While the form that this will ultimately take is
863 unclear, projects existing within an ecosystem should explore
864 the realm of appropriate social relationships that will ensure
865 that they can work together and be informed of each others'
866 work without resorting to "spamming" each other with
867 irrelevant or unnecessary information.</para>
869 <para>Another issue that has recently played an important role
870 in the Debian/Ubuntu relationship is the importance of both
871 giving adequate credit to the authors or upstream maintainers
872 of software without implying a closer relationship than is the
873 case. Derivers must walk a file line where they credit others'
874 work on a project without implying that the others work for,
875 support, or are connected to the derivers project to which, for
876 any number of reasons, the "upstream" author might not want to
877 be associated.</para>
879 <para>In the case of Debian and Ubuntu, this has resulted in an
880 emphasis on keeping or importing changelog entries when
881 changes are imported and in noting the pedigree of changes
882 more generally. It has recently also been discussed in terms
883 of the "maintainer" field in each package in Ubuntu. Ubuntu
884 wants to avoid making changes to every unmodified source
885 package (and introducing an unnecessary delta) but does not
886 want to give the impression that the maintainer of the package
887 is someone unassociated with Ubuntu. While no solution has
888 been decided at the time of writing, one idea involved marking
889 the maintainer of the package explicitly as a Debian
890 maintainer at the time that the binary packages are built on
891 the Ubuntu build machines.</para>
893 <para>The emphasis on social solutions is also essential when
894 using distributed VCS technology. As Linus Torvalds alluded to
895 in the quote above, the importance of technological changes to
896 distributed VCS technology is only felt when people begin to
897 work in a different way — when they begin to employ
898 different social models of developer interaction.</para>
900 <para>While Ubuntu's experience can provide a good model for
901 tackling some of these source control issues, it can only
902 serve as a model and not as a fixed answer. Social solutions
903 must be appropriate for a given social relationship. Even in
904 situations where a package is branched because of social
905 disagreements, a certain level of collaboration on a social
906 level will be essential to the long term viability of the
914 <title>Conclusions</title>
916 <para>As the techniques described in this paper evolve, the role
917 that they play in free software development becomes increasingly
918 prominent and increasingly important. Joining them will be other
919 techniques and models that I have not described and cannot
920 predict. Because of the size and usefulness of their code and
921 the size of their development communities, large projects like
922 Debian and Ubuntu have been forced into confronting and
923 attempting to mediate the problems inherent in forking and
924 deriving. However, as these problems are negotiated and tools
925 and processes are advanced toward solutions, free software
926 projects of all sizes will be able to offer users exactly what
927 they want with minimal redundancy and little duplication of
928 work. In doing this, free software will harness a power that
929 proprietary models cannot compete with. They will increase their
930 capacity to produce better products and better processes.
931 Ultimately, it will help free software capture more users, bring
932 in more developers, and produce more free software of a higher
940 <!-- Keep this comment at the end of the file
945 sgml-namecase-general:t
946 sgml-general-insert-case:lower
947 sgml-minimize-attributes:nil
948 sgml-always-quote-attributes:t
949 sgml-parent-document:nil
950 sgml-exposed-tags:nil
951 sgml-local-catalogs:nil
952 sgml-local-ecat-files:nil