1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
5 <article id="paper-11194">
7 <title>To Fork or Not To Fork</title>
8 <subtitle>Lessons From Ubuntu and Debian</subtitle>
10 <firstname>Benjamin</firstname>
11 <othername>Mako</othername>
12 <surname>Hill</surname>
14 <orgname>Canonical Limited</orgname>
17 <orgname>The Debian GNU/Linux Project</orgname>
20 <orgname>Software in the Public Interest, Inc.</orgname>
24 <para>Benjamin Mako Hill is an intellectual property
25 researcher and activist and a professional Free/Open Source
26 Software (FOSS) advocate and developer. He is active
27 participant in the Debian Project in both technical and
28 non-technical roles. He is the author of the Free Software
29 Project Management HOWTO and many published works on Free
30 and Open Source Software. He currently is working full time
31 for Canonical Ltd. on Ubuntu, a new Debian-based
39 <holder>Benjamin Mako Hill</holder>
44 <title>Introduction</title>
46 <para>The explosive growth of free and open source software over
47 the last decade has been mirrored by an equally explosive growth
48 in the ambitiousness of free software projects in choosing and
49 tackling problems. The free software movement approaches these
50 large problems with more code and with more expansive
51 communities than was even thinkable a decade ago. Example of
52 these massive projects include desktop environments — like
53 GNOME and KDE — and distributions like Debian.</para>
55 <para>These projects are leveraging the work of thousands of
56 programmers — both volunteer and paid — and are
57 producing millions of lines of code. Their software is being
58 used by millions of users with a diverse set of needs. This
59 paper focuses on two major effects of this situation:</para>
64 <para>The communities that free software projects — and
65 in particular large projects — serve are increasingly
66 diverse. It is becoming increasingly difficult for a single
67 large project to release any single product that can cater
68 to all of its potential users.</para>
73 <para>It's becoming increasingly difficult to reproduce these
74 large projects. While reproducing entire project is
75 impossible for small groups of hackers, it is often not
76 substantially easier for small groups to even track and
77 maintain a fork of a large project over time.</para>
82 <para>Taken together, these facts imply an increasingly realized
83 free software community in which programmers frequently derive
84 but where traditional forking is often untenable. "Forks," as
85 they are traditionally defined, must be improved upon.
86 Communities around large free software projects must be smarter
87 about the process of derivation than they have been in the
90 <para>We are already seeing this with GNU/Linux distributions. New
91 distributions are rarely built from scratch today. Instead, they
92 adapted from and built on top of the work of existing projects.
93 As projects and user-bases grow, these derived distributions are
94 increasingly common. Most of what I describe in this essay are
95 tools and experiences of derived distributions.</para>
97 <para>Software makers must pursue the idea of an
98 <emphasis>ecosystem</emphasis> of free software projects and
99 products that have forked but that maintain a close relationship
100 as they develop parallelly and symbiotically. To do this,
101 developers should:</para>
105 <para>Break down the process of derivation into a set of
106 different types of customization and derivation and
107 prioritize methods of derivation.</para>
110 <para>Create and foster social solutions to the social aspects
111 of the derivation problem.</para>
114 <para>Build and use new tools specifically designed to
115 coordinate development of software in the context of an
116 ecosystem of projects.</para>
119 <para>Distribute and utilize distributed version control tools
120 with an emphasis on maintaining differences over
125 <para>This paper is an early analysis of this set of problems. As
126 such, it is highly focused on the experience of the Ubuntu
127 project and it's existence as a derived Debian distribution. It
128 also pulls from my experience with Debian-NP and the Custom
129 Debian Distribution (CDD) community. Since I participate in both
130 the Ubuntu and CDD projects, these are areas that I can discuss
131 with some degree of knowledge and experience.</para>
135 <title>"Fork" Is A Four Letter Word</title>
137 <para>The act of taking the code for a free software project and
138 bifurcating it to create a new project is called "forking."
139 There have been a number of famous forks in free software
140 history. One of the most famous was the schism that led to the
141 parallel development of two versions of the Emacs text editor:
142 GNU Emacs and XEmacs. This schism persists to this day.</para>
144 <para>Some forks, like Emacs and XEmacs, are permanent. Others are
145 relatively short lived. An example of this is the GCC project
146 which saw two forks — EGCS and PGCC — that both eventually
147 merged back into GCC. Forking can happen for any number of
148 reasons. Often developers on a project develop political or
149 personal differences that keep them from continuing to work
150 together. In some cases, maintainers become unresponsive and
151 other developers on the project fork the project to keep the
152 project alive.</para>
154 <para>Ultimately though, most forks occur because people do not
155 agree on the features, the mechanisms, or the technology at the
156 core of a project. People have different goals, different
157 problems, and want different tools. Often, these goals, problems
158 and tools are similar up until a certain point before the need
159 to part ways becomes essential.</para>
161 <para>A fork occurs on the level of code but a fork is not merely
162 — or even primarily — technical. Many projects create
163 "branches." Branches are alternative version of a piece of
164 software used to experiment with intrusive or unstable features
165 and fixes. Forks are distinguished from branches both in
166 that they are often more significant departures from a technical
167 perspective (i.e., more lines of code have been changed and/or
168 the changes are more invasive or represent a more fundamental
169 rethinking of the problem) and in that they are bifurcations
170 defined in social and political terms. Branches involve a
171 <emphasis>single</emphasis> developer or community of developers
172 — even if it does boil down to distinct subgroups within a
173 community — whereas forks are separate projects.</para>
175 <para>Forking has historically been viewed as a bad thing in free
176 software communities: they are seen to stem from people's
177 inability to work together and have ended in reproduction of
178 work. When I published the first version of the <ulink
179 url="http://mako.cc/projects/howto/">Free Software Project
180 Management HOWTO</ulink> more than four years ago, I included
181 a small subsection on forking which described forking to
182 prospective free software project leaders with this text:</para>
185 <para>The short version of the fork section is, don't do them.
186 Forks force developers to choose one project to work with,
187 cause nasty political divisions, and redundancy of
191 <para>In the <emphasis>best</emphasis> situations, a fork means
192 that two groups of people need to go on developing features and
193 doing work they would ordinarily do <emphasis>in addition
194 to</emphasis> tracking the forked project and having to
195 hand-select and apply features and fixes to their own code-base.
196 This level of monitoring and constant comparison can be
197 extremely difficult and time-consuming. The situation is not
198 helped substantially by traditional source control tools like
199 diff, patch, CVS and Subversion which are not optimized for this
200 task. The worse (and much more common) situation occurs when two
201 groups go about their work ignorant or partially ignorant of the
202 work done on the other side of the fork. Important features and
203 fixes are implemented twice — differently and
206 <para>The most substantial bright side to these drawbacks is that
207 the problems associated with forking are so severe and notorious
208 that, in most cases, the threat of a fork is enough to force
209 maintainers to work out solutions that keep the fork from
210 happening in the first place.</para>
212 <para>Finally, it is worth pointing out that fork is something of
213 a contested term. Because definitions of forks involve, to one
214 degree or another, statements about the political, organization,
215 and technical distinctions between projects, bifurcations that
216 many people call branches or parallel trees are described as
217 others as forks. Recently, fueled by the advent of distributed
218 version control systems, the definition of what is and is not a
219 fork has become increasingly unclear. In part due to the same
220 systems, the benefits and drawbacks of what is increasingly
221 problematically called forking is equally debatable.</para>
226 <title>Case Study</title>
228 <para>In my introduction, I described how the growing scope of
229 free software projects and the rapidly increasingly size and
230 diversity of project's user communities is spearheading the need
231 for new type of derivation that avoids, as best as possible, the
232 drawbacks of forking. Nowhere is this more evident than in the
233 largest projects with the broadest scope: a small group of
234 projects that includes operating system distributions.</para>
238 <title>The Debian Project</title>
240 <para>The Debian project is by many counts the largest, in terms
241 of both code and volunteers, free software distribution. It is
242 the also, arguably, the largest free software project in terms
243 of the number of volunteers. Debian includes more than 15,000
244 packages and the work of well over 1,000 official volunteers
245 and many more contributors without official membership.
246 Projects without Debian's massive volunteer base cannot
247 replicate what Debian has accomplished; they can rarely hope
248 to even maintain what Debian currently has.</para>
250 <para>At the time that this paper was written, Distrowatch lists
251 129 distributions based on Debian<footnote>
252 <para>Information is listed on the distrowatch homepage
254 url="http://distrowatch.com/dwres.php?resource=independence">http://distrowatch.com/dwres.php?resource=independence</ulink></para>
256 </footnote> — most of them currently active to varying
257 degrees. Each distribution represents at least one person —
258 and in most cases a community of people — who disagreed with
259 Debian's vision or direction strongly enough to want to create
260 a new distribution <emphasis>and</emphasis> who had the
261 technical capacity to follow through with this goal. Despite
262 Debian's long-standing slogan — "the universal operating
263 system" — the fact that the Debian project has become the
264 fastest growing operating system while spawning so many
265 derivatives is testament to the fact that, as far as software
266 is concerned, one size <emphasis>can not</emphasis> fit
268 <para>Netcraft posts yearly updates on the speed at which
269 Linux distributions are growing. The one in question can
271 url="http://news.netcraft.com/archives/2004/01/28/debian_fastest_growing_linux_distribution.html">http://news.netcraft.com/archives/2004/01/28/debian_fastest_growing_linux_distribution.html</ulink></para>
276 <para>Organizationally, Debian derivers are located both inside
277 and outside of the Debian project. A group of derivers working
278 within the Debian project has labeled themselves "Custom
279 Debian Distributions" and has created nearly a dozen projects
280 customizing and deriving from Debian for specific groups of
281 users including non-profit organization, the medical
282 community, lawyers, children and many others.<footnote>
283 <para>I spearheaded and help build a now mostly defunct
284 derivation of Debian called Debian-Nonprofit (Debian-NP)
285 geared for non-profit organizations by working within the
286 Debian project.</para>
287 </footnote> These projects build on the core Debian distribution and
288 the canonical archive from <emphasis>within</emphasis> the
289 organizational and political limits of the Debian project and
290 constantly seek to minimize the delta by focusing on less
291 invasive changes and by advancing creative ways of building
292 the <emphasis>ability</emphasis> to make changes in the core
293 Debian code base through established and policy compliant
296 <!-- http://linktocddinformation -->
298 <para>A second group of Debian customizers includes those
299 working outside of the Debian project organizationally.
300 Notable among this list are (in alphabetical order) Knoppix,
301 Libranet, Linspire (formerly Lindows), Progeny, MEPIS, Ubuntu,
302 Userlinux, and Xandros. With its strong technological base,
303 excellent package management, wide selection of packages to
304 choose from, and strong commitment to software freedom which
305 ensures derivability, Debian provides an ideal point from
306 which to create a GNU/Linux distribution.</para>
312 <title>Ubuntu</title>
314 <para>The Ubuntu project was started by Mark Shuttleworth in
315 April 2004 and the first version was built almost entirely
316 by a small group of a Debian developers employed by Shuttleworth's
317 company Canonical Limited.<footnote>
318 <para>Information Ubuntu can be found on the <ulink
319 url="http://www.ubuntu.com">Ubuntu homepage.</ulink>
320 Information Canonical Limited can be found at <ulink
321 url="http://www.canonical.com">Canonical's
322 homepage</ulink>.</para>
323 </footnote> It was released to the world in late 2004.
324 The second version was released six months later in April
325 2005. The goals of Ubuntu are to provide a distribution based
326 on a subset of Debian with:</para>
330 <para>Regular and predictable releases — every six months
331 with support for eighteen months.</para>
334 <para>An emphasis on free software that will maintain the
335 derivability of the distribution.</para>
338 <para>An emphasis on usability and a consistent desktop
339 vision. As an example, this has translated into less
340 questions in the installer and a default selection and
341 configuration of packages that is usable for most desktop
342 users "out of the box."</para>
347 <para>The Ubuntu project provides an interesting example of a
348 project that aims to derive from Debian to an extensive
349 degree. Ubuntu made code-level changes to nearly 1300 packages
350 in Debian at the time that this paper was written and the
351 speed of changes will not decelerate with time; the total
352 number of changes and the total size of the delta will
354 <para>Scott James Remnant maintains a list of these patches
356 url="http://people.ubuntu.com/~scott/patches/">http://people.ubuntu.com/~scott/patches/</ulink></para>
357 </footnote> The changes that Ubuntu makes are primarily of the
358 most intrusive kind — changes to the code itself.</para>
360 <para>That said, the Ubuntu project is explicit about the fact
361 that it could not exist with the work done by the Debian
362 project before Ubuntu was created.<footnote>
363 <para>You can see that explicit statement on Ubuntu's
365 url="http://www.ubuntulinux.org/ubuntu/relationship/">http://www.ubuntulinux.org/ubuntu/relationship/</ulink></para>
366 </footnote> More importantly, Ubuntu explains that it cannot
367 continue to provide the complete set of packages that its
368 users depend on without the ongoing work by the Debian
369 project. Even though Ubuntu has made changes to the nearly
370 1300 packages, this is less than ten percent of the total
371 packages shipped in Ubuntu and pulled from Debian.</para>
373 <para>Scott James Remnant, a prominent Debian developer and a
374 hacker on Ubuntu who works for Canonical Ltd., described the
375 situation this way on his web log to introduce the Ubuntu
376 development methodology in the week after first public
377 announcement of Canonical and Ubuntu:<footnote>
378 <para>The entire post can be read here: <ulink
379 url="http://www.netsplit.com/blog/work/canonical/ubuntu_and_debian.html">http://www.netsplit.com/blog/work/canonical/ubuntu_and_debian.html</ulink></para>
385 <para>I don't think Ubuntu is a "fork" of Debian, at least not
386 in the traditional sense. A fork suggests that at some
387 point we go our separate way from Debian and then
388 occasionally merge in changes as we carry on down our own
391 <para>Our model is quite different; every six months we take a
392 snapshot of Debian's unstable distribution, apply any
393 outstanding patches from our last release to it and spend a
394 couple of months testing and bug-fixing it.</para>
400 <imagedata fileref="tfontf-picture-01.png" format="PNG"/>
405 <para>One thing that should be obvious from this is our job is
406 a lot easier if Debian take all of our changes, the model
407 actually encourages us to give back to Debian.</para>
409 <para>That's why from the very first day we started fixing
410 bugs we began sending <ulink
411 url="http://www.no-name-yet.com/patches/">the
412 patches</ulink> back to Debian through the BTS. Not only
413 will it make our job so much easier when we come to freeze
414 for "hoary", our next release, but it's exactly what every
415 derivative should do in the first place.</para>
419 <para>There is some debate on the degree to which Ubuntu
420 developers have succeeded in accomplishing the goals laid out
421 by Remnant. Ubuntu has filed hundreds of patches in the bug
422 tracking system but it has also run into problems in deciding
423 <emphasis>what</emphasis> constitutes something that should be
424 fed back to Debian. Many changes are simply not relevant to
425 Debian developers. For example, they may include changes to a
426 package in response to another change made in another package
427 in Ubuntu that will not or has not been taken by Debian. In
428 many other cases, the best action in regards to a particular
429 change, a particular package, and a particular upstream Debian
430 developer is simply unclear.</para>
432 <para>The Ubuntu project's track record in working
433 constructively with Debian is, at the moment, a mixed one.
434 While an increasingly large number of Debian developers are
435 maintaining their packages actively within both projects, many
436 in both Debian and Ubuntu feel that Ubuntu has work left to do
437 in living up to its own goal of a completely smooth productive
438 relationship with Debian.</para>
440 <para>That said, the importance of the goals described by
441 Remnant in the context of of the Ubuntu development model
442 cannot be overstated. Ever line of delta between Debian and
443 Ubuntu has a cost for Ubuntu developers. Technology, social
444 practices, and wise choices may reduce the cost but it cannot
445 eliminate it. The resources that Ubuntu can bring to bear upon
446 the problem of building a distribution are limited — far
447 more limited than Debian's. As a result, there is a limit to
448 how far Ubuntu can diverge; it is always in Ubuntu's advantage
449 to minimize the delta where possible.</para>
454 <title>Applicability</title>
456 <para>Ubuntu and Debian are distributions and — as such —
457 operate on a different scale than the vast majority of free
458 software projects. Using a very simple metric, they include
459 more code and more people. As a result, there are questions as
460 to whether the experiences and lessons learned from these
461 projects are particularly applicable to the experience of
462 smaller free software projects.</para>
464 <para>Clearly, because of the difficulties associated with
465 forking massive amount of code and the problems associated
466 with duplicating the work of large volunteer bases,
467 distributions are forced into finding a way to balance the
468 benefits and drawbacks of forking. However, while the need is
469 stronger and more immediate in larger projects, the benefits
470 of their solutions will often be fully transferable.</para>
472 <para>Clearly, modifiability of free software to better fit the
473 needs of its users lies at the heart of the free software
474 movement's success. However, while modification usually comes
475 in the form of collaboration on a single code-base, this is
476 function of limitations in software development methodologies
477 and tools rather than the best response to the needs or
478 desires of users or developers.</para>
480 <para>I believe that the fundamental advantage of free software
481 in the next decade will be in the growing ability of any
482 single free software project to be multiple things to multiple
483 users simultaneously. This will translate into the fact that,
484 in the next ten years, technology and social processes will
485 evolve so that forking is increasingly less of a bad thing.
486 Free software development methodology will become less
487 dependent on a single project and begin to emphasize parallel
488 development within an ecosystem of software development
489 working on related projects. The result is that free software
490 projects will gain a competitive advantage over propriety
491 software projects through their ability to better serve the
492 increasingly diverse needs of increasingly large and
493 increasingly diverse user-bases. Although it sounds
494 paradoxical today, more projects will derive and less
495 redundant code will be written.</para>
497 <para>Projects more limited in code and scope may use the tools
498 and methods described in the remainder of this paper in
499 different combinations, in different ways, and to different
500 degrees than the examples around distributions introduced
501 here. Different projects with different needs will find that
502 certain solutions work better than others. Because communities
503 of the size of Debian are difficult to fork in a way that is
504 beneficial to any party, it is in these communities that the
505 technology and development methodologies are first
506 emerging. With time, these strategies and tools will find
507 themselves employed productively in a wide variety of projects
508 with a broad spectrum of sizes, needs, scopes and
516 <title>Balancing Forking With Collaboration</title>
519 <title>Derivation and Problem Analysis</title>
521 <para>The easiest step in creating a productive derivative
522 software project is to break down the problems of derivations
523 into a series of different classes of modification. Certain
524 types of modification are more easily done and are
525 intrinsically more maintainable.</para>
527 <para>In the context of distributions, the problem of derivation
528 can be broken down into the following types of changes (sorted
529 roughly according to the intrusiveness inherent in solving the
530 problem and the severity of the long-term maintainability
531 problems that they introduce):</para>
535 <para>Selection of individual pieces of software;</para>
538 <para>Changes to the way that packages are installed or run
539 (e.g., in a Live CD type environment or using a different
543 <para>Configuration of different pieces of software;</para>
546 <para>Changes made to the actual software package (made on
547 the level of changes to the packages code);</para>
551 <para>By breaking down the problem in this way. Debian derivers
552 have been able to approach derivation in ways that focus
553 energy on the less intrusive problems first.</para>
555 <para>The first area that Ubuntu focused on was selecting a
556 subset of packages that Ubuntu would support. Ubuntu selected
557 and supports approximate 2,000 packages. These became the
558 <command>main</command> component in Ubuntu. Other packages in
559 Debian were included in a separate section of the Ubuntu
560 archive called <command>universe</command> but were not
561 guaranteed to be supported with bug or security fixes. By
562 focusing on a small subset of packages, the Ubuntu team was
563 able to select a maintainable subsection of the Debian archive
564 that they could maintain over time.</para>
566 <para>The most simple derived distributions — often
567 working within the Debian project as CDDs but also including
568 projects like Userlinux — are merely lists of packages
569 and do nothing outside of package selection. The installation
570 of lists of packages and the maintenance of those lists over
571 time can be aided through the creation of what are called
572 <emphasis>metapackages</emphasis>: empty packages with long
573 lists of "dependencies" that are maintained over time.</para>
575 <para>The second item, configuration changes, are also
576 relatively low-impact. Focusing on moving as many changes as
577 possible into the realm of configuration changes is a
578 relatively low-impact strategy that derivers working within
579 the Debian project intent on a single code-base have pursued
580 actively. Their idea is that rather than forking a piece of
581 code due to disagreement in how the program should work, they
582 can leave the code intact but add the
583 <emphasis>ability</emphasis> to work in a different way to the
584 software. This alternate functionality is made toggleable
585 through a configuration change in the same manner that
586 applications are configured through questions asked at install
587 time. Since the Debian project has a unified package
588 configuration framework called Debconf, derivers are able to
589 configure an entire system in a highly centralized
590 manner.<footnote> <para>More information on Debconf can be
591 found online at: <ulink
592 url="http://www.kitenet.net/programs/debconf/">http://www.kitenet.net/programs/debconf/</ulink></para>
593 </footnote> This is not unlike RedHat's Kickstart although the
594 emphasis is on maintenance of those configuration changes over
595 the life and evolution of the package; Kickstart is focused
596 merely on installation of the package.</para>
598 <para>A third type of configuration is limited to changes in the
599 environment through which a system is run or installed. One is
600 example is Progeny's Anaconda-based Debian installer which
601 provides an alternate installer but results in an identical
602 system. Another example is the Knoppix project which is famous
603 for its "Live CD" environments. While, Knoppix makes a wide
604 range of invasive changes that span all items in my list
605 above, other Live CD projects, including Ubuntu's "Casper"
606 project, are much closer to alternative environments through
607 which the same code is run.</para>
609 <para>Because these three methods are relatively non-invasive,
610 they are reasonable strategies for small teams and individuals
611 working on creating a derived distribution. However, many
612 desirable changes — and in the case of some derived
613 distributions, most desirable changes — require more
614 invasive changes. The final and most invasive type of change
615 — changes to code — is the most difficult but also
616 the most promising and powerful if it can be done sustainably.
617 Changes of this type involve bifurcations of the code-base and
618 will be the topic of the remainder of this paper.</para>
623 <title>Distributed Source Control</title>
625 <para>One promising method of maintaining changes in forked or
626 branched problems lies in distributed version control systems
627 (VCS). Traditional VCS systems work in a highly centralized
628 fashion. CVS, the archetypal free software VCS and the basis
629 for many others, is based around the model of a single
630 centralized server. Anyone who wishes to commit to a project
631 must commit to the centralized repository. While CVS allows
632 users to create branches, anyone with commit rights has access
633 to the entire repository. The tools for branching and merging
634 over time are not particularly good.</para>
636 <para>The branching model is primarily geared toward a system
637 where development is bifurcated and then the branch is merged
638 completely back into the main tree. Normal use of a branch
639 might include creating a development branch, making a series
640 of development releases while maintaining and fixing important
641 bugs in the stable primary branch, and then ultimately
642 replacing the stable release with the development release. The
643 CVS model is <emphasis>not</emphasis> geared toward a system
644 where an arbitrary delta, or sets of deltas, are maintained
647 <para>Distributed version control aims to solve a number of
648 problems introduced by CVS and alluded to above by:</para>
652 <para>Allowing people to work disconnected from each other
653 and to sync with each other, in whole or in part, in an
654 arbitrary and ad-hoc fashion.</para>
657 <para>Allowing deltas to be maintained over time.</para>
661 <para>Ultimately, this requires tools that are better at merging
662 changes and in <emphasis>not</emphasis> merging certain
663 changes when that is the desired behavior. It also leads to tools capable
664 of history-sensitive merging.</para>
666 <para>The most famous switch to a distributed VCS model from a
667 centralized VCS model was the move by the Linux kernel
668 development community to the proprietary distributed version
669 control system BitKeeper. In his recent announcement of the
670 decision to part ways with BitKeeper, Linus Torvalds
674 <para>In fact, one impact BK has had is to very fundamentally
675 make us (and me in particular) change how we do things. That
676 ranges from the fine-grained changeset tracking to just how
677 I ended up trusting sub-maintainers with much bigger things,
678 and not having to work on a patch-by-patch basis any
679 more.<footnote> <para>The full message can be read online
681 url="http://kerneltrap.org/mailarchive/1/message/48393/thread">http://kerneltrap.org/mailarchive/1/message/48393/thread</ulink></para>
686 <para>At the time of the switch, free distributed version
687 control tools were less advanced than they are today. At the
688 moment, an incomplete list of free software VCS tools includes
689 GNU Arch, Bazaar, Bazaar-NG, Darcs, Monotone, SVK (based on
690 Subversion), GIT (a system developed by Linus Torvalds as a
691 temporary replacement for BitKeeper) and others.</para>
693 <para>Each of these tools, at least after they reach a certain
694 level of maturity, allow or will allow users to develop
695 software in a distributed fashion and to, over time, compare
696 their software and pull changes from others significantly more
697 easily than they could otherwise. The idea of parallel
698 development lies at the heart of the model, the tools for
699 merging and resolving conflicts over time, and the ability to
700 "cherry pick" certain patches or changes from a parallel
701 developer each make this type of development significantly
702 more useful than it has been in the past.</para>
704 <para>VCSs work entirely on the level of code. Due to the nature
705 of the types of changes that Ubuntu project is making to
706 Debian's code, Ubuntu has focused primarily on this model and
707 Canonical currently funds two major distributed control
708 products — the Bazaar and Bazaar-NG projects.</para>
710 <para>In many ways, employing distributed version control
711 effectively is a much easier problem to solve for small, more
712 traditional, free software development projects than it is for
713 GNU/Linux distributions. Because the problems with maintaining
714 parallel development of a single piece of software in a set of
715 related distributed repositories is primary use case for
716 distributed version control system, distributed VCS alone can
717 be a technical solution for certain types of parallel
718 development. As the tools and social processes for distributed
719 VCS evolve, they will become increasingly important tools in
720 the way that free software is developed.</para>
722 <para>Because the problems of scale associated with building an
723 entire derivative distribution are more complicated than those
724 associated with working with a single project, distributed
725 version control has not yet been widely deployed in the Ubuntu
726 project. Instead, the project is focusing on integrating these
727 into problem specific tools built on top of distributed
728 version control.</para>
733 <title>Problem Specific Tools</title>
735 <para>Another technique that Canonical Ltd. is experimenting
736 with is the creation of high level tools built on top of
737 distributed version control tools specifically designed for
738 maintaining difference between packages. Because packages are
739 usually distributed as a source file with a collection of one
740 or more patches, this introduces the unique possibility of
741 creating a high-level VCS system based on this fact.</para>
743 <para>In the case of Ubuntu and Debian, the ideal tool creates
744 one branch per patch or feature and using heuristics to
745 analyze patch files and create these branches
746 intelligently. The package build system section of the total
747 patch can also be kept as a separate branch. Canonical's tool,
748 called the Hypothetical Changeset Tool (HCT) (although no
749 longer hypothetical), is one experimental way of creating a
750 very simple, very streamlined interface for dealing with a
751 particular type of source that is created and distributed in a
752 particular type of way with a particular type of
755 <para>While HCT promises to be very useful for people making
756 derived distributions based on Debian, its application outside
757 distribution makers will, in all likelihood, be limited. That
758 said, it provides an example of the way that problem and
759 context specific tools may play an essential role in the
760 maintenance of derived code more generally.</para>
766 <title>Social Solutions</title>
768 <para>It has been said that it is a common folly of a
769 technophile to attempt to employ technical solutions toward
770 solving social problems. The problem of deriving software is
771 both a technical <emphasis>and</emphasis> a social problem and
772 adequately addressing the larger problems requires approaches that
773 take into consideration both types of solution.</para>
775 <para>Scott James Remnant compares the relationship between
776 distributions and derived distributions as not unlike the
777 relationship between distributions and upstream
781 <para>I don't think this is much different from how Debian
782 maintainers interact with their upstreams. As Debian
783 maintainers we take and package upstream software and then
784 act as a gateway for bugs and problems. Quite often we fix
785 bugs ourselves and apply the patch to the package and send
786 it upstream. Sometimes the upstream don't incorporate that
787 patch and we have to make sure we don't accidentally drop it
788 each subsequent release, we much prefer it if they take
789 them, but we don't get angry if they don't.</para>
791 <para>This is how I see the relationship between Ubuntu and
792 Debian, we're no more a fork of Debian than a Debian package
793 is a fork of its upstream.</para>
796 <para>Scott alludes the fact that, at least in the world of
797 distributions, parallel development is already one way to view
798 the <emphasis>modus operandi</emphasis> of existing GNU/Linux
799 distributions. The relationship between a deriver and derivee
800 on the distribution level mirrors the relationship between the
801 distribution and the "upstream" authors of the packages that
802 make up the distribution. These relationships are rarely based
803 around technological tools but are entirely in the realm of
804 social solutions.</para>
806 <para>Ubuntu has pursued a number of different initiatives along
807 these lines. The first of these has been to regularly file
808 bugs in the Debian bug tracking system when bugs are fixed
809 that exist in Debian are fixed in Ubuntu. While this can be
810 partially automated, the choice to automate this is a purely
813 <para>However, as I alluded to above, Ubuntu is still left with
814 questions in regards to changes that are made to packages that
815 do not necessarily fix bugs or that fix bugs that do not exist
816 in Debian but may in the future. Some Debian developers want
817 to hear about the full extent of changes made to their
818 software in Ubuntu while others do not want to be
819 bothered. Ubuntu should continue to work with Debian to find
820 ways to allow developers to stay in sync.</para>
822 <para>There is are also several initiatives by developers in
823 Debian, to create a stronger relationship between the Debian
824 project and its ecosystem of derivers and between Ubuntu and
825 Debian in particular. While the form that this will ultimately
826 take is unclear, projects existing within an ecosystem should
827 explore the realm of appropriate social relationships that
828 will ensure that they can work together and be informed of
829 each others' work without resorting to "spamming" each other
830 with irrelevant or unnecessary information.</para>
832 <para>Another issue that has recently played an important role
833 in the Debian/Ubuntu relationship is the importance of both
834 giving adequate credit to the authors or upstream maintainers
835 of software without implying a closer relationship than is the
836 case. Derivers must walk a file line where they credit others'
837 work on a project without implying that the others work for,
838 support, or are connected to the derivers project which, for
839 any number of reasons, the original author might not want to
840 be associated with.</para>
842 <para>In the case of Debian and Ubuntu, this has resulted in an
843 emphasis on keeping or importing changelog entries when
844 changes are imported and in noting the pedigree of changes
845 more generally. It has recently also been discussed in terms
846 of the "maintainer" field in each package in Ubuntu. Ubuntu
847 wants to avoid making changes to every unmodified source
848 package (and introducing an unnecessary delta) but does not
849 want to give the impression that the maintainer of the package
850 is someone unassociated with Ubuntu. While no solution has
851 been decided at the time of writing, one idea involved marking
852 the maintainer of the package explicitly as a Debian
853 maintainer at the time that the binary packages are built on
854 the Ubuntu build machines.</para>
856 <para>The emphasis on social solutions is also essential when
857 using distributed VCS technology. As Linus Torvalds alluded to
858 in the quote above, the importance of technological changes to
859 distributed VCS technology is only felt when people begin to
860 work in a different way — when they begin to employ
861 different social models of developer interaction.</para>
863 <para>While Ubuntu's experience can provide a good model for
864 tackling some of these source control issues, it can only
865 serve as a model and not as a fixed answer. Social solutions
866 must be appropriate for a given social relationship. Even in
867 situations where a package is branched because of social
868 incompatibility, a certain level of collaboration on a social
869 level will be essential to the long term viability of the
877 <title>Conclusions</title>
879 <para>As the techniques described in this paper evolve, the role
880 that they play in free software development becomes increasingly
881 prominent and increasingly important. Joining them will be other
882 techniques and models that I have not seen and cannot predict.
883 Because of the size and usefulness of their code and the size of
884 their development communities, large projects like Debian and
885 Ubuntu have been forced into confronting and attempting to
886 mediate the problems inherent in forking and deriving. However,
887 as these problems are negotiated and tools and processes are
888 advanced toward solutions, free software projects of all sizes
889 will be able to offer users exactly what they want with minimal
890 redundancy and little duplication of work. In doing this, free
891 software will harness a power that proprietary models cannot
892 compete with. They will increase their capacity to produce
893 better products and better processes. Ultimately, it will help
894 free software capture more users, bring in more developers, and
895 produce more free software of a higher quality.</para>
902 <!-- Keep this comment at the end of the file
907 sgml-namecase-general:t
908 sgml-general-insert-case:lower
909 sgml-minimize-attributes:nil
910 sgml-always-quote-attributes:t
911 sgml-parent-document:nil
912 sgml-exposed-tags:nil
913 sgml-local-catalogs:nil
914 sgml-local-ecat-files:nil