1 <?xml version="1.0" encoding="UTF-8"?>
2 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
5 <article id="paper-11194">
7 <title>To Fork or Not To Fork</title>
8 <subtitle>Lessons From Ubuntu and Debian</subtitle>
10 <firstname>Benjamin</firstname>
11 <othername>Mako</othername>
12 <surname>Hill</surname>
14 <orgname>Canonical Limited</orgname>
17 <orgname>The Debian GNU/Linux Project</orgname>
20 <orgname>Software in the Public Interest, Inc.</orgname>
24 <para>Benjamin Mako Hill is an intellectual property
25 researcher and activist and a professional Free/Open Source
26 Software (FOSS) advocate, developer, and consultant. He is
27 active participant in the Debian Project in both technical
28 and non-technical roles and a founder of Debian-Nonprofit
29 and other Free Software projects. He is the author of the
30 Free Software Project Management HOWTO and many published
31 works on Free and Open Source Software. He currently is
32 working full time for Canonical Ltd. on Ubuntu, a new
33 Debian-based distribution.</para>
40 <holder>Benjamin Mako Hill</holder>
45 <para>This is where my abstract should go.</para>
49 <title>Introduction</title>
51 <para>The explosive growth of free and open source software over
52 the last decade has been mirrored by an equally explosive growth
53 in the ambitiousness of free software projects in choosing and
54 tackling problems. The free software movement approaches these
55 large problems with more code and with more expansive
56 communities than was even thinkable a decade ago. Example of
57 these massive projects include desktop environments — like
58 GNOME and KDE — and distributions like Debian.</para>
60 <para>These projects are leveraging the work of thousands of
61 programmers — both volunteer and paid — and are producing
62 millions of lines of code. Their software is being used by
63 millions of users with a diverse set of needs. This paper
64 focuses on two major effects of this situation:</para>
68 <para>The communities that free software projects — and in
69 particular large projects — serve are increasingly diverse.
70 It is becoming increasingly difficult for a single large
71 project to release any single product that can cater to all
72 of its potential users.</para>
75 <para>It's becoming increasingly difficult to reproduce these
76 large projects. While reproducing entire project is
77 impossible for small groups of hackers, it is often not
78 substantially easier for small groups to even track and
79 maintain a fork of a large project over time.</para>
83 <para>Taken together, these facts imply an increasingly realized
84 free software community in which programmers frequently derive
85 but where traditional forking is often untenable. "Forks," as
86 they are traditionally defined, will be improved upon.
87 Communities around large free software projects will be smarter
88 about the process of deriviation than they have been in the
91 <para>We are already seeing this with GNU/Linux distributions. New
92 distributions are rarely built from scratch today. Instead, they
93 adapted from and built on top of the work of existing projects.
94 As projects and userbases grow, these derived distributions are
95 increasingly common. Most of what I describe in this essay are
96 tools and experiences of derived distributions.</para>
98 <para>Software makers must pursue the idea of an
99 <emphasis>ecosystem</emphasis> of free software projects and
100 products that have forked but that maintain a close relationship
101 as they develop parallelly and symbiotically. To do this,
102 developers should:</para>
106 <para>Break down the process of derivation into a set of
107 different types of customization and derivation and
108 prioritize methods of derivation.</para>
111 <para>Create and foster social solutions to the social aspects
112 of the derivation problem.</para>
115 <para>Build and use new tools specifically designed to
116 coordinate development of software in the context of an
117 ecosystem of projects.</para>
120 <para>Distribute and utilize distributed version control tools
121 with an emphasis on maintaining differences over
126 <para>This paper is an early analysis of this set of problems. As
127 such, it is highly focused on the experience of the Ubuntu
128 project and it's existence as a derived Debian distribution. It
129 also pulls from my experience with Debian-NP and the Custom
130 Debian Distribution (CDD) community. Since I am active member of
131 both the Ubuntu and Debian-NP projects, these are areas that I
132 can discuss with some degree of knowlege and experience.</para>
136 <title>"Fork" Is A Four Letter Word</title>
138 <para>The act of taking the code for a free software project and
139 bifurcating it to create a new project is called "forking."
140 There have been a number of famous forks in free software
141 history. One of the most famous was the schism that led to the
142 parallel development of two versions of the Emacs text editor:
143 GNU Emacs and XEmacs. This schism persists to this day.</para>
145 <para>Some forks, like Emacs and XEmacs, are permanent. Others are
146 relatively sort lived. An example of this is the GCC project
147 which saw two forks — EGCS and PGCC — that both eventually
148 merged back into GCC. Forking can happen for any number of
149 reasons. Often developers on a project develop political or
150 personal differences that keep them from continuing to work
151 together. In some cases, maintainers become unresponsive and
152 other developers on the project fork the project to keep the
153 project alive in some form.</para>
155 <para>Ultimately though, most forks occur because people do not
156 agree on the features, the mechanisms, or the technology at the
157 core of a project. People have different goals, different
158 problems, and want different tools. Often, these goals, problems
159 and tools are similar up until a certain point before the need
160 to part ways becomes essential.</para>
162 <para>A fork occurs on the level of code but a fork is not merely
163 — or even primarily — technical. Many projects create
164 "branches." Branches are alternative version of a piece of
165 software used to experiment with intrusive or unstable features
166 and bug fixes. Forks are distinguished from branches both in
167 that they are often more significant departures from a technical
168 perspective (i.e., more lines of code have been changed and/or
169 the changes are more invasive or represent a more fundamental
170 rethinking of the problem) and in that they are bifurcations
171 defined in social terms. Branches involve a
172 <emphasis>single</emphasis> developer or community of developers
173 — even if it does boil down to distinct subgroups within a
174 community — whereas forks are separate projects.</para>
176 <para>Forking has historically been viewed as a bad thing in free
177 software communities: they are seen to stem from people's
178 inability to work together and have ended in reproduction of
179 work. When I published the first version of the <ulink
180 url="http://mako.cc/projects/howto/">Free Software Project
181 Management HOWTO</ulink> more than four years ago, I included
182 a small subsection on forking which described forking to
183 prospective free software project leaders with this text:</para>
186 <para>The short version of the fork section is, don't do them.
187 Forks force developers to choose one project to work with,
188 cause nasty political divisions, and redundancy of
192 <para>In the <emphasis>best</emphasis> situations, a fork means
193 that two groups of people need to go on developing features and
194 doing work they would ordinarily do <emphasis>in addition
195 to</emphasis> tracking the forked project and having to
196 hand-select and apply features and fixes to their own code-base.
197 This level of monitoring and constant comparison can be
198 extremely difficult and time-consuming. The situation is not
199 helped substantially by traditional source control tools like
200 diff, patch, CVS and Subversion which are not optimized for this
201 task. The worse (and much more common) situation occurs when two
202 groups go about their work ignorant or partially ignorant of the
203 work done on the other side of the fork. Important features and
204 fixes are implemented twice — differently and
207 <para>The most substantial bright side to these drawbacks is that
208 the problems associated with forking are so severe and notorious
209 that, in most cases, the threat of a fork is enough to force
210 maintainers to work out solutions that keep the fork from
211 happening in the first place.</para>
213 <para>Before moving on, it is worth pointing out that fork is
214 something of a contested term. Because definitions of forks
215 involve, to one degree or another, statements about the
216 political, organization, and technical distinctions between
217 projects, bifurcations that many people call branches or
218 parallel trees are described as others as forks. Recently,
219 fueled by the advent of distributed version control systems, the
220 definition of what is and is not a fork has becoming
221 increasingly unclear. In part due to the same systems, the
222 benefits and drawbacks of what is increasingly problematically
223 called forking is equally debatable.</para>
228 <title>Case Study</title>
230 <para>In my introduction, I described how the growing scope of
231 free software projects and the rapidly increasingly size and
232 diversity of project's user communities is spearheading the need
233 for new type of derivation that avoids, as best as possible, the
234 drawbacks of forking. Nowhere is this more evident than in the
235 largest projects with the broadest scope: a small group of
236 projects that includes operating system distributions.</para>
240 <title>The Debian Project</title>
242 <para>The Debian project is a the largest, in terms of both code
243 and volunteers, free software distribution. It is the also,
244 arguably, the largest free software project in terms of the
245 number of volunteers. Debian includes more than 15,000
246 packages and the work of well over 1,000 official volunteers
247 and many more contributors without official membership status.
248 Projects without Debian's massive volunteer base cannot
249 replicate what Debian has accomplished; they can rarely hope
250 to even maintain what Debian currently has separately.</para>
252 <para>At the time that this paper was written, Distrowatch lists
253 129 distributions based on Debian<footnote>
254 <para>Information is listed on the distrowatch homepage
256 url="http://distrowatch.com/dwres.php?resource=independence">http://distrowatch.com/dwres.php?resource=independence</ulink></para>
258 </footnote> — most of them currently active to one degree or
259 another. Each distribution represents at least one person —
260 and in most cases a community of people — who disagreed with
261 Debian's vision or direction strongly enough to want to create
262 a new distribution <emphasis>and</emphasis> who had the
263 technical capacity to follow through with this goal. Despite
264 Debian's long-standing slogan — "the universal operating
265 system" — the fact that the Debian project has become the
266 fastest growing operating system while spawning so many
267 derivatives is testament to the fact that, as far as software
268 is concerned, one size does <emphasis>not</emphasis> fit
270 <para>Netcraft posts yearly updates on the speed at which
271 Linux distributions are growing. The one in question can
273 url="http://news.netcraft.com/archives/2004/01/28/debian_fastest_growing_linux_distribution.html">http://news.netcraft.com/archives/2004/01/28/debian_fastest_growing_linux_distribution.html</ulink></para>
278 <para>Organizationally, Debian derivers are located both inside
279 and outside of the Debian project. A group of derivers working
280 within the Debian project has labeled themselves "Custom
281 Debian Distributions" and has created nearly a dozen projects
282 customizing and deriving from Debian for specific groups of
283 users including non-profit organization, the medical
284 community, lawyers, children and many others.<footnote>
285 <para>I have spearheaded and built a derivation of Debian
286 called Debian-Nonprofit (Debian-NP) geared for non-profit
287 organizations working within the Debian project.</para>
288 </footnote> These projects build on the core distribution and
289 the canonical archive from <emphasis>within</emphasis> the
290 organizational and political limits of the Debian project and
291 constantly seek to minimize the delta by focusing on less
292 invasive changes and by advancing creative ways of building
293 the <emphasis>ability</emphasis> to make changes in the core
294 Debian code base through established and policy compliant
297 <!-- http://linktocddinformation -->
299 <para>A second group of Debian customizers includes those
300 working outside of the Debian project organizationally.
301 Notable among this list are (in alphabetical order) Knoppix,
302 Libranet, Linspire (formerly Lindows), Progeny, MEPIS, Ubuntu,
303 Userlinux, and Xandros. With its strong technological base,
304 excellent package management, wide selection of packages to
305 choose from, and strong commitment to software freedom which
306 ensures derivability, Debian provides an ideal point from
307 which to create a GNU/Linux distribution.</para>
313 <title>Ubuntu</title>
315 <para>The Ubuntu project was started by Mark Shuttleworth in
316 April 2004 and the first version was executed almost entirely
317 by a small group of a Debian developers by Shuttleworth's
318 company Canonical Limited.<footnote>
319 <para>Information Ubuntu can be found on the <ulink
320 url="http://www.ubuntu.com">Ubuntu homepage.</ulink>
321 Information Canonical Limited can be found at <ulink
322 url="http://www.canonical.com">Canonical's
323 homepage</ulink>.</para>
324 </footnote> It was released to the world in the fall of 2004.
325 The second version was released six months later in April
326 2005. The goals of Ubuntu are to provide a distribution based
327 on a subset of Debian with:</para>
331 <para>Regular and predictable releases — every six months
332 with support for eighteen months.</para>
335 <para>An emphasis on free software that will maintain the
336 derivability of the distribution.</para>
339 <para>An emphasis on usability and a consistent desktop
340 vision. As an example, this has translated into less
341 questions in the installer and a default selection and
342 configuration of packages that is usable for most desktop
343 users "out of the box."</para>
348 <para>The Ubuntu project provides an interesting example of a
349 project that aims to derive from Debian to an extensive
350 degree. Ubuntu made code-level changes to nearly 1300 packages
351 in Debian at the time that this paper was written and the
352 speed of changes will only accelerate with time; the total
353 number of changes and the total size of the delta will
355 <para>Scott James Remnant maintains a list of these patches
357 url="http://people.ubuntu.com/~scott/patches/">http://people.ubuntu.com/~scott/patches/</ulink></para>
358 </footnote> The changes that Ubuntu makes are primarily of the
359 most intrusive kind — changes to the code itself.</para>
361 <para>That said, the Ubuntu project is explicit about the fact
362 that it could not exist with the work done by the Debian
363 project before Ubuntu was created.<footnote>
364 <para>You can see that explicit statement on Ubunut's
366 url="http://www.ubuntulinux.org/ubuntu/relationship/">http://www.ubuntulinux.org/ubuntu/relationship/</ulink></para>
367 </footnote> More importantly, Ubutnu explains that it cannot
368 continue to provide the complete set of packages that its
369 users depend on without the ongoing work by the Debian
370 project. Even though Ubuntu has made changes to the nearly
371 1300 packages, this is less than ten percent of the total
372 packages shipped in Ubuntu and pulled from Debian.</para>
374 <para>Scott James Remnant, a prominent Debian developer and a
375 hacker on Ubuntu who works for Canonical Ltd., described the
376 situation this way on his web log to introduce the Ubuntu
377 development methodology in the week after first public
378 announcement of Canonical and Ubuntu:<footnote>
379 <para>The entire post can be read here: <ulink
380 url="http://www.netsplit.com/blog/work/canonical/ubuntu_and_debian.html">http://www.netsplit.com/blog/work/canonical/ubuntu_and_debian.html</ulink></para>
386 <para>I don't think Ubuntu is a "fork" of Debian, at least not
387 in the traditional sense. A fork suggests that at some
388 point we go our separate way from Debian and then
389 occasionally merge in changes as we carry on down our own
392 <para>Our model is quite different; every six months we take a
393 snapshot of Debian's unstable distribution, apply any
394 outstanding patches from our last release to it and spend a
395 couple of months testing and bug-fixing it.</para>
401 <imagedata fileref="picture-11194-01.png" format="PNG"/>
406 <para>One thing that should be obvious from this is our job is
407 a lot easier if Debian take all of our changes, the model
408 actually encourages us to give back to Debian.</para>
410 <para>That's why from the very first day we started fixing
411 bugs we began sending <ulink
412 url="http://www.no-name-yet.com/patches/">the
413 patches</ulink>_ back to Debian through the BTS. Not only
414 will it make our job so much easier when we come to freeze
415 for "hoary", our next release, but it's exactly what every
416 derivative should do in the first place.</para>
420 <para>There is some debate on the degree to which Ubuntu
421 developers have succeeded in accomplishing the goals laid out
422 by Remnant. Ubuntu has filed hundreds of patches in the bug
423 tracking system although it has often run into problems in
424 deciding <emphasis>what</emphasis> constitutes something that
425 should be fed back to Debian. Many changes are simply not
426 relevant to upstream Debian developers. For example, they may
427 include changes to a package in response to another change
428 made in another package in Ubuntu that will not or has not
429 been taken by Debian.</para>
431 <para>The Ubuntu project's track record in working
432 constructively with Debian is, at the moment, decidedly mixed.
433 While an increasingly large number of Debian developers are
434 maintaining their packages actively within both projects, many
435 in both Debian and Ubuntu feel that Ubuntu has work left to do
436 in living up to its own goals of a smooth productive
437 relationship with Debian.</para>
442 <title>Applicability</title>
444 <para>Ubuntu and Debian are distributions and — as such —
445 operate on a different scale than the vast majority of free
446 software projects. Using a very simple metric, they include
447 more code and more people. As a result, there are questions as
448 to whether the experiences and lessons learned from these
449 projects are particularly applicable to the experience of
450 smaller free software projects.</para>
452 <para>Clearly, because of the difficulties associated with
453 forking massive amount of code and the problems associated
454 with duplicating the work of large volunteer bases,
455 distributions are forced into finding a way to balance the
456 benefits and drawbacks of forking. However, while the need is
457 stronger and more immediate in larger projects, the benefits
458 of their solutions will often be fully transferable.</para>
460 <para>Clearly, modifiability of free software to better fit the
461 needs of its users lies at the heart of the free software
462 movement's success. However, while modification usually comes
463 in the form of collaboration on a single code-base, this is
464 function of limitations in software development methodologies
465 and tools rather than the best response to the needs or
466 desires of users or developers.</para>
468 <para>I believe that the fundamental advantage of free software
469 in the next decade will be in the growing ability of any
470 single free software project to be multiple things to multiple
471 users simultaneously. This will translate into the fact that,
472 in the next ten years, technology and social processes will
473 evolve so that forking is increasingly less of a bad thing.
474 Free software development methodology will become less
475 dependent on a single project and begin to emphasize parallel
476 development within an ecosystem of software development
477 working on related projects. The result is that free software
478 projects will gain a competitive advantage over propriety
479 software projecrts through their ability to better serve the
480 increasingly diverse needs of increasingly large and
481 increasingly diverse user-bases. More projects will derive and
482 less redundant code will be written.</para>
484 <para>Projects more limited in code and scope may use the tools
485 and methods in different combinations, in different ways, and
486 to different degrees than the examples around distributions
487 introduced here. Different projects with different needs will
488 find that certain solutions work better than others. Because
489 communities of the size of Debian are difficult to fork in a
490 way that is beneficial to any party, it is in these
491 communities that the technology and development methodologies
492 will are first emerging. With time, these strategies and tools
493 will find themselves employed productively in a wide variety
494 of projects with a broad spectrum of sizes, needs, scopes and
502 <title>Balancing Forking With Collaboration</title>
505 <title>Derivation and Problem Analysis</title>
507 <para>The easiest step in creating a productive derivative
508 software project is to break down the problems of deriviations
509 into a series of different classes of modification. Certain
510 types of modification is more easily done and are
511 intrinsically more maintainable.</para>
513 <para>In the context of distributions, the problem of derivation
514 can be broken down into the following types of changes (sorted
515 roughly according to the intrusiveness inherent in solving the
516 problem and the severity of the long-term maintainability
517 problems that they introduce):</para>
521 <para>Selection of individual pieces of software;a</para>
524 <para>Changes to the way that packages are installed or run
525 (e.g., in a Live CD type environment or using a different
529 <para>Configuration of different pieces of software;</para>
532 <para>Changes made to the actual software package (made on
533 the level of changes to the packages code);</para>
537 <para>By breaking down the problem in this way. Debian derivers
538 have been able to approach deriviation in ways that focus
539 energy on the less intrusive problems first.</para>
541 <para>The first area that Ubuntu focused on was selecting a
542 subset of package that Ubuntu would support. Ubuntu selected
543 and supports approximate 2,000 packages. These became the
544 <command>main</command> component in Ubuntu. Other packages in
545 Debian were included in a separate section of the Ubuntu
546 archive called <command>universe</command> but were not
547 guaranteed to be supported with bug or security fixes. By
548 focusing on a small subset of packages, the Ubuntu team was
549 able to select a maintainable subsection of the Debian archive
550 that they could maintain over time.</para>
552 <para>The most simple derived distributions — often working
553 within the Debian project as CDDs but also including projects
554 like Userlinux — are merely lists of packages and do nothing
555 outside of package selection. The installation of lists of
556 packages and the maintenance of those lists over time can be
557 aided through the creation of what are called "metapackages:"
558 empty packages that are maintained over time.</para>
560 <para>The second item, configuration changes, are also
561 relatively low-impact. Focusing on moving as many changes as
562 possible into the realm of configuration changes is a
563 relatively low-impact strategy that derivers working within
564 the Debian project intent on a single code-base have pursued
565 actively. Their idea is that rather than forking a piece of
566 code due to disagreement in how the program should work, they
567 can leave the code intact but add the
568 <emphasis>ability</emphasis> to work in a different way. This
569 alternate functionality is made toggleable through a
570 configuration change of the distribution in much the same that
571 applications can be configured differently or shipped with
572 different configuration files. Since the Debian project has a
573 unified package configuration framework called Debconf,
574 derivers are able to configure an entire system in aa highly
575 centralized manner.<footnote>
576 <para>More information on Debconf can be found online at:
578 url="http://www.kitenet.net/programs/debconf/">http://www.kitenet.net/programs/debconf/</ulink></para>
579 </footnote> This is not unlike RedHat's Kickstart although the
580 emphasis is on maintenance of those configuration changes over
581 the life and evolution of the package; Kickstart is focused
582 merely on installation of the package.</para>
584 <para>A third type of configuration is limited to changes in the
585 environment through which a system is run or installed. One is
586 example is Progeny's Anaconda-based Debian installer provides
587 which an alternate installer but installs an identical system.
588 Another example is the Knoppix project is famous for its Live
589 CD environments.<footnote>
590 <para>In reality, Knoppix makes a wide range of changes to a
591 default Debian installation that spam all items in my list
593 </footnote> Other Live CD projects, including Ubuntu's
594 Casper project, are purely a different
595 way of running the exact same code.</para>
597 <para>Because these three methods are relatively non-invasive,
598 they are reasonable strategies for small teams and individuals
599 working on creating a derived distribution. However, many
600 desirable changes — and in the case of some derived
601 distributions, most desirable changes — require more
602 invasive changes. The final and most invasive type of change
603 — changes to code — is the most difficult but also the most
604 promising and powerful if solved. Changes of this type involve
605 bifurcations of the code-base and will be the topic of the
606 remainder of this paper.</para>
611 <title>Distributed Source Control</title>
613 <para>One promising method of maintaining changes in forked or
614 branched problems lies in distributed version control systems
615 (VCS). Traditional VCS systems work in a highly centralized
616 fashion. CVS, the archetypal free software VCS and the basis
617 for many others, is based around the model of a single
618 centralized server. Anyone who wishes to commit to a project
619 must commit to the centralized repository. While CVS allows
620 users to create branches, anyone with commit rights has access
621 to the entire repository. The tools for branching and merging
622 over time are not particularly good.</para>
624 <para>The branching model is primarily geared toward a system
625 where development is bifurcated and then the branch is merged
626 completely back into the main tree. Normal use of a branch
627 might include creating a development branch, making a series
628 of development releases while maintaining and fixing important
629 bugs in the stable primary branch, and then ultimately
630 replacing the stable release with the development release. The
631 CVS model is <emphasis>not</emphasis> geared toward a system
632 where an arbitrary delta, or sets of deltas, is maintained
635 <para>Distributed version control aims to solve a number of
636 problems introduced by CVS and alluded to above by:</para>
640 <para>Allowing people to work disconnected from each other
641 and to sync with each other, in whole or in part, in an
642 arbitrary and ad-hoc fashion.</para>
645 <para>Allowing deltas to be maintained over time.</para>
649 <para>Ultimately, this requires tools that are better at merging
650 changes and in <emphasis>not</emphasis> merging certain
651 changes when that is desirable. It also leads to tools capable
652 of history-sensitive merging.</para>
654 <para>The most famous switch to a distributed VCS model from a
655 centralized VCS model was the move by the Linux kernel
656 development community to the proprietary distributed version
657 control system BitKeeper. In his recent announcement of the
658 decision to part ways with BitKeeper, Linus Torvalds
662 <para>In fact, one impact BK has had is to very fundamentally
663 make us (and me in particular) change how we do things. That
664 ranges from the fine-grained changeset tracking to just how
665 I ended up trusting sub-maintainers with much bigger things,
666 and not having to work on a patch-by-patch basis any
667 more.<footnote> <para>The full message can be read online
669 url="http://kerneltrap.org/mailarchive/1/message/48393/thread">http://kerneltrap.org/mailarchive/1/message/48393/thread</ulink></para>
674 <para>At the time of the switch, free distributed version
675 control tools were less advanced than they are today. At the
676 moment, an incomplete list of free software VCS tools includes
677 GNU Arch, Bazaar, Bazaar-NG, Darcs, Monotone, SVK (based on
678 Subversion), GIT (a system developed by Linus Torvlards as a
679 temporary replacement for BitKeeper) and others.</para>
681 <para>Each of these tools, at least after they reach a certain
682 level of maturity, allow or will allow its users to develop
683 software in a distributed fashion and to, over time, compare
684 their software and pull changes from others significantly more
685 easily than they could otherwise. The idea of parallel
686 development lies at the heart of the model, the tools for
687 merging and resolving conflicts over time, and the ability to
688 "cherry pick" certain patches or changes from a parallel
689 developer each make this type of development significantly
690 more useful than it has been in the past.</para>
692 <para>VCSs work entirely on the level of code. Due to the nature
693 of the types of changes that Ubuntu project is making to
694 Debian's code, Ubuntu has focused primarily on this model and
695 Canonical currently funds two major distributed control
696 products — the Bazaar and Bazaar-NG projects.</para>
698 <para>In many ways, employing distributed version control
699 effectively is a much easier problem to solve for small, more
700 traditional, free software development projects than it is for
701 GNU/Linux distributions. Because the problems with maintaining
702 parallel development of a single piece of software in a set of
703 related distributed repositories is primary use case for
704 distributed version control system, distributed VCS alone can
705 be a technical solution for certain types of parallel
706 development. As the tools and social processes for distributed
707 VCS evolve, they will become increasingly important tools in
708 the way that free software is developed.</para>
710 <para>Because the problems of scale associated with buildling an
711 entire derivative distribution are more complicated than those
712 associated with working with a single project, distributed
713 version control has not yet been widely deployed in the Ubuntu
714 project. Instead, the project is focusing on integrating these
715 into problem specific tools built on top of distributed
716 version control.</para>
721 <title>Problem Specific Tools</title>
723 <para>Another technique that Canonical Ltd. is experimenting
724 with is the creation of high level tools built on top of
725 distributed version control tools specifically designed for
726 maintaining difference between packages. Because packages are
727 usually distributed as a source file with a collection of one
728 or more patches, this introduces the unique possibility of
729 creating a limited high-level VCS system based on this
732 <para>In the case of Ubuntu and Debian, he tool is creating one
733 branch per patch or feature and using heuristics to analyze
734 patch files and create these branches intelligently. The
735 package build system section of the total patch can also be
736 kept as a separate branch. Canonical's tool, called the
737 Hypothetical Changeset Tool (HCT) (although no longer
738 hypothetical), is one experimental way of creating a very
739 simple, very streamlined interface for dealing with a
740 particular type of source that is created and distributed in a
741 particular type of way with a particular type of
744 <para>While HCT promises to be very useful for people making
745 derived distributions based on Debian, its wider application
746 may be limited. That said, this provides an example of the way
747 that problem and context specific tools may play an essential
748 role in the maintenance of derived code more generally.</para>
754 <title>Social Solutions</title>
756 <para>It has been said that a common folly among technophiles is
757 based on the temptation to employ technical solutions toward
758 solving social problems. The problem of deriving software is
759 both a technical <emphasis>and</emphasis> a social problem and
760 adequately addressing the issue will require approaches that
761 take into consideration both type of solution.</para>
763 <para>Scott James Remnant compares the relationship between
764 distributions and derived distributions as not unlike the
765 relationship between distributions and upstream
769 <para>I don't think this is much different from how Debian
770 maintainers interact with their upstreams. As Debian
771 maintainers we take and package upstream software and then
772 act as a gateway for bugs and problems. Quite often we fix
773 bugs ourselves and apply the patch to the package and send
774 it upstream. Sometimes the upstream don't incorporate that
775 patch and we have to make sure we don't accidentally drop it
776 each subsequent release, we much prefer it if they take
777 them, but we don't get angry if they don't.</para>
779 <para>This is how I see the relationship between Ubuntu and
780 Debian, we're no more a fork of Debian than a Debian package
781 is a fork of its upstream.</para>
784 <para>Scott alludes the fact that, at least in the world of
785 distributions, parallel development is already one way to view
786 the <emphasis>modus operandi</emphasis> of existing GNU/Linux
787 distributions. The relationship between a deriver and derivee
788 on the distribution level mirrors the relationship between the
789 distribution and the "upstream" authors of the packages that
790 make up the distribution. These relationships are rarely based
791 around technological tools but are entirely in the realm of
792 social solutions.</para>
794 <para>Ubuntu has pursued a number of different initiatives along
795 these lines. The first of these has been to regularly file
796 bugs in the Debian bug tracking system when bugs are fixed
797 that exist in Debian are fixed in Ubuntu. While this can be
798 partially automated, the choice to automate this is a purely
801 <para>Ubuntu is still left with questions in regards to changes
802 that are made to packages that do not necessarily fix bugs or
803 that fix bugs that do not exist in Debian but may in the
804 future. Some Debian developers want to hear about the full
805 extent of changes made to their software in Ubuntu while
806 others do not want to be bothered. Ubuntu should continue to
807 work with Debian to find ways to allow developers to stay in
810 <para>There is a recent initiative by some developers in Debian,
811 largely led by myself, to create a stronger relationship
812 between the Debian project and its ecosystem of derivers.
813 While the form that this will ultimately take is unclear,
814 projects existing within an ecosystem should explore the realm
815 of appropriate social relationships that will ensure that they
816 can work together and be informed of each others' work without
817 resorting to "spamming" each other with irrelevant or
818 unnecessary information.</para>
820 <para>Another issue that has recently played an important role
821 in the Debian/Ubuntu relationship is the importance of both
822 giving adequate credit to the authors or upstream maintainers
823 of software without implying a closer relationship than is the
824 case. Derivers must walk a file line where they credit others'
825 work on a project without implying that the others works for,
826 support, or are connected to the derivers project which, for
827 any number of reasons, the original author might not want to
828 be associated with.</para>
830 <para>In the case of Debian and Ubuntu, this has resulted in an
831 emphasis on keeping or importing changelog entries when
832 changes are imported and in noting the pedigree of changes
833 more generally. It has recently also been discussed in terms
834 of the "maintainer" field in each package in Ubuntu. Ubuntu
835 wants to avoid making changes to every unmodified source
836 package (and introducing an unnecessary delta) but does not
837 want to give the impression that the maintainer of the package
838 is someone unassociated with Ubuntu. While no solution has
839 been decided at the time of writing, one idea involved marking
840 the maintainer of the package explicitly as a Debian
841 maintainer at the time that the binary packages are built on
842 the Ubuntu build machines.</para>
844 <para>The emphasis on social solutions is also essential when
845 using distributed VCS technology. As Linus Torvalds alluded to
846 in the quote above, the importance of technological changes to
847 distributed VCS technology is only felt when people begin to
848 work in a different way — when they begin to employ differnet
849 social models of developer interaction.</para>
851 <para>While Ubuntu's experience can provide a good model for
852 tackling some of these source control issues, it can only
853 serve as a model and not as a fixed answer. Social solutions
854 must be appropriate for a given social relationship. Even in
855 situations where a package is branched because of social
856 incompatibility, a certain level of collaboration on a social
857 level will be essential to the long term viability of the
865 <title>Conclusions</title>
867 <para>As the techniques described in this paper evolve, the role
868 that they play in free software development becomes increasingly
869 prominent and increasingly important. Joining them will be other
870 techniques and models that I have not seen and cannot predict.
871 Because of the size and usefulness of their code and the size of
872 their development communities, large projects like Debian and
873 Ubuntu have been forced into confronting and attempting to
874 mediate the problems inherent in forking and deriving. However,
875 as these problems are negotiated and tools and processes are
876 advanced toward solutions, free software projects of all sizes
877 will be able to offer users exactly what they want with minimal
878 redundancy and little duplication of work. In doing this, free
879 software will harness a power that proprietary models cannot
880 compete with. They will increase their capacity to produce
881 better products and better processes. Ulimately, it will help
882 free software capture more users, bring in more developers, and
883 produce more free software of a higher quality.</para>
890 <!-- Keep this comment at the end of the file
895 sgml-namecase-general:t
896 sgml-general-insert-case:lower
897 sgml-minimize-attributes:nil
898 sgml-always-quote-attributes:t
899 sgml-parent-document:nil
900 sgml-exposed-tags:nil
901 sgml-local-catalogs:nil
902 sgml-local-ecat-files:nil