From: mako@atdot.cc <> Date: Sun, 7 Aug 2005 17:41:49 +0000 (-0400) Subject: Final read-through. This is ready to go. X-Git-Url: https://projects.mako.cc/source/to_fork_or_not/commitdiff_plain/d901603b808a946ed51c24c167c02124c0941ff7 Final read-through. This is ready to go. --- diff --git a/to_fork_or_not_to_fork.xml b/to_fork_or_not_to_fork.xml index 8f4fb0b..e43618b 100644 --- a/to_fork_or_not_to_fork.xml +++ b/to_fork_or_not_to_fork.xml @@ -48,14 +48,15 @@ in the ambitiousness of free software projects in choosing and tackling problems. The free software movement approaches these large problems with more code and with more expansive - communities than was even thinkable a decade ago. Example of - these massive projects include desktop environments — like - GNOME and KDE — and distributions like Debian. + communities than was thinkable a decade ago. Example of these + massive projects include desktop environments — like GNOME + and KDE — and distributions like Debian, RedHat, and + Gentoo. These projects are leveraging the work of thousands of programmers — both volunteer and paid — and are producing millions of lines of code. Their software is being - used by millions of users with a diverse set of needs. This + used by millions of users with diverse sets of needs. This paper focuses on two major effects of this situation: @@ -72,9 +73,9 @@ It's becoming increasingly difficult to reproduce these large projects. While reproducing entire project is - impossible for small groups of hackers, it is often not - substantially easier for small groups to even track and - maintain a fork of a large project over time. + impossible for small groups of hackers, it is often not even + possible for small groups to even track and maintain a fork + of a large project over time. @@ -124,7 +125,7 @@ This paper is an early analysis of this set of problems. As such, it is highly focused on the experience of the Ubuntu - project and it's existence as a derived Debian distribution. It + project and its existence as a derived Debian distribution. It also pulls from my experience with Debian-NP and the Custom Debian Distribution (CDD) community. Since I participate in both the Ubuntu and CDD projects, these are areas that I can discuss @@ -143,13 +144,12 @@ Some forks, like Emacs and XEmacs, are permanent. Others are relatively short lived. An example of this is the GCC project - which saw two forks — EGCS and PGCC — that both eventually - merged back into GCC. Forking can happen for any number of - reasons. Often developers on a project develop political or - personal differences that keep them from continuing to work - together. In some cases, maintainers become unresponsive and - other developers on the project fork the project to keep the - project alive. + which saw two forks — EGCS and PGCC — that both + eventually merged back into GCC. Forking can happen for any + number of reasons. Often developers on a project develop + political or personal differences that keep them from continuing + to work together. In some cases, maintainers become unresponsive + and other developers fork to keep the software alive. Ultimately though, most forks occur because people do not agree on the features, the mechanisms, or the technology at the @@ -160,7 +160,7 @@ A fork occurs on the level of code but a fork is not merely — or even primarily — technical. Many projects create - "branches." Branches are alternative version of a piece of + "branches." Branches are alternative versions of a piece of software used to experiment with intrusive or unstable features and fixes. Forks are distinguished from branches both in that they are often more significant departures from a technical @@ -178,8 +178,8 @@ work. When I published the first version of the Free Software Project Management HOWTO more than four years ago, I included - a small subsection on forking which described forking to - prospective free software project leaders with this text: + a small subsection on forking which described the concept to + future free software project leaders with this text:
The short version of the fork section is, don't do them. @@ -199,8 +199,8 @@ diff, patch, CVS and Subversion which are not optimized for this task. The worse (and much more common) situation occurs when two groups go about their work ignorant or partially ignorant of the - work done on the other side of the fork. Important features and - fixes are implemented twice — differently and + code bieng cut on the other side of the fork. Important features + and fixes are implemented twice — differently and incompatibly. The most substantial bright side to these drawbacks is that @@ -213,7 +213,7 @@ a contested term. Because definitions of forks involve, to one degree or another, statements about the political, organization, and technical distinctions between projects, bifurcations that - many people call branches or parallel trees are described as + many people call branches or parallel trees are described by others as forks. Recently, fueled by the advent of distributed version control systems, the definition of what is and is not a fork has become increasingly unclear. In part due to the same @@ -227,8 +227,8 @@ In my introduction, I described how the growing scope of free software projects and the rapidly increasingly size and - diversity of project's user communities is spearheading the need - for new type of derivation that avoids, as best as possible, the + diversity of user communities is spearheading the need for new + type of derivation that avoids, as best as possible, the drawbacks of forking. Nowhere is this more evident than in the largest projects with the broadest scope: a small group of projects that includes operating system distributions. @@ -237,15 +237,15 @@
The Debian Project - The Debian project is by many counts the largest, in terms - of both code and volunteers, free software distribution. It is - the also, arguably, the largest free software project in terms - of the number of volunteers. Debian includes more than 15,000 + The Debian project is by many counts the largest free + software distribution in terms of code. It is the also, + arguably, the largest free software project in terms of the + number of volunteers. Debian includes more than 15,000 packages and the work of well over 1,000 official volunteers and many more contributors without official membership. Projects without Debian's massive volunteer base cannot replicate what Debian has accomplished; they can rarely hope - to even maintain what Debian currently has. + to even maintain what Debian has produced. At the time that this paper was written, Distrowatch lists 129 distributions based on Debian @@ -289,7 +289,7 @@ organizational and political limits of the Debian project and constantly seek to minimize the delta by focusing on less invasive changes and by advancing creative ways of building - the ability to make changes in the core + the ability to alter the core Debian code base through established and policy compliant procedures. @@ -359,7 +359,7 @@ That said, the Ubuntu project is explicit about the fact that it could not exist with the work done by the Debian - project before Ubuntu was created. + project. You can see that explicit statement on Ubuntu's website here: http://www.ubuntulinux.org/ubuntu/relationship/ @@ -441,7 +441,7 @@ Remnant in the context of of the Ubuntu development model cannot be overstated. Ever line of delta between Debian and Ubuntu has a cost for Ubuntu developers. Technology, social - practices, and wise choices may reduce the cost but it cannot + practices, and wise choices may reduce that cost but it cannot eliminate it. The resources that Ubuntu can bring to bear upon the problem of building a distribution are limited — far more limited than Debian's. As a result, there is a limit to @@ -453,13 +453,13 @@
Applicability - Ubuntu and Debian are distributions and — as such — - operate on a different scale than the vast majority of free - software projects. Using a very simple metric, they include - more code and more people. As a result, there are questions as - to whether the experiences and lessons learned from these - projects are particularly applicable to the experience of - smaller free software projects. + Ubuntu and Debian are distributions and — as such + — operate on a different scale than the vast majority of + free software projects. They include more code and more + people. As a result, there are questions as to whether the + experiences and lessons learned from these projects are + particularly applicable to the experience of smaller free + software projects. Clearly, because of the difficulties associated with forking massive amount of code and the problems associated @@ -485,14 +485,13 @@ evolve so that forking is increasingly less of a bad thing. Free software development methodology will become less dependent on a single project and begin to emphasize parallel - development within an ecosystem of software development - working on related projects. The result is that free software - projects will gain a competitive advantage over propriety - software projects through their ability to better serve the - increasingly diverse needs of increasingly large and - increasingly diverse user-bases. Although it sounds - paradoxical today, more projects will derive and less - redundant code will be written. + development within an ecosystem of related projects. The + result is that free software projects will gain a competitive + advantage over propriety software projects through their + ability to better serve the increasingly diverse needs of + increasingly large and increasingly diverse user-bases. + Although it sounds paradoxical today, more projects will + derive and less redundant code will be written. Projects more limited in code and scope may use the tools and methods described in the remainder of this paper in @@ -570,26 +569,26 @@ of lists of packages and the maintenance of those lists over time can be aided through the creation of what are called metapackages: empty packages with long - lists of "dependencies" that are maintained over time. + lists of "dependencies." The second item, configuration changes, are also relatively low-impact. Focusing on moving as many changes as possible into the realm of configuration changes is a - relatively low-impact strategy that derivers working within - the Debian project intent on a single code-base have pursued - actively. Their idea is that rather than forking a piece of - code due to disagreement in how the program should work, they - can leave the code intact but add the - ability to work in a different way to the - software. This alternate functionality is made toggleable - through a configuration change in the same manner that - applications are configured through questions asked at install - time. Since the Debian project has a unified package - configuration framework called Debconf, derivers are able to - configure an entire system in a highly centralized - manner. More information on Debconf can be - found online at: http://www.kitenet.net/programs/debconf/ + sustainable strategy that derivers working within the Debian + project intent on a single code-base have pursued actively. + Their idea is that rather than forking a piece of code due to + disagreement in how the program should work, they can leave + the code intact but add the ability to + work in a different way to the software. This alternate + functionality is made toggleable through a configuration + change in the same manner that applications are configured + through questions asked at install time. Since the Debian + project has a unified package configuration framework called + Debconf, derivers are able to configure an entire system in a + highly centralized manner. More information on + Debconf can be + found online at: http://www.kitenet.net/programs/debconf/ This is not unlike RedHat's Kickstart although the emphasis is on maintenance of those configuration changes over the life and evolution of the package; Kickstart is focused @@ -603,27 +602,28 @@ for its "Live CD" environments. While, Knoppix makes a wide range of invasive changes that span all items in my list above, other Live CD projects, including Ubuntu's "Casper" - project, are much closer to alternative environments through - which the same code is run. + project, are much closer to an alternate shell through which + the same code is run. Because these three methods are relatively non-invasive, they are reasonable strategies for small teams and individuals working on creating a derived distribution. However, many desirable changes — and in the case of some derived - distributions, most desirable changes — require more - invasive changes. The final and most invasive type of change - — changes to code — is the most difficult but also - the most promising and powerful if it can be done sustainably. - Changes of this type involve bifurcations of the code-base and - will be the topic of the remainder of this paper. + distributions, most desirable changes + — require more invasive techniques. The final and most + invasive type of change — changes to code — is the + most difficult but also the most promising and powerful if it + can be done sustainably. Changes of this type involve + bifurcations of the code-base and will be the topic of the + remainder of this paper.
Distributed Source Control - One promising method of maintaining changes in forked or - branched problems lies in distributed version control systems + One promising method of maintaining deltas in forked or + branched projects lies in distributed version control systems (VCS). Traditional VCS systems work in a highly centralized fashion. CVS, the archetypal free software VCS and the basis for many others, is based around the model of a single @@ -712,8 +712,8 @@ traditional, free software development projects than it is for GNU/Linux distributions. Because the problems with maintaining parallel development of a single piece of software in a set of - related distributed repositories is primary use case for - distributed version control system, distributed VCS alone can + related distributed repositories is the primary use case for + distributed version control systems, distributed VCS alone can be a technical solution for certain types of parallel development. As the tools and social processes for distributed VCS evolve, they will become increasingly important tools in @@ -721,11 +721,11 @@ Because the problems of scale associated with building an entire derivative distribution are more complicated than those - associated with working with a single project, distributed - version control has not yet been widely deployed in the Ubuntu - project. Instead, the project is focusing on integrating these - into problem specific tools built on top of distributed - version control. + associated with working with a single "upstream" project, + distributed version control is only now being actively + deployed in the Ubuntu project. In doing so, th project is + focusing on integrating these into problem specific tools + built on top of distributed version control.
@@ -738,10 +738,10 @@ maintaining difference between packages. Because packages are usually distributed as a source file with a collection of one or more patches, this introduces the unique possibility of - creating a high-level VCS system based on this fact.
+ creating a high-level VCS system based around this fact. In the case of Ubuntu and Debian, the ideal tool creates - one branch per patch or feature and using heuristics to + one branch per patch or feature and uses heuristics to analyze patch files and create these branches intelligently. The package build system section of the total patch can also be kept as a separate branch. Canonical's tool, @@ -805,10 +805,10 @@ Ubuntu has pursued a number of different initiatives along these lines. The first of these has been to regularly file - bugs in the Debian bug tracking system when bugs are fixed - that exist in Debian are fixed in Ubuntu. While this can be - partially automated, the choice to automate this is a purely - social one. + bugs in the Debian bug tracking system when bugs that exist in + Debian are fixed in Ubuntu. While this can be partially + automated, the choice to automate this and the manner in which + it it is set up is a purely social one. However, as I alluded to above, Ubuntu is still left with questions in regards to changes that are made to packages that @@ -819,15 +819,16 @@ bothered. Ubuntu should continue to work with Debian to find ways to allow developers to stay in sync. - There is are also several initiatives by developers in - Debian, to create a stronger relationship between the Debian - project and its ecosystem of derivers and between Ubuntu and - Debian in particular. While the form that this will ultimately - take is unclear, projects existing within an ecosystem should - explore the realm of appropriate social relationships that - will ensure that they can work together and be informed of - each others' work without resorting to "spamming" each other - with irrelevant or unnecessary information. + There are also several initiatives by developers in + Debian, Ubuntu, and in other derivations to create a + stronger relationship between the Debian project and its + ecosystem of derivers and between Ubuntu and Debian in + particular. While the form that this will ultimately take is + unclear, projects existing within an ecosystem should explore + the realm of appropriate social relationships that will ensure + that they can work together and be informed of each others' + work without resorting to "spamming" each other with + irrelevant or unnecessary information. Another issue that has recently played an important role in the Debian/Ubuntu relationship is the importance of both @@ -835,9 +836,9 @@ of software without implying a closer relationship than is the case. Derivers must walk a file line where they credit others' work on a project without implying that the others work for, - support, or are connected to the derivers project which, for - any number of reasons, the original author might not want to - be associated with. + support, or are connected to the derivers project to which, for + any number of reasons, the "upstream" author might not want to + be associated. In the case of Debian and Ubuntu, this has resulted in an emphasis on keeping or importing changelog entries when @@ -865,9 +866,9 @@ serve as a model and not as a fixed answer. Social solutions must be appropriate for a given social relationship. Even in situations where a package is branched because of social - incompatibility, a certain level of collaboration on a social + disagreements, a certain level of collaboration on a social level will be essential to the long term viability of the - derivation. + derivative.
@@ -879,20 +880,21 @@ As the techniques described in this paper evolve, the role that they play in free software development becomes increasingly prominent and increasingly important. Joining them will be other - techniques and models that I have not seen and cannot predict. - Because of the size and usefulness of their code and the size of - their development communities, large projects like Debian and - Ubuntu have been forced into confronting and attempting to - mediate the problems inherent in forking and deriving. However, - as these problems are negotiated and tools and processes are - advanced toward solutions, free software projects of all sizes - will be able to offer users exactly what they want with minimal - redundancy and little duplication of work. In doing this, free - software will harness a power that proprietary models cannot - compete with. They will increase their capacity to produce - better products and better processes. Ultimately, it will help - free software capture more users, bring in more developers, and - produce more free software of a higher quality. + techniques and models that I have not described and cannot + predict. Because of the size and usefulness of their code and + the size of their development communities, large projects like + Debian and Ubuntu have been forced into confronting and + attempting to mediate the problems inherent in forking and + deriving. However, as these problems are negotiated and tools + and processes are advanced toward solutions, free software + projects of all sizes will be able to offer users exactly what + they want with minimal redundancy and little duplication of + work. In doing this, free software will harness a power that + proprietary models cannot compete with. They will increase their + capacity to produce better products and better processes. + Ultimately, it will help free software capture more users, bring + in more developers, and produce more free software of a higher + quality.