Use of WebDAV in Subversion
This document details how WebDAV is used within the
Subversion
product. Specifically, how the client side interfaces with
Neon to generate
WebDAV requests over the wire, and what the
server must do to map incoming WebDAV requests into operations
against the Subversion repository. Note that the server side is
implemented as an
Apache 2.0 module,
operating as a back-end for its mod_dav functionality.
This document heavily refers to the
Subversion
design document and the
latest Delta-V protocol
draft. Details of those documents will not be
replicated here.
|
NOTE: Subversion uses DeltaV for its
communication, but the Subversion client is
not a general-purpose DeltaV client. In
fact, it expects some custom features from the
server. Further, the Subversion server is
not a general-purpose DeltaV server. It
implements a strict subset of the
DeltaV specification. A WebDAV or DeltaV client may very
well be able to interoperate with it, but only if that
client operates within the narrow confines of those
features the server has implemented.
|
|
Version 2.0 of Subversion will address WebDAV
interoperability (Class 1, Class 2, and DeltaV
features). Each of the custom features expected by the
client actually has an alternate mechanism available in
DeltaV, but in a much less efficient form.
|
|
It is expected that Version 1.0 will support read-only,
Class 1 WebDAV clients. Any "low-hanging fruit" to
increase DeltaV interoperability will be considered.
|
Basic Concepts
Subversion uses a tree-based format to describe a change set
against the repository. This tree is constructed on the client
side (by "walking" over the working copy) to describe the
change. The tree is marshalled to the server as a linear
sequence of changes to be applied to the repository. The
repository can accept changes in a random-access manner, so the
mapping from a tree to a set of changes works very well for the
repository.
Subversion provides properties on files, directories, and even
the abstract concept of a revision. Each of the operations
involving properties are mapped directly to WebDAV properties,
which are manipulated with the PROPFIND and PROPPATCH HTTP methods. Revisions are
modeled as DeltaV baselines, so
revision properties are available through a PROPFIND on the baseline.
The Subversion server can efficiently compute deltas between two
revisions (these deltas are complete tree
deltas, not simple text deltas). DeltaV does not have a
direct analogue for the tree delta concept. A client could
discover changes by issuing a sequence of PROPFIND requests on the various WebDAV
resources, but this would be a time-consuming operation,
involving many requests. Instead, Subversion marshals this
concept as a custom WebDAV report. Using this report, the client learns
which items in the working copy are out of date and can issue
GET and PROPFIND methods to fetch the new data.
Tags and branches are simple copies in Subversion, which are
handled with the WebDAV COPY.
DeltaV Concepts Used by Subversion
Subversion uses many of the DeltaV concepts, as listed
below. Note that many of these concepts are not fully
implemented by Subversion; we have implemented enough to meet
our needs, but little more.
- Baseline
-
- Activity
-
- Version Resource
-
- Version-Controlled Configuration
-
- Baseline Collection
-
- Version-Controlled Resource
-
- Working Resource (Feature)
-
- Merge Feature
-
- Label Feature
-
- Version-Controlled-Collection Feature
-
Subversion Projects as URLs
The very first concept to define is how a project is exposed to
the client. Subversion will expose all projects as URLs on a
server. The files and subdirectories under this project will be
exposed through the URL namespace.
For example, let us assume that we have a project named
"example". And let us say that this project will be exposed at
the URL:
http://subversion.tigris.org/repos/example/.
This mapping is set up through a set of configuration
parameters for the Apache HTTP Server (which is hosting the
Subversion code and the particular project in question). The
configuration could look like:
<Location /repos/example>
DAV svn
SVNPath /home/svn-projects/example
</Location>
Files and directories within the project will be directly mapped
into the URL namespace. For example, if the project contains a
file "file.c" in a subdirectory "sub", then the URL for that
file will be
http://subversion.tigris.org/repos/example/sub/file.c.
Initial Checkout
When the user performs the initial checkout of a Subversion
project, the client will issue a series of PROPFIND and GET requests. These requests will traverse
the repository, pick up some necessary metadata, and then fetch
the latest revision.
Committing a Change
Subversion commits are modeled using the "activity" concept from
DeltaV. An activity can be viewed as a transaction for a set of
resources.
Creating the activity
At commit time, the Subversion client will retrieve the stored
DAV:activity-collection-set value to
know where it should create the activity. Next,
the client will generate a UUID (a unique value) to use for the
activity's location. Finally, the client will issue a
MKACTIVITY method request, where the Request-URL is
composed from the activity location and the UUID. This request
will construct an activity to hold all of the changes for the
commit.
Abbreviated summary:
- At checkout time:
-
Request: OPTIONS for
DAV:activity-collection-set
Response: http://www.example.com/repos/foo/$svn/act/
- At commit time:
-
Request: MKACTIVITY
http://www.example.com/repos/foo/$svn/act/01234567-89ab-cdef-0123-456789abcdef
Response: 201 (Created)
The CHECKOUT method can specify an
activity to use upon checkout. This feature is used to associate
all items with the newly-created activity.
Storing the commit message
Mapping changes to WebDAV
A change set in Subversion is specified with a "tree delta" (see
the SVN design for more details on the changes that can be
placed into a tree delta). The tree delta will be unravelled
into a set of requests. These requests will be one of the
following forms:
- Delete file or directory
-
These changes are mapped onto a DELETE operation. The version resource
of the target's parent collection is checked out using the
CHECKOUT method (into the current
activity). The target (name) is then deleted from the
resulting working collection using the DELETE method.
- Add file
-
This is modeled by performing a CHECKOUT of the version resource of the
target's parent collection. The new file is created within the
resulting working collection using a PUT request. Properties are applied
using PROPPATCH.
- Add directory
-
This is modeled by performing a CHECKOUT on the version resource of the
target's parent collection. The new directory is created
within the resulting working collection with a MKCOL request. Properties are applied
using PROPPATCH.
- Add file or directory, with previous ancestory (a copy)
-
A tree delta can specify that a file/directory originates as
a copy of another file/dir. This copy may be further
modified by additional elements the tree delta.
This change will be modeled by performing a
CHECKOUT on the version resource of the parent
collection which will contain the new resource. The
VERSION-CONTROL method will create a new
version-controlled resource (VCR) within the working
collection, with the VCR's DAV:checked-in
property referring to the ancestor's version resource.
Note: it appears that we will use
COPY to copy the appropriate resource into the
working collection. This will create a new version history
which is then placed into the working collection. The
version history will use the DAV:precursor-set
property to specify the version resource of the ancestor.
Because a version resource does not specify the revision,
it will not be possible to COPY a version
resource into the working collection -- it will not tell
us what revision was copied. Instead, we will most likely
copy a version resource out of the appropriate
baseline. This implies the client must be able to map from
a URL/revision pair to a baselined version resource URL.
The second issue is whether/how we set the
DAV:precursor-set property of the version
history. Or, more precisely, how we synthesize the value
from information stored in the repository. This is still
under investigation.
- Replace file/dir by another file/dir
-
This change does not have a WebDAV modeling because tree
deltas model it as two, sequential operations: a
delete, followed by an add.
- Moving a file or directory
-
This change does not have a WebDAV modeling because tree
deltas model it as two, distinct operations: a
delete, and an add with previous ancestry.
- Replace file
-
This is modeled with a CHECKOUT on
the target's version resource, followed by a PUT to the resulting working resource.
- Replace directory
-
In Subversion terms, "replace directory" means that additions,
deletions, and other changes will occur within the
directory. Each of these changes are modeled individually, and
the change to the directory is performed
implicitly. Therefore, this "change" has no particular mapping
into WebDAV.
- Property delta
-
A property delta (against a file or directory) maps directly
to a PROPPATCH in WebDAV
terms. The target's version resource will be checked out using
CHECKOUT and the PROPPATCH will be applied to the
resulting working resource.
Final Commit
The final action of the commit process is to issue a MERGE request to the Subversion server,
specifying that the activity (created earlier) be checked in and
the corresponding version-controlled resources be updated to
refer to the new version resources.
The version-controlled resources are also baseline-controlled,
which means that updates to them will automatically create a new
baseline. In essence, the commit will create a new baseline
corresponding to the new Subversion revision.
Example
Consider the following set of operations and its corresponding
tree delta (taken from the SVN design document):
- rename
/dir1/dir2 to /dir1/dir4,
- rename
/dir1/dir3 to /dir1/dir2, and
- move
file3 from /dir1/dir4 to /dir1/dir2.
<tree-delta>
<replace name='dir1'>
<directory>
<tree-delta>
<replace name='dir2'>
<directory ancestor='/dir1/dir3'> (1)
<tree-delta>
<new name='file3'> (2)
<file ancestor='/dir1/dir2/file3'/>
</new>
</tree-delta>
</directory>
</replace>
<delete name='dir3'/> (3)
<new name='dir4'> (4)
<directory ancestor='/dir1/dir2'>
<tree-delta>
<delete name='file3'/> (5)
</tree-delta>
</directory>
</new>
</tree-delta>
</directory>
</replace>
</tree-delta>
Walking through this delta, we map out the WebDAV requests
listed below. The numbers in the above delta roughly correspond
to the numbered entries below. The correspondence is not exact
because a specific, resulting behavior is typically based on a
combination of a few elements in the delta.
-
The
<directory ancestor="/dir1/dir3">
specifies that we are overwriting /dir1/dir2 with
/dir1/dir3.
CHECKOUT /dir1/dir2/
(returns a working resource URL for the directory)
COPY /dir1/dir3/
Destination: http://www.example.com/$svn/wrk/.../
Overwrite: T
-
/dir1/dir2/file3 is new (since we just overwrote
the original dir2 directory), and originates from
/dir1/dir2/file3. Thus, we simply
COPY the file into the target directory's working
resource:
COPY /dir1/dir2/file3
Destination: http://www.example.com/$svn/wrk/.../file3
-
CHECKOUT /dir1/dir3/
(returns a working resource URL for the directory)
DELETE /$svn/wrk/.../
-
We are going to creating a new subdirectory (
dir4) in the
/dir1 directory. Since we don't have
/dir1 checked out yet, we do so:
CHECKOUT /dir1/
(returns a working resource URL for the directory)
And now we copy the right directory into the new working
resource:
COPY /dir1/dir2/
Destination: http://www.example.com/$svn/wrk/.../dir4/
-
The
COPY created a complete set of working
resources on the server, so we simply delete the part that we
don't want:
DELETE: /$svn/wrk/.../dir4/file3
URL Layout
The Subversion server exposes repositories at user-defined
URLs. For example, the "foo" repository might be located at
http://www.example.com/repos/foo/. However,
the server also requires a number of other resources to be
exposed for proper operation. These additional resources will be
associated with each repository in a location under the main
repository URL. By default, this location is "$svn". It may be changed by using the
SVNSpecialURI directive:
<Location /repos/foo>
DAV svn
SVNPath /home/svn-projects/foo
SVNSpecialURI .special
</Location>
Underneath the location specified by SVNSpecialURI,
we will expose several collections. Assuming we use the default
of "$svn", the collections are:
- $svn/act/
-
This area is where activity resources are created. The client
will pick a unique name within this collection and issue a
MKACTIVITY for that URL. The client will then use
the activity in further interactions.
No methods are allowed on the $svn/act/
resource.
Note: actually, we may want to allow a PROPFIND
with a Depth: 1 header to allow clients to
enumerate the current activities.
Only a subset of methods are allowed on the activities
within the collection. They are: PROPFIND,
MERGE (commit the activity), and
DELETE (abort the activity).
Per the Delta-V specification, all activity resources will
have a DAV:resourcetype of
DAV:activity.
- $svn/his/
-
This collection contains the version history resources for
files and directories in a project. Its internal layout is
completely server-defined. Clients will receive URLs into
this collection (or a subcollection) from various responses.
No methods are allowed on the $svn/his/
resource.
Internally, the URL namespace is laid out with URLs of the
following form:
$svn/his/node-id
The node-id is an internal value
that Subversion uses to reference individual files and
directories. This node-id is a single integer
defined by the Subversion repository. Note that this is an
undotted node id, which is the base for the entire history
of a given node in the repository.
The DAV:resourcetype of the node-id
collection is DAV:version-history.
- $svn/ver/
-
This collection contains the version resources for the
project.
No methods are allowed on the $svn/ver/
resource.
The layout of this collection is internal to the server. For
reference purposes here (and to describe the
implementation), it is laid out as:
$svn/ver/node-id/path
Only read-only methods are allowed against these resources
(e.g. GET, PROPFIND,
REPORT); all other methods are illegal.
The DAV:resourcetype of a version resource is
simply the value of the resource at checkin time
(e.g. <D:resourcetype/> or
<D:resourcetype><D:collection/></D:resourcetype>).
- $svn/wrk/
-
This collection contains working resources for the resources
that have been checked out with the CHECKOUT
method. The form and construction of this collection is
server-defined, but is also well-defined so that clients may
interact properly with collection versions that have been
checked out.
No methods are allowed on the $svn/wrk/
resource.
For reference purposes, the working resource URLs are
constructed as:
$svn/wrk/activity/path
Any method is allowed on the working resources, but no
methods are allowed on any of its parents.
The DAV:resourcetype of the working resources
follows normal resource typing:
<D:resourcetype/> for regular working
resources, and
<D:resourcetype><D:collection/></D:resourcetype>
for working collections.
- $svn/vcc/
-
version-controlled configuration...
$svn/vcc/root as a singleton.
$svn/bln/
-
baselines...
$svn/bln/rev/
$svn/wbl/
-
working baseline...
$svn/bc/
-
baseline collection...
Property Management (and History/Log Reporting)
As mentioned before, Subversion properties map onto WebDAV
properties. For history/log reporting, the following WebDAV
properties will be applied to each baseline (a Subversion
revision) and to each version resource created by the
revision. Since these resources are all version resources, the
properties below are read-only.
DAV:comment
-
This is the standard (dead) property for specifying a checkin
comment.
DAV:creator-displayname
-
This is a (dead) property that is generated from Subversion's
concept of the "user" who made a particular change.
DAV:creationdate
-
This is a read-only live property created by the server at
commit time.
The history for a specified file will be generated using the
REPORT method and a
DAV:property-report report. A typical history will
fetch the three properties mentioned above for each version of
the file/directory.
Based on the client design, it may be important to specify other
read-only live properties for information about versions. For
example, how many lines were added/removed in a particular
checkin for a file? Creating these live properties will be quite
straight-forward, and driven by the client design over time.
Note: if we do this, however, then we'd end up tying the client
to the server. Of course, if the client were run against another
DeltaV server which didn't report these properties, then we'd
simply not display them in the UI. (e.g. graceful degradation of
functionality)
Fetching Status and Updates
After the initial checkout, the client can request a status
report (what has been changed on the client, pending a commit;
what has been changed on the server, pending an update). The
update process is similar, except that we also fetch the changes
from the server.
The local changes can be handled entirely on the client
side. The Working Copy library can easily handle the detection
and reporting of these changes. We're concerned with efficiently
detecting what has changed on the server.
While it would be possible to traverse the repository, fetching
the current state, and comparing that to the client state, it
would not be efficient. The Subversion design enables the server
to easily compute what has changed (relative to the client), if
it is given a description of the client state.
The core of the status and update commands is
based on a custom Subversion-specific WebDAV report. This custom
report will transmit the state of the working copy to the
server, and the server response will specify which resources
will need to be updated (fetched).
The request is a standard REPORT request, with a
custom XML body. The body will use the standard Subversion
technique of reporting a top-level revision number, and then
only reporting children that have different revisions. The
result of the report will use the same technique of reporting
only the resources where a change is found. If a change is
found, the server will provide a URL to the version resource to
fetch for the changed resource. The server will also report the
current revision number.
The custom report will tie the client to only those servers
which support the report, but a future version of the software
will contain a fallback codepath, a graceful degradation, to
support other DeltaV servers.
When an updated is performed, the client will fetch each of the
URLs (using GET requests) provided in the server
response.
GET (and PUT) operations will transfer
content in a "diff" format when possible. The mechanics of this
will follow the Internet Draft, titled
Delta Encoding in HTTP.
Entity Tags (etags)
Etags are required to be unique across all versions of a
resource. Luckily, this
is very easy for a version control system. Each etag will be
simply be the repository's node-id for the resource.
Etags are used to generate diffs, following the guidelines in
the aforementioned draft:
Delta Encoding in HTTP.
The problem then becomes how to get the etag for each file
stored on the client (we don't need etags for directories since
we never fetch them). During a checkout or
update process, this is easy: the etag is provided in
the HTTP response headers for each file retrieved.
The other part of the problem is getting the etag after a
commit has occurred. The MERGE response
provides a way to request properties from the version resources
which are created as part of the checkin of the activity. The
etag (and other properties) can be fetched using that mechanism.
Tags and Branches
Tags and branches within Subversion are performed by copying
from one area to another. For example:
[.../src/my-project]$ svn cp trunk tags/1.0.3-rc4
[.../src/my-project]$ svn commit
In the above example, tags/1.0.3-rc4 should now be
considered readonly and will always reflect the status of
trunk.
These copies are handled just like a regular commit. An activity
is created with MKACTIVITY, a
working resource is created via CHECKOUT (for the target directory;
tags/ in our example above), and then a COPY is performed. The activity is then
merged back into the repository with a MERGE request.
Server Requirements
DAV Methods
The server will need to implement the following WebDAV methods
for proper operation:
- OPTIONS
- GET
- DELETE
- COPY
- PROPPATCH
- PROPFIND
- MKACTIVITY
- CHECKOUT
- MERGE
- REPORT
The following methods are not required by Subversion at this
time:
- CHECKIN
- UNCHECKOUT
- UPDATE
- LABEL
- VERSION-CONTROL
- BASELINE-CONTROL
- MKWORKSPACE
DAV Properties
The following DeltaV properties will be implemented:
- DAV:comment
- DAV:creator-displayname
- DAV:supported-method-set
- DAV:supported-live-property-set
- DAV:supported-report-set
- DAV:version-controlled-configuration
- DAV:checked-in
-
DAV:auto-version is a readonly,
empty element (auto versioning not supported).
- DAV:checked-out
-
DAV:predecessor-set
Note: the Subversion design document is not clear on the
mechanics of how multiple predecessors are merged to create
a single, new revision. When this clarifies, then
DAV:predecessor-set may end up containing more
than zero or one predecessor URLs
-
DAV:version-name is simply the
(global) revision number.
- DAV:checkout-fork
- DAV:checkin-fork
- DAV:auto-update
-
DAV:subbaseline-set is a readonly,
empty property (sub-baselines not supported).
-
DAV:unreserved is set to
F.
- DAV:baseline-controlled-collection
- DAV:baseline-collection
-
DAV:subactivity-set is a readonly,
empty property (sub-activities not supported).
-
DAV:eclipsed-set is always empty
(internal members can never be eclipsed).
Contrary to the DeltaV specification, the following required
properties will not be implemented:
OPTIONS
The OPTIONS request will signal that
it supports the following DAV features:
1
2
version-control
checkout
working-resource
merge
baseline
activity
version-controlled-collection
Reports
The DAV:supported-report-set property
will signal support for the following reports:
- svn:update-report
- svn:log-report
These reports are available only on the "public" resources (the
VCRs). They are not available on the resources within the
$svn/ area.
Notes, reminders
Discuss timeouts and auto-purge of activities (and the related
working resources).
Discuss the activity database maintained by mod_dav_svn.
Discuss other implementation details of ra_dav and mod_dav_svn.
Several times, people have asked, "Why choose
HTTP/WebDAV/DeltaV? That seems awfully bloated and
ill-suited. Why didn't you design a custom, well-tuned protocol?
Or maybe use the CVS protocol?" Listed below are a number of
reasons for our choice of WebDAV as our network protocol.
While this list could certainly be expanded with more reasons
(and to be fair, with a list of reasons why WebDAV was a poor
choice), it certainly demonstrates the basic reasons for our
choice.
Note: this list came from an email note, so the tone and point
of view might be a bit off. Further word-smithing is
welcome...
Builtin web browsing of the repository
For example, take a
look at:
http://svn.collab.net/repos/svn/trunk/README
(that's the HEAD right there; we also have URLs for every
previous revision of every file)
DAV-based browsing
Use Web Folders or WebDrive or somesuch on
your Windows box (or Windows XP's native DAV mounts) to browse
the SVN repository with Windows Explorer. Mac OS X has builtin
DAV server mounting. Nautilus has DAV capabilities. Then you
have your Open Source tools such as cadaver, Goliath, etc.
People can use existing libraries
I couldn't even begin to count the number of HTTP tools and
libraries available. If we had designed our own protocol, then
we would have /none/ of those benefits. Heck, two HTTP library
implementors (Joe Orton of Neon, and Daniel Stenberg of CURL)
are regulars here. we wouldn't get that benefit. I've used
Python's httplib (and a davlib of my own) to do a lot of testing
of our server. No need to go and roll new protocol libraries.
Existing tools
One word: Ethereal :-) When we capture network traces, Ethereal
already knows about HTTP. It's quite nice, but I know there are
even better ones out there. But we also have other tools like
squid and other (caching) proxies (see the next item).
Caching proxies
Subversion will work great with caching proxies. There is no
longer a need for specialized tools like "cvsup". Just drop in a
caching proxy, and you've already got your distributed read-only
repository. That European dev team can just drop in the cache
between them and the US server and their checkouts/updates will
get cached for the benefit of the other team members. Commits
will flow through, back to the US-based server.
Sophisticated and broad-choice authentication
We don't have to reimplement an authentication scheme for a new
protocol. We can use all of the various schemes that have been
defined for HTTP. Ever look at the CVS protocol? Ever see the "I
Love You" or "I Hate You" lines? :-) That is all part of
creating a new authentication scheme. But we get to use SSL and
certificate-based auth if we want. Kerberos. NTLM. or even just
simple Basic or Digest. And our users can come from text files,
database, LDAP, or PAM. We don't have to reinvent the wheel cuz
it is all available for Apache already.
Awesome network server
We don't have to worry about how to portably set TCP_CORK for
optimal network packets. We don't have to worry about when
sendfile() makes sense, or if it is available. We don't have to
worry about dropped client connections, how to best use threads
and processes to scale, request management, monitoring, logging,
etc. Apache gives us all of that and a ton more. I *really*
would not want to do that through xinetd. I mean... setting
TCP_CORK on stdout? freaky :-)
Well-defined on-wire compression
We already have on-wire compression, similar to CVS's "-z#"
switch. And we didn't do anything. The client library and server
that we use just support it automatically for us, according to
RFC 2616.
Future interoperability
In the future, we'll be able to interoperate with a multitude of
IDEs and other WebDAV/DeltaV clients. As DeltaV becomes more
prevalent, IDEs could very well use it for source code
management, and we'll be right there without needing to write
some MS/SCC library to interface to the tool.
Greg Stein
Last modified: Fri Jan 25 12:54:20 PST 2002