Login | Register
My pages Projects Community openCollabNet
Click here to access Issue tracker direct from Eclipse or Visual Studio
and to obtain other CollabNet integrations.

subversion
Issue 525

Query | Reports

Issue 525

Issue #: 525   Platform: Other   Reporter: sussman (Ben Collins-Sussman)
Component: subversion   OS: other  
Subcomponent: src   Version: all   CC:
Remove selected CCs
Status: NEW   Priority: P4  
Resolution:   Issue type: FEATURE  
 Target milestone:unscheduled 
Assigned to: issues@subversion
URL:
* Summary: allow working copies without text-base/
Status whiteboard:
Keywords:
Attachments:
Issue 525 depends on:2539 3357 2539,3357 Show dependency tree
Issue 525 blocks:

View issue activity   |   Format for printing   |   Format as XML

Description: Opened: Thu Oct 11 10:25:00 -0800 2001 Sort by: Oldest first | Newest first

Someday, it's theoretically possible to improve subversion such that it does
*not* rely on the existence of cached, pristine files in .svn/text-base/.  It's
a big change, though.  There are comments throughout libsvn_wc that mention the
option.

------- Additional comments from Ben Collins-Sussman Thu Oct 11 10:26:20 -0800 2001 -------

post-1.0

------- Additional comments from Julien Mercay Tue Nov 18 11:52:23 -0800 2003 -------

This issue makes it hard to use subversion to store anything but small files.
For example, one might want to store mp3 or digital photo files, where the
overhead of storing the text-base is to high.

A property could tell svn the maximum size of files to keep in the text-base.

------- Additional comments from Karl Fogel Tue Nov 18 12:08:39 -0800 2003 -------

Many people use Subversion to store large files right now.  This issue makes it
hard for people with small (or slow) disks to version large files.  For now, our
answer is "buy a bigger disk", not because we're coldhearted techno-elitists
:-), but because making the text-base optional is a non-trivial task.  It's not
just a matter of defining a new property, it's a rewrite of much of libsvn_wc.

------- Additional comments from Oswald Buddenhagen Sat Jul 30 00:05:22 -0800 2005 -------

now that max has this as his SoC project, there is hope. :-)  
 
one idea that comes to mind is hard-linking the user-visible files to  
text-base. an editor that does "safe saving" (create a new file and then  
rename it onto the old one) would handle edits the right way automatically. 
and if the text-base files get screwed up - hey, that's what hashes are for. 
just re-fetch the text-base in this case.  
 
another *major* problem with the wc is its hunger for inodes and single 
blocks. this might be no problem for reiserfs, but you know, there are still 
other fs around and often the users are not in the position to change this. 
oh, well, i'm sure you heard this a few thousand times already. 
 

------- Additional comments from Branko Cibej Sat Jul 30 02:44:41 -0800 2005 -------

All the world is not Unix. We need a portable solution, and hard links aren't.

------- Additional comments from Peter N. Lundblad Mon Sep 19 00:48:22 -0800 2005 -------

*** Issue 2407 has been marked as a duplicate of this issue. ***

------- Additional comments from Peter N. Lundblad Fri Mar 10 12:50:23 -0800 2006 -------

*** Issue 2520 has been marked as a duplicate of this issue. ***

------- Additional comments from Oswald Buddenhagen Fri Jun 9 12:32:56 -0800 2006 -------

some more thoughts:
it is no big loss if the optimal solution is not achievable on inferior
platforms - after all, it is just an optimization.
possible solutions ordered by decreasing usefulnes:
- hardlinks with os-based COW semantics. this is perfect, because the user
cannot screw up. not sure any system supports this today; at least there was a
lot of talk on the linux kernel list.
- hardlinks with manual COW semantics. if the user (or rather, his editor)
screws up (we know this, as we have an md5 of text-base), discard the text-base
and fall back to the next method. as the fall-back is forseen anyway, this
entire handling can be simply skipped on inferior file systems (and #ifdef'd on
inferior os).
- if no valid text-base exists, re-fetch it. this will make the first diff after
a modification slow, but subsequent diffs will be fast. as usually only few
files are modified, the bandwith and storage impact should be reasonable. this
is also where compressed text-bases come into play. for those with really small
disks but too much bandwidth, the text-base could be thrown away again, of
course. and for the opposite site, permanent detached (not hardlinked) wcs
should be still possible as well - this should be adjustable at any time per
file, i guess.

please make sure this year's designated soc dude gets to read this. :)

------- Additional comments from Christian Westgaard Mon Sep 18 05:14:21 -0800 2006 -------

I store build environments, rootstraps, toolchains, sdk's, libraries, etc in 
svn. Often these are several GB, and so keeping pristine copies fills up the 
disk where the working copy is, quite fast. And makes check-in and checkout 
quite large and slow operations. I rarely/never change the binary files.

It would be great if I at least could disable pristine copies of binary files.

The repository is on LAN anyway so, getting a missing pristine file on check-
in, is a small loss compared to the massive disk loss of keeping pristine 
copies...

and Oh, /dev files/nodes cannot be stored in svn (I have to tarball them)
It would be nice to be able to checkout fully functional NFS roostraps...

------- Additional comments from Lieven Govaerts Thu Sep 21 01:53:20 -0800 2006 -------

*** Issue 2610 has been marked as a duplicate of this issue. ***

------- Additional comments from Alessio Fri Sep 22 02:35:23 -0800 2006 -------

Just a further use-case to stress how serious this issue is:

Very often I need to recursively search/replace several files of the same type.
If I'm not extremely careful with the file selection pattern, the search/replace
will corrupt the .svn directories. This scenario plays itself over and over
again. It's daily bread for mid-to-large projects and the current .svn directory
mechanism is very vulnerable to it.

------- Additional comments from Alessio Fri Sep 22 02:39:59 -0800 2006 -------

Apart from the solutions identified above by Oswald Buddenhagen, you might want
to consider making Berkely DB a mandatory component of svn and using a local
"cache database" that the svn client would query before going to the central
remote repository.

------- Additional comments from Talden Mon May 14 15:17:09 -0800 2007 -------

I wonder if one step towards reducing the text-bases problem might be to
recognise that a checkout often contains files that are not intended for
modification and allow us to mark those files/folders as such.

If we added "svn:readonly" property for the client to consume this could be
interpreted as:

1. Don't store a pristine copy for this file (files if the property is on a folder)
2. Don't report (or possibly even avoid looking for) modifications here.
3. Error during add and remove if these will be ignored.
*. If we're told to override this property during a checkout, update, status,
add or remove then the working copy should behave as it does now.

This would enable all of those repository contents that are not expected to
change to be checked out without their pristine copy and possibly even avoid
some processing as a potential benefit.

We already have this concept with svn:ignore (including the ability to override it).

This wouldn't get in the way of future solutions that compress or avoid pristine
copies for other use-cases and could still be a useful construct in the event
that the approach to building a working copy moves all book-keeping up and out
of the individual folders.

------- Additional comments from Chris Frost Thu Oct 25 21:59:19 -0800 2007 -------

As an interim solution for situations where storing both the pristine and
working copies is an issue and where the two copies are identical for most files
most of the time: the scord overlay file system may be useful. (Note of bias: I
am the scord author.) scord allows programs to treat a working copy as they do
now, but, behind the scenes, detects when only one of the two copies is
necessary to avoid storing two of each file on disk. Website:
http://scord.sourceforge.net/

------- Additional comments from Christian Westgaard Wed Jan 16 06:06:48 -0800 2008 -------

This MIGHT be another workaround.

Keep all pristine copies as 0 byte files, in order to workaround error message:
for f in `find . -type f -path '*/.svn/text-base/*'`;do echo -n>$f;done

Remove checksums to allow checkin:
for f in `gfind . -type f -path '*/.svn/entries'`;do sed -i '/^.*checksum=".*$/
d' $f;done

I'm unsure how this affects delta on the repository side.

------- Additional comments from David Weintraub Tue Apr 28 09:01:25 -0800 2009 -------

While discussing theory here...

If Subversion is modified to make the .svn/text-base optional, can the rest of
the .svn directory be removed too? I know that this contains the properties, and
server information, but the .svn directories do cause all sorts of issues with
tools such as "find" or using a working directory as a web directory.

------- Additional comments from Bob Thule Thu Aug 20 08:02:45 -0800 2009 -------

Even though the cost of hard drives are generally cheap, this issue is still
important.  This feature is the one major failure for subversion to be used
other than source code versioning.

Besides keeping a lightweight copy and sending full files to the server, another
option is for client side locking.  The process of unlocking a file on the
client would copy the file to text-base.

I personally would love to use subversion for my photos, home movies, music, and
so on.  Currently, subversion is just really bad at handling files that do not
change often or at all.  I hope that now that subversion has matured, that the
developers will revisit this and open subversion up to new markets:  home backup
and multiple-computer synchronization.

------- Additional comments from Hyrum K. Wright Thu Aug 20 08:11:32 -0800 2009 -------

One of the goals of the current work on WC-NG is to enable this feature.  I
won't get into all the technical details here, but suffice it to say that as
designed, the current working copy library cannot handle this.  So, we're
writing a new one, with the expectation and goal that it will make implementing
this feature possible.

It might be a while yet, but we're working on it!

------- Additional comments from Stefan Sperling Thu Aug 20 08:26:20 -0800 2009 -------

While using Subversion as a file distribution and synchronization tool
works for many people in practice (and not so much for some) I'd like to
point out this paragraph at the beginning of the subversion book:
http://svnbook.red-bean.com/nightly/en/svn.intro.whatis.html#svn.intro.righttool

Keep in mind that we're here primarily to write a version control system for
use by software developers. If people use Subversion for other things,
that's fine, but it should be clear that any such use is not a strict priority
for us and should not guide the design and development of Subversion.

That said, I don't think that the option of omitting text-bases hurts the
software development use case, and may even be needed in some software
development scenarios, so this is a valid enhancement. But not because it
will make it easier for people to use Subversion to copy photos to their
webserver or use Subversion to make backups.

------- Additional comments from Vincent Lefevre Thu Aug 20 11:19:29 -0800 2009 -------

The book says:

  If you need to archive old versions of files and directories, possibly
  resurrect them, or examine logs of how they've changed over time, then
  Subversion is exactly the right tool for you.

The fact that a file doesn't change much often doesn't mean that it doesn't
change at all. For instance, one may want to modify metadata of photos, so that
it may be a good idea to use Subversion for photos too, and it is the right tool
according to the book.

I also use Subversion for web sites, where some files often change, but some
other files, which may be huge, don't usually change (e.g., for a web site
associated with software development, this can be tarballs). Using Subversion
for all files is more practical, and even almost necessary to track the history
of the web site as a whole.

------- Additional comments from Stefan Sperling Thu Aug 20 11:51:45 -0800 2009 -------

Vincent, that's a fine use case. I was referring to the comment saying
Subversion should expand into the realm of "home backup and multiple-computer
synchronization." Versioning the code/data/html etc. for a website should
be considered part of what I called "software development" in my comment above.

Just how the stuff gets deployed onto the webserver is a different story.
There is "svn export" for this, and then any synchronization tool can be used
to upload the data. I know some people keep working copies on their web servers
and use 'svn update' to sync their websites, and that's OK as long as it
works but if there happen to be local changes and they get conflicts, well,
that's their problem.

------- Additional comments from Greg Stein Tue May 18 10:01:04 -0800 2010 -------

As Hyrum noted, the working copy library (as a whole) expects to always find these "pristine" copies of 
the files. We'll eventually get that fixed, but for right now we're rejiggering how all of the data is stored 
to make it easier/possible for us to add these new features.

We are scheduling this work for the 1.7 release of Subversion. It will still keep the pristines, but there 
will be only one .svn subdirectory at the root of your working copy (which mostly addresses the concern 
about having those .svn subdirs in every directory).

Subversion 1.8 (or later?) may be able to omit the pristines. We aren't scheduling the features that far 
out right now, but I can say "it won't be in 1.7, so it must therefore be 1.8 or later".

------- Additional comments from H. Stein Tue Jan 18 01:23:18 -0800 2011 -------

Hi,

i just read this an i would like to bring in the following idea, maybe you can
make use of it.

1) i fully agree that .svn - workarea metadata (wa) kills sometime the not dm
aware apps, therefore it is a good goal to make this hidden.
2) Reducing this to one directory is also good, but will lead to up-wards
searching for the root .svn dir of the current directory *I'THINK*

=> Conclusion for me.

1) remove .svn completly from the workarea
2) keep it as is (ok, features like get rid of text-base copy are still open)
3) store it as configuration (i explain below)

Imagine the svn-wa implementation would look like.

# config/var dirs
/home/<user>/.subversion/config.wa      file with the wa-configuration
/home/<user>/.subversion/wa-metadata    default directory for wa-metadata
wa-dirs
/home/<user>/projects/prj1              root of a svn wa, without any .svn

how do we connect to the wa-metadata

in config.wa there would be

[/home/<user>/projects/prj1]
wa-metadataroot=/home/<user>/.subversion/wa-metadata/<rep-uuid>

and below /home/<user>/.subversion/wa-metadata/<rep-uuid>
all the .svn metadata with directory structure as currently used within the
wa directory struture.

ok. looking further into the future... 
i can also imaging the config.wa can be very usefull, eg.

[/home/<user>/projects/prj1]
wa-metadataroot=/home/<user>/.subversion/wa-metadata/<rep-uuid>
persistant-copy=[yes|no]
keep-persistant-obj-size-limit=100k
fetch-persistant-copy-if-change-detected-from-server-with-next-update=[yes|no]

and so on... 

For sure i am available for further discussion.
kind regards,
H.Stein ;)