Login | Register
My pages Projects Community openCollabNet

Discussions > dev [DISABLED] > Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

subversion
Discussion topic

Hide all messages in topic

All messages in topic

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author gstein
Full name Greg Stein
Date 2009-11-12 09:02:47 PST
Message On Thu, Nov 12, 2009 at 09:57, Julian Foad <julian.foad@wand​isco.com> wrote:
>...
>> The intent is that a relpath is always attached to some root. For
>> Windows that could be "\". For a URL, that could be
>> "http://hostname/". Whatever. But never ever a leading slash.
>
> I hope we're in agreement that a relpath can not only be relative to
> that kind of minimal "root", but can also be relative to a repository
> root such as "http://hostname/dir1/dir2/" or to a WC absdir such as
> "/home/julian/wc" or to another relpath such as "subversion/libsvn_wc",
> in fact relative to any URI or dirent or relpath.

Yup. A relpath can be joined onto any of the three.

>...
>>  As I noted above, we also want them to *always* be
>> absolute. The codebase is pretty darned close to allowing for that.
>> Also note that the svn_uri_* functions are new in 1.7, so we can
>> define them with this restriction.
>
> You explained in follow-up emails that it's useful to be sure that we're
> always using abs URIs in these APIs. That's an OK position to take, and

Another thing that I just thought of: a relative URI is not useful.
You always have to join it with something. By taking the position of
"always absolute", then it is "always useful" and you don't have to
search your context for something to join it with.

> I'm not against it, but if we do that then I'm sure we will also need a
> few API functions to handle relative URIs. That's fine, and may be the

Yup.

>...

Cheers,
-g

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author Julian Foad <julian dot foad at wandisco dot com>
Full name Julian Foad <julian dot foad at wandisco dot com>
Date 2009-11-12 06:57:33 PST
Message On Wed, 2009-11-11 at 15:36 -0500, Greg Stein wrote:
> On Wed, Nov 11, 2009 at 07:25, Julian Foad <julian.foad@wand​isco.com> wrote:
> >...
> > RELPATH:
> >
> > A "relpath" represents "an unrooted path that can be joined to any other
> > relative path, uri or dirent". Good, but let's specify it more
> > precisely. The terms "absolute" and "relative" are not clearly defined
> > when applied to partially-relative paths such as a Windows
> > "\rel-to-current-drive" or a URL "/rel-to-server".
>
> No. The intent is to NOT have any leading slashes. A relpath should be
> "path/to/some/place".

Totally agree. That was and is my intent.

> Change the docstring accordingly.
>
> The intent is that a relpath is always attached to some root. For
> Windows that could be "\". For a URL, that could be
> "http://hostname/". Whatever. But never ever a leading slash.

I hope we're in agreement that a relpath can not only be relative to
that kind of minimal "root", but can also be relative to a repository
root such as "http://hostname/dir1/dir2/" or to a WC absdir such as
"/home/julian/wc" or to another relpath such as "subversion/libsvn_wc",
in fact relative to any URI or dirent or relpath.

> Bert has been making a bunch of changes throughout the codebase in
> order to make the URI APIs take *only* absolute URIs. Never a relative
> URI.

Sure - see below.

> > URLS:
> >
> > * Define and name functions for URLs instead of URIs.
>
> Nope.
>
> I spoke at length, last week, with Roy Fielding [...]

OK, those arguments are good enough for me.

> > * The representation of a URL should be always URI-encoded.
>
> Yah. That's how we treat them, in general, but having it declared that
> way would be good.

Great.

> As I noted above, we also want them to *always* be
> absolute. The codebase is pretty darned close to allowing for that.
> Also note that the svn_uri_* functions are new in 1.7, so we can
> define them with this restriction.

You explained in follow-up emails that it's useful to be sure that we're
always using abs URIs in these APIs. That's an OK position to take, and
I'm not against it, but if we do that then I'm sure we will also need a
few API functions to handle relative URIs. That's fine, and may be the
best way to go. The relative-URI API need only consist of a very few
functions, and could be aimed only at converting relative URIs to
absolute URIs (and maybe vice versa), or maybe only at converting
relpaths to and from relative URIs, whereas the abs-URI API consists of
very many functions.


> Outside of the above... *shrug*. Seems fine.

Thanks.
- Julian

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author rdonch
Full name Roman Donchenko
Date 2009-11-11 16:44:20 PST
Message Branko Cibej <brane at xbc dot nu> писал в своём письме Thu, 12 Nov 2009 01:26:22
+0300:

> Why not just drop the must-be-absolute requirement? Those are perfectly
> valid URIs, as per spec, they simply lack the schema and server part.

Well, if we're talking about the spec (which, I assume, is RFC 3986), then
they are in fact not URIs, but relative URI references.

> (BTW, URI-encoding them internally is going to cause no end of screaming
> horrors. My crystal ball has spoken.)

Well, if there are separate sets of functions for handling relative OS
paths and relative URI references and there are conversion functions which
do the necessary escaping/unescaping, then nothing can go wrong. ;=]

It's possible that there will be problems with different representations
of the same URI reference, so a canonical representation might be needed.
I'm not familiar enough with the internals to judge whether it's actually
a problem.

Roman.

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author gstein
Full name Greg Stein
Date 2009-11-11 14:32:46 PST
Message On Wed, Nov 11, 2009 at 17:26, Branko Čibej <brane at xbc dot nu> wrote:
> Greg Stein wrote:
>> On Wed, Nov 11, 2009 at 17:07, Greg Stein <gstein at gmail dot com> wrote:
>>
>>> On Wed, Nov 11, 2009 at 16:26, Branko Čibej <brane at xbc dot nu> wrote:
>>>
>>>> Greg Stein wrote:
>>>>
>>>>>> * The representation of a URL should be always URI-encoded.
>>>>>>
>>>>>>
>>>>> Yah. That's how we treat them, in general, but having it declared that
>>>>> way would be good. As I noted above, we also want them to *always* be
>>>>> absolute. The codebase is pretty darned close to allowing for that.
>>>>> Also note that the svn_uri_* functions are new in 1.7, so we can
>>>>> define them with this restriction.
>>>>>
>>>>>
>>>> Oh hum. That reminds me of my recent changes in svndumpfilter on this
>>>> very topic. Svnumpfilter uses "repository-absolute" paths, that is,
>>>> paths within the versionable filesystem that always have a leading /.
>>>> Clearly those are not dirents; nor are they relpaths; nor, by your
>>>> definition above, are they URIs to the intent of the svn_uri API.
>>>> They're not URI-encoded, either.
>>>>
>>>> Which leaves me scratching my head, wondering which of the three
>>>> inapplicable families of functions svndumpfilter should be using.
>>>>
>>> FS is the odd man out. The leading-slash paths don't fit well with
>>> much of anything.
>>>
>>> It would be nice if it used a relpath [from the root].
>>>
>>
>> To expand a bit more...
>>
>> Bert and I discussed this a few times. Because it isn't a relpath, and
>> it isn't a dirent, that is why we use the URI functions for FS paths.
>> But once Bert switches on the "must be absolute" bit, then everything
>> will fall over. I dunno what his plan was for the FS (he's been
>> updating stuff throughout the client, wc, and RA layers).
>>
>> We never came up with a good solution. Sigh.
>>
>
> Why not just drop the must-be-absolute requirement? Those are perfectly
> valid URIs, as per spec, they simply lack the schema and server part.

That's what we have today. But I think it would be good to have less
"oh, but wait. it is tuesday, so that is only relative. not absolute."
... less variability can be quite handy.

That leading slash in the FS is superfluous. It's a constant. rm it, I say.

> (BTW, URI-encoding them internally is going to cause no end of screaming
> horrors. My crystal ball has spoken.)

Bah. Sucker's bet. You can come back and say "I told you so", or
nobody will ever remember you made this statement.

Cheers,
-g

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author brane
Full name Branko Cibej
Date 2009-11-11 14:26:27 PST
Message Greg Stein wrote:
> On Wed, Nov 11, 2009 at 17:07, Greg Stein <gstein at gmail dot com> wrote:
>
>> On Wed, Nov 11, 2009 at 16:26, Branko Čibej <brane at xbc dot nu> wrote:
>>
>>> Greg Stein wrote:
>>>
>>>>> * The representation of a URL should be always URI-encoded.
>>>>>
>>>>>
>>>> Yah. That's how we treat them, in general, but having it declared that
>>>> way would be good. As I noted above, we also want them to *always* be
>>>> absolute. The codebase is pretty darned close to allowing for that.
>>>> Also note that the svn_uri_* functions are new in 1.7, so we can
>>>> define them with this restriction.
>>>>
>>>>
>>> Oh hum. That reminds me of my recent changes in svndumpfilter on this
>>> very topic. Svnumpfilter uses "repository-absolute" paths, that is,
>>> paths within the versionable filesystem that always have a leading /.
>>> Clearly those are not dirents; nor are they relpaths; nor, by your
>>> definition above, are they URIs to the intent of the svn_uri API.
>>> They're not URI-encoded, either.
>>>
>>> Which leaves me scratching my head, wondering which of the three
>>> inapplicable families of functions svndumpfilter should be using.
>>>
>> FS is the odd man out. The leading-slash paths don't fit well with
>> much of anything.
>>
>> It would be nice if it used a relpath [from the root].
>>
>
> To expand a bit more...
>
> Bert and I discussed this a few times. Because it isn't a relpath, and
> it isn't a dirent, that is why we use the URI functions for FS paths.
> But once Bert switches on the "must be absolute" bit, then everything
> will fall over. I dunno what his plan was for the FS (he's been
> updating stuff throughout the client, wc, and RA layers).
>
> We never came up with a good solution. Sigh.
>

Why not just drop the must-be-absolute requirement? Those are perfectly
valid URIs, as per spec, they simply lack the schema and server part.

(BTW, URI-encoding them internally is going to cause no end of screaming
horrors. My crystal ball has spoken.)

-- Brane

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author gstein
Full name Greg Stein
Date 2009-11-11 14:09:31 PST
Message On Wed, Nov 11, 2009 at 17:07, Greg Stein <gstein at gmail dot com> wrote:
> On Wed, Nov 11, 2009 at 16:26, Branko Čibej <brane at xbc dot nu> wrote:
>> Greg Stein wrote:
>>>> * The representation of a URL should be always URI-encoded.
>>>>
>>>
>>> Yah. That's how we treat them, in general, but having it declared that
>>> way would be good. As I noted above, we also want them to *always* be
>>> absolute. The codebase is pretty darned close to allowing for that.
>>> Also note that the svn_uri_* functions are new in 1.7, so we can
>>> define them with this restriction.
>>>
>>
>> Oh hum. That reminds me of my recent changes in svndumpfilter on this
>> very topic. Svnumpfilter uses "repository-absolute" paths, that is,
>> paths within the versionable filesystem that always have a leading /.
>> Clearly those are not dirents; nor are they relpaths; nor, by your
>> definition above, are they URIs to the intent of the svn_uri API.
>> They're not URI-encoded, either.
>>
>> Which leaves me scratching my head, wondering which of the three
>> inapplicable families of functions svndumpfilter should be using.
>
> FS is the odd man out. The leading-slash paths don't fit well with
> much of anything.
>
> It would be nice if it used a relpath [from the root].

To expand a bit more...

Bert and I discussed this a few times. Because it isn't a relpath, and
it isn't a dirent, that is why we use the URI functions for FS paths.
But once Bert switches on the "must be absolute" bit, then everything
will fall over. I dunno what his plan was for the FS (he's been
updating stuff throughout the client, wc, and RA layers).

We never came up with a good solution. Sigh.

Cheers,
-g

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author gstein
Full name Greg Stein
Date 2009-11-11 14:07:54 PST
Message On Wed, Nov 11, 2009 at 16:26, Branko Čibej <brane at xbc dot nu> wrote:
> Greg Stein wrote:
>>> * The representation of a URL should be always URI-encoded.
>>>
>>
>> Yah. That's how we treat them, in general, but having it declared that
>> way would be good. As I noted above, we also want them to *always* be
>> absolute. The codebase is pretty darned close to allowing for that.
>> Also note that the svn_uri_* functions are new in 1.7, so we can
>> define them with this restriction.
>>
>
> Oh hum. That reminds me of my recent changes in svndumpfilter on this
> very topic. Svnumpfilter uses "repository-absolute" paths, that is,
> paths within the versionable filesystem that always have a leading /.
> Clearly those are not dirents; nor are they relpaths; nor, by your
> definition above, are they URIs to the intent of the svn_uri API.
> They're not URI-encoded, either.
>
> Which leaves me scratching my head, wondering which of the three
> inapplicable families of functions svndumpfilter should be using.

FS is the odd man out. The leading-slash paths don't fit well with
much of anything.

It would be nice if it used a relpath [from the root].

Cheers,
-g

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author brane
Full name Branko Cibej
Date 2009-11-11 13:26:52 PST
Message Greg Stein wrote:
>> * The representation of a URL should be always URI-encoded.
>>
>
> Yah. That's how we treat them, in general, but having it declared that
> way would be good. As I noted above, we also want them to *always* be
> absolute. The codebase is pretty darned close to allowing for that.
> Also note that the svn_uri_* functions are new in 1.7, so we can
> define them with this restriction.
>

Oh hum. That reminds me of my recent changes in svndumpfilter on this
very topic. Svnumpfilter uses "repository-absolute" paths, that is,
paths within the versionable filesystem that always have a leading /.
Clearly those are not dirents; nor are they relpaths; nor, by your
definition above, are they URIs to the intent of the svn_uri API.
They're not URI-encoded, either.

Which leaves me scratching my head, wondering which of the three
inapplicable families of functions svndumpfilter should be using.

-- Brane

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author stsp
Full name Stefan Sperling
Date 2009-11-11 13:10:47 PST
Message On Wed, Nov 11, 2009 at 11:58:59PM +0300, Roman Donchenko wrote:
> Stefan Sperling <stsp at elego dot de> писал в своём письме Wed, 11 Nov 2009
> 20:02:01 +0300:
> > You get a patch someone created on Windows, so it contains paths using
> > backslashes as separators, and you want 'svn patch' to apply it to
> > a working copy you have on a unix machine.
>
> Even on Windows, how do you get a patch with backslashes? I just checked,
> and both svn diff and GNU diff use forward slashes.

I have no idea, the patch is here:
http://subversion.ti​gris.org/ds/viewMess​age.do?dsForumId=462​&dsMessageId=240​9838

Stefan

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author rdonch
Full name Roman Donchenko
Date 2009-11-11 12:59:36 PST
Message Stefan Sperling <stsp at elego dot de> писал в своём письме Wed, 11 Nov 2009
20:02:01 +0300:

> On Wed, Nov 11, 2009 at 05:46:21PM +0100, Branko Cibej wrote:
>> Julian Foad wrote:
>> > Stefan Sperling wrote:
>> >
>> >> Yes, but there's no function that deals with the case where you get
>> >> a Windows path on unix and need to normalise it to internal style.
>> >>
>> >> I don't know if dirent_uri should provide that. But I had to cook
>> >> a custom hack in libsvn_client/patch.c to deal with this case.
>> >>
>> >
>> > Good point: it sounds like it would be useful for the lib to provide
>> > functions for handling Windows paths even when running on Unix.
>> >
>>
>> No wait -- when do you ever have to deal with Windows paths on Unix?
>> where do they come from? Those are not valid filesystem paths on the OS
>> where your client is running -- so what's the use case?
>
> I've briefly hinted at it above:
> You get a patch someone created on Windows, so it contains paths using
> backslashes as separators, and you want 'svn patch' to apply it to
> a working copy you have on a unix machine.

Even on Windows, how do you get a patch with backslashes? I just checked,
and both svn diff and GNU diff use forward slashes.

Roman.

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author gstein
Full name Greg Stein
Date 2009-11-11 12:37:01 PST
Message On Wed, Nov 11, 2009 at 07:25, Julian Foad <julian.foad@wand​isco.com> wrote:
>...
> RELPATH:
>
> A "relpath" represents "an unrooted path that can be joined to any other
> relative path, uri or dirent". Good, but let's specify it more
> precisely. The terms "absolute" and "relative" are not clearly defined
> when applied to partially-relative paths such as a Windows
> "\rel-to-current-drive" or a URL "/rel-to-server".

No. The intent is to NOT have any leading slashes. A relpath should be
"path/to/some/place". Change the docstring accordingly.

The intent is that a relpath is always attached to some root. For
Windows that could be "\". For a URL, that could be
"http://hostname/". Whatever. But never ever a leading slash.

>...
> URI:
>
> A "uri" in this API represents an "absolute path that starts with a '/'
> or a schema definition"... which is gratuitously specialized, compared
> with the official definition of a URI.
>
> We use URLs a lot and rarely need to use more general URIs, so I think
> the API should be geared specifically to URLs.
>
> It is not clear whether the representation of a URI is URI-encoded. The
> API should make a clear promise. I think it should be, both because
> that's a valuable part of the utility of a URL API, and because it seems
> unlikely to be possible to fully support URL manipulations without them
> being URI-encoded. (The sort of thing that springs to mind that simply
> would not work is if my password contains a "/" and I try to represent
> "http://username:pass​word at my dot org/" without URI-encoding.)

Bert has been making a bunch of changes throughout the codebase in
order to make the URI APIs take *only* absolute URIs. Never a relative
URI.

>...
> URLS:
>
> * Define and name functions for URLs instead of URIs.

Nope.

I spoke at length, last week, with Roy Fielding about this specific
topic. I explained our scenarios, and he recommended that we *stick*
to the URI name.

Sure, the APIs only deal with a subset of the URI space, but that
isn't a problem. Just document it as "we only handle <these> schemes".
And people aren't really using URNs anyway, so it doesn't matter much.

Over time, "the world" will converge on just the URI naming. We should
stick to the URI name in our APIs, and let the world catch up. Our
APIs are going to be around for a *long* time. Let's use the
correct/future names for it.

(and if you don't know who Roy is... he's one of the *primary* authors
of the URI specification; as far as I'm concerned, what he says goes;
I've found him to be rarely wrong, and you can ask Sander and Justin
to verify that :-P )

> * The representation of a URL should be always URI-encoded.

Yah. That's how we treat them, in general, but having it declared that
way would be good. As I noted above, we also want them to *always* be
absolute. The codebase is pretty darned close to allowing for that.
Also note that the svn_uri_* functions are new in 1.7, so we can
define them with this restriction.

>...

Outside of the above... *shrug*. Seems fine.

Cheers,
-g

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author stsp
Full name Stefan Sperling
Date 2009-11-11 11:48:19 PST
Message On Wed, Nov 11, 2009 at 08:37:37PM +0100, Branko Cibej wrote:
> Stefan Sperling wrote:
> > You get a patch someone created on Windows, so it contains paths using
> > backslashes as separators, and you want 'svn patch' to apply it to
> > a working copy you have on a unix machine.
> >
>
> Does it work with plain "patch"? No, it does not.

Good point. I forgot to check whether unix patch supports this
before committing r40399.

> You have to manually
> tweak the patch before you can apply it. Exempli gratia:

> Not saying that this is an argument against "svn patch" accepting such
> files, but the purpose of the svn_path_(internal|local)_style functions
> is to deal with local user input, not with cross-OS data migration
> issues. Moreover, just blindly converting backslashes to slashes is
> *ambiguous* and therefore wrong on Unix. (Going the other way is not
> ambiguous on Windows.)

Yeah, a comment I added to the code says:

  /* Contrary to what one might expect, svn_dirent_internal_style() does not
   * replace backslashes with slashes on UNIX. But it's quite possible that
   * a patch generated on Windows uses backslashes as path separators.
   * To apply such patches on UNIX, we need to normalise separators to '/'.
   * Do a global search-replace if the path from the patch file contains
   * only backslashes but no forward slashes. This may not be suitable in all
   * situations, e.g. backslashes might be part of a filename with no leading
   * directory components. But let's optimise for seamless interoperability
   * between platforms rather than for people using weird filenames. */

So I did misunderstand the dirent API.
The question now is whether we want to keep this hack or get rid of it?
Given that patch does not support this, it's not part of my initial
set of 'svn patch' requirements, which is to implement a big enough
subset UNIX patch functionality to be generally useful, but not add
any unnecessary bells and whistles yet. (By the way, Julian, since you
asked for spec, this last sentence is probably the first 'svn patch'
spec I've ever written ;)

Stefan

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author brane
Full name Branko Cibej
Date 2009-11-11 11:37:42 PST
Message Stefan Sperling wrote:
> On Wed, Nov 11, 2009 at 05:46:21PM +0100, Branko Cibej wrote:
>
>> Julian Foad wrote:
>>
>>> Stefan Sperling wrote:
>>>
>>>
>>>> Yes, but there's no function that deals with the case where you get
>>>> a Windows path on unix and need to normalise it to internal style.
>>>>
>>>> I don't know if dirent_uri should provide that. But I had to cook
>>>> a custom hack in libsvn_client/patch.c to deal with this case.
>>>>
>>>>
>>> Good point: it sounds like it would be useful for the lib to provide
>>> functions for handling Windows paths even when running on Unix.
>>>
>>>
>> No wait -- when do you ever have to deal with Windows paths on Unix?
>> where do they come from? Those are not valid filesystem paths on the OS
>> where your client is running -- so what's the use case?
>>
>
> I've briefly hinted at it above:
> You get a patch someone created on Windows, so it contains paths using
> backslashes as separators, and you want 'svn patch' to apply it to
> a working copy you have on a unix machine.
>

Does it work with plain "patch"? No, it does not. You have to manually
tweak the patch before you can apply it. Exempli gratia:

$ patch -p0 < test.patch
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: test\Dispatcher.java
|===================​====================​====================​========
|--- test\Dispatcher.java (revision 23)
|+++ test\Dispatcher.java (revision 22)
--------------------------
File to patch: ^C
$ ls -l test/Dispatcher.java
-rw-r--r-- 1 brane brane 1872 2009-11-11 14:46 daytest/Dispatcher.java


(I produced the patch file with "svn diff" on Linux and just replaced /
with \)

Not saying that this is an argument against "svn patch" accepting such
files, but the purpose of the svn_path_(internal|local)_style functions
is to deal with local user input, not with cross-OS data migration
issues. Moreover, just blindly converting backslashes to slashes is
*ambiguous* and therefore wrong on Unix. (Going the other way is not
ambiguous on Windows.)

-- Brane

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author stsp
Full name Stefan Sperling
Date 2009-11-11 09:02:07 PST
Message On Wed, Nov 11, 2009 at 05:46:21PM +0100, Branko Cibej wrote:
> Julian Foad wrote:
> > Stefan Sperling wrote:
> >
> >> Yes, but there's no function that deals with the case where you get
> >> a Windows path on unix and need to normalise it to internal style.
> >>
> >> I don't know if dirent_uri should provide that. But I had to cook
> >> a custom hack in libsvn_client/patch.c to deal with this case.
> >>
> >
> > Good point: it sounds like it would be useful for the lib to provide
> > functions for handling Windows paths even when running on Unix.
> >
>
> No wait -- when do you ever have to deal with Windows paths on Unix?
> where do they come from? Those are not valid filesystem paths on the OS
> where your client is running -- so what's the use case?

I've briefly hinted at it above:
You get a patch someone created on Windows, so it contains paths using
backslashes as separators, and you want 'svn patch' to apply it to
a working copy you have on a unix machine.

Stefan

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author brane
Full name Branko Cibej
Date 2009-11-11 08:46:25 PST
Message Julian Foad wrote:
> Stefan Sperling wrote:
>
>> Yes, but there's no function that deals with the case where you get
>> a Windows path on unix and need to normalise it to internal style.
>>
>> I don't know if dirent_uri should provide that. But I had to cook
>> a custom hack in libsvn_client/patch.c to deal with this case.
>>
>
> Good point: it sounds like it would be useful for the lib to provide
> functions for handling Windows paths even when running on Unix.
>

No wait -- when do you ever have to deal with Windows paths on Unix?
where do they come from? Those are not valid filesystem paths on the OS
where your client is running -- so what's the use case?

-- Brane

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author Julian Foad <julian dot foad at wandisco dot com>
Full name Julian Foad <julian dot foad at wandisco dot com>
Date 2009-11-11 06:34:19 PST
Message Stefan Sperling wrote:
> Yes, but there's no function that deals with the case where you get
> a Windows path on unix and need to normalise it to internal style.
>
> I don't know if dirent_uri should provide that. But I had to cook
> a custom hack in libsvn_client/patch.c to deal with this case.

Good point: it sounds like it would be useful for the lib to provide
functions for handling Windows paths even when running on Unix.

- Julian

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author stsp
Full name Stefan Sperling
Date 2009-11-11 06:02:48 PST
Message On Wed, Nov 11, 2009 at 01:15:06PM +0000, Julian Foad wrote:
> On Wed, 2009-11-11 at 14:08 +0100, Stefan Sperling wrote:
> > On Wed, Nov 11, 2009 at 12:25:49PM +0000, Julian Foad wrote:
> > > DIRENT:
> > >
> > > A "dirent" represents a native operating-system path... but let's be
> > > clear exactly what kinds of absolute and relative path this includes.
> > >
> > > The representation seems a bit odd, using Subversion's "canonical path"
> > > rules ("/" separator, etc.), rather than the native form, and so
> > > requiring "to_internal_style" and "to_native_style" conversions.
> >
> > One observation I've made recently: On UNIX, passing a path containing
> > backslash-separators (e.g. a path parsed from a patch file) to
> > svn_dirent_internal_style() does absolutely nothing.
> > It just returns the path unmodified.
>
> Isn't that working as designed? A Unix path is allowed to contain
> backslash characters, and they are not treated as path separators, so
> "my\file" is a valid Unix filename, consisting of one component.

Yes, but there's no function that deals with the case where you get
a Windows path on unix and need to normalise it to internal style.

I don't know if dirent_uri should provide that. But I had to cook
a custom hack in libsvn_client/patch.c to deal with this case.

Stefan

RE: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author rhuijben
Full name Bert Huijben
Date 2009-11-11 05:51:49 PST
Message > -----Original Message-----
> From: Julian Foad [mailto:julian.foad@​wandisco.com]
> Sent: woensdag 11 november 2009 14:34
> To: Branko Cibej
> Cc: dev at subversion dot tigris dot org
> Subject: Re: [RFC] Paths API (svn_dirent_uri.h) - improvements
>
> Branko Cibej wrote:
> > > * (Advanced.) An OSPATH object should know whether it is case-
> sensitive.
>
> > I don't believe any such intrinsic knowledge is going to work.
>
> You may well be right. I'm not prepared to give it the necessary
> thought :-(

On Windows it can be that
C:\Folder is case insensitive while C:\Folder\SubFolder is not, e.g. when
handling junctions.

And there is no api to retrieve if it is or isn't case sensitive or which
locale it uses for case insensitivity (which can differ over drives too).

    Bert

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author Julian Foad <julian dot foad at wandisco dot com>
Full name Julian Foad <julian dot foad at wandisco dot com>
Date 2009-11-11 05:33:54 PST
Message Branko Cibej wrote:
> > * (Advanced.) An OSPATH object should know whether it is case-sensitive.

> I don't believe any such intrinsic knowledge is going to work.

You may well be right. I'm not prepared to give it the necessary
thought :-(

- Julian

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author brane
Full name Branko Cibej
Date 2009-11-11 05:29:37 PST
Message Stefan Sperling wrote:
> On Wed, Nov 11, 2009 at 12:25:49PM +0000, Julian Foad wrote:
>
>> DIRENT:
>>
>> A "dirent" represents a native operating-system path... but let's be
>> clear exactly what kinds of absolute and relative path this includes.
>>
>> The representation seems a bit odd, using Subversion's "canonical path"
>> rules ("/" separator, etc.), rather than the native form, and so
>> requiring "to_internal_style" and "to_native_style" conversions.
>>
>
> One observation I've made recently: On UNIX, passing a path containing
> backslash-separators (e.g. a path parsed from a patch file) to
> svn_dirent_internal_style() does absolutely nothing.
> It just returns the path unmodified.
>
> On Windows, passing a path containing backslash-separators
> to svn_dirent_internal_style() does convert any backslashes
> to forward slashes...
>
> So the to_internal_style and to_native_style don't even work
> as one might expect -- they don't always convert to the internal
> style, it depends on the platform.
>

They work *exactly* as expected. Those functions are meant to convert
valid user input to the format Subversion use internally; by
implication, they have to behave in a system-dependent way.

On Windows it's very likely that your editors or IDEs or compilers will
understand backslash-separated paths. That's "somewhat less" likely on
Unix. In other words, if you tell Subversion on Unix to commit
this\mess, it'll fail with a similar error as if you try to "vi this\mess".

-- Brane

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author brane
Full name Branko Cibej
Date 2009-11-11 05:25:23 PST
Message Julian Foad wrote:
> I'm considering some improvements to the "dirent/uri" API.
>

In general, +1; I find myself to be somewhat confused by the new API,
not because I don't understand the concepts, but because there's still
some overlap.

[...]
> One meta-comment is that I feel the low-level path and URL functions
> should be coming straight from APR (or other such support library)
> whereas we seem to have written many of our own.
>

I think you'll find that APR's functionality is either less or more than
we need, depending on circumstances. But again, in general, +1.

[...]

> * (Advanced.) An OSPATH object should know whether it is case-sensitive.
> The default would be according to the platform it's running on, but
> different file systems have different case-sensitivity so eventually if
> we want to get better at handling such issues we'll need this. I'm not
> planning to do this. However it is an example of how we may need to
> encapsulate the path in an object rather than always represent it as a
> plain string.
>

I don't believe any such intrinsic knowledge is going to work. We have
to solve the case-folding problems (such as a case-change-only rename on
Windows) inside the WC library, not the path library. Detecting that
"this name is the same as that other different one on disk" is more
complex than answering the question, "is this filesystem
case-insensitive". You have to let the filesystem itself tell you that,
you can't rely on OS locale-specific APIs to give the right answer.

-- Brane

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author Julian Foad <julian dot foad at wandisco dot com>
Full name Julian Foad <julian dot foad at wandisco dot com>
Date 2009-11-11 05:15:17 PST
Message On Wed, 2009-11-11 at 14:08 +0100, Stefan Sperling wrote:
> On Wed, Nov 11, 2009 at 12:25:49PM +0000, Julian Foad wrote:
> > DIRENT:
> >
> > A "dirent" represents a native operating-system path... but let's be
> > clear exactly what kinds of absolute and relative path this includes.
> >
> > The representation seems a bit odd, using Subversion's "canonical path"
> > rules ("/" separator, etc.), rather than the native form, and so
> > requiring "to_internal_style" and "to_native_style" conversions.
>
> One observation I've made recently: On UNIX, passing a path containing
> backslash-separators (e.g. a path parsed from a patch file) to
> svn_dirent_internal_style() does absolutely nothing.
> It just returns the path unmodified.

Isn't that working as designed? A Unix path is allowed to contain
backslash characters, and they are not treated as path separators, so
"my\file" is a valid Unix filename, consisting of one component.

- Julian


> On Windows, passing a path containing backslash-separators
> to svn_dirent_internal_style() does convert any backslashes
> to forward slashes...
>
> So the to_internal_style and to_native_style don't even work
> as one might expect -- they don't always convert to the internal
> style, it depends on the platform.

Re: [RFC] Paths API (svn_dirent_uri.h) - improvements

Author stsp
Full name Stefan Sperling
Date 2009-11-11 05:08:20 PST
Message On Wed, Nov 11, 2009 at 12:25:49PM +0000, Julian Foad wrote:
> DIRENT:
>
> A "dirent" represents a native operating-system path... but let's be
> clear exactly what kinds of absolute and relative path this includes.
>
> The representation seems a bit odd, using Subversion's "canonical path"
> rules ("/" separator, etc.), rather than the native form, and so
> requiring "to_internal_style" and "to_native_style" conversions.

One observation I've made recently: On UNIX, passing a path containing
backslash-separators (e.g. a path parsed from a patch file) to
svn_dirent_internal_style() does absolutely nothing.
It just returns the path unmodified.

On Windows, passing a path containing backslash-separators
to svn_dirent_internal_style() does convert any backslashes
to forward slashes...

So the to_internal_style and to_native_style don't even work
as one might expect -- they don't always convert to the internal
style, it depends on the platform.

Stefan

[RFC] Paths API (svn_dirent_uri.h) - improvements

Author Julian Foad <julian dot foad at wandisco dot com>
Full name Julian Foad <julian dot foad at wandisco dot com>
Date 2009-11-11 04:25:58 PST
Message I'm considering some improvements to the "dirent/uri" API.


THE GOOD
========

Lieven and Bert made some good moves this year towards untangling the
old "svn_path" APIs that tried to support all kinds of paths the same.
We needed to separate out the handling of local disk paths from the
handling of URLs, because even with our "Subversion internal form" they
still need to follow different rules. So came the "dirent/URI" API.

A good part of the concept is that URIs and native paths have to be
treated separately, but both kinds end with a series of path
"components" that can be added on, taken off, or copied from a native
path to a URL or from a URL to a native path. The functions for adding
and subtracting relpaths to and from each kind of path are (in a sense)
the core of the API.

I did some thinking on the way home from SubConf. If we were writing the
code in a high-level language with a good cross-platform library of
support for URLs/URIs and operating-system native paths, how would we
expect it to behave? Let's try to define and provide the high-level
behaviour that we would like.


THE PROBLEMS
============

There still seems to be lack of crispness about the new path kinds in
the new "svn_dirent_uri" API.

One meta-comment is that I feel the low-level path and URL functions
should be coming straight from APR (or other such support library)
whereas we seem to have written many of our own.

RELPATH:

A "relpath" represents "an unrooted path that can be joined to any other
relative path, uri or dirent". Good, but let's specify it more
precisely. The terms "absolute" and "relative" are not clearly defined
when applied to partially-relative paths such as a Windows
"\rel-to-current-drive" or a URL "/rel-to-server".

DIRENT:

A "dirent" represents a native operating-system path... but let's be
clear exactly what kinds of absolute and relative path this includes.

The representation seems a bit odd, using Subversion's "canonical path"
rules ("/" separator, etc.), rather than the native form, and so
requiring "to_internal_style" and "to_native_style" conversions. That is
a legacy from trying to use a single set of path functions for all kinds
of paths. There are certain benefits, mostly to do with being able to
contruct paths manually by writing "foo/bar" in tests and so on. I don't
think we necessraily need to change this but it seems like we might be
making problems by trying to store native paths in a non-native form.

And I don't much care for the name "dirent" :-) To me, "directory
entry" implies a single path component, and also implies status info
about the directory entry.

URI:

A "uri" in this API represents an "absolute path that starts with a '/'
or a schema definition"... which is gratuitously specialized, compared
with the official definition of a URI.

We use URLs a lot and rarely need to use more general URIs, so I think
the API should be geared specifically to URLs.

It is not clear whether the representation of a URI is URI-encoded. The
API should make a clear promise. I think it should be, both because
that's a valuable part of the utility of a URL API, and because it seems
unlikely to be possible to fully support URL manipulations without them
being URI-encoded. (The sort of thing that springs to mind that simply
would not work is if my password contains a "/" and I try to represent
"http://username:pass​word at my dot org/" without URI-encoding.)


THE CHANGES
===========

These are some changes I'd like to make. Comments solicited.

RELATIVE PATHS:

* A RELPATH should represent a generic "path", not tied to being
interpreted as a URL path or an OS path, but freely able to be
interpreted as either. To convert a RELPATH to a relative URL or a
relative native path (dirent), we should always call an appropriate API
function, even if that doesn't change its in-memory representation. This
will allow us to decouple the representation from the API. (Actually
making such a change would in some places require extra function calls,
and so if there are more than a very few such places we may not want to
do it until we anticipate a particular benefit.)

* A RELPATH should represent always a forward path, with no
back-segments (".."), because a forward path is a nice clean concept and
is all we need to convert between Subversion local disk paths and
Subversion URLs. There will of course be some API available for
interpreting a relative path that might contain "..", but I think such a
path is nearly always user input, and we always know whether it is a URL
or a native OS path, and its interpretation always involves high-level
decisions about how to handle the "going too far back" case and the
ambiguity of whether "../foo" == ".". Therefore that interpretation
should be outside the scope of the defined "RELPATH" concept.

URLS:

* Define and name functions for URLs instead of URIs.

* The representation of a URL should be always URI-encoded.

* A URL shall be defined either as a full URL starting with a scheme, or
as an RFC-defined relative URL. Either definition would be better than
the current specification that it must start with a scheme or with "/".

NATIVE OPERATING-SYSTEM PATHS:

* Specify that an OSPATH can be absolute or relative or partially
relative, and that "relative" doesn't mean necessarily relative to the
process's CWD/current drive but relative to whatever its user wants it
to be relative to. Therefore, in convert-to-abs functions, the caller
should be able to specify (or the function doc string should state) what
it's relative to.

* (Advanced.) An OSPATH object should know whether it is case-sensitive.
The default would be according to the platform it's running on, but
different file systems have different case-sensitivity so eventually if
we want to get better at handling such issues we'll need this. I'm not
planning to do this. However it is an example of how we may need to
encapsulate the path in an object rather than always represent it as a
plain string.

* (Trivial.) Rename DIRENT to OSPATH. Alternative: FILEPATH, as used in
APR. But such a rename is the least of my concerns, and only makes sense
as a companion to changing the semantics.


I think the changes would not negate the work currently being done to
move to the current new APIs. Even if the same calls need to be changed
again, the current work in discovering and distinguishing what kind of
paths are being handled will make that next step easier.

This all sounds like a lot, but I hope we can do something towards it.
Does it make sense?

- Julian
Messages per page: