Login | Register
My pages Projects Community openCollabNet

Discussions > dev [DISABLED] > Re: svn commit: r38394 - in trunk/subversion: include libsvn_delta

subversion
Discussion topic

Hide all messages in topic

All messages in topic

Re: Defining atomic "replace"

Author Bill Tutt <bill dot tutt at gmail dot com>
Full name Bill Tutt <bill dot tutt at gmail dot com>
Date 2009-08-14 11:12:04 PDT
Message On Thu, Aug 13, 2009 at 7:24 PM, Julian Foad <julianfoad@btope​nworld.com>wrote:​


> On Thu, 2009-08-13, Bill Tutt wrote: > Move, you mentioned move. You're
> giving me a headache flashback.
>
>
> Hi Bill. Thanks for the thought input...
> > i.e. The directory heirarchy changing dependant case:
> > A\a.cs A\B\C\c.cs
> >
> > mv A\B B
> > mv A B\A
> >
> > This ends up with B\A\a.cs and B\C\c.cs.
> >
> > Regardless of whether not you can submit such a thing in one
> > changeset, a non-operational merge would need to deal with the
> > problem. (i.e. not allowing it in the general changeset case doesn't
> > prevent it from the non-operational merge case)
>
>
> Yes, OK...
> > or the substantially less annoying and fairly simple dependant rename
> > case:
> >
> > mv a.cs temp.cs
> > mv b.cs a.cs
> > mv temp.cs b.cs
>
>
> which swaps a.cs with b.cs, and is equivalent to (and let's say the current
> revision is 3)
>
> del a.cs
> del b.cs
> copy <URL of a.cs>@3 b.cs
> copy <URL of b.cs>@3 a.cs
>
>
> > Both of the above cases also suffer from the problem of needing to
> > disallow not including all of the pending moves in a submission.
>
>
> <parsing double-negatives>... <done>
>

To rephrase and clarify:

Dependant rename set: A set of rename operations that require the operations
to execute in a specific order in the WC so that all file contents end up
with the correct path and contents. There are three possible cases here:

File dependant rename example:
mv a.cs -> b.cs
mv b.cs -> a.cs

Directory dependant rename example:
mv A -> B
mv B -> A

Directory hierarchy changing rename example:
A\B\C are directories that contain files:
mv A\B -> B
mv A -> B\A

If Subversion did not allow the replace mechanism with these types of moves:
All of the pending moves in a dependent rename set that was NOT "Directory
hierarchy changing" would need to be submitted at the same time.

"Directory hierarchy changing" renames do not have to be submitted
simultaneously, however allowing them seems like a very bad idea when
balanced with a typical version control end user concept of: A given
changeset should always be buildalbe. You would be submitting a tree
structure that never actually occurred in the working copy. *shivers*

The code necessary to unwind only part of a "directory hierarchy changing"
rename when you have other pending renames in the same submission that
interact with the "directory hirarchy changing" rename isn't particulary
pretty or cheap. (It's that fun two pass rename algorthim dreamt up by the
author of arch.) I'd recommend only implementing that code for 'svn
revert'/''svn merge' if you ever write that code at all.


> In Subversion, currently, as you know, you are allowed to commit just
> one half of the moves. Let's examine what happens if you do. Note that
> the selection of what to include in the commit is done by path, not by
> "object identity". If in your latter case we choose to commit just
> "a.cs", the result would be:
>
> r4:
> a.cs is replaced with a copy of b.cs@3
> b.cs is unaffected
>
> I agree that there is a usability benefit in disallowing this partial
> commit, but I don't see that it causes any particularly bad or
> surprising result. You can still commit the other half (b.cs) later and
> get the intended end result:
>
 r5:
> a.cs is unaffected
> b.cs is replaced with a copy of a.cs@3
>
> The WC remembered that b.cs should be a copy of a.cs@3. If instead it
> had remembered it as "a copy of head revision of a.cs", this second
> commit would be effectively a no-op because a.cs@HEAD was at this time
> already the same as b.cs, and that would not be the intended end result.
>
>

You're correct with replace only doing a paritial commit of the file
dependant rename set is fairly harmless. Odd and wrong, but fairly harmless.


> As part of "true renames" support, we will probably want to change the
> semantics of moves and copies to say "a copy of head revision of ..."
> rather than a specific revision number. Then we will have to watch out
> for this split commit, and probably disallow it.
>

Why would you ever want to say "a copy of head revision of ..."? You want to
generate conflicts appropriately when you make your submission. If
the file/directory was renamed behind your back you need to reverify the
user's desire to maintain the same destination path.


>
> > (Esp. of directories, but disallowing it entirely always seemed like a
> > good idea to me.) If you allow partial pending move submissions then
> > you're submitting a tree state you haven't built yet.
>
>
> Yes, but that's true of any kind of partial commit where you make WC modifications
> in one order and then commit the pieces in another order.
> You feel this is particularly bad in the case of splitting a move. I
> think we all agree it's worse and not a good idea, but I don't
> understand whether you have some special viewpoint that makes you feel
> it's so terrible or such a headache.
>
>

My big fear is users accidentally submitting a tree structure change that
wasn't the shape of their WC (for each dependant rename set) at time of
submission. That seems like a black hole of badness and its not particulary
hard to check for. i.e. If something is always a bad idea why not prevent
the user from accidently hanging themselves?

Not submitting all renames in a file dependant rename set is certainly odd,
but substantially less scary to me than not submitting all renames in a
directory dependant rename set.

Bill
Attachments

Re: Defining atomic "replace"

Author julianfoad
Full name Julian Foad
Date 2009-08-13 16:24:48 PDT
Message On Thu, 2009-08-13, Bill Tutt wrote:
> Move, you mentioned move. You're giving me a headache flashback.

Hi Bill. Thanks for the thought input...
 
> i.e. The directory heirarchy changing dependant case:
> A\a.cs A\B\C\c.cs
>
> mv A\B B
> mv A B\A
>
> This ends up with B\A\a.cs and B\C\c.cs.
>
> Regardless of whether not you can submit such a thing in one
> changeset, a non-operational merge would need to deal with the
> problem. (i.e. not allowing it in the general changeset case doesn't
> prevent it from the non-operational merge case)

Yes, OK...
 
> or the substantially less annoying and fairly simple dependant rename
> case:
>
> mv a.cs temp.cs
> mv b.cs a.cs
> mv temp.cs b.cs

which swaps a.cs with b.cs, and is equivalent to (and let's say the
current revision is 3)

del a.cs
del b.cs
copy <URL of a.cs>@3 b.cs
copy <URL of b.cs>@3 a.cs

 
> Both of the above cases also suffer from the problem of needing to
> disallow not including all of the pending moves in a submission.

<parsing double-negatives>... <done>

In Subversion, currently, as you know, you are allowed to commit just
one half of the moves. Let's examine what happens if you do. Note that
the selection of what to include in the commit is done by path, not by
"object identity". If in your latter case we choose to commit just
"a.cs", the result would be:

r4:
  a.cs is replaced with a copy of b.cs@3
  b.cs is unaffected

I agree that there is a usability benefit in disallowing this partial
commit, but I don't see that it causes any particularly bad or
surprising result. You can still commit the other half (b.cs) later and
get the intended end result:

r5:
  a.cs is unaffected
  b.cs is replaced with a copy of a.cs@3

The WC remembered that b.cs should be a copy of a.cs@3. If instead it
had remembered it as "a copy of head revision of a.cs", this second
commit would be effectively a no-op because a.cs@HEAD was at this time
already the same as b.cs, and that would not be the intended end result.

As part of "true renames" support, we will probably want to change the
semantics of moves and copies to say "a copy of head revision of ..."
rather than a specific revision number. Then we will have to watch out
for this split commit, and probably disallow it.

> (Esp. of directories, but disallowing it entirely always seemed like a
> good idea to me.) If you allow partial pending move submissions then
> you're submitting a tree state you haven't built yet.

Yes, but that's true of any kind of partial commit where you make WC
modifications in one order and then commit the pieces in another order.
You feel this is particularly bad in the case of splitting a move. I
think we all agree it's worse and not a good idea, but I don't
understand whether you have some special viewpoint that makes you feel
it's so terrible or such a headache.
 
- Julian

Re: Defining atomic "replace"

Author Bill Tutt <bill dot tutt at gmail dot com>
Full name Bill Tutt <bill dot tutt at gmail dot com>
Date 2009-08-13 07:41:32 PDT
Message On Tue, Aug 11, 2009 at 4:04 PM, Greg Stein <gstein at gmail dot com> wrote:

> > (*) Atomic move > In the new editor callbacks, there is a separate
> move() call to represent
> > atomic moves. It is thus possible to issue a delete() followed by a
> move()
> > to the same path that was deleted. Which is, essentially, a replace!
> >
> > --> We need to also add a REPLACES_REV argument to the move() callback,
> and
> > we need to disallow issuing a delete followed by a move to the same path.
>
> Agreed.
> > Actually, has this case ever been considered? Currently, a move()
> becomes an
> > "add-with-history" plus a delete(). Our API doesn't indicate a move
> > replacing a path. We could need another update/status letter for this.
>
>
> There are LOTS of APIs in our system that do not understand moves. The new
> wc_db can record moves, but (in 1.7) nothing is going to *tell* it
> to do that. This new editor API can describe moves, but nothing will
> drive it that way. Or maybe it will, and the receiver will break it
> down to a delete/add to deal with legacy APIs that do not have a move
> operation.
>
> I expect that, over time, we will start propagating a "move" concept
> further and further through our codebase. Only at legacy points (eg.
> speaking to an old server) will they be broken into a delete/add pair.
>

Apologies for the slightly off topic direction shift:
Move, you mentioned move. You're giving me a headache flashback.

i.e. The directory heirarchy changing dependant case:
A\a.cs A\B\C\c.cs

mv A\B B
mv A B\A

This ends up with B\A\a.cs and B\C\c.cs.

Regardless of whether not you can submit such a thing in one changeset, a
non-operational merge would need to deal with the problem. (i.e. not
allowing it in the general changeset case doesn't prevent it from the
non-operational merge case)

or the substantially less annoying and fairly simple dependant rename case:

mv a.cs temp.cs
mv b.cs a.cs
mv temp.cs b.cs

Both of the above cases also suffer from the problem of needing to disallow
not including all of the pending moves in a submission.
(Esp. of directories, but disallowing it entirely always seemed like a good
idea to me.) If you allow partial pending move submissions then you're
submitting a tree state you haven't built yet.

Ugh. Ok, the headache flashback is over now, back to your normal subversion
work. :)

Bill
Attachments

Re: no-op-copy -- was: Re: Defining atomic "replace"

Author neels
Full name Neels Janosch Hofmeyr
Date 2009-08-12 17:53:07 PDT
Message Julian Foad wrote:
> At the moment I don't mind that it works this way. Not sure what you
> mean by "worth testing for".

Yes, that's exactly what I meant ;)
"it's not worth the trouble to test for this case and change this behavior
at the moment".

~Neels
Attachments

Re: no-op-copy -- was: Re: Defining atomic "replace"

Author julianfoad
Full name Julian Foad
Date 2009-08-12 17:03:11 PDT
Message Neels J Hofmeyr wrote:
> + svn rm file
> D file
> + svn cp '^/file@1' file
> A file
> + svn st
> R + file
> + svn ci -m noop
> Replacing file
>
> Committed revision 2.
> + svn log file
> --------------------​--------------------​--------------------​------------
> r2 | neels | 2009-08-12 20:43:07 +0200 (Wed, 12 Aug 2009) | 1 line
>
> noop
> --------------------​--------------------​--------------------​------------
> r1 | neels | 2009-08-12 20:43:05 +0200 (Wed, 12 Aug 2009) | 1 line
>
> add
> --------------------​--------------------​--------------------​------------
> + svn diff -c 2
> [no diff output]
> ]]]
>
> So yes, svn currently permits this no-op.
> But I don't really understand the question. What do you mean by "can svn
> tell the difference"?

To answer my question you need to run "svn log -v". What I mean is: does
Subversion treat "replaced with SELF@(REV-1)" differently from a no-op?

Answer:
[[[
$ svn16 log -v --stop-on-copy wc/file
--------------------​--------------------​--------------------​------------
r2 | julianfoad | 2009-08-13 00:37:37 +0100 (Thu, 13 Aug 2009) | 1 line
Changed paths:
   R /file (from /file:1)

noop
--------------------​--------------------​--------------------​------------
]]]

and the --stop-on-copy stops there, which is very different from how a
completely no-op change would appear (or rather not appear) in history.

So Subversion CAN tell the difference between that and a no-op, even
though the diff is a no-op.


> Does anyone think this case is worth the trouble testing for? It would make
> an awful lot of sense to cancel this to void, but in practice ... the manual
> says "revert" anyway, right?

At the moment I don't mind that it works this way. Not sure what you
mean by "worth testing for".

- Julian

no-op-copy -- was: Re: Defining atomic "replace"

Author neels
Full name Neels Janosch Hofmeyr
Date 2009-08-12 12:00:55 PDT
Message Julian Foad wrote:
> So the result is: file@10 is copied from file@9. I don't know if a
> Subversion repository can tell the difference between that and a no-op
> (or a simple content modification). I would be interested to know
> whether it can.

With Subversion 1.6.4, the attached script shows that this no-op is allowed.
I get this output:

[[[
[...]
+ cd wc
+ echo one
+ svn add file
A file
+ svn ci -m add
Adding file
Transmitting file data .
Committed revision 1.
+ svn rm file
D file
+ svn cp '^/file@1' file
A file
+ svn st
R + file
+ svn ci -m noop
Replacing file

Committed revision 2.
+ svn log file
--------------------​--------------------​--------------------​------------
r2 | neels | 2009-08-12 20:43:07 +0200 (Wed, 12 Aug 2009) | 1 line

noop
--------------------​--------------------​--------------------​------------
r1 | neels | 2009-08-12 20:43:05 +0200 (Wed, 12 Aug 2009) | 1 line

add
--------------------​--------------------​--------------------​------------
+ svn diff -c 2
[no diff output]
]]]

So yes, svn currently permits this no-op.
But I don't really understand the question. What do you mean by "can svn
tell the difference"?

Does anyone think this case is worth the trouble testing for? It would make
an awful lot of sense to cancel this to void, but in practice ... the manual
says "revert" anyway, right?

~Neels
Attachments

Re: Defining atomic "replace"

Author neels
Full name Neels Janosch Hofmeyr
Date 2009-08-12 10:51:43 PDT
Message Julian Foad wrote:
> On Tue, 2009-08-11 at 20:51 +0200, Neels J Hofmeyr wrote:
>> Hi Julian and everyone else,
>>
>> I'd like to summarize a little about atomic replace operations in the new
>> editor interface (subversion/include/​svn_editor.h,
>> subversion/libsvn_de​lta/editor.c) and have a conclusion.
>>
>> My intent is to further tweak svn_editor.h shortly, so feedback is welcome.
>>
>>
>>
>> The argument started off like this:
>>
>> If a file or folder at a certain path is deleted, and some other file or
>> folder is added at the same path, that is a replace.
>>
>> Sometimes it is nice and/or necessary to be able to relate the delete() and
>> the add(), for example to print
>> R foo
>> instead of
>> D foo
>> M other_stuff
>> A foo
>> , and for detecting tree-conflicts.
>>
>> Currently, wherever that happens, the *receiving* side of the editor
>> callbacks has to figure out whether a delete coincides with an add at the
>> same path. That involves caching deleted nodes == bloat and complication.
>>
>> Fact is, there is at least one *driver* of the editor callbacks that has
>> already made the correlation of a delete followed by an add, but splits it
>> up into a delete followed by an add, following callback design.
>>
>> My crusade is to move the work for combining delete() + add() into a single
>> "replace" into the *driver* of the new callbacks, considering that that
>> information may already be lying around for free in most cases.
>>
>> Even if it isn't, I think it is better to make the *driver* do the
>> combination work "once", and not burden "all those receivers" with the work
>> of correlating deletes with adds.
>>
>> The problem I ran into is this:
>>
>> To effectively take the burden off the receivers, we need to *enforce* that
>> the driver actually combines delete + add = replace. If the driver had a
>> choice to combine or not combine "replace" calls, all those receivers have
>> to be able to handle *both* cases, and the whole plan backfires, adding more
>> code instead of simplifying.
>>
>> I will now try to theoretically home in on a Good API definition around
>> atomic replace.
>>
>>
>> Aspects:
>>
>>
>> (*) Receivers should not have to combine "replace" operations.
>>
>> --> We need to *disallow* issuing a delete() followed by an add() operation
>> on the same path. The drivers all need to use the atomic-replace channel
>> instead, which is to issue an add() operation with a REPLACES_REV arg.
>>
>>
>> (*) Two-sided vs. detailed history
>> In current merge (and all other operations AFAIK), if a certain path was
>> deleted and added multiple times in a given range, all the intermediate
>> steps are dropped and the only things that matter are
>> - the initial state and
>> - the final state.
>> e.g. (D, A, M, D, A) becomes (D, A), the three in the center are dropped.
>> However, there are rumors that at some point all the intermediate steps will
>> be necessary (keyword "operational merge").
>
> (Greg: Neels is talking about a hypothetical new "operation-based" merge
> editor that is a consumer of the editor API. It would want to be driven
> not simply with the difference between merge-left and merge-right
> sources, but rather with the full sequence of changes that were
> originally made to get from merge-left to merge-right.)
(yep.)

>> --> We need to allow multiple atomic replaces in a given range, so that each
>> sub-sequence of (D, A) becomes a (R):
>> (D, A, M, D, A) becomes (R, M, R) in an operational merge.
>
> Even if we decide to go towards "operation-based" merging, I think we
> have no idea yet what it would look like, and we don't know whether we
> would want to represent that (R, M, R) sequence. So I don't think there
> is any point in considering that now.

Well, I was just hypothetically testing the constraint of "must combine" for
the "operational merge" concept. I didn't want to introduce a constraint
that would cause design pains soon after. So, in the docs, I'll say
something like "you must combine D,A to R, but you could have multiple Rs in
a row to represent multiple steps". Legal: A, R, R, M, R, D

> For now, it can recreate the existing indication(s) on multiple lines.
> Or rather
>
> D move-source-path
> R+ replaced-path (copied from move-source-path)

Yea, I was thinking the same.

>
> D move-source-path
> D replaced-path
> A+ replaced-path (copied from move-source-path)
>
>
>> (*) Copy
>> There also is a new copy() call instead of the "add(copyfrom_revision)". I
>> guess it's a bikeshed whether the copy() callback should be separate instead
>> of adding a "copyfrom_rev" to the add() callback. In short, copy == add+.
>> Like with moves, it is possible to send a delete() followed by a copy() to
>> the same path, which amounts to yet another "replace" variant.
>>
>> --> We need to also add a REPLACES_REV argument to the copy() callback, and
>> disallow issuing a delete followed by a copy to the same path.
>
> Sounds good.
>
>
>> (*) Different meanings
>> It is conceivable that one single "replace" call may have a different
>> meaning from separate delete() and add() calls. That would be in the way
>> that the one type represents a closer relation than the other.
>>
>> --> My conclusion is that this is not the case / does not matter. An atomic
>> replace is simply a new node taking the path of another node, the two of
>> them being unrelated. There *are* no less-related cases than a "replace"
>> operation (on the same path).
>
> This all sounds good to me.
>
>> There are more-related cases than a "replace",
>
> You talk about copies and moves that replace an existing node. I don't
> think these operations involve a "more related" kind of replacement,
> they just involve the same kind of replacement within a different
> operation.
>
> I can't think of any more or less related kinds of replacement. In other
> words, I can't think of any reason why a user might consider the
> "replace" operation to mean something different from the combination of
> "delete" and "add".
>
>
>> which then become a move() or a copy() (a.k.a. add-with-history). (Thus I
>> suggest adding the REPLACES_REV arg to move() and copy() as well, above. If
>> the added node is related to the node that is being replaced, heavens
>
> "related to"? Do you mean "a copy of"?

I was thinking specifically of a resurrection:

rev op
1 A foo.c
2 M foo.c
3 D foo.c
4 A foo.c <-- an unrelated node
5 R+ foo.c (from foo.c@2)

hm, what was I thinking again?

>> forbid, it should become a copy() operation with SRC_RELPATH == DST_RELPATH
>> and REPLACES_REV == SRC_REVISION.)

Ah, ok, that thinking was flawed, actually REPLACES_REV > SRC_REVISION (4>2)
-- sorry.

> So the result is: file@10 is copied from file@9. I don't know if a
> Subversion repository can tell the difference between that and a no-op
> (or a simple content modification).

But that case wouldn't modify any content. It could result in a log message
with maybe props modified, but no code modified at all. Heh, we should
probably forbid/skip that.

> So: +1 to the plan.

Greg, Julian, t'was a great pleasure to read your responses. Nice to see I'm
on the right track.

~Neels
Attachments

Re: Defining atomic "replace"

Author julianfoad
Full name Julian Foad
Date 2009-08-11 16:31:17 PDT
Message On Wed, 2009-08-12, Julian Foad wrote:
> > (*) Atomic move
> > In the new editor callbacks, there is a separate move() call to represent
> > atomic moves. It is thus possible to issue a delete() followed by a move()
> > to the same path that was deleted. Which is, essentially, a replace!
> >
> > --> We need to also add a REPLACES_REV argument to the move() callback, and
> > we need to disallow issuing a delete followed by a move to the same path.
> > Actually, has this case ever been considered? Currently, a move() becomes an
> > "add-with-history" plus a delete(). Our API doesn't indicate a move
> > replacing a path.
>
> I think the current representation of this would be two deletes (delete
> the move-source and delete the node being replaced) and then an
> add-with-history.
>
> > We could need another update/status letter for this.
>
> For now, it can recreate the existing indication(s) on multiple lines.
>
> D move-source-path
> D replaced-path
> A+ replaced-path (copied from move-source-path)

Or rather

D move-source-path
R+ replaced-path (copied from move-source-path)

- Julian

Re: Defining atomic "replace"

Author julianfoad
Full name Julian Foad
Date 2009-08-11 16:27:08 PDT
Message On Tue, 2009-08-11 at 20:51 +0200, Neels J Hofmeyr wrote:
> Hi Julian and everyone else,
>
> I'd like to summarize a little about atomic replace operations in the new
> editor interface (subversion/include/​svn_editor.h,
> subversion/libsvn_de​lta/editor.c) and have a conclusion.
>
> My intent is to further tweak svn_editor.h shortly, so feedback is welcome.
>
>
>
> The argument started off like this:
>
> If a file or folder at a certain path is deleted, and some other file or
> folder is added at the same path, that is a replace.
>
> Sometimes it is nice and/or necessary to be able to relate the delete() and
> the add(), for example to print
> R foo
> instead of
> D foo
> M other_stuff
> A foo
> , and for detecting tree-conflicts.
>
> Currently, wherever that happens, the *receiving* side of the editor
> callbacks has to figure out whether a delete coincides with an add at the
> same path. That involves caching deleted nodes == bloat and complication.
>
> Fact is, there is at least one *driver* of the editor callbacks that has
> already made the correlation of a delete followed by an add, but splits it
> up into a delete followed by an add, following callback design.
>
> My crusade is to move the work for combining delete() + add() into a single
> "replace" into the *driver* of the new callbacks, considering that that
> information may already be lying around for free in most cases.
>
> Even if it isn't, I think it is better to make the *driver* do the
> combination work "once", and not burden "all those receivers" with the work
> of correlating deletes with adds.
>
> The problem I ran into is this:
>
> To effectively take the burden off the receivers, we need to *enforce* that
> the driver actually combines delete + add = replace. If the driver had a
> choice to combine or not combine "replace" calls, all those receivers have
> to be able to handle *both* cases, and the whole plan backfires, adding more
> code instead of simplifying.
>
> I will now try to theoretically home in on a Good API definition around
> atomic replace.
>
>
> Aspects:
>
>
> (*) Receivers should not have to combine "replace" operations.
>
> --> We need to *disallow* issuing a delete() followed by an add() operation
> on the same path. The drivers all need to use the atomic-replace channel
> instead, which is to issue an add() operation with a REPLACES_REV arg.
>
>
> (*) Two-sided vs. detailed history
> In current merge (and all other operations AFAIK), if a certain path was
> deleted and added multiple times in a given range, all the intermediate
> steps are dropped and the only things that matter are
> - the initial state and
> - the final state.
> e.g. (D, A, M, D, A) becomes (D, A), the three in the center are dropped.
> However, there are rumors that at some point all the intermediate steps will
> be necessary (keyword "operational merge").

(Greg: Neels is talking about a hypothetical new "operation-based" merge
editor that is a consumer of the editor API. It would want to be driven
not simply with the difference between merge-left and merge-right
sources, but rather with the full sequence of changes that were
originally made to get from merge-left to merge-right.)

> --> We need to allow multiple atomic replaces in a given range, so that each
> sub-sequence of (D, A) becomes a (R):
> (D, A, M, D, A) becomes (R, M, R) in an operational merge.

Even if we decide to go towards "operation-based" merging, I think we
have no idea yet what it would look like, and we don't know whether we
would want to represent that (R, M, R) sequence. So I don't think there
is any point in considering that now.


> (*) Atomic move
> In the new editor callbacks, there is a separate move() call to represent
> atomic moves. It is thus possible to issue a delete() followed by a move()
> to the same path that was deleted. Which is, essentially, a replace!
>
> --> We need to also add a REPLACES_REV argument to the move() callback, and
> we need to disallow issuing a delete followed by a move to the same path.
> Actually, has this case ever been considered? Currently, a move() becomes an
> "add-with-history" plus a delete(). Our API doesn't indicate a move
> replacing a path.

I think the current representation of this would be two deletes (delete
the move-source and delete the node being replaced) and then an
add-with-history.

> We could need another update/status letter for this.

For now, it can recreate the existing indication(s) on multiple lines.

D move-source-path
D replaced-path
A+ replaced-path (copied from move-source-path)


> (*) Copy
> There also is a new copy() call instead of the "add(copyfrom_revision)". I
> guess it's a bikeshed whether the copy() callback should be separate instead
> of adding a "copyfrom_rev" to the add() callback. In short, copy == add+.
> Like with moves, it is possible to send a delete() followed by a copy() to
> the same path, which amounts to yet another "replace" variant.
>
> --> We need to also add a REPLACES_REV argument to the copy() callback, and
> disallow issuing a delete followed by a copy to the same path.

Sounds good.


> (*) Different meanings
> It is conceivable that one single "replace" call may have a different
> meaning from separate delete() and add() calls. That would be in the way
> that the one type represents a closer relation than the other.
>
> --> My conclusion is that this is not the case / does not matter. An atomic
> replace is simply a new node taking the path of another node, the two of
> them being unrelated. There *are* no less-related cases than a "replace"
> operation (on the same path).

This all sounds good to me.

> There are more-related cases than a "replace",

You talk about copies and moves that replace an existing node. I don't
think these operations involve a "more related" kind of replacement,
they just involve the same kind of replacement within a different
operation.

I can't think of any more or less related kinds of replacement. In other
words, I can't think of any reason why a user might consider the
"replace" operation to mean something different from the combination of
"delete" and "add".


> which then become a move() or a copy() (a.k.a. add-with-history). (Thus I
> suggest adding the REPLACES_REV arg to move() and copy() as well, above. If
> the added node is related to the node that is being replaced, heavens

"related to"? Do you mean "a copy of"?

> forbid, it should become a copy() operation with SRC_RELPATH == DST_RELPATH
> and REPLACES_REV == SRC_REVISION.)

So the result is: file@10 is copied from file@9. I don't know if a
Subversion repository can tell the difference between that and a no-op
(or a simple content modification). I would be interested to know
whether it can. Either way, I don't think it affects this proposal.

So: +1 to the plan.

- Julian


> Thanks for feedback,
>
> ~Neels
>
>
> Julian Foad wrote:
> > On Tue, 2009-07-21, Neels J Hofmeyr wrote:
> >> Julian Foad wrote:
> >>>> The driver is required to use add() with REPLACES_REV instead of a separate
> >>>> delete() call before this add(). Receivers of this callback in general are
> >>>> not required to support a separate delete() call for a 'replace'.
> >>> I think it would be clearer to state the editor requirements separately
> >>> from the usage recommendations. (This last paragraph and the following
> >>> one seem to mix the two.)
> >>>
> >>>> If you intend to implement a separate delete() call when driving this
> >>>> editor, make sure you really have to and put up "big red signs" in the
> >>>> documentation of your driver. Be aware that, in general, the driver should
> >>>> do all of the (possibly hard) work of combining a delete() with an add() to
> >>>> form an atomic replace, taking that burden off the various receivers. So if
> >>>> you write a driver that doesn't do that, you have to make sure that all your
> >>>> receivers (preferably very few) can handle a non-atomic replace.
> >>>> Furthermore, a delete() call should occur always before, never after an
> >>>> add() call for the same node.
> >>>> ]]]
> >> Yes, you're right.
> >> Where is the line between editor requirements and usage recommendations?
> >>
> >> Maybe you can help to choose a side... Or are these sides chosen by
> >> individual implementations anyway?
> >
> > It is important to decide and clearly draw the line somewhere. The whole
> > point of having a standard "interface" is so that multiple producers and
> > consumers can be written independently and someone else can plug them
> > together and know that they will work together.
> >
> >> I am more for forcing an atomic replace. I would rather remove the loophole
> >> that says "well, but if you must, just issue separate calls". I'd personally
> >> not mind if some parts of the code send separate calls, but I want e.g. the
> >> merge callback receivers to be certain that they will *not* receive separate
> >> calls *ever*.
> >
> > OK, we can make that a requirement. We must ensure that all of the
> > current producers obey it.
> >
> >
> >> How about we choose the stricter comment, but we let people use it
> >> differently if they must without saying so now?
> >
> > No, let's not invite trouble.
> >
> >
> > [...]
> >> - is it ok to disallow multiple add() and delete() calls on the same path,
> >> in principle?
> >> e.g. during merge, there are only the two sides, merge-left
> >> and merge-right. A replace means that both exist but are "unrelated". So
> >> a merge will never need to send more than one (delete, add) tuple.
> >> Usually it's either *one* delete or *one* add. The (add, delete) tuple
> >> simply cancels out, nothing being sent.
> >
> > Yes, it's absolutely fine to specify that the editor must be driven with
> > a single change for each path.
> >
> > Let's imagine that the user does "svn rm X; svn add X; svn rm X; svn add
> > X" in his/her WC. Let's imagine that the WC code has an "undo history"
> > that recorded all the separate actions in this sequence. When the user
> > makes a commit (or a diff or an update or whatever), then it's the
> > calling code's job to ensure that the editor is driven with a single
> > change (replace X(old) with X(new)).
> >
> >> But are there cases where we also send all the steps in-between? Then we
> >> could allow this: add(), add(REPLACES_REV), add(REPLACES_REV), delete().
> >> But *are there*?
> >
> > I don't know whether there are currently cases where we do this, other
> > than a single delete followed by a single add. But we are defining the
> > interface. We must define what we think is best and then the callers
> > must do what we say.
> >
> >> - is it necessary to disallow multiple add() and delete() calls on the same
> >> path, to make the atomic replace useful?
> >> e.g. during merge, the receiver needs to cache all delete()s and match
> >> with add()s, carrying out the leftovers during dir_close(). And merge
> >> is not the only one in need of this quirky code construct.
> >> I want to eradicate this madness -- I'd love to force atomic replaces
> >> on everyone, so the receivers are entirely relieved of this quirkiness.
> >> Will allowing both separate and atomic replace calls defy my plan by
> >> actually adding more code to the receivers?
> >
> > Basically, yes, I think it is necessary.
> >
> > If a receiver wants to handle a replacement as a single operation and
> > the editor API does not guarantee that it will get a single call, then
> > it can achieve the same end result by batching up the delete requests
> > and executing them just before closing the directory. But if it
> > implements that, then there would be no benefit in having the single
> > "replace" call available.
> >
> >> - is there a case where separate calls have a different meaning than an
> >> atomic replace?
> >> e.g. for merge, it doesn't matter whether the nodes are "related".
> >> I can't think of any case where we'd want to highlight that the node was
> >> deleted and later re-added with the same history, or something. Right?
> >
> > This is the question of critical importance: Is there any semantic
> > difference (difference in meaning) between
> >
> > replace X with (something)
> >
> > and
> >
> > delete X; add (something) at path "X"
> >
> > ?
> >
> > Before I try to answer that, first let's review the history of the
> > semantics of "svn move" (also know as "svn rename"). Initially we
> > decided there was no semantic difference between
> >
> > move X to Y
> >
> > and
> >
> > delete X; add Y as a copy of X
> >
> > That is true as far as it goes, but unfortunately in the implementation
> > we lost the implicit requirement that the "X" being deleted in the first
> > half is the very same "X" that's copied in the second half. It got
> > translated to
> >
> > delete X
> > add Y as a copy of (X's absolute path at a specific recent-ish
> > revision)
> >
> > That works just about well enough within a single branch, but when
> > merging such a change to another branch the connection between the two
> > X's is completely broken and the result is not what we meant at all.
> >
> > We usually want a move in the merge source branch, such as
> >
> > delete source/X
> > add source/Y as a copy of (source/X at its latest revision)
> >
> > to be merged into the target branch as
> >
> > delete target/X
> > add target/Y as a copy of (target/X at its latest revision)
> >
> > So, working back to how we could better have defined the semantics of
> > "move", we could say "move X to Y" means
> >
> > remove entry X from X's directory
> > (but don't destroy its content, because it's just being moved)
> > add entry Y into Y's directory,
> > taking its content from the X that's being removed
> >
> > Note that the two parts are intimately connected together, and that the
> > "delete" half is semantically different from a plain "delete".
> >
> > I think that's where we need to be going with the "true renames" issue
> > <http://subversion.ti​gris.org/issues/show​_bug.cgi?id=898>.​
> >
> > That's not quite the end of the story with "move". One side issue is
> > that I think the semantics of "copy" should be updated in the same way,
> > so when "copy source/X to source/Y" is merged it should add target/Y as
> > a copy of (target/X at latest revision), not of (source/X at some
> > specific old revision). Another is that I've hand-waved on how the path
> > "source/X" is mapped to the path "target/X"; I think it involves
> > treating them both as relative to the respective merge root paths, and
> > adjusting for any directories moved within the merge. The merge should
> > throw an error if the move source is outside the merged tree.
> >
> > Now, what does that teach us about how we want "replace" to behave? Are
> > we making some assumption when we "replace X with a copy of Y" that is
> > subtly stronger than "delete X; add a new X that's a copy of Y"? Does it
> > make a difference if instead of "Y" we say "a previous revision of X"?
> >
> > - Julian
> >
> > --------------------​--------------------​--------------
> > http://subversion.ti​gris.org/ds/viewMess​age.do?dsForumId=462​&dsMessageId=237​6254
>

Re: Defining atomic "replace"

Author gstein
Full name Greg Stein
Date 2009-08-11 13:04:12 PDT
Message In short: sounds great.

On Tue, Aug 11, 2009 at 20:51, Neels J Hofmeyr<neels at elego dot de> wrote:
>...
> Currently, wherever that happens, the *receiving* side of the editor
> callbacks has to figure out whether a delete coincides with an add at the
> same path. That involves caching deleted nodes == bloat and complication.

Agreed.

And I also agree that we can define a requirement upon drivers'
pattern of operation on the API.

I've also speculated on having the editor.c code implement a debug
mode that verifies all constraints. Since *all* calls on the editor v2
must pass through those functions, we can implement all the semantic
checks. For example, we can put a hash table into the editor structure
and record all the deletes, then verify that an add/copy/move never
arrives for the same path.

> Fact is, there is at least one *driver* of the editor callbacks that has
> already made the correlation of a delete followed by an add, but splits it
> up into a delete followed by an add, following callback design.
>
> My crusade is to move the work for combining delete() + add() into a single
> "replace" into the *driver* of the new callbacks, considering that that
> information may already be lying around for free in most cases.

Right. This is also why I changed the model to "present all children
at directory creation time". This is always known, as far as I'm
aware. And specifying it up front has a number of advantages (as
stated in editor-v2.txt).

> Even if it isn't, I think it is better to make the *driver* do the
> combination work "once", and not burden "all those receivers" with the work
> of correlating deletes with adds.
>
> The problem I ran into is this:
>
> To effectively take the burden off the receivers, we need to *enforce* that
> the driver actually combines delete + add = replace. If the driver had a
> choice to combine or not combine "replace" calls, all those receivers have
> to be able to handle *both* cases, and the whole plan backfires, adding more
> code instead of simplifying.

Yup. And as I said above: we can have a debug mode to validate proper
behaviors from all drivers.

> I will now try to theoretically home in on a Good API definition around
> atomic replace.
>
>
> Aspects:
>
>
> (*) Receivers should not have to combine "replace" operations.
>
> --> We need to *disallow* issuing a delete() followed by an add() operation
> on the same path. The drivers all need to use the atomic-replace channel
> instead, which is to issue an add() operation with a REPLACES_REV arg.

Agreed.

> (*) Two-sided vs. detailed history
> In current merge (and all other operations AFAIK), if a certain path was
> deleted and added multiple times in a given range, all the intermediate
> steps are dropped and the only things that matter are
>  - the initial state and
>  - the final state.
> e.g. (D, A, M, D, A) becomes (D, A), the three in the center are dropped.
> However, there are rumors that at some point all the intermediate steps will
> be necessary (keyword "operational merge").
>
> --> We need to allow multiple atomic replaces in a given range, so that each
> sub-sequence of (D, A) becomes a (R):
>  (D, A, M, D, A) becomes (R, M, R) in an operational merge.

I'm not sure that I understand this, but agree with the restriction of
"must combine".

But this does seem to raise a new question: you're talking about an
extra operation here. Some kind of "merge" operation. Does that need
to be present in the new editor API?

Note: if the answer is "unsure; we'll know later", then I consider
that fine, too. This v2 API is much more extensible, so I believe that
we can add more "callbacks" without a much problem.

> (*) Atomic move
> In the new editor callbacks, there is a separate move() call to represent
> atomic moves. It is thus possible to issue a delete() followed by a move()
> to the same path that was deleted. Which is, essentially, a replace!
>
> --> We need to also add a REPLACES_REV argument to the move() callback, and
> we need to disallow issuing a delete followed by a move to the same path.

Agreed.

> Actually, has this case ever been considered? Currently, a move() becomes an
> "add-with-history" plus a delete(). Our API doesn't indicate a move
> replacing a path. We could need another update/status letter for this.

There are LOTS of APIs in our system that do not understand moves. The
new wc_db can record moves, but (in 1.7) nothing is going to *tell* it
to do that. This new editor API can describe moves, but nothing will
drive it that way. Or maybe it will, and the receiver will break it
down to a delete/add to deal with legacy APIs that do not have a move
operation.

I expect that, over time, we will start propagating a "move" concept
further and further through our codebase. Only at legacy points (eg.
speaking to an old server) will they be broken into a delete/add pair.

> (*) Copy
> There also is a new copy() call instead of the "add(copyfrom_revision)". I
> guess it's a bikeshed whether the copy() callback should be separate instead
> of adding a "copyfrom_rev" to the add() callback. In short, copy == add+.

It isn't a bikeshed. There are *four* add variants, based on the kind
of the node you're adding. There is only *one* copy callback because
the node's kind does not magically change during the copy process.
Adding a copy revision/path pair of arguments to all four "add"
variants increases complexity, I think. I also believe that newcomers
have a harder time dealing with the "copy == add with history"
concept.

> Like with moves, it is possible to send a delete() followed by a copy() to
> the same path, which amounts to yet another "replace" variant.
>
> --> We need to also add a REPLACES_REV argument to the copy() callback, and
> disallow issuing a delete followed by a copy to the same path.

Agreed.

> (*) Different meanings
> It is conceivable that one single "replace" call may have a different
> meaning from separate delete() and add() calls. That would be in the way
> that the one type represents a closer relation than the other.
>
> --> My conclusion is that this is not the case / does not matter. An atomic
> replace is simply a new node taking the path of another node, the two of
> them being unrelated. There *are* no less-related cases than a "replace"
> operation (on the same path). There are more-related cases than a "replace",
> which then become a move() or a copy() (a.k.a. add-with-history). (Thus I

Agreed.

> suggest adding the REPLACES_REV arg to move() and copy() as well, above. If
> the added node is related to the node that is being replaced, heavens
> forbid, it should become a copy() operation with SRC_RELPATH == DST_RELPATH
> and REPLACES_REV == SRC_REVISION.)

Eh? Wouldn't this just become a no-op?

>...

Cheers,
-g

Re: Defining atomic "replace"

Author neels
Full name Neels Janosch Hofmeyr
Date 2009-08-11 11:53:36 PDT
Message Hi Julian and everyone else,

I'd like to summarize a little about atomic replace operations in the new
editor interface (subversion/include/​svn_editor.h,
subversion/libsvn_de​lta/editor.c) and have a conclusion.

My intent is to further tweak svn_editor.h shortly, so feedback is welcome.



The argument started off like this:

If a file or folder at a certain path is deleted, and some other file or
folder is added at the same path, that is a replace.

Sometimes it is nice and/or necessary to be able to relate the delete() and
the add(), for example to print
 R foo
instead of
 D foo
 M other_stuff
 A foo
, and for detecting tree-conflicts.

Currently, wherever that happens, the *receiving* side of the editor
callbacks has to figure out whether a delete coincides with an add at the
same path. That involves caching deleted nodes == bloat and complication.

Fact is, there is at least one *driver* of the editor callbacks that has
already made the correlation of a delete followed by an add, but splits it
up into a delete followed by an add, following callback design.

My crusade is to move the work for combining delete() + add() into a single
"replace" into the *driver* of the new callbacks, considering that that
information may already be lying around for free in most cases.

Even if it isn't, I think it is better to make the *driver* do the
combination work "once", and not burden "all those receivers" with the work
of correlating deletes with adds.

The problem I ran into is this:

To effectively take the burden off the receivers, we need to *enforce* that
the driver actually combines delete + add = replace. If the driver had a
choice to combine or not combine "replace" calls, all those receivers have
to be able to handle *both* cases, and the whole plan backfires, adding more
code instead of simplifying.

I will now try to theoretically home in on a Good API definition around
atomic replace.


Aspects:


(*) Receivers should not have to combine "replace" operations.

--> We need to *disallow* issuing a delete() followed by an add() operation
on the same path. The drivers all need to use the atomic-replace channel
instead, which is to issue an add() operation with a REPLACES_REV arg.


(*) Two-sided vs. detailed history
In current merge (and all other operations AFAIK), if a certain path was
deleted and added multiple times in a given range, all the intermediate
steps are dropped and the only things that matter are
 - the initial state and
 - the final state.
e.g. (D, A, M, D, A) becomes (D, A), the three in the center are dropped.
However, there are rumors that at some point all the intermediate steps will
be necessary (keyword "operational merge").

--> We need to allow multiple atomic replaces in a given range, so that each
sub-sequence of (D, A) becomes a (R):
  (D, A, M, D, A) becomes (R, M, R) in an operational merge.


(*) Atomic move
In the new editor callbacks, there is a separate move() call to represent
atomic moves. It is thus possible to issue a delete() followed by a move()
to the same path that was deleted. Which is, essentially, a replace!

--> We need to also add a REPLACES_REV argument to the move() callback, and
we need to disallow issuing a delete followed by a move to the same path.
Actually, has this case ever been considered? Currently, a move() becomes an
"add-with-history" plus a delete(). Our API doesn't indicate a move
replacing a path. We could need another update/status letter for this.


(*) Copy
There also is a new copy() call instead of the "add(copyfrom_revision)". I
guess it's a bikeshed whether the copy() callback should be separate instead
of adding a "copyfrom_rev" to the add() callback. In short, copy == add+.
Like with moves, it is possible to send a delete() followed by a copy() to
the same path, which amounts to yet another "replace" variant.

--> We need to also add a REPLACES_REV argument to the copy() callback, and
disallow issuing a delete followed by a copy to the same path.


(*) Different meanings
It is conceivable that one single "replace" call may have a different
meaning from separate delete() and add() calls. That would be in the way
that the one type represents a closer relation than the other.

--> My conclusion is that this is not the case / does not matter. An atomic
replace is simply a new node taking the path of another node, the two of
them being unrelated. There *are* no less-related cases than a "replace"
operation (on the same path). There are more-related cases than a "replace",
which then become a move() or a copy() (a.k.a. add-with-history). (Thus I
suggest adding the REPLACES_REV arg to move() and copy() as well, above. If
the added node is related to the node that is being replaced, heavens
forbid, it should become a copy() operation with SRC_RELPATH == DST_RELPATH
and REPLACES_REV == SRC_REVISION.)


Thanks for feedback,

~Neels


Julian Foad wrote:
> On Tue, 2009-07-21, Neels J Hofmeyr wrote:
>> Julian Foad wrote:
>>>> The driver is required to use add() with REPLACES_REV instead of a separate
>>>> delete() call before this add(). Receivers of this callback in general are
>>>> not required to support a separate delete() call for a 'replace'.
>>> I think it would be clearer to state the editor requirements separately
>>> from the usage recommendations. (This last paragraph and the following
>>> one seem to mix the two.)
>>>
>>>> If you intend to implement a separate delete() call when driving this
>>>> editor, make sure you really have to and put up "big red signs" in the
>>>> documentation of your driver. Be aware that, in general, the driver should
>>>> do all of the (possibly hard) work of combining a delete() with an add() to
>>>> form an atomic replace, taking that burden off the various receivers. So if
>>>> you write a driver that doesn't do that, you have to make sure that all your
>>>> receivers (preferably very few) can handle a non-atomic replace.
>>>> Furthermore, a delete() call should occur always before, never after an
>>>> add() call for the same node.
>>>> ]]]
>> Yes, you're right.
>> Where is the line between editor requirements and usage recommendations?
>>
>> Maybe you can help to choose a side... Or are these sides chosen by
>> individual implementations anyway?
>
> It is important to decide and clearly draw the line somewhere. The whole
> point of having a standard "interface" is so that multiple producers and
> consumers can be written independently and someone else can plug them
> together and know that they will work together.
>
>> I am more for forcing an atomic replace. I would rather remove the loophole
>> that says "well, but if you must, just issue separate calls". I'd personally
>> not mind if some parts of the code send separate calls, but I want e.g. the
>> merge callback receivers to be certain that they will *not* receive separate
>> calls *ever*.
>
> OK, we can make that a requirement. We must ensure that all of the
> current producers obey it.
>
>
>> How about we choose the stricter comment, but we let people use it
>> differently if they must without saying so now?
>
> No, let's not invite trouble.
>
>
> [...]
>> - is it ok to disallow multiple add() and delete() calls on the same path,
>> in principle?
>> e.g. during merge, there are only the two sides, merge-left
>> and merge-right. A replace means that both exist but are "unrelated". So
>> a merge will never need to send more than one (delete, add) tuple.
>> Usually it's either *one* delete or *one* add. The (add, delete) tuple
>> simply cancels out, nothing being sent.
>
> Yes, it's absolutely fine to specify that the editor must be driven with
> a single change for each path.
>
> Let's imagine that the user does "svn rm X; svn add X; svn rm X; svn add
> X" in his/her WC. Let's imagine that the WC code has an "undo history"
> that recorded all the separate actions in this sequence. When the user
> makes a commit (or a diff or an update or whatever), then it's the
> calling code's job to ensure that the editor is driven with a single
> change (replace X(old) with X(new)).
>
>> But are there cases where we also send all the steps in-between? Then we
>> could allow this: add(), add(REPLACES_REV), add(REPLACES_REV), delete().
>> But *are there*?
>
> I don't know whether there are currently cases where we do this, other
> than a single delete followed by a single add. But we are defining the
> interface. We must define what we think is best and then the callers
> must do what we say.
>
>> - is it necessary to disallow multiple add() and delete() calls on the same
>> path, to make the atomic replace useful?
>> e.g. during merge, the receiver needs to cache all delete()s and match
>> with add()s, carrying out the leftovers during dir_close(). And merge
>> is not the only one in need of this quirky code construct.
>> I want to eradicate this madness -- I'd love to force atomic replaces
>> on everyone, so the receivers are entirely relieved of this quirkiness.
>> Will allowing both separate and atomic replace calls defy my plan by
>> actually adding more code to the receivers?
>
> Basically, yes, I think it is necessary.
>
> If a receiver wants to handle a replacement as a single operation and
> the editor API does not guarantee that it will get a single call, then
> it can achieve the same end result by batching up the delete requests
> and executing them just before closing the directory. But if it
> implements that, then there would be no benefit in having the single
> "replace" call available.
>
>> - is there a case where separate calls have a different meaning than an
>> atomic replace?
>> e.g. for merge, it doesn't matter whether the nodes are "related".
>> I can't think of any case where we'd want to highlight that the node was
>> deleted and later re-added with the same history, or something. Right?
>
> This is the question of critical importance: Is there any semantic
> difference (difference in meaning) between
>
> replace X with (something)
>
> and
>
> delete X; add (something) at path "X"
>
> ?
>
> Before I try to answer that, first let's review the history of the
> semantics of "svn move" (also know as "svn rename"). Initially we
> decided there was no semantic difference between
>
> move X to Y
>
> and
>
> delete X; add Y as a copy of X
>
> That is true as far as it goes, but unfortunately in the implementation
> we lost the implicit requirement that the "X" being deleted in the first
> half is the very same "X" that's copied in the second half. It got
> translated to
>
> delete X
> add Y as a copy of (X's absolute path at a specific recent-ish
> revision)
>
> That works just about well enough within a single branch, but when
> merging such a change to another branch the connection between the two
> X's is completely broken and the result is not what we meant at all.
>
> We usually want a move in the merge source branch, such as
>
> delete source/X
> add source/Y as a copy of (source/X at its latest revision)
>
> to be merged into the target branch as
>
> delete target/X
> add target/Y as a copy of (target/X at its latest revision)
>
> So, working back to how we could better have defined the semantics of
> "move", we could say "move X to Y" means
>
> remove entry X from X's directory
> (but don't destroy its content, because it's just being moved)
> add entry Y into Y's directory,
> taking its content from the X that's being removed
>
> Note that the two parts are intimately connected together, and that the
> "delete" half is semantically different from a plain "delete".
>
> I think that's where we need to be going with the "true renames" issue
> <http://subversion.ti​gris.org/issues/show​_bug.cgi?id=898>.​
>
> That's not quite the end of the story with "move". One side issue is
> that I think the semantics of "copy" should be updated in the same way,
> so when "copy source/X to source/Y" is merged it should add target/Y as
> a copy of (target/X at latest revision), not of (source/X at some
> specific old revision). Another is that I've hand-waved on how the path
> "source/X" is mapped to the path "target/X"; I think it involves
> treating them both as relative to the respective merge root paths, and
> adjusting for any directories moved within the merge. The merge should
> throw an error if the move source is outside the merged tree.
>
> Now, what does that teach us about how we want "replace" to behave? Are
> we making some assumption when we "replace X with a copy of Y" that is
> subtly stronger than "delete X; add a new X that's a copy of Y"? Does it
> make a difference if instead of "Y" we say "a previous revision of X"?
>
> - Julian
>
> --------------------​--------------------​--------------
> http://subversion.ti​gris.org/ds/viewMess​age.do?dsForumId=462​&dsMessageId=237​6254
Attachments

Defining atomic "replace" [was: svn commit: r38394 - in trunk/subversion: include libsvn_delta]

Author julianfoad
Full name Julian Foad
Date 2009-07-28 03:28:37 PDT
Message On Tue, 2009-07-21, Neels J Hofmeyr wrote:
>
> Julian Foad wrote:
> >> The driver is required to use add() with REPLACES_REV instead of a separate
> >> delete() call before this add(). Receivers of this callback in general are
> >> not required to support a separate delete() call for a 'replace'.
> >
> > I think it would be clearer to state the editor requirements separately
> > from the usage recommendations. (This last paragraph and the following
> > one seem to mix the two.)
> >
> >> If you intend to implement a separate delete() call when driving this
> >> editor, make sure you really have to and put up "big red signs" in the
> >> documentation of your driver. Be aware that, in general, the driver should
> >> do all of the (possibly hard) work of combining a delete() with an add() to
> >> form an atomic replace, taking that burden off the various receivers. So if
> >> you write a driver that doesn't do that, you have to make sure that all your
> >> receivers (preferably very few) can handle a non-atomic replace.
> >> Furthermore, a delete() call should occur always before, never after an
> >> add() call for the same node.
> >> ]]]
>
> Yes, you're right.
> Where is the line between editor requirements and usage recommendations?
>
> Maybe you can help to choose a side... Or are these sides chosen by
> individual implementations anyway?

It is important to decide and clearly draw the line somewhere. The whole
point of having a standard "interface" is so that multiple producers and
consumers can be written independently and someone else can plug them
together and know that they will work together.

> I am more for forcing an atomic replace. I would rather remove the loophole
> that says "well, but if you must, just issue separate calls". I'd personally
> not mind if some parts of the code send separate calls, but I want e.g. the
> merge callback receivers to be certain that they will *not* receive separate
> calls *ever*.

OK, we can make that a requirement. We must ensure that all of the
current producers obey it.


> How about we choose the stricter comment, but we let people use it
> differently if they must without saying so now?

No, let's not invite trouble.


[...]
> - is it ok to disallow multiple add() and delete() calls on the same path,
> in principle?
> e.g. during merge, there are only the two sides, merge-left
> and merge-right. A replace means that both exist but are "unrelated". So
> a merge will never need to send more than one (delete, add) tuple.
> Usually it's either *one* delete or *one* add. The (add, delete) tuple
> simply cancels out, nothing being sent.

Yes, it's absolutely fine to specify that the editor must be driven with
a single change for each path.

Let's imagine that the user does "svn rm X; svn add X; svn rm X; svn add
X" in his/her WC. Let's imagine that the WC code has an "undo history"
that recorded all the separate actions in this sequence. When the user
makes a commit (or a diff or an update or whatever), then it's the
calling code's job to ensure that the editor is driven with a single
change (replace X(old) with X(new)).

> But are there cases where we also send all the steps in-between? Then we
> could allow this: add(), add(REPLACES_REV), add(REPLACES_REV), delete().
> But *are there*?

I don't know whether there are currently cases where we do this, other
than a single delete followed by a single add. But we are defining the
interface. We must define what we think is best and then the callers
must do what we say.

> - is it necessary to disallow multiple add() and delete() calls on the same
> path, to make the atomic replace useful?
> e.g. during merge, the receiver needs to cache all delete()s and match
> with add()s, carrying out the leftovers during dir_close(). And merge
> is not the only one in need of this quirky code construct.
> I want to eradicate this madness -- I'd love to force atomic replaces
> on everyone, so the receivers are entirely relieved of this quirkiness.
> Will allowing both separate and atomic replace calls defy my plan by
> actually adding more code to the receivers?

Basically, yes, I think it is necessary.

If a receiver wants to handle a replacement as a single operation and
the editor API does not guarantee that it will get a single call, then
it can achieve the same end result by batching up the delete requests
and executing them just before closing the directory. But if it
implements that, then there would be no benefit in having the single
"replace" call available.

> - is there a case where separate calls have a different meaning than an
> atomic replace?
> e.g. for merge, it doesn't matter whether the nodes are "related".
> I can't think of any case where we'd want to highlight that the node was
> deleted and later re-added with the same history, or something. Right?

This is the question of critical importance: Is there any semantic
difference (difference in meaning) between

  replace X with (something)

and

  delete X; add (something) at path "X"

?

Before I try to answer that, first let's review the history of the
semantics of "svn move" (also know as "svn rename"). Initially we
decided there was no semantic difference between

  move X to Y

and

  delete X; add Y as a copy of X

That is true as far as it goes, but unfortunately in the implementation
we lost the implicit requirement that the "X" being deleted in the first
half is the very same "X" that's copied in the second half. It got
translated to

  delete X
  add Y as a copy of (X's absolute path at a specific recent-ish
revision)

That works just about well enough within a single branch, but when
merging such a change to another branch the connection between the two
X's is completely broken and the result is not what we meant at all.

We usually want a move in the merge source branch, such as

  delete source/X
  add source/Y as a copy of (source/X at its latest revision)

to be merged into the target branch as

  delete target/X
  add target/Y as a copy of (target/X at its latest revision)

So, working back to how we could better have defined the semantics of
"move", we could say "move X to Y" means

  remove entry X from X's directory
    (but don't destroy its content, because it's just being moved)
  add entry Y into Y's directory,
    taking its content from the X that's being removed

Note that the two parts are intimately connected together, and that the
"delete" half is semantically different from a plain "delete".

I think that's where we need to be going with the "true renames" issue
<http://subversion.ti​gris.org/issues/show​_bug.cgi?id=898>.​

That's not quite the end of the story with "move". One side issue is
that I think the semantics of "copy" should be updated in the same way,
so when "copy source/X to source/Y" is merged it should add target/Y as
a copy of (target/X at latest revision), not of (source/X at some
specific old revision). Another is that I've hand-waved on how the path
"source/X" is mapped to the path "target/X"; I think it involves
treating them both as relative to the respective merge root paths, and
adjusting for any directories moved within the merge. The merge should
throw an error if the move source is outside the merged tree.

Now, what does that teach us about how we want "replace" to behave? Are
we making some assumption when we "replace X with a copy of Y" that is
subtly stronger than "delete X; add a new X that's a copy of Y"? Does it
make a difference if instead of "Y" we say "a previous revision of X"?

- Julian

Re: operation based merging -- was: Re: svn commit: r38394 - in trunk/subversion: include libsvn_delta

Author gavinbaumanis
Full name Gavin Baumanis
Date 2009-07-21 16:33:05 PDT
Message On 22/07/2009, at 00:44 , Stefan Sperling wrote:

> On Tue, Jul 21, 2009 at 03:34:36PM +0200, Neels Janosch Hofmeyr wrote:
>>> [*] We end up merging "svn copy /branch/A /branch/B" from branch
>>> to trunk as "svn copy /branch/B /trunk/B" while it should be
>>> "svn copy /trunk/A /trunk/B" plus merging any additional textual
>>> edits
>>> made in /branch/B since its inception into /trunk/B.
>>> Right now, edits made to /trunk/A after branching do not make their
>>> way into /trunk/B in the merge result, which is wrong since B is
>>> supposed to be a copy of A on trunk, too, but ends up not being a
>>> copy
>>> of /trunk/A. And it's even worse with moves because changes made
>>> to A
>>> after branching just get dropped in <1.6. In 1.6 and beyond, we're
>>> flagging a tree conflict on A, (local edit, incoming delete upon
>>> merge)
>>> but we should be handling this automatically. And IIRC detecting
>>> this
>>> case doesn't even work for directories right now.
>>
>> whoa, stsp, you sometimes have a way of writing a really
>> complicated example
>> in one huge paragraph rant.
>
> Your diagram sums it up nicely though :)
>
>> I admittedly do not understand every detail in
>> this one... but I do have a question:
>>
>>
>> A /trunk/A
>> |
>> |\_________svn copy ^/trunk ^/branch___________
>> | \
>> M /trunk/A (1) |
>> | svn copy A B
>> | A+ /branch/B
>> | |
>> | M /branch/B (2)
>> | |
>> | <--- merge ---{ |
>> A+ /trunk/B (3) |
>>
>>
>> So you argue that (3) should contain *both* the edits (1) and (2),
>> instead
>> of just (2)? It seems to me that it is impossible to know whether a
>> user
>> wants that or not!
>
> True, there is ambiguity. There is more than one way to merge this.
>
>> I can imagine both cases:
>> - "What!? Why did it not merge the edits on /trunk/A (1)?"
>> - "What!? It included all the edits on /trunk/A (1) as well?"
>
> Does it need to be configurable? I'd imagine that merging the
> edits by default is a safe thing to do. Note that e.g. Mercurial
> does this by default, so there is precedence.
>
>> I would probably have been asking the latter, *not* expecting trunk/
>> B to
>> contain (1). So operation based merges would throw me off at first.
>
> In a huge merge, it's easier to revert a textual edit you don't want,
> and which you can see in svn diff, than to find an edit which should
> have happened but didn't. And which you can't tell from svn diff so
> it's easy to overlook -- you may see it as part of a file being
> deleted
> but if it's a huge file who is going to bother checking every line
> for things they want to keep in the copy?
>
>> If I got it right and you're sure about it, it seems to me like this
>> situation needs some kind of conflict resolu
>> * neels stops short.
>
> We could flag a tree conflict and have the edits be applied
> to the victim by default. Just like we currently schedule some
> victims A+ by default to make it easy for people who want to keep
> the victim. People who do not want to keep the A+ victim can just run
> 'svn revert victim' to unschedule it. People who do not want the
> edits in the copy can just revert them. At least the edits have
> been brought to their attention this way. And that's what Subversion
> is supposed to do -- helping people with keeping track of changes.
>

+1

Not knowing that something didn't happen is a lot harder to diagnose -
especially if it is out of sight.

And since I am not a developer I am loathed to add comments that
require extra work.... but if it could be flagged as being abnormal,
via some sort of conflict property, then that would also assist the
end-user with making an informed decision about the resultant operation.

Gavin.
Attachments

Re: operation based merging -- was: Re: svn commit: r38394 - in trunk/subversion: include libsvn_delta

Author stsp
Full name Stefan Sperling
Date 2009-07-21 07:44:55 PDT
Message
Attachments

operation based merging -- was: Re: svn commit: r38394 - in trunk/subversion: include libsvn_delta

Author neels
Full name Neels Janosch Hofmeyr
Date 2009-07-21 06:36:04 PDT
Message Stefan Sperling wrote:
> On Tue, Jul 21, 2009 at 04:30:53AM +0200, Neels Janosch Hofmeyr wrote:
>> - is it ok to disallow multiple add() and delete() calls on the same path,
>> in principle?
>> e.g. during merge, there are only the two sides, merge-left
>> and merge-right. A replace means that both exist but are "unrelated". So
>> a merge will never need to send more than one (delete, add) tuple.
>> Usually it's either *one* delete or *one* add. The (add, delete) tuple
>> simply cancels out, nothing being sent.
>>
>> But are there cases where we also send all the steps in-between? Then we
>> could allow this: add(), add(REPLACES_REV), add(REPLACES_REV), delete().
>> But *are there*?
>>
>> - is it necessary to disallow multiple add() and delete() calls on the same
>> path, to make the atomic replace useful?
>> e.g. during merge, the receiver needs to cache all delete()s and match
>> with add()s, carrying out the leftovers during dir_close(). And merge
>> is not the only one in need of this quirky code construct.
>> I want to eradicate this madness -- I'd love to force atomic replaces
>> on everyone, so the receivers are entirely relieved of this quirkiness.
>> Will allowing both separate and atomic replace calls defy my plan by
>> actually adding more code to the receivers?
>>
>> - is there a case where separate calls have a different meaning than an
>> atomic replace?
>> e.g. for merge, it doesn't matter whether the nodes are "related".
>> I can't think of any case where we'd want to highlight that the node was
>> deleted and later re-added with the same history, or something. Right?
>>
>> ~Neels
>
> Note that if we ever switch to operation-based merging, a sender
> might send mulitple replace calls for the same item.
>
> Right now, Subversion just looks at the static difference between
> the source and target branches, and tries to eliminate this difference.
> Operation-based merging adds a notion of time to this difference.
> It means that tree operations such as copy, delete, add, replace
> would be replayed in order on the target branch just like they
> happened on the source branch in the revision range being merged.
>
> This would make handling tree conflicts much easier, and would also
> deal with other wrinkles, such as the fact that copies aren't being
> merged from one branch to another in a useful way.[*]
>
> The editor interface should allow for this kind of operation.
> In the long term, I hope to see Subversion do operation-based merging.

That's quite an interesting perspective.
Thanks!


>
> Stefan
>
> [*] We end up merging "svn copy /branch/A /branch/B" from branch
> to trunk as "svn copy /branch/B /trunk/B" while it should be
> "svn copy /trunk/A /trunk/B" plus merging any additional textual edits
> made in /branch/B since its inception into /trunk/B.
> Right now, edits made to /trunk/A after branching do not make their
> way into /trunk/B in the merge result, which is wrong since B is
> supposed to be a copy of A on trunk, too, but ends up not being a copy
> of /trunk/A. And it's even worse with moves because changes made to A
> after branching just get dropped in <1.6. In 1.6 and beyond, we're
> flagging a tree conflict on A, (local edit, incoming delete upon merge)
> but we should be handling this automatically. And IIRC detecting this
> case doesn't even work for directories right now.

whoa, stsp, you sometimes have a way of writing a really complicated example
in one huge paragraph rant. I admittedly do not understand every detail in
this one... but I do have a question:


A /trunk/A
    |
    |\_________svn copy ^/trunk ^/branch___________
    | \
M /trunk/A (1) |
    | svn copy A B
    | A+ /branch/B
    | |
    | M /branch/B (2)
    | |
    | <--- merge ---{ |
A+ /trunk/B (3) |


So you argue that (3) should contain *both* the edits (1) and (2), instead
of just (2)? It seems to me that it is impossible to know whether a user
wants that or not!

I can imagine both cases:
 - "What!? Why did it not merge the edits on /trunk/A (1)?"
 - "What!? It included all the edits on /trunk/A (1) as well?"

I would probably have been asking the latter, *not* expecting trunk/B to
contain (1). So operation based merges would throw me off at first.

If I got it right and you're sure about it, it seems to me like this
situation needs some kind of conflict resolu
 * neels stops short.

I'll rather keep quiet before we add another column for "merge conflicts"
;)

~Neels
Attachments

Re: svn commit: r38394 - in trunk/subversion: include libsvn_delta

Author stsp
Full name Stefan Sperling
Date 2009-07-21 04:44:44 PDT
Message
Attachments

Re: svn commit: r38394 - in trunk/subversion: include libsvn_delta

Author neels
Full name Neels Janosch Hofmeyr
Date 2009-07-20 19:31:53 PDT
Message Julian Foad wrote:
>> The driver is required to use add() with REPLACES_REV instead of a separate
>> delete() call before this add(). Receivers of this callback in general are
>> not required to support a separate delete() call for a 'replace'.
>
> I think it would be clearer to state the editor requirements separately
> from the usage recommendations. (This last paragraph and the following
> one seem to mix the two.)
>
>> If you intend to implement a separate delete() call when driving this
>> editor, make sure you really have to and put up "big red signs" in the
>> documentation of your driver. Be aware that, in general, the driver should
>> do all of the (possibly hard) work of combining a delete() with an add() to
>> form an atomic replace, taking that burden off the various receivers. So if
>> you write a driver that doesn't do that, you have to make sure that all your
>> receivers (preferably very few) can handle a non-atomic replace.
>> Furthermore, a delete() call should occur always before, never after an
>> add() call for the same node.
>> ]]]

Yes, you're right.
Where is the line between editor requirements and usage recommendations?

Maybe you can help to choose a side... Or are these sides chosen by
individual implementations anyway?

I am more for forcing an atomic replace. I would rather remove the loophole
that says "well, but if you must, just issue separate calls". I'd personally
not mind if some parts of the code send separate calls, but I want e.g. the
merge callback receivers to be certain that they will *not* receive separate
calls *ever*.

How about we choose the stricter comment, but we let people use it
differently if they must without saying so now?

  "The receiver is not required to handle a separate delete() and add() call
   on the same path.
   The driver must always combine any delete() that is followed by an add()
   to a single add()-with-REPLACES_REV."


Some Qs:

- is it ok to disallow multiple add() and delete() calls on the same path,
  in principle?
   e.g. during merge, there are only the two sides, merge-left
   and merge-right. A replace means that both exist but are "unrelated". So
   a merge will never need to send more than one (delete, add) tuple.
   Usually it's either *one* delete or *one* add. The (add, delete) tuple
   simply cancels out, nothing being sent.
  But are there cases where we also send all the steps in-between? Then we
  could allow this: add(), add(REPLACES_REV), add(REPLACES_REV), delete().
  But *are there*?

- is it necessary to disallow multiple add() and delete() calls on the same
  path, to make the atomic replace useful?
   e.g. during merge, the receiver needs to cache all delete()s and match
   with add()s, carrying out the leftovers during dir_close(). And merge
   is not the only one in need of this quirky code construct.
  I want to eradicate this madness -- I'd love to force atomic replaces
  on everyone, so the receivers are entirely relieved of this quirkiness.
  Will allowing both separate and atomic replace calls defy my plan by
  actually adding more code to the receivers?

- is there a case where separate calls have a different meaning than an
  atomic replace?
   e.g. for merge, it doesn't matter whether the nodes are "related".
  I can't think of any case where we'd want to highlight that the node was
  deleted and later re-added with the same history, or something. Right?

~Neels
Attachments

Re: svn commit: r38394 - in trunk/subversion: include libsvn_delta

Author julianfoad
Full name Julian Foad
Date 2009-07-19 10:53:47 PDT
Message Neels J Hofmeyr wrote:
> Sorry Julian, I'm late with this reply...

No problem; it's good to hear your reply.

Julian Foad wrote:
> > Neels J Hofmeyr wrote:
> >> Julian Foad wrote:
> >>> On Thu, 2009-07-09 at 15:53 -0700, Neels Janosch Hofmeyr wrote:
> >>>> Enable an atomic replace in editor v2.
> >>>>
> >>>> * subversion/include/svn_editor.h
> >>>> (svn_editor_add_directory_t, svn_editor_add_directory,
> >>>> svn_editor_add_file_t, svn_editor_add_file,
> >>>> svn_editor_add_symlink_t, svn_editor_add_symlink,
> >>>> svn_editor_add_absent_t, svn_editor_add_absent):
> >>>> Add parameter REPLACES_REV.
> >>> Where is this parameter documented?
> >
> > [...]
> >> It felt kind of silly to only document that parameter in the midst of void.
> >> And I am too lazy right now to document all of them.
> >
> > But it's not silly. I can't review the change without knowing what it
> > means. I'm sure it seems totally obvious to you, but I have questions
> > about it.
>
> Yes I fully understand that. We could, too, ask gstein to add proper
> documentation for the rest of the parameters for the same reasons. I had the
> same kind of "hm" moment when I saw the docs missing pretty much completely.

We should document all of the parameters, yes. The reason I picked up on
this one in particular is that the others are (I think) pretty much
copied from the old equivalent API, so their semantics are (I hope) well
established and documented and we can transfer the documentation
relatively easily; but the new parameter doesn't have this route
available.

> My intention is to follow up on this once the time has come. I haven't added
> the parameter just so, I do have a specific application of it in mind (in
> the diff callbacks), so probably I won't forget about it.
>
> >
> >> However, I added a rough description in notes/editor-v2.txt in r38396.
> >
> > [[[
> > add_directory(path, children, props, replaces_rev)
> > -- name all children. receiver can put on "pending" list
> > -- MUST be followed (at some point) by add_*() calls for each child
> > -- if replaces_rev isn't SVN_INVALID_REVNUM, this is an atomic
> > replace.
> > Use a revnum because it implies all other info like node kind etc.
> > ]]]
> >
> > That doesn't actually say what the parameter means. Is it something like
> > this:
>
> Yeah, in fact, I wrote that stuff because it answers a question stsp had.
>
> > [[[
> > If replaces_rev is not SVN_INVALID_REVNUM:
> > (a) This 'add' atomically replaces the node (of kind 'file' or
> > 'directory') that would have existed (if this call and any associated
> > 'delete' call were ignored) at 'path' in the target revision of this
> > editor drive.
>
> There are different callbacks for file, dir, symlink and absent ... it
> doesn't make sense to speak of "'file' or 'directory'", right?

OK. What I really meant when I wrote "'file' or 'directory'" was "a node
that exists".

> > (b) The replaced node-rev must have unbroken history back to revision
> > 'replaces_rev', with no modifications of any kind.
>
> I think it doesn't *have to* must. It depends on the application of the
> editor: A diff may handle it one way, a merge another. Also, passing the
> revision is not because the revision was that important for a replace. The
> really important info is whether the replaced node is a directory, file, or
> whatever. The revision number just is the minimal bit of information that
> enables the receiver to find out *anything* it wants to know about the
> replaced node.
>
> > (c) A separate 'delete' of the node that previously existed should not
> > be issued by the driver, but must be accepted by the receiver (before or
> > after this call) for backward compatibility.
>
> I think it can't be after. The current code works around the non-atomic
> replace by caching the deletes and looking up each add. It wouldn't work if
> the order was reversed, AFAIR.
>
> > ]]]
> >
> > ?
>
> My version:
> [[[
> During a simple 'add', REPLACES_REV is SVN_INVALID_REVNUM.

Great. Instead of "a simple 'add'" we could spell out "adding a node
where there was none before this edit, or where a 'delete' has been
issued earlier in this edit".

> If REPLACES_REV is *not* SVN_INVALID_REVNUM, this is an atomic replace: The
> driver of the callback intends this add to atomically replace the node of
> the same name at revision REPLACES_REV. The receiver of the callback may
> trigger a tree-conflict if the receiving side has a different node kind,
> revision, etc., for a node with the same name (e.g. during merge), but that
> depends on what makes sense in the specific application.
>
> The REPLACES_REV argument is an svn_revnum_t, because by the node's path and
> its revision, the receiving side can find out *anything* it wants to know
> about the replaced node; first and foremost, its node kind.
>
> The driver is required to use add() with REPLACES_REV instead of a separate
> delete() call before this add(). Receivers of this callback in general are
> not required to support a separate delete() call for a 'replace'.

I think it would be clearer to state the editor requirements separately
from the usage recommendations. (This last paragraph and the following
one seem to mix the two.)

> If you intend to implement a separate delete() call when driving this
> editor, make sure you really have to and put up "big red signs" in the
> documentation of your driver. Be aware that, in general, the driver should
> do all of the (possibly hard) work of combining a delete() with an add() to
> form an atomic replace, taking that burden off the various receivers. So if
> you write a driver that doesn't do that, you have to make sure that all your
> receivers (preferably very few) can handle a non-atomic replace.
> Furthermore, a delete() call should occur always before, never after an
> add() call for the same node.
> ]]]
>
> What do you tink about this text?

Much better!

Thanks.
- Julian

Re: svn commit: r38394 - in trunk/subversion: include libsvn_delta

Author neels
Full name Neels Janosch Hofmeyr
Date 2009-07-18 14:20:07 PDT
Message Sorry Julian, I'm late with this reply...

Julian Foad wrote:
> Neels J Hofmeyr wrote:
>> Julian Foad wrote:
>>> On Thu, 2009-07-09 at 15:53 -0700, Neels Janosch Hofmeyr wrote:
>>>> Enable an atomic replace in editor v2.
>>>>
>>>> * subversion/include/svn_editor.h
>>>> (svn_editor_add_directory_t, svn_editor_add_directory,
>>>> svn_editor_add_file_t, svn_editor_add_file,
>>>> svn_editor_add_symlink_t, svn_editor_add_symlink,
>>>> svn_editor_add_absent_t, svn_editor_add_absent):
>>>> Add parameter REPLACES_REV.
>>> Where is this parameter documented?
>
> [...]
>> It felt kind of silly to only document that parameter in the midst of void.
>> And I am too lazy right now to document all of them.
>
> But it's not silly. I can't review the change without knowing what it
> means. I'm sure it seems totally obvious to you, but I have questions
> about it.

Yes I fully understand that. We could, too, ask gstein to add proper
documentation for the rest of the parameters for the same reasons. I had the
same kind of "hm" moment when I saw the docs missing pretty much completely.

My intention is to follow up on this once the time has come. I haven't added
the parameter just so, I do have a specific application of it in mind (in
the diff callbacks), so probably I won't forget about it.

>
>> However, I added a rough description in notes/editor-v2.txt in r38396.
>
> [[[
> add_directory(path, children, props, replaces_rev)
> -- name all children. receiver can put on "pending" list
> -- MUST be followed (at some point) by add_*() calls for each child
> -- if replaces_rev isn't SVN_INVALID_REVNUM, this is an atomic
> replace.
> Use a revnum because it implies all other info like node kind etc.
> ]]]
>
> That doesn't actually say what the parameter means. Is it something like
> this:

Yeah, in fact, I wrote that stuff because it answers a question stsp had.

> [[[
> If replaces_rev is not SVN_INVALID_REVNUM:
> (a) This 'add' atomically replaces the node (of kind 'file' or
> 'directory') that would have existed (if this call and any associated
> 'delete' call were ignored) at 'path' in the target revision of this
> editor drive.

There are different callbacks for file, dir, symlink and absent ... it
doesn't make sense to speak of "'file' or 'directory'", right?

> (b) The replaced node-rev must have unbroken history back to revision
> 'replaces_rev', with no modifications of any kind.

I think it doesn't *have to* must. It depends on the application of the
editor: A diff may handle it one way, a merge another. Also, passing the
revision is not because the revision was that important for a replace. The
really important info is whether the replaced node is a directory, file, or
whatever. The revision number just is the minimal bit of information that
enables the receiver to find out *anything* it wants to know about the
replaced node.

> (c) A separate 'delete' of the node that previously existed should not
> be issued by the driver, but must be accepted by the receiver (before or
> after this call) for backward compatibility.

I think it can't be after. The current code works around the non-atomic
replace by caching the deletes and looking up each add. It wouldn't work if
the order was reversed, AFAIR.

> ]]]
>
> ?

My version:
[[[
During a simple 'add', REPLACES_REV is SVN_INVALID_REVNUM.

If REPLACES_REV is *not* SVN_INVALID_REVNUM, this is an atomic replace: The
driver of the callback intends this add to atomically replace the node of
the same name at revision REPLACES_REV. The receiver of the callback may
trigger a tree-conflict if the receiving side has a different node kind,
revision, etc., for a node with the same name (e.g. during merge), but that
depends on what makes sense in the specific application.

The REPLACES_REV argument is an svn_revnum_t, because by the node's path and
its revision, the receiving side can find out *anything* it wants to know
about the replaced node; first and foremost, its node kind.

The driver is required to use add() with REPLACES_REV instead of a separate
delete() call before this add(). Receivers of this callback in general are
not required to support a separate delete() call for a 'replace'.

If you intend to implement a separate delete() call when driving this
editor, make sure you really have to and put up "big red signs" in the
documentation of your driver. Be aware that, in general, the driver should
do all of the (possibly hard) work of combining a delete() with an add() to
form an atomic replace, taking that burden off the various receivers. So if
you write a driver that doesn't do that, you have to make sure that all your
receivers (preferably very few) can handle a non-atomic replace.
Furthermore, a delete() call should occur always before, never after an
add() call for the same node.
]]]

What do you tink about this text?
~Neels
Attachments

Re: svn commit: r38394 - in trunk/subversion: include libsvn_delta

Author julianfoad
Full name Julian Foad
Date 2009-07-11 17:04:55 PDT
Message Neels J Hofmeyr wrote:
> Julian Foad wrote:
> > On Thu, 2009-07-09 at 15:53 -0700, Neels Janosch Hofmeyr wrote:
> >> Enable an atomic replace in editor v2.
> >>
> >> * subversion/include/svn_editor.h
> >> (svn_editor_add_directory_t, svn_editor_add_directory,
> >> svn_editor_add_file_t, svn_editor_add_file,
> >> svn_editor_add_symlink_t, svn_editor_add_symlink,
> >> svn_editor_add_absent_t, svn_editor_add_absent):
> >> Add parameter REPLACES_REV.
> >
> > Where is this parameter documented?

[...]
> It felt kind of silly to only document that parameter in the midst of void.
> And I am too lazy right now to document all of them.

But it's not silly. I can't review the change without knowing what it
means. I'm sure it seems totally obvious to you, but I have questions
about it.

> However, I added a rough description in notes/editor-v2.txt in r38396.

[[[
add_directory(path, children, props, replaces_rev)
  -- name all children. receiver can put on "pending" list
  -- MUST be followed (at some point) by add_*() calls for each child
  -- if replaces_rev isn't SVN_INVALID_REVNUM, this is an atomic
replace.
     Use a revnum because it implies all other info like node kind etc.
]]]

That doesn't actually say what the parameter means. Is it something like
this:

[[[
If replaces_rev is not SVN_INVALID_REVNUM:
(a) This 'add' atomically replaces the node (of kind 'file' or
'directory') that would have existed (if this call and any associated
'delete' call were ignored) at 'path' in the target revision of this
editor drive.
(b) The replaced node-rev must have unbroken history back to revision
'replaces_rev', with no modifications of any kind.
(c) A separate 'delete' of the node that previously existed should not
be issued by the driver, but must be accepted by the receiver (before or
after this call) for backward compatibility.
]]]

?

- Julian

Re: svn commit: r38394 - in trunk/subversion: include libsvn_delta

Author neels
Full name Neels Janosch Hofmeyr
Date 2009-07-11 15:34:49 PDT
Message Julian Foad wrote:
> On Thu, 2009-07-09 at 15:53 -0700, Neels Janosch Hofmeyr wrote:
>> Author: neels
>> Date: Thu Jul 9 15:53:33 2009
>> New Revision: 38394
>>
>> Log:
>> Enable an atomic replace in editor v2.
>>
>> * subversion/include/svn_editor.h
>> (svn_editor_add_directory_t, svn_editor_add_directory,
>> svn_editor_add_file_t, svn_editor_add_file,
>> svn_editor_add_symlink_t, svn_editor_add_symlink,
>> svn_editor_add_absent_t, svn_editor_add_absent):
>> Add parameter REPLACES_REV.
>
> Where is this parameter documented?

That's a very reasonable question.

So far, svn_editor.h's *only* comment is:

[[[
/** An abstract object that edits a target tree.
 *
 * ### more docco
 */
]]]

It felt kind of silly to only document that parameter in the midst of void.
And I am too lazy right now to document all of them.
However, I added a rough description in notes/editor-v2.txt in r38396.

~Neels

>
> - Julian
>
>> * subversion/libsvn_de​lta/editor.c
>> (svn_editor_add_directory,
>> svn_editor_add_file,
>> svn_editor_add_symlink,
>> svn_editor_add_absent):
>> Add parameter REPLACES_REV and pass to callbacks.
>
> --------------------​--------------------​--------------
> http://subversion.ti​gris.org/ds/viewMess​age.do?dsForumId=462​&dsMessageId=236​9635
Attachments

Re: svn commit: r38394 - in trunk/subversion: include libsvn_delta

Author julianfoad
Full name Julian Foad
Date 2009-07-10 03:35:24 PDT
Message On Thu, 2009-07-09 at 15:53 -0700, Neels Janosch Hofmeyr wrote:
> Author: neels
> Date: Thu Jul 9 15:53:33 2009
> New Revision: 38394
>
> Log:
> Enable an atomic replace in editor v2.
>
> * subversion/include/svn_editor.h
> (svn_editor_add_directory_t, svn_editor_add_directory,
> svn_editor_add_file_t, svn_editor_add_file,
> svn_editor_add_symlink_t, svn_editor_add_symlink,
> svn_editor_add_absent_t, svn_editor_add_absent):
> Add parameter REPLACES_REV.

Where is this parameter documented?

- Julian

> * subversion/libsvn_de​lta/editor.c
> (svn_editor_add_directory,
> svn_editor_add_file,
> svn_editor_add_symlink,
> svn_editor_add_absent):
> Add parameter REPLACES_REV and pass to callbacks.
Messages per page: