Login | Register
My pages Projects Community openCollabNet

Discussions > dev [DISABLED] > Re: design document

subversion
Discussion topic

Hide all messages in topic

All messages in topic

Re: design document

Author hwright
Full name Hyrum K. Wright
Date 2009-05-30 07:00:38 PDT
Message On May 30, 2009, at 5:45 AM, HuiHuang wrote:

>
> >Could this be as simple as adding a pointer to a working copy
> access baton
> >to the committable structure, plus a master array of top-level
> working copy
> >access batons (for post-commit releasing)?
>
> This is what I think, too. But Hyrum say that we will just use a
> single
> svn_wc_context_t instead of carrying around a collection of access
> batons
> in the future. So I had better to konw svn_wc_context_t first and then
> think about how to use it in my work.

I'd stay with the status quo for right now, with the expectation that
things will change over the course of the next few months. If you
introduce *new* wc APIs, which I don't think will be needed, then they
should use svn_wc_context_t.

>
> To Hyrum,
>
> Thank you very much. Here I have a question. I find out definitions of
> svn_wc_adm_access_t and svn_wc_context_t as following, obviously there
> are more information in svn_wc_adm_access_t than in svn_wc_context_t.
> So how can svn_wc_context_t replace svn_wc_adm_access_t?

svn_wc_context_t is an opaque structure which contains a pointer to an
svn_wc__db_t. It's this database handle which contains the real
information used to interface with the working copy metadata. The
database will also handle locking, concurrency, and multiple directory
opening, only instead of keeping track of a collection of them, we
only need one. This is both easier, but it will also help as we move
toward a one-db-per-working copy paradigm.

> By the way, would you mind to tell me where can I find public API
> changes
> made to libsvn_wc? Thanks~

'svn log subversion/include/svn_wc.h' :)

By the way, I'm happy to answer questions, as are a number of other
people, but sometimes you can get the information you are looking for
using a judicious combination of 'svn log' and 'svn praise'.
'praise' (or 'blame', if you prefer) gives line-by-line change
history, and our log messages usually contain a good rationale for the
accompanying change. Using these tools, you can mine the repository,
which is often much better than mining developers' heads.

> //In libsvn_wc/lock.c
> struct svn_wc_adm_access_t
> {
> /* PATH to directory which contains the administrative area */
> const char *path;
> /* And the absolute form of the path. */
> const char *abspath;
> enum svn_wc__adm_access_type {
> /* SVN_WC__ADM_ACCESS_UNLOCKED indicates no lock is held allowing
> read-only access */
> svn_wc__adm_access_unlocked,
> /* SVN_WC__ADM_ACCESS_WRITE_LOCK indicates that a write lock is
> held
> allowing read-write access */
> svn_wc__adm_access_write_lock,
> /* SVN_WC__ADM_ACCESS_CLOSED indicates that the baton has been
> closed. */
> svn_wc__adm_access_closed
> } type;
> /* LOCK_EXISTS is set TRUE when the write lock exists */
> svn_boolean_t lock_exists;
> /* Handle to the administrative database. */
> svn_wc__db_t *db;
> /* Was the DB provided to us? If so, then we'll never close it. */
> svn_boolean_t db_provided;
> /* ENTRIES_HIDDEN is all cached entries including those in
> state deleted or state absent. It may be NULL. */
> apr_hash_t *entries_all;
> /* POOL is used to allocate cached items, they need to persist for
> the
> lifetime of this access baton */
> apr_pool_t *pool;
> };
>
>
> //In libsvn_wc/wc.h
> /*** Context handling ***/
> struct svn_wc_context_t
> {
> /* The wc_db handle for this working copy. */
> svn_wc__db_t *db;
> /* The state pool for this context. */
> apr_pool_t *state_pool;
> };
>
> Thanks all!
>
> Huihuang
>
>
> 2009-05-30
> yellow.flying
>
>
>

Re: design document

Author yellowflying
Full name HuiHuang
Date 2009-05-30 03:44:58 PDT
Message >Could this be as simple as adding a pointer to a working copy access baton
>to the committable structure, plus a master array of top-level working copy
>access batons (for post-commit releasing)?

This is what I think, too. But Hyrum say that we will just use a single
svn_wc_context_t instead of carrying around a collection of access batons
in the future. So I had better to konw svn_wc_context_t first and then
think about how to use it in my work.

To Hyrum,

Thank you very much. Here I have a question. I find out definitions of
svn_wc_adm_access_t and svn_wc_context_t as following, obviously there
are more information in svn_wc_adm_access_t than in svn_wc_context_t.
So how can svn_wc_context_t replace svn_wc_adm_access_t?

By the way, would you mind to tell me where can I find public API changes
made to libsvn_wc? Thanks~

//In libsvn_wc/lock.c
struct svn_wc_adm_access_t
{
  /* PATH to directory which contains the administrative area */
  const char *path;
  /* And the absolute form of the path. */
  const char *abspath;
  enum svn_wc__adm_access_type {
    /* SVN_WC__ADM_ACCESS_UNLOCKED indicates no lock is held allowing
       read-only access */
    svn_wc__adm_access_unlocked,
    /* SVN_WC__ADM_ACCESS_WRITE_LOCK indicates that a write lock is held
       allowing read-write access */
    svn_wc__adm_access_write_lock,
    /* SVN_WC__ADM_ACCESS_CLOSED indicates that the baton has been
       closed. */
    svn_wc__adm_access_closed
  } type;
  /* LOCK_EXISTS is set TRUE when the write lock exists */
  svn_boolean_t lock_exists;
  /* Handle to the administrative database. */
  svn_wc__db_t *db;
  /* Was the DB provided to us? If so, then we'll never close it. */
  svn_boolean_t db_provided;
  /* ENTRIES_HIDDEN is all cached entries including those in
     state deleted or state absent. It may be NULL. */
  apr_hash_t *entries_all;
  /* POOL is used to allocate cached items, they need to persist for the
     lifetime of this access baton */
  apr_pool_t *pool;
};


//In libsvn_wc/wc.h
/*** Context handling ***/
struct svn_wc_context_t
{
  /* The wc_db handle for this working copy. */
  svn_wc__db_t *db;
  /* The state pool for this context. */
  apr_pool_t *state_pool;
};

Thanks all!

Huihuang


2009-05-30



yellow.flying
Attachments

Re: design document

Author stsp
Full name Stefan Sperling
Date 2009-05-29 07:40:33 PDT
Message
Attachments

Re: design document

Author hwright
Full name Hyrum K. Wright
Date 2009-05-29 06:40:56 PDT
Message On May 29, 2009, at 8:31 AM, C. Michael Pilato wrote:

> yellow.flying wrote:
>> **>I wasn't anticipating the change you seem to be proposing, where
>> the
>>> committables are grouped by working copy.
>>> My redesign of the commit process long ago assumes no need to group
>>> committables by working copy, only by repository. Commits are
>>> driven based
>>> on the committable's URL today -- *not* based on its working copy
>>> path.
>>
>> I see that the committing files are still grouped by working copy
>> in native
>> implement of commit now, but it can be extend to group by
>> repository. So
>> if "committables" you design can deal with the later situation, I can
>> reuse it.
>>
>> you say "no need to group committables by working copy, only by
>> repository",
>> do you mean that "base_dir_access" in the following function is not
>> necessary
>> a working copy access baton but a access baton to the base
>> directory of
>> several working copies from the same repository?
>
> I had forgotten about the access baton situation. (Actually, I
> think my
> code was written before we had access batons ... it was the addition
> of the
> access baton paradigm that made this all stop working, if I recall
> correctly.)
>
> Could this be as simple as adding a pointer to a working copy access
> baton
> to the committable structure, plus a master array of top-level
> working copy
> access batons (for post-commit releasing)?

<sidebar>
Access batons are slowly disappearing from internal use in the working
copy library, and will hopefully go extinct in the client library as
well. In the future, we will just use a single svn_wc_context_t,
instead of carrying around a collection of access batons. I don't
know how this will impact the problem at hand, but it's good to keep
in mind.

Because this work is pretty tightly coupled to the working copy, I'd
encourage whoever is coding and reviewing it to closely follow any
public API changes made to libsvn_wc.
</sidebar>

-Hyrum

Re: design document

Author cmpilato
Full name C. Michael Pilato
Date 2009-05-29 06:32:00 PDT
Message yellow.flying wrote:
> **>I wasn't anticipating the change you seem to be proposing, where the
>>committables are grouped by working copy.
>>My redesign of the commit process long ago assumes no need to group
>>committables by working copy, only by repository. Commits are driven based
>>on the committable's URL today -- *not* based on its working copy path.
>
> I see that the committing files are still grouped by working copy in native
> implement of commit now, but it can be extend to group by repository. So
> if "committables" you design can deal with the later situation, I can
> reuse it.
>
> you say "no need to group committables by working copy, only by repository",
> do you mean that "base_dir_access" in the following function is not
> necessary
> a working copy access baton but a access baton to the base directory of
> several working copies from the same repository?

I had forgotten about the access baton situation. (Actually, I think my
code was written before we had access batons ... it was the addition of the
access baton paradigm that made this all stop working, if I recall correctly.)

Could this be as simple as adding a pointer to a working copy access baton
to the committable structure, plus a master array of top-level working copy
access batons (for post-commit releasing)?

--
C. Michael Pilato <cmpilato at collab dot net>
CollabNet <> www.collab.net <> Distributed Development On Demand
Attachments

Re: design document

Author yellowflying
Full name HuiHuang
Date 2009-05-28 20:36:34 PDT
Message >I wasn't anticipating the change you seem to be proposing, where the
>committables are grouped by working copy.
>My redesign of the commit process long ago assumes no need to group
>committables by working copy, only by repository. Commits are driven based
>on the committable's URL today -- *not* based on its working copy path.

I see that the committing files are still grouped by working copy in native
implement of commit now, but it can be extend to group by repository. So
if "committables" you design can deal with the later situation, I can reuse it.

you say "no need to group committables by working copy, only by repository",
do you mean that "base_dir_access" in the following function is not necessary
a working copy access baton but a access baton to the base directory of
several working copies from the same repository?

cmt_err = svn_client__harvest_​committables(&co​mmittables,
                                                  &lock_tokens,
                                                  base_dir_access,
                                                  rel_targets,
                                                  depth,
                                                  ! keep_locks,
                                                  changelists,
                                                  ctx,
                                                  pool)

>That's what allows us to theoretically get atomicity in a commit that spans
>multiple working copies which point to the same repository.
>Again, the code may be so stale and so tweaked by now that the design I had
>in mind is now useless. And I'm certainly not tied to those old ideas. I
>just don't want to see unnecessary effort invested if we can avoid it.

Yes, I see there is plenty of code I can reuse.

Thanks


2009-05-29



yellow.flying
Attachments

Re: design document

Author stsp
Full name Stefan Sperling
Date 2009-05-28 10:27:20 PDT
Message
Attachments

Re: design document

Author cmpilato
Full name C. Michael Pilato
Date 2009-05-28 07:23:21 PDT
Message Stefan Sperling wrote:
> On Thu, May 28, 2009 at 08:58:23AM -0400, C. Michael Pilato wrote:
>> When I rewrote the commit driving code a long time ago, I anticipated the
>> need to handle commits to multiple repositories. The code may have gone a
>> bit stale, but the logic that harvests "committables" stores those items in
>> a hash that was designed to be primarily keyed on some unique repository
>> attribute (UUID, repos URL, or something). Of course, I think this was back
>> before we stored such things in our working copy, so I used a single static
>> key for that hash for the time being. But I still hold some hope that that
>> code can be revived and massaged into doing what is expected.
>
> So you're saying that there already are provisions in the existing
> code for the "commit to multiple repositories" problem which are
> orthogonal to what Hui Huang is doing?
>
> I mean, according to your description, the current data structure
> hierarchy looks somewhat like this:
>
> +-----------------+
> | repo hash table | <-- keyed statically right now,
> +-----------------+ so it only has one entry
> |
> v
> (commitables) <-- list of commitables (or a hash table or whatever)
>
> When we start storing commitables for multiple working copies,
> we will still have commitables grouped per repository anyway.
> So we'll change the above to something like:
>
> +-----------------+
> | repo hash table | <-- still keyed statically
> +-----------------+
> |
> v
> (commitables WC1, commitables WC2, ..., commitables WCn)
>
> Extending the commit mechanism to use multiple keys into the
> per-repository hash instead of a static one is something which
> can still be done later. It even could be done right now before
> Hui Huang works on the code, because it's "one level above"
> of what he is doing.
>
> Did I understand you correctly?

I wasn't anticipating the change you seem to be proposing, where the
committables are grouped by working copy.

My redesign of the commit process long ago assumes no need to group
committables by working copy, only by repository. Commits are driven based
on the committable's URL today -- *not* based on its working copy path.
That's what allows us to theoretically get atomicity in a commit that spans
multiple working copies which point to the same repository.

Again, the code may be so stale and so tweaked by now that the design I had
in mind is now useless. And I'm certainly not tied to those old ideas. I
just don't want to see unnecessary effort invested if we can avoid it.

--
C. Michael Pilato <cmpilato at collab dot net>
CollabNet <> www.collab.net <> Distributed Development On Demand
Attachments

Re: design document

Author stsp
Full name Stefan Sperling
Date 2009-05-28 06:25:37 PDT
Message
Attachments

Re: design document

Author cmpilato
Full name C. Michael Pilato
Date 2009-05-28 05:58:27 PDT
Message Stefan Sperling wrote:
> On Thu, May 28, 2009 at 09:46:14AM +0200, Branko Cibej wrote:
>> Stefan Sperling wrote:
>>> On Wed, May 27, 2009 at 03:21:40PM +0200, Branko Cibej wrote:
>>>> But why restrict to a single repsitory? I agree that one transaction per
>>>> repository makes sense; however, I see no reason to not launch several
>>>> commit transactions within one svn_client_commit. By the way, this would
>>>> be an elegant solution for
>>>> http://subversion.ti​gris.org/issues/show​_bug.cgi?id=1167
>>>>
>>> Let's just go one step at a time, and focus on the "multiple working
>>> copies" issue first. Once that is solved, we can easily extend it
>>> to "multiple repositories".
>>>
>> I agree on the one-step-at-a-time approach, but I'm not sure about the
>> "easily" unless the current design at least takes the next step into
>> account.
>
> Then please invest the time to suggest how the design can be made to
> take the next step into account without major effort.
>
> The focus right now is #2381, and not #1167. Making #1167 an extra
> requirement creates extra work for Hui Huang. Having to solve an extra
> problem that isn't on the charter of his gsoc project is not what gsoc
> is about.
>
> I am sure that it is possible to solve both issues eventually,
> no matter what we do now. We can still bend the existing design later
> if we find that it is insufficent to also satisfy #1167. But if you
> have a clever idea that would already help #1167 a little, then please
> tell it to us so we can discuss whether it's worth doing it.
> But it must not create another huge pile of work, because there already
> is a huge pile of work in front of Hui Huang.

When I rewrote the commit driving code a long time ago, I anticipated the
need to handle commits to multiple repositories. The code may have gone a
bit stale, but the logic that harvests "committables" stores those items in
a hash that was designed to be primarily keyed on some unique repository
attribute (UUID, repos URL, or something). Of course, I think this was back
before we stored such things in our working copy, so I used a single static
key for that hash for the time being. But I still hold some hope that that
code can be revived and massaged into doing what is expected.

--
C. Michael Pilato <cmpilato at collab dot net>
CollabNet <> www.collab.net <> Distributed Development On Demand
Attachments

Re: design document

Author stsp
Full name Stefan Sperling
Date 2009-05-28 04:50:58 PDT
Message
Attachments

Re: design document

Author brane
Full name Branko Cibej
Date 2009-05-28 00:46:18 PDT
Message Stefan Sperling wrote:
> On Wed, May 27, 2009 at 03:21:40PM +0200, Branko Cibej wrote:
>
>> HuiHuang wrote:
>>
>>> 3b) Suggested change for Subversion
>>>
>>>
>>>
>>> I think that “One Commit, One Transaction” is the best(and this is
>>> also compatible with
>>>
>>> original system). So commit files from different repositories at a
>>> time and break them into
>>>
>>> several transactions may be not a good idea. And I suggest that we
>>> should constrain
>>>
>>> committing files in the same repository.
>>>
>>>
>> But why restrict to a single repsitory? I agree that one transaction per
>> repository makes sense; however, I see no reason to not launch several
>> commit transactions within one svn_client_commit. By the way, this would
>> be an elegant solution for
>> http://subversion.ti​gris.org/issues/show​_bug.cgi?id=1167
>>
>
> Let's just go one step at a time, and focus on the "multiple working
> copies" issue first. Once that is solved, we can easily extend it
> to "multiple repositories".
>

I agree on the one-step-at-a-time approach, but I'm not sure about the
"easily" unless the current design at least takes the next step into
account.

-- Brane

Re: design document

Author stsp
Full name Stefan Sperling
Date 2009-05-27 07:32:39 PDT
Message
Attachments

Re: design document

Author stsp
Full name Stefan Sperling
Date 2009-05-27 07:10:50 PDT
Message
Attachments

Re: design document

Author brane
Full name Branko Cibej
Date 2009-05-27 06:21:44 PDT
Message HuiHuang wrote:
>
> 3b) Suggested change for Subversion
>
>
>
> I think that “One Commit, One Transaction” is the best(and this is
> also compatible with
>
> original system). So commit files from different repositories at a
> time and break them into
>
> several transactions may be not a good idea. And I suggest that we
> should constrain
>
> committing files in the same repository.
>

But why restrict to a single repsitory? I agree that one transaction per
repository makes sense; however, I see no reason to not launch several
commit transactions within one svn_client_commit. By the way, this would
be an elegant solution for
http://subversion.ti​gris.org/issues/show​_bug.cgi?id=1167

-- Brane

design document

Author yellowflying
Full name HuiHuang
Date 2009-05-26 17:32:59 PDT
Message Hi,

The following is design document for "commit from multiple woring copies".
If there is any problem, please tell me and I will modify it.

Thanks all~

Name: Commit from Multiple Working Copies.
Author: HuiHuang.
Date: 2009-5-26.
Version: 1.0.
 
1) Expected behavior
When committing files, listing their paths, no matter whether they belong to the same work copy
or not, if they all live in the same repository, they should be committed in one transaction successfully.
 
2) Actual behavior
2a) If the committing files belong to the same working copy, they will be committed in one
transaction successfully.
2b) Otherwise, if they belong to more than one working copy, svn will output an error which
indicates that their common ancestor is not a work copy and commit action fails.
 
3) Suggested change
This section has two parts. The first part introduces how this issue is solved in SVNKit, and then
I will give my suggestions on how we should solve this issue in the second part.
 
3a) SVNKit's solution
 
1. SVNKit receives list of paths to commit.
 
2. All paths are grouped by wc root path, so we get a map of wc_root:paths pairs - one for each
working copy.
 
3. For each pair in Map we create SVNWCAccess object - wc_access - which is actually a collection
of directories being opened for commit (same as svn_wc_adm_access_t set in native SVN).
 
4. For each wc_access we collect items to commit, same way as for "normal" commit. Each item
refers to its wc_access. Then we group all items by its repos_url and repos_uuid (fetching it from
repos if not available).
 
5. So we have commit items grouped by repository root URL, each item may refer to its own
wc_access (working copy). We call such a group a "commit packet".
 
7. Now we have a list of commit packets-one for each repository. Then we will commit each
commit packet as a transaction.
 
8. During commit and in post commit code we use that wc_access references that is stored
 in each commit item to update corresponding working copy, write and execute log files
and then finally to close all open directories.
 
3b) Suggested change for Subversion
 
I think that “One Commit, One Transaction” is the best(and this is also compatible with
original system). So commit files from different repositories at a time and break them into
several transactions may be not a good idea. And I suggest that we should constrain
committing files in the same repository.
 
1. We receives list of paths to commit.
 
2. All paths are grouped by wc root path, so we get a map of wc_root:paths pairs - one for
each working copy.
 
3. For each pair in Map we create svn_wc_adm_access_t- wc_access.
 
4. For each wc_access we collect items to commit. Each item refers to its wc_access.
 
5. If there are more than one working copies, then we check all items by its repos_url
and repos_uuid (fetching it from repos if not available). If they are not from the same repository,
return with error, otherwise, combine them into one group.
 
6. Commit these items as a transaction.
 
7. During commit and in post commit code we use that wc_access references that is stored
in each commit item to update corresponding working copy, write and execute log files and
then finally to close all open directories.

HuiHuang
2009-05-27



yellow.flying
Attachments
Messages per page: