When you use a folder in your daily work, you naturally find yourselves moving things around. For example, shortly after I started blogging, I realized that my work documents folder had become cluttered with half-written article ideas, so I changed the names of these files to start with "do-" and moved them into my blog and solution subfolders, both of which had previously been only for finished articles.
We all make these sorts of changes, right? However, since work documents is the root of a Groove file sharing workspace, I had to keep in mind how the changes might affect performance. Here are a few things to be aware of when you start reorganizing a GFS folder.
Factors affecting workspace performance
Consider what happens when you rename a file. Workspace members who are logged into Groove need to process the new file name, but unless you have a huge number of files, this isn't a big deal. However, if a member is not logged in at the time of this operation, their Groove installation will only see the results of the change -- an apparent file deletion and a new file. In this case, the entire file is resent (or, if it exceeds automatic download limits, it must be fetched again).
If you rename a subfolder, the same conditions apply, but now if a member is offline when you perform the move they must re-fetch everything in that folder.
But what if you are not running Groove when you make these changes? After all, you can use a Groove-synchronized folder when Groove isn't active. The same problems apply. Since Groove has no information about operating system events that occurred when it was not an active process, each renaming operation becomes deleting a file followed by adding a file, and those changes must be sent to every workspace member.
In some rare cases, Groove can miss a renaming anyway. This usually happens if there is a lot of disk activity going on. If Groove doesn't get the file system events close enough together, it may not succeed in matching them and detecting that an existing file has been renamed.
Moving a file is similar, if you move the file within a directory. However, if you have a file several folders deep and move it to the top level, or to a parallel folder with a high divergence point, Groove has to make more changes to the workspace record. If you move a large volume of data this way, Groove may not be able to send all the changes effectively. In this case, the entire workspace will be resent to all members.
In my case, these changes all had to be sent to my other computer, but because I am the only member of the workspace, I didn't need to worry about offline members. I made sure both computers were active in Groove, and that I wasn't running anything disk intensive on either one, and everything went smoothly.
Recommendations
Plan:
Communicate:
Execute:
With these guidelines, you should be able to rearrange your GFS folders without synchronization problems.
Thanks
something all users should know
we who have been messing with groove for 8 years now are aware that the same behavour applies to the files tool - ie renaming or changing folders will cause groove to resync.
something files tool users also need to be made aware off is how space is used. (Does not apply to Shared Folders as that is at the OS level)and delted / moved files are handled as normal
In the files tool Groove never reclaims space when a file is deleted and duplicates space used even if a file or whole folder is rearranged. So we always advise uses to plan their folder structure etc well before they start adding / deleting files all over - a space with just 100 MB of real files could easily show usage of 3-500 MB if there is major restructuring within a tool.
In the case of folder share it just affects the bandwidth used but freaks out users who "think" they have changed nothing and suddenly see MBs of deltas out/in.
Should be better info right up front when such spaces are created or files tool used, to alert new users to this behavour.
rds
I believe you are misunderstanding your observations. Groove does recover file space. However, the mechanisms by which it does so are complex, because they are designed to help prevent data loss rather than to optimize disk usage. In some situations, you may never be seeing that space as recovered, because you always have more deletions and unsynchronized changes queued.
In a two-member space, I tried creating three folders, moving files among them, moving two folders into another folder, renaming the top-level folder, moving that folder into a new folder -- as expected, I saw minor amounts of data queued to transmit, but the size of the space (from Properties) remained the same. I went offline, added a 2+ MB file, and saw a little under 6 MB queued to transmit -- that seems about right as my account is on one other computer, and the other member has his account on at least three computers. I came back on line and that was processed.
Now, after deleting some data, I did not see a drop in workspace size. There are two reasons for this:
* The other user is only logged on from one of his active computers. The changes cannot be deleted from the workspace record until his other computers either receive them or stay inactive long enough for the workspace to stop synchronizing to them.
* I may not have enough deletions queued for them to be purged from workspace.
The second is the tricky one, because you don't know at exactly what point you have queued enough changes or when the queue will be evaluated, and if you keep making more, you may go over while the first condition is a factor, and start filling that queue again.
Because of these factors, constantly active workspaces are often larger than ones with a slower pace of activity. It will be a while before I can come back to this workspace assured that the first factor is not in effect, but I will check it again. I do know that the purge system works from looking at older, fairly stable spaces. For example, one of my team's spaces is mostly files. The files are updated occasionally -- added, deleted, or changed -- and some of it was restructured about a year ago when we added a new Files tool -- and the space is approximately the size of the file contents. Ideally, it would be smaller, since files are compressed, but we have two members who have recently been inactive, which probably explains the discrepancy.
The general process is documented in "How to recover disk space that is used by a Groove workspace" at http://support.microsoft.com/ kb/922142.
sorry
talking about files tool not the folder sync.
To my understanding groove never recovers space from deletion of any data in any tool else the checksum of the tool would change and groove would have to verify every delta every time
Also every time a new member is invited groove needs the atomic history of the space to maintain sync
so groove hides data that is deleted but never removes it unless you create a duplicate of the space - then it removes invitees and crunches the space because there is no history. but everyone has to be invited afresh.
does not apply to folder sync as that spaces is handled at the OS level
rgds
Yes, I am talking about the Files tool in a standard workspace.
The article that I linked to was extensively reviewed by the software architect responsible for that capability. Groove does maintain a history, but that does not contain all data that was ever in the workspace indefinitely. The data sent at invitation goes at least back to the point where all endpoints have identical data, and includes unpurged deletions, but not deletions before the last purge.
I will recheck these details when I am in the office next week.
you may be right - i cannot speak openly because of NDA but we do know that the groove engine went / is going through a revision after the MS acquisition - my comment is based on the original groove engine, which i knew and tested very well.
i am aware that optimisation of the sync process is proposed but cannot comment on whether it is already there or not.
NDA is a peculiar thing and we can never be sure of what is in the open elsewhere.
best regards
ashok
Hi Ashok,
This mechanism (batch purge of data deletions that have been acknowledged by all workspace endpoints) was already loosely documented when I started at Groove, around the time of the 1.3 release. (I was MoonlightGroove on the forums.)
The article has been through some revisions since then, as it was fact-checked when we were reviewing web content for a major Groove Networks release (I think it was 3.0) and again for the Microsoft Office Groove 2007 release, but the basics of how deletions work have been fairly stable during that time. I can't comment on any future plans, of course.
Hi Moonlight
great to hear from you :-)
I will of-course defer to your insiders knowledge.
Must say however that i have not "seen" this batch delete happen - but then i am just looking at the space bloat - quite possible that some deleted matter but not is removed but (some) excess space usage still remains.
the principle that i find safe to pass on to users is however that we can't expect normal space reclamation so be careful
cheers