Login | Register
My pages Projects Community openCollabNet

Discussions > commits > svn commit: r5 - trunk/fsvs/doc

fsvs
Discussion topic

Back to topic list

svn commit: r5 - trunk/fsvs/doc

Author pmarek
Full name P.Marek
Date 2005-09-27 21:49:37 PDT
Message Author: pmarek
Date: Tue Sep 27 21:49:36 2005
New Revision: 5

Added:
   trunk/fsvs/doc/DESIGN (contents, props changed)
Modified:
   trunk/fsvs/doc/TODO
Log:
TODO update
DESIGN document started

Added: trunk/fsvs/doc/DESIGN
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/doc/DES​IGN?view=auto&re​v=5
====================​====================​====================​==================
--- (empty file)
+++ trunk/fsvs/doc/DESIGN Tue Sep 27 21:49:36 2005
@@ -0,0 +1,88 @@
+The internal design
+-------------------
+
+0) Terms used in this document
+
+entry In subversion speak an entry is either a directory, a symlink or a file. In fsvs it can additionally be a block or character device. Sockets an pipes are currently ignored, as they're typically re-created by the various applications.
+waa This means "working copy administrative area"; a directory, where local data is stored.
+
+
+1) The waa storage area
+
+1a) Format
+The waa area is the local storage area for fsvs.
+Here the local data is stored, ie. filelists, ignorelists, repository URLs, and more.
+
+Generally a path can be of (nearly) arbitrary length, and have every character (saving NUL, "\0") in it.
+Therefore the paths get hashed into a MD5-sum, and all information regarding this path uses the MD5-sum as specifier.
+
+So whereever pathnames or similar things are stored (eg. patterns), they are NUL-terminated.
+Furthermore the data sets have a linefeed ("\n") after them, to ease inspection in an editor.
+
+
+1aI) Filelists
+The filelists remember the last committed state of entries. That includes the ctime, mtime, unix-mode (with flags for directory/device/symlink/file), MD5 sum, size in bytes, inode, tree relation, number of child nodes, user and group, and filename.
+The path can be recreated from the tree-structure and the filenames.
+
+The header includes fields such as header version, header length, number of entries, needed space for the filenames, and the length of the longest path.
+
+
+1aII) Ignore lists
+They consist of a header with the number of ignore entries, followed by the ignore patterns; NUL-terminated, LF-separated.
+
+
+1aIII) URL lists
+They consist of a header with the number of URLs, followed by the URLs themselves; NUL-terminated, LF-separated.
+
+
+1aVI) MD5
+To speed up comparing and committing large files, these files should hold some parameters for manber hashing and a list of MD5 hashes for the manber blocks.
+This way big files don't have to be hashed in full to check whether they've changed.
+And the manber blocks can be used for the delta algorithm.
+
+
+1aV) Prop
+These files should store properties, which are not understood or not used by fsvs.
+
+
+1aIV) New
+I don't know if these are really needed.
+
+
+
+1b) Location
+The default location is /var/spool/fsvs, but can be specified with the enviroment variable "WAA".
+
+
+1c) Requirements
+The waa doesn't need that much space. The ignore and url lists are typically a few kB.
+And as soon as the filelist is stored compressed, the needed space should not exceed a few MB.
+
+
+2) In memory layout
+
+In memory fsvs builds a complete tree of the needed entries (struct estat).
+They are referenced with the parent pointers to the root, and the by_inode and by_name downwards.
+
+The by_inode and by_name members are pointers to arrays of pointers to entries (:-); their values must be the same, apart from the sorting order.
+While running through a directory (or the whole tree in waa__input_tree) we use the by_inode-order, as this is faster; when checking a directory for new entries, we compare by name and use by_name.
+
+
+2a) Storage and allocation
+Every node can have string space allocated, ie. space needed for the filenames in this directory (and possibly sub-directories, too.)
+On loading of the list in waa__input_tree() two memory ranges are allocated - on for the "struct estat"'s read, and one for the filenames.
+
+Because of this free()ing of entries is currently not possible - a freelist should be implemented.
+
+
+3) Algorithms
+Generally I tried to use fast and simple algorithms better than O(n).
+Though it's entirely possible that I forgot something.
+
+
+.) Searching for an entry in a directory (in memory) is O(log2(n)), as I use bsearch().
+.) Determining the correct order for writing the entries (in waa__output_tree()) is optimized by having all lists sorted; about half the time a single compare is enough to determine the next written entry.
+
+
+
+# vi: sw=8 ts=8

Modified: trunk/fsvs/doc/TODO
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/doc/TOD​O?view=diff&rev=​5&p1=trunk/fsvs/​doc/TODO&r1=4​&p2=trunk/fsvs/doc/​TODO&r2=5
====================​====================​====================​==================
--- trunk/fsvs/doc/TODO (original)
+++ trunk/fsvs/doc/TODO Tue Sep 27 21:49:36 2005
@@ -47,6 +47,13 @@
 - performance: 1 thread for reading in input_tree, some threads for lstat()
   and/or verifying? (some simultaneous requests to harddisk)
 
+- make a directory for the files .waa-dir, etc.
+ 66/66/xxxxx/dir.waa
+ 66/66/xxxxx/ign.waa
+ This lets us support symlinks as directory-names
+
+- fsvs symlink - to ease support and administration in the waa-area (make symlink "root" -> 66/66/dc.../)
+
 
 - saving of md5_is necessary?
   to store that file is changed? -> no, better with mtime

« Previous message in topic | 1 of 1 | Next message in topic »

Messages

Show all messages in topic

svn commit: r5 - trunk/fsvs/doc pmarek P.Marek 2005-09-27 21:49:37 PDT
Messages per page: