Login | Register
My pages Projects Community openCollabNet

Discussions > commits > svn commit: r282 - trunk: . fsvs fsvs/doc/develop fsvs/src fsvs/src/test

fsvs
Discussion topic

Back to topic list

svn commit: r282 - trunk: . fsvs fsvs/doc/develop fsvs/src fsvs/src/test

Author pmarek
Full name P.Marek
Date 2006-05-08 00:13:59 PDT
Message Author: pmarek
Date: 2006-05-08 00:13:59-0700
New Revision: 282

Added:
   trunk/fsvs/doc/develop/UTF8
   trunk/fsvs/src/test/​017_locale_iconv (contents, props changed)
Modified:
   trunk/ (props changed)
   trunk/fsvs/CHANGES
   trunk/fsvs/src/commit.c
   trunk/fsvs/src/fsvs.c
   trunk/fsvs/src/global.h
   trunk/fsvs/src/helper.c
   trunk/fsvs/src/helper.h
   trunk/fsvs/src/sync.c
   trunk/fsvs/src/update.c
   trunk/fsvs/src/update.h
   trunk/fsvs/src/warnings.c
   trunk/fsvs/src/warnings.h

Log:
Finally, the long awaited UTF-8 compliance revision.

|=
|= *MANY THANKS* to Gunter Ohrner, who did most of this.
|= Thank you; it would have taken much longer without your help!
|=

diffstat says -90 +512 ... a lot of work.
For future work please see doc/develop/UTF8.





Modified: trunk/fsvs/CHANGES
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/CHANGES​?view=diff&rev=2​82&p1=trunk/fsvs​/CHANGES&p2=trun​k/fsvs/CHANGES&r​1=281&r2=282
====================​====================​====================​==================
--- trunk/fsvs/CHANGES (original)
+++ trunk/fsvs/CHANGES 2006-05-08 00:13:59-0700
@@ -1,4 +1,7 @@
 Changes since 1.0.6
+- Support for non-UTF8-encodings. Filenames, symlink targets, and commit
+ messages with non-ASCII-characters are now correctly en/decoded on
+ commit/update.
 - On commit just before adding a new file to the repository we check if it
   still exists. Previously we'd abort if we found a temporary file on the
     initial scan, which was deleted when we wanted to send it.

Added: trunk/fsvs/doc/develop/UTF8
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/doc/dev​elop/UTF8?view=auto​&rev=282
====================​====================​====================​==================
--- (empty file)
+++ trunk/fsvs/doc/develop/UTF8 2006-05-08 00:13:59-0700
@@ -0,0 +1,31 @@
+
+UTF8 in FSVS
+------------
+
+
+Some points which trouble me a bit, and some random thoughts; everything
+connected with UTF-8:
+
+- Properties we get from the repository might be easiest stored locally
+ as UTF8, if we don't do anything with them (eg. svn:entry).
+
+- In which properties can be non-ASCII-characters? Does someone define
+ user/group names in UTF-8? Can eg. xattr have unicode characters in them?
+ Does that happen in practical usage?
+
+- The currently used properties should be safe. I've never heard from
+ non-ASCII groups or users, and the mtime should always be in the same
+ format.
+
+- I thought whether I should just do *everything* in UTF-8.
+ But that is a performance tradeoff; on a simple "fsvs status" we'd
+ have to all filenames from the waa-directory. It may not be much work,
+ but if it's not necessary ...
+
+- I'd like to have the subversion headers to define a utf8_char *, which
+ would (with gcc) be handled distinct from a normal char * ...
+ (see linux kernel, include/linux/types.h: #define __bitwise ...)
+ But that won't happen, as there's already too much software which relies
+ on the current definitions.
+
+

Modified: trunk/fsvs/src/commit.c
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/src/com​mit.c?view=diff&​rev=282&p1=trunk​/fsvs/src/commit.c​&p2=trunk/fsvs/src​/commit.c&r1=281​&r2=282
====================​====================​====================​==================
--- trunk/fsvs/src/commit.c (original)
+++ trunk/fsvs/src/commit.c 2006-05-08 00:13:59-0700
@@ -16,7 +16,6 @@
  * We have to traverse the complete tree, store what we want to do,
  * and do that in a second run.
  * */
-/* TODO: convert to UTF-8 before sending */
 #include <apr-0/apr_md5.h>
 #include <apr-0/apr_pools.h>
 #include <apr-0/apr_user.h>
@@ -39,7 +38,7 @@
 #include "waa.h"
 #include "est_ops.h"
 #include "racallback.h"
-
+#include "helper.h"
 
 
 typedef svn_error_t *(*change_any_prop_t) (void *baton,
@@ -67,26 +66,32 @@
 }
 
 
+/* subversion-1/svn_ra.h is not clear whether these strings
+ * are really utf8. I think they must even be ASCII, except if someone
+ * uses non-ASCII-usernames ... */
 svn_error_t * ac__commit_callback (
         svn_revnum_t new_revision,
- const char *date,
- const char *author,
+ const char *utf8_date,
+ const char *utf8_author,
         void *baton)
 {
- struct estat *root=baton;
- int status;
+ struct estat *root=baton;
+ int status;
 
 
     printf("committed revision\t%ld on %s as %s\n",
- new_revision, date, author);
+ new_revision, utf8_date, utf8_author);
 
     /* recursively set the new revision */
     STOPIF( ac___ci_setrev(root, new_revision), NULL);
+
 ex:
     RETURN_SVNERR(status);
 }
 
 
+/* The callback called by input_tree and build_tree, to mark changed
+ * entries that should be committed. */
 int ac__commit(struct estat *sts,
         char *path)
 {
@@ -103,6 +108,9 @@
 }
 
 
+/* We hope that group/user names are ASCII;
+ * the names of "our" properties are known, and contain no characters
+ * above \x80. */
 svn_error_t *ac__ci_set_props(void *baton,
         struct estat *sts,
         change_any_prop_t function,
@@ -174,10 +182,16 @@
     {
         case FT_SYMLINK:
             STOPIF( ops__link_to_string(sts, filename, &cp), NULL);
+ STOPIF( hlp__local2utf8(cp, &cp), NULL);
+ /* It is not defined whether svn_stringbuf_create copies the string,
+ * takes the character pointer into the pool, or whatever.
+ * Knowing people wanted. */
             str=svn_stringbuf_create(cp, pool);
             break;
         case FT_BDEV:
         case FT_CDEV:
+ /* See above */
+ /* We only put ASCII in this string */
             str=svn_stringbuf_create(
                     ops__dev_to_filedata(sts), pool);
             break;
@@ -247,6 +261,7 @@
     apr_pool_t *subpool;
     int i, exists_now;
     char *filename;
+ char* utf8_filename;
     svn_error_t *status_svn;
     struct stat64 dummy_stat64;
 
@@ -270,6 +285,7 @@
 
         STOPIF( ops__build_path(&filename, sts), NULL);
         /* as the path needs to be canonical we strip the ./ in front */
+ STOPIF( hlp__local2utf8(filename+2, &utf8_filename), NULL );
 
         STOPIF( ac__status(sts, filename), NULL);
 
@@ -287,7 +303,7 @@
             DEBUGP("deleting %s", sts->name);
             /* that's easy :-) */
             STOPIF_SVNERR( editor->delete_entry,
- (filename+2, SVN_INVALID_REVNUM, dir_baton, subpool) );
+ (utf8_filename, SVN_INVALID_REVNUM, dir_baton, subpool) );
 
             if (!exists_now)
             {
@@ -303,7 +319,7 @@
         }
 
 
- /* Is there something to do - get a baton.
+ /* If there something to do - get a baton.
          * Else we're finished with this one. */
         if (!exists_now && !(sts->entry_status & FS_CHILD_CHANGED))
             continue;
@@ -345,7 +361,7 @@
                     (sts->entry_type == FT_DIR ?
                      editor->add_directory : editor->add_file),
                     sts->entry_type == FT_DIR ? "add_directory" : "add_file",
- (filename+2, dir_baton,
+ (utf8_filename, dir_baton,
                      NULL, SVN_INVALID_REVNUM,
                      subpool, &baton)
                     );
@@ -361,7 +377,7 @@
                     (sts->entry_type == FT_DIR ?
                      editor->open_directory : editor->open_file),
                     sts->entry_type == FT_DIR ? "open_directory" : "open_file",
- (filename+2, dir_baton,
+ (utf8_filename, dir_baton,
                      sts->repos_rev,
                      subpool, &baton)
                     );
@@ -473,6 +489,7 @@
     struct stat64 st;
     int commitmsg_fh,
             commitmsg_is_temp;
+ char *utf8_commit_msg;
 
 
     status=0;
@@ -527,12 +544,14 @@
         close(commitmsg_fh);
     }
 
+ STOPIF( hlp__local2utf8(opt_commitmsg, &utf8_commit_msg),
+ "Conversion of the commit message to utf8 failed");
 
     STOPIF_SVNERR( svn_ra_get_commit_editor,
             (session,
              &editor,
              &edit_baton,
- opt_commitmsg,
+ utf8_commit_msg,
              ac__commit_callback,
              root,
              NULL, // apr_hash_t *lock_tokens,
@@ -567,7 +586,7 @@
 ex2:
     if (status && edit_baton)
     {
- /* If there has already something bad happenend, it probably
+ /* If there has already something bad happened, it probably
          * makes no sense checking the error code. */
         editor->abort_edi​t(edit_baton, pool);
     }

Modified: trunk/fsvs/src/fsvs.c
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/src/fsv​s.c?view=diff&re​v=282&p1=trunk/f​svs/src/fsvs.c&p​2=trunk/fsvs/src/fsv​s.c&r1=281&r​2=282
====================​====================​====================​==================
--- trunk/fsvs/src/fsvs.c (original)
+++ trunk/fsvs/src/fsvs.c 2006-05-08 00:13:59-0700
@@ -13,6 +13,8 @@
 #include <sys/time.h>
 #include <time.h>
 #include <stdarg.h>
+#include <langinfo.h>
+#include <locale.h>
 
 #include <apr_pools.h>
 
@@ -68,6 +70,8 @@
 
 static char *program_name;
 
+char *local_codeset;
+
 apr_pool_t *pool;
 
 
@@ -360,6 +364,39 @@
         }
     }
 
+
+ /* Set the locale from the environment variables, so that we get the
+ * correct codeset told */
+ cmd=setlocale(LC_ALL, "");
+ DEBUGP("LC_ALL gives %s", cmd);
+ /* The second call is in case that the above fails.
+ * Sometimes eg. LC_PAPER is set to an invalid value; then the first
+ * call fails, but the seconds succeeds.
+ * See also the fsvs dev@ mailing list (april 2006), where a post
+ * to dev@subversion is referenced */
+ cmd=setlocale(LC_CTYPE, "");
+ DEBUGP("LC_CTYPE gives %s", cmd);
+
+ local_codeset=nl_lan​ginfo(CODESET);
+ if (!local_codeset)
+ {
+ STOPIF( wa__warn(WRN__CHARSET_INVALID, EINVAL,
+ "Could not retrieve the current character set - assuming UTF-8."),
+ "nl_langinfo(CODESET) failed - check locale configuration.");
+ }
+ else
+ {
+ DEBUGP("codeset found to be %s", local_codeset);
+ if (strcmp(local_codeset, "UTF-8")==0)
+ /* man page says "This pointer MAY point to a static buffer ..."
+ * so no freeing. */
+ local_codeset=NULL;
+ }
+
+ if (!local_codeset)
+ DEBUGP("codeset: using identity");
+
+
     /* first non-argument is action */
     if (args[optind])
     {

Modified: trunk/fsvs/src/global.h
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/src/glo​bal.h?view=diff&​rev=282&p1=trunk​/fsvs/src/global.h​&p2=trunk/fsvs/src​/global.h&r1=281​&r2=282
====================​====================​====================​==================
--- trunk/fsvs/src/global.h (original)
+++ trunk/fsvs/src/global.h 2006-05-08 00:13:59-0700
@@ -286,6 +286,8 @@
     propname_special[],
     propval_special[];
 
+extern char *local_codeset;
+
 extern svn_ra_session_t *session;
 
 extern apr_pool_t *pool;

Modified: trunk/fsvs/src/helper.c
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/src/hel​per.c?view=diff&​rev=282&p1=trunk​/fsvs/src/helper.c​&p2=trunk/fsvs/src​/helper.c&r1=281​&r2=282
====================​====================​====================​==================
--- trunk/fsvs/src/helper.c (original)
+++ trunk/fsvs/src/helper.c 2006-05-08 00:13:59-0700
@@ -11,6 +11,7 @@
 #include <fcntl.h>
 #include <unistd.h>
 
+#include <iconv.h>
 
 #include "global.h"
 #include "helper.h"
@@ -64,3 +65,173 @@
     return i;
 }
 
+
+int hlp___get_conv_handle(const char* from_charset,
+ const char* to_charset,
+ iconv_t* cd)
+{
+ int status;
+
+ status=0;
+ *cd = iconv_open(to_charset, from_charset);
+ STOPIF_CODE_ERR( *cd == (iconv_t)-1, errno,
+ "Conversion from %s to %s is not supported",
+ from_charset, to_charset);
+
+ex:
+ return status;
+}
+
+
+/* This function dynamically allocates some buffer space, and returns
+ * the converted data in it.
+ * A few buffers are used round-robin, so that the caller need not free
+ * anything and the maximum memory usage is limited.
+ * Normally only 1 or 2 buffers are "active", eg. filename for a symlink
+ * and it's destination, or source and destination for apply_textdelta. */
+inline int hlp___do_convert(iconv_t cd, const char* from, char** to)
+{
+ static int cur_cache=0;
+ static struct {
+ int size;
+ char *buffer;
+ } cache[5] = { { 0 } };
+ int status;
+ char* to_buf;
+ const char* from_buf;
+ size_t srclen_rem, dstlen_rem;
+ int iconv_ret, i, done;
+
+
+ status=0;
+ /* Input = NULL ==> Output = NULL */
+ if (!from)
+ {
+ *to = NULL;
+ goto ex;
+ }
+
+ cur_cache++;
+ if (cur_cache >= sizeof(cache)/sizeof(cache[0]))
+ cur_cache=0;
+
+
+ srclen_rem = strlen(from)+1;
+ from_buf=from;
+ to_buf=cache[cur_cache].buffer;
+
+ /* Do the conversion. */
+ while(srclen_rem)
+ {
+ done=to_buf-cache[cu​r_cache].buffer;
+
+ /* Check for buffer space; reallocate, if necessary. */
+ if (done+cache[cur_cache].size < srclen_rem)
+ {
+ /* Due to the reallocate, the buffer address may change.
+ * Remember where we were, and continue from there. */
+
+ i=cache[cur_cache].s​ize+2*srclen_rem+16;​
+
+ /* If we'd need less than 256 bytes, that 256.
+ * There's a good chance that we'll get longer filenames;
+ * we'll avoid a re-allocate, and it isn't that much memory. */
+ cache[cur_cache].buffer = realloc(cache[cur_ca​che].buffer, i);
+ STOPIF_ENOMEM(!cache​[cur_cache].buffer);​
+ cache[cur_cache].size = i;
+
+ to_buf=cache[cur_cac​he].buffer+done;
+ }
+
+ /* How much space is left? */
+ dstlen_rem=cache[cur​_cache].size - done;
+
+ /* iconv should have a const in it! */
+ iconv_ret = iconv(cd,
+ (char**)&from_buf, &srclen_rem,
+ &to_buf, &dstlen_rem);
+
+ /* Only allowed error is E2BIG. */
+ if (iconv_ret == -1)
+ {
+ /* We don't know which pointer has the local codeset, and even if we
+ * did, we don't know if it's safe to print it.
+ * After all, we got a conversion error - there may be invalid
+ * characters in it.
+ * "Hier seyen Drachen" :-] */
+ STOPIF_CODE_ERR( errno != E2BIG, errno,
+ "Conversion of string failed. "
+ "Next bytes are \\x%02X\\x%02X\\x%02X\\x%02X",
+ srclen_rem>=1 ? from_buf[0] : 0,
+ srclen_rem>=2 ? from_buf[1] : 0,
+ srclen_rem>=3 ? from_buf[2] : 0,
+ srclen_rem>=4 ? from_buf[3] : 0
+ );
+
+ /* We got E2big, so get more space. That should happen automatically
+ * in the next round. */
+ }
+ }
+
+ /* reset the conversion. */
+ iconv(cd, NULL, NULL, NULL, NULL);
+
+ *to=cache[cur_cache].buffer;
+ DEBUGP("input: %s, ouput: %s", from, *to);
+
+ex:
+ return status;
+}
+
+
+int hlp__local2utf8(const char *local_string, char** utf8_string)
+{
+ static iconv_t iconv_cd = NULL;
+ int status;
+
+ status=0;
+ if (!local_codeset)
+ {
+ *utf8_string=(char*)​local_string;
+ goto ex;
+ }
+
+ if (!iconv_cd)
+ {
+ STOPIF( hlp___get_conv_handle( local_codeset, "UTF-8", &iconv_cd),
+ NULL);
+ }
+
+
+ STOPIF( hlp___do_convert(iconv_cd, local_string, utf8_string),
+ NULL);
+ex:
+ return status;
+}
+
+
+int hlp__utf82local(const char *utf8_string, char** local_string)
+{
+ static iconv_t iconv_cd = NULL;
+ int status;
+
+ status=0;
+ if (!local_codeset)
+ {
+ *local_string=(char*​)utf8_string;
+ goto ex;
+ }
+
+ /* Get a conversion handle, if not already done. */
+ if (!iconv_cd)
+ {
+ STOPIF( hlp___get_conv_handle( "UTF-8", local_codeset, &iconv_cd),
+ NULL);
+ }
+
+ STOPIF( hlp___do_convert(iconv_cd, utf8_string, local_string),
+ NULL);
+ex:
+ return status;
+}
+

Modified: trunk/fsvs/src/helper.h
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/src/hel​per.h?view=diff&​rev=282&p1=trunk​/fsvs/src/helper.h​&p2=trunk/fsvs/src​/helper.h&r1=281​&r2=282
====================​====================​====================​==================
--- trunk/fsvs/src/helper.h (original)
+++ trunk/fsvs/src/helper.h 2006-05-08 00:13:59-0700
@@ -47,4 +47,8 @@
 int hex2bin(char *hex, char *bin, int maxlen);
 
 
+int hlp__local2utf8(const char* local_string, char** utf8_string);
+int hlp__utf82local(const char* utf8_string, char** local_string);
+
+
 #endif

Modified: trunk/fsvs/src/sync.c
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/src/syn​c.c?view=diff&re​v=282&p1=trunk/f​svs/src/sync.c&p​2=trunk/fsvs/src/syn​c.c&r1=281&r​2=282
====================​====================​====================​==================
--- trunk/fsvs/src/sync.c (original)
+++ trunk/fsvs/src/sync.c 2006-05-08 00:13:59-0700
@@ -14,7 +14,6 @@
  * we fetch the new values from the repository.
  *
  * */
-/* TODO: convert from UTF-8 before writing */
 #include <apr-0/apr_md5.h>
 #include <apr-0/apr_pools.h>
 #include <apr-0/apr_user.h>
@@ -40,6 +39,7 @@
 #include "waa.h"
 #include "update.h"
 #include "racallback.h"
+#include "helper.h"
 
 
 static char *filename;
@@ -89,7 +89,7 @@
 }
 
 
-svn_error_t *ac__sync_delete_entry(const char *path,
+svn_error_t *ac__sync_delete_entry(const char *utf8_path,
         svn_revnum_t revision UNUSED,
         void *parent_baton,
         apr_pool_t *pool)
@@ -106,9 +106,9 @@
 }
 
 
-svn_error_t *ac__sync_add_directory(const char *path,
+svn_error_t *ac__sync_add_directory(const char *utf8_path,
         void *parent_baton,
- const char *copy_path,
+ const char *utf8_copy_path,
         svn_revnum_t copy_rev,
         apr_pool_t *dir_pool UNUSED,
         void **child_baton)
@@ -116,6 +116,12 @@
     struct estat *dir=parent_baton;
     struct estat *sts;
     int status;
+ char* path = NULL;
+ char* copy_path = NULL;
+
+ STOPIF( hlp__utf82local(utf8_path, &path), NULL );
+ STOPIF( hlp__utf82local(utf8_copy_path, &copy_path), NULL );
+
 
     DEBUGP("in %s", __PRETTY_FUNCTION__);
 
@@ -135,7 +141,7 @@
 
 
 
-svn_error_t *ac__sync_open_directory(const char *path,
+svn_error_t *ac__sync_open_directory(const char *utf8_path,
         void *parent_baton,
         svn_revnum_t base_revision UNUSED,
         apr_pool_t *dir_pool UNUSED,
@@ -144,6 +150,9 @@
     struct estat *dir=parent_baton;
     struct estat *sts;
     int status;
+ char* path = NULL;
+
+ STOPIF( hlp__utf82local(utf8_path, &path), NULL );
 
     DEBUGP("in %s", __PRETTY_FUNCTION__);
     BUG("changing in an empty state is impossible");
@@ -161,15 +170,16 @@
 
 
 svn_error_t *ac__sync_change_dir_prop(void *dir_baton,
- const char *name,
+ const char *utf8_name,
         const svn_string_t *value,
         apr_pool_t *pool UNUSED)
 {
     struct estat *sts=dir_baton;
     int status;
 
- status=ac__up_parse_prop(sts, name, value);
+ STOPIF( ac__up_parse_prop(sts, utf8_name, value), NULL);
 
+ex:
     RETURN_SVNERR(status);
 }
 
@@ -189,7 +199,7 @@
 }
 
 
-svn_error_t *ac__sync_absent_dir​ectory(const char *path,
+svn_error_t *ac__sync_absent_dir​ectory(const char *utf8_path,
         void *parent_baton,
         apr_pool_t *pool)
 {
@@ -201,9 +211,9 @@
 }
 
 
-svn_error_t *ac__sync_add_file(const char *path,
+svn_error_t *ac__sync_add_file(const char *utf8_path,
         void *parent_baton,
- const char *copy_path,
+ const char *utf8_copy_path,
         svn_revnum_t copy_rev,
         apr_pool_t *file_pool,
         void **file_baton)
@@ -211,6 +221,11 @@
     struct estat *dir=parent_baton;
     struct estat *sts;
     int status;
+ char* path = NULL;
+ char* copy_path = NULL;
+
+ STOPIF( hlp__utf82local(utf8_path, &path), NULL );
+ STOPIF( hlp__utf82local(utf8_copy_path, &copy_path), NULL );
 
     DEBUGP("in %s", __PRETTY_FUNCTION__);
 
@@ -230,7 +245,7 @@
 }
 
 
-svn_error_t *ac__sync_open_file(const char *path,
+svn_error_t *ac__sync_open_file(const char *utf8_path,
         void *parent_baton,
         svn_revnum_t base_revision,
         apr_pool_t *file_pool,
@@ -239,6 +254,9 @@
     int status;
     struct estat *dir UNUSED=parent_baton;
     struct estat *sts;
+ char* path;
+
+ STOPIF( hlp__utf82local(utf8_path, &path), NULL);
 
     DEBUGP("in %s", __PRETTY_FUNCTION__);
     BUG("changing in an empty state is impossible");
@@ -285,16 +303,16 @@
 
 
 svn_error_t *ac__sync_change_file_prop(void *file_baton,
- const char *name,
+ const char *utf8_name,
         const svn_string_t *value,
- apr_pool_t *pool)
+ apr_pool_t *pool UNUSED)
 {
     struct estat *sts=file_baton;
     int status;
 
- DEBUGP("in %s", __PRETTY_FUNCTION__);
- status=ac__up_parse_prop(sts, name, value);
+ STOPIF( ac__up_parse_prop(sts, utf8_name, value), NULL);
 
+ex:
     RETURN_SVNERR(status);
 }
 
@@ -322,7 +340,7 @@
 }
 
 
-svn_error_t *ac__sync_absent_file(const char *path,
+svn_error_t *ac__sync_absent_file(const char *utf8_path,
         void *parent_baton,
         apr_pool_t *pool)
 {

Added: trunk/fsvs/src/test/​017_locale_iconv
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/src/tes​t/017_locale_iconv?v​iew=auto&rev=282​
====================​====================​====================​==================
--- (empty file)
+++ trunk/fsvs/src/test/​017_locale_iconv 2006-05-08 00:13:59-0700
@@ -0,0 +1,100 @@
+#!/bin/bash
+
+set -e
+
+$PREPARE_CLEAN > /dev/null
+cd $WC
+
+###################​####################​####################​###########
+#
+# Preferable we should not only do such a mini-test for locales,
+# but do the whole test-suite with both language settings.
+#
+# - Define an environment variable for the unicode-characters
+# (eg FSVSTEST_CH)
+# - Skip this script if FSVSTEST_CH is set
+# - Call the testsuite with FSVSTEST_CH and locale set to UTF8
+# - Call the testsuite with FSVSTEST_CH with local encodings
+# - Change the testsuite to use $FSVSTEST_CH in the filenames
+# (and commit messages)
+#
+# Contributions?? Thank you!
+#
+###################​####################​####################​###########
+
+
+function testfunc
+{
+ filename=$1
+
+ touch file-$filename
+ ln -s file-$filename link-$filename
+ ln -s bad-$filename badlink-$filename
+ $BIN ci -m "locale ci $filename"
+ $WC2_UP_ST_COMPARE
+
+ if [[ `svn ls $REPURL/ | grep -F "$filename" | wc -l` -eq 3 ]]
+ then
+ echo "Ok, found all 3 entries."
+ else
+ echo "En/Decode problem - entries not found."
+ exit 1
+ fi
+
+ # TODO: test whether the entries are correct in the other locale.
+
+ rm *
+ $BIN ci -m "locale ci $filename cleanup"
+ $WC2_UP_ST_COMPARE
+}
+
+
+
+# look for UTF8
+utf8_locale=`locale -a | grep .utf8 | head -1`
+if [[ "$utf8_locale" != "" ]]
+then
+ echo "Found UTF8-locale '$utf8_locale', using that for testing."
+else
+ echo "Found no utf8-locale, cannot test"
+fi
+
+
+# look for non-utf8
+loc_locale=`locale -a | egrep -v "(POSIX|C|utf8$)" | head -1`
+if [[ "$loc_locale" != "" ]]
+then
+ echo "Found non-UTF8-locale '$loc_locale', using that for testing, too."
+else
+ echo "Found no non-utf8-locale, cannot test"
+fi
+
+
+# Trivial test with current settings
+# We must use only ASCII as we don't know in which locale
+# this script is parsed.
+testfunc test12
+
+# Clear environment
+unset LC_ALL LC_CTYPE
+
+# Test UTF8
+if [[ "$utf8_locale" != "" ]]
+then
+ export LC_ALL=$utf8_locale
+ # The bytes here must be \xc2\xa9; in utf8 that's 3 horizontal lines.
+ # Use a hex editor.
+ testfunc ©
+fi
+
+# Test non-UTF8
+if [[ "$loc_locale" != "" ]]
+then
+ export LC_ALL=$loc_locale
+ # The bytes here must be \xc2\x61, that is an invalid UTF8-sequence.
+ # Use a hex editor.
+ testfunc Âa
+fi
+
+
+# vi: binary

Modified: trunk/fsvs/src/update.c
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/src/upd​ate.c?view=diff&​rev=282&p1=trunk​/fsvs/src/update.c​&p2=trunk/fsvs/src​/update.c&r1=281​&r2=282
====================​====================​====================​==================
--- trunk/fsvs/src/update.c (original)
+++ trunk/fsvs/src/update.c 2006-05-08 00:13:59-0700
@@ -14,7 +14,6 @@
  * we fetch the new values from the repository.
  *
  * */
-/* TODO: convert from UTF-8 before writing */
 #include <apr-0/apr_md5.h>
 #include <apr-0/apr_pools.h>
 #include <apr-0/apr_user.h>
@@ -33,6 +32,7 @@
 
 
 #include "global.h"
+#include "helper.h"
 #include "status.h"
 #include "checksum.h"
 #include "warnings.h"
@@ -51,38 +51,45 @@
 
 
 int ac__up_parse_prop(struct estat *sts,
- const char *name,
- const svn_string_t *value)
+ const char *utf8_name,
+ const svn_string_t *utf8_value)
 {
- char *cp;
+ char *cp, *loc_name, *loc_value;
     int i,status;
     apr_uid_t uid;
     apr_gid_t gid;
     apr_time_t at;
     svn_error_t *status_svn;
 
+ /* We get the name and value in UTF8.
+ * For the currently used properties it makes no difference;
+ * but see doc/develop/UTF8. */
+ /* We need the localized name only for debug and error messages;
+ * we still compare the utf8-name, and work with the utf8-data. */
+ STOPIF( hlp__utf82local(utf8_name, &loc_name), NULL);
+ STOPIF( hlp__utf82local(utf8​_value->data, &loc_value), NULL);
 
     status=0;
- if (!value)
+ if (!utf8_value)
     {
         DEBUGP("got NULL property for %s: %s",
- sts->name, name);
+ sts->name, loc_name);
         goto ex;
     }
 
     DEBUGP("got property for %s: %s=%s",
- sts->name, name, value->data);
+ sts->name, loc_name, loc_value);
 
- /* if an invalid value is detected, we'd better ignore it.
+ /* if an invalid utf8_value is detected, we'd better ignore it.
      * who knows which pandora's box we'd open ... */
- if (0 == strcmp(name, propname_owner))
+ if (0 == strcmp(utf8_name, propname_owner))
     {
         /* for user and group we try to find the username, and fallback
          * to the uid. */
- i=strtoul(value->data, &cp, 0);
- if (cp == value->data)
+ i=strtoul(utf8_value->data, &cp, 0);
+ if (cp == utf8_value->data)
             STOPIF( wa__warn(WRN__META_U​SER_INVALID, EINVAL,
- "cannot read uid in %s", value->data),
+ "cannot read uid in %s", loc_value),
                     NULL);
         else
         {
@@ -99,15 +106,15 @@
             sts->st.st_uid = i;
             sts->entry_status |= FS_META_OWNER;
             DEBUGP("marking owner %s to %d",
- value->data, sts->st.st_uid);
+ loc_value, sts->st.st_uid);
         }
     }
- else if (0 == strcmp(name, propname_group))
+ else if (0 == strcmp(utf8_name, propname_group))
     {
- i=strtoul(value->data, &cp, 0);
- if (cp == value->data)
+ i=strtoul(utf8_value->data, &cp, 0);
+ if (cp == utf8_value->data)
             STOPIF( wa__warn(WRN__META_U​SER_INVALID, EINVAL,
- "cannot read gid in %s", value->data),
+ "cannot read gid in %s", loc_value),
                     NULL);
         else
         {
@@ -124,15 +131,15 @@
             sts->st.st_gid = i;
             sts->entry_status |= FS_META_GROUP;
             DEBUGP("marking group %s to %d",
- value->data, sts->st.st_gid);
+ loc_value, sts->st.st_gid);
         }
     }
- else if (0 == strcmp(name, propname_mtime))
+ else if (0 == strcmp(utf8_name, propname_mtime))
     {
- status_svn=svn_time_​from_cstring(&at​, value->data, pool);
+ status_svn=svn_time_​from_cstring(&at​, utf8_value->data, pool);
         if (status_svn)
             STOPIF( wa__warn(WRN__META_M​TIME_INVALID, EINVAL,
- "modification time string invalid: %s", value->data),
+ "modification time string invalid: %s", loc_value),
                     NULL);
         else
         {
@@ -140,29 +147,30 @@
             sts->st.st_mtim.t​v_nsec=apr_time_usec​(at) * 1000;
             sts->entry_status |= FS_META_MTIME;
             DEBUGP("marking mtime \"%s\" to %24.24s",
- value->data,
+ loc_value,
                     ctime(& (sts->st.st_mtim.tv_sec) ));
         }
     }
- else if (0 == strcmp(name, propname_umode))
+ else if (0 == strcmp(utf8_name, propname_umode))
     {
- i=strtoul(value->data, &cp, 0);
+ i=strtoul(utf8_value->data, &cp, 0);
         if (*cp || i>07777)
             STOPIF( wa__warn(WRN__META_U​MASK_INVALID, EINVAL,
- "no valid permissions found in %s", value->data),
+ "no valid permissions found in %s", loc_value),
                     NULL);
         else
         {
             sts->st.st_mode = (sts->st.st_mode & ~07777) | i;
             sts->entry_status |= FS_META_UMODE;
             DEBUGP("marking mode \"%s\" to 0%o",
- value->data, sts->st.st_mode & 07777);
+ loc_value, sts->st.st_mode & 07777);
         }
     }
- else if (0 == strcmp(name, propname_special) &&
- 0 == strcmp(value->data, propval_special))
+ else if (0 == strcmp(utf8_name, propname_special) &&
+ 0 == strcmp(utf8_value->data, propval_special))
     {
         sts->entry_type = FT_ANYSPECIAL;
+ DEBUGP("this is a special node");
     }
     else
     {
@@ -170,10 +178,10 @@
          * for an update we store them?? */
         /* ignore svn:entry:* properties */
         /* check for is_import_export */
- if (strncmp(name, "svn:entry", 9) != 0)
+ if (strncmp(utf8_name, "svn:entry", 9) != 0)
             DEBUGP("unhandled property in %s for %s: %s=%s", __PRETTY_FUNCTION__,
                     sts->name,
- name, value->data);
+ loc_name, loc_value);
         goto ex;
     }
 
@@ -187,11 +195,11 @@
      * values, so that an emergency can be 99% handled */
     /* print path in case of errors? */
     status=ops__build_path(&cp, sts);
- STOPIF(EINVAL, "incorrect value for property %s ignored: "
+ STOPIF(EINVAL, "incorrect utf8_value for property %s ignored: "
             "file %s: \"%s\"",
             name,
             status == 0 ? cp : sts->name,
- value->data);
+ utf8_value->data);
 #endif
 }
 
@@ -401,6 +409,7 @@
 
     sts->stringbuf_tgt->data[ sts->stringbuf_tgt->len ]=0;
     STOPIF( ops__string_to_dev(sts, sts->stringbuf_tgt->data, &cp), NULL);
+ STOPIF( hlp__utf82local(cp, &cp), NULL);
 
     sts->stringbuf_tgt=NULL;
     sts->stringbuf_src=NULL;
@@ -463,7 +472,7 @@
 }
 
 
-svn_error_t *ac__up_delete_entry(const char *path,
+svn_error_t *ac__up_delete_entry(const char *utf8_path,
         svn_revnum_t revision UNUSED,
         void *parent_baton,
         apr_pool_t *pool)
@@ -471,7 +480,9 @@
     int status, change;
     struct estat *dir=parent_baton;
     struct estat *sts;
+ char* path;
 
+ STOPIF( hlp__utf82local(utf8_path, &path), NULL );
 
     DEBUGP("deleting entry %s", path);
     STOPIF( ops__find_entry_byname(dir, path, &sts, 0), NULL);
@@ -507,16 +518,21 @@
 }
 
 
-svn_error_t *ac__up_add_directory(const char *path,
+svn_error_t *ac__up_add_directory(const char *utf8_path,
         void *parent_baton,
- const char *copy_path,
+ const char *utf8_copy_path,
         svn_revnum_t copy_rev,
- apr_pool_t *dir_pool UNUSED,
+ apr_pool_t *dir_pool,
         void **child_baton)
 {
     struct estat *dir=parent_baton;
     struct estat *sts;
     int status;
+ char* path;
+ char* copy_path;
+
+ STOPIF( hlp__utf82local(utf8_path, &path), NULL );
+ STOPIF( hlp__utf82local(utf8_copy_path, &copy_path), NULL );
 
     STOPIF( ac__up_add_entry(dir, path, copy_path, copy_rev, &sts), NULL );
 
@@ -540,16 +556,18 @@
 
 
 
-svn_error_t *ac__up_open_directory(const char *path,
+svn_error_t *ac__up_open_directory(const char *utf8_path,
         void *parent_baton,
         svn_revnum_t base_revision UNUSED,
- apr_pool_t *dir_pool UNUSED,
+ apr_pool_t *dir_pool,
         void **child_baton)
 {
     struct estat *dir=parent_baton;
     struct estat *sts;
     int status;
+ char* path;
 
+ STOPIF( hlp__utf82local(utf8_path, &path), NULL );
 
     status=0;
     STOPIF( ops__find_entry_byname(dir, path, &sts, 0),
@@ -565,15 +583,14 @@
 
 
 svn_error_t *ac__up_change_dir_prop(void *dir_baton,
- const char *name,
+ const char *utf8_name,
         const svn_string_t *value,
- apr_pool_t *pool UNUSED)
+ apr_pool_t *pool)
 {
     struct estat *sts=dir_baton;
     int status;
 
- STOPIF( ac__up_parse_prop(sts, name, value),
- "parsing property %s=%s failed", name, value->data);
+ STOPIF( ac__up_parse_prop(sts, utf8_name, value), NULL);
 
 ex:
     RETURN_SVNERR(status);
@@ -599,7 +616,7 @@
 }
 
 
-svn_error_t *ac__up_absent_directory(const char *path,
+svn_error_t *ac__up_absent_directory(const char *utf8_path,
         void *parent_baton,
         apr_pool_t *pool)
 {
@@ -611,9 +628,9 @@
 }
 
 
-svn_error_t *ac__up_add_file(const char *path,
+svn_error_t *ac__up_add_file(const char *utf8_path,
         void *parent_baton,
- const char *copy_path,
+ const char *utf8_copy_path,
         svn_revnum_t copy_rev,
         apr_pool_t *file_pool,
         void **file_baton)
@@ -621,6 +638,11 @@
     struct estat *dir=parent_baton;
     struct estat *sts;
     int status;
+ char* path;
+ char* copy_path;
+
+ STOPIF( hlp__utf82local(utf8_path, &path), NULL );
+ STOPIF( hlp__utf82local(utf8_copy_path, &copy_path), NULL );
 
     STOPIF( ac__up_add_entry(dir, path, copy_path, copy_rev, &sts),
             NULL);
@@ -633,7 +655,7 @@
 }
 
 
-svn_error_t *ac__up_open_file(const char *path,
+svn_error_t *ac__up_open_file(const char *utf8_path,
         void *parent_baton,
         svn_revnum_t base_revision,
         apr_pool_t *file_pool,
@@ -642,7 +664,9 @@
     int status;
     struct estat *dir UNUSED=parent_baton;
     struct estat *sts;
+ char* path;
 
+ STOPIF( hlp__utf82local(utf8_path, &path), NULL );
 
     STOPIF( ops__find_entry_byname(dir, path, &sts, 0), NULL);
 
@@ -665,6 +689,7 @@
     svn_stream_t *svn_s_src, *svn_s_tgt;
     int status;
     char *cp;
+ char* fn_utf8;
 
 
     STOPIF( ops__build_path(&filename, sts), NULL);
@@ -699,8 +724,11 @@
     {
         /* special entries are taken into a svn_stringbuf_t */
         if (S_ISLNK(sts->st.st_mode))
+ {
             STOPIF( ops__link_to_string(sts, filename, &cp),
                     NULL);
+ STOPIF( hlp__local2utf8(cp, &cp), NULL);
+ }
         else
             cp=ops__dev_to_filedata(sts);
 
@@ -738,8 +766,9 @@
                     NULL);
     }
 
+ STOPIF( hlp__local2utf8(filename, &fn_utf8), NULL );
     svn_txdelta_apply(svn_s_src, svn_s_tgt,
- sts->md5_is, filename, pool,
+ sts->md5_is, fn_utf8, pool,
             handler, handler_baton);
 
 ex:
@@ -748,14 +777,14 @@
 
 
 svn_error_t *ac__up_change_file_prop(void *file_baton,
- const char *name,
+ const char *utf8_name,
         const svn_string_t *value,
         apr_pool_t *pool)
 {
     struct estat *sts=file_baton;
     int status;
 
- STOPIF( ac__up_parse_prop(sts, name, value), NULL);
+ STOPIF( ac__up_parse_prop(sts, utf8_name, value), NULL);
 
 ex:
     RETURN_SVNERR(status);
@@ -824,7 +853,7 @@
 }
 
 
-svn_error_t *ac__up_absent_file(const char *path,
+svn_error_t *ac__up_absent_file(const char *utf8_path,
         void *parent_baton,
         apr_pool_t *pool)
 {

Modified: trunk/fsvs/src/update.h
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/src/upd​ate.h?view=diff&​rev=282&p1=trunk​/fsvs/src/update.h​&p2=trunk/fsvs/src​/update.h&r1=281​&r2=282
====================​====================​====================​==================
--- trunk/fsvs/src/update.h (original)
+++ trunk/fsvs/src/update.h 2006-05-08 00:13:59-0700
@@ -18,8 +18,8 @@
 inline const char *ac___up_get_filename(const char *path);
 
 int ac__up_parse_prop(struct estat *sts,
- const char *name,
- const svn_string_t *value);
+ const char *utf8_name,
+ const svn_string_t *utf8_value);
 
 int ac__up_set_meta_data(struct estat *sts,
         const char *filename);

Modified: trunk/fsvs/src/warnings.c
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/src/war​nings.c?view=diff​&rev=282&p1=tru​nk/fsvs/src/warnings​.c&p2=trunk/fsvs​/src/warnings.c&​r1=281&r2=282
====================​====================​====================​==================
--- trunk/fsvs/src/warnings.c (original)
+++ trunk/fsvs/src/warnings.c 2006-05-08 00:13:59-0700
@@ -32,6 +32,8 @@
   [WRN__META_UMASK_INVALID] = { "meta-umask" },
 
     [WRN__ENTRY_NOT_FOUND] = { "entry-not-found" },
+
+ [WRN__CHARSET_INVALID] = { "charset-invalid" },
 };
 
 

Modified: trunk/fsvs/src/warnings.h
Url: http://fsvs.tigris.o​rg/source/browse/fsv​s/trunk/fsvs/src/war​nings.h?view=diff​&rev=282&p1=tru​nk/fsvs/src/warnings​.h&p2=trunk/fsvs​/src/warnings.h&​r1=281&r2=282
====================​====================​====================​==================
--- trunk/fsvs/src/warnings.h (original)
+++ trunk/fsvs/src/warnings.h 2006-05-08 00:13:59-0700
@@ -35,6 +35,8 @@
     WRN__META_UMASK_INVALID,
 
     WRN__ENTRY_NOT_FOUND,
+
+ WRN__CHARSET_INVALID,
     // Keep this at end!
     _WRN__LAST_INDEX
 } warning_e;

« Previous message in topic | 1 of 1 | Next message in topic »

Messages

Show all messages in topic

svn commit: r282 - trunk: . fsvs fsvs/doc/develop fsvs/src fsvs/src/test pmarek P.Marek 2006-05-08 00:13:59 PDT
Messages per page: