Login | Register
My pages Projects Community openCollabNet

Discussions > dev > Re: ignore globbing patterns are not anchored

fsvs
Discussion topic

Hide all messages in topic

All messages in topic

Re: ignore globbing patterns are not anchored

Author Gunter Ohrner <G dot Ohrner at post dot rwth-aachen dot de>
Full name Gunter Ohrner <G dot Ohrner at post dot rwth-aachen dot de>
Date 2006-10-27 12:41:41 PDT
Message Am Freitag, 27. Oktober 2006 14:13 schrieb Ph. Marek:
> > is that it's still not possible write a single pattern which matches
> > the file "tmp" in the top-level directory and any subdirectory...

> Currently we substitute "**" to ".*".

> That's a special-case -- because /**/ is a special case of possibly
> (near-) 0-byte length ...
> Is that correct?

Sounds plausible and should work.

Another idea would be to introduce another prefix for globbing patterns...

I'm not sure which aapproach would be best, but I think I like
the '**/' -> '(.*/)?' idea better.

Greetings,

  Gunter

--
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
- "If you're going to suggest I try dropping twenty feet down a pitch
dark tower in the hope of hitting a couple of greasy little steps which
might not even still be there, you can forget it," said Rincewind
sharply.
- "There is an alternative, then."
- "Out with it, man."
- "You could drop five hundred feet down a pitch black tower and hit
stones which certainly are there," said Twoflower.
Dead silence from below him. Then Rincewind said, accusingly, "That was
sarcasm." -- (Terry Pratchett, The Light Fantastic)
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
+ PGP-verschlüsselte Mails bevorzugt! +
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
Attachments

Re: ignore globbing patterns are not anchored

Author Gunter Ohrner <G dot Ohrner at post dot rwth-aachen dot de>
Full name Gunter Ohrner <G dot Ohrner at post dot rwth-aachen dot de>
Date 2006-10-27 12:38:18 PDT
Message Am Freitag, 27. Oktober 2006 13:46 schrieb Ph. Marek:
> > dir_ignore
> I believe that dir_ignore is just the name of the patch?

Yes. I do not say that it's a paticularily clever or fitting one... ;)

> > My first try was to simply anchor all patterns except patterns ending
> > in '/', but that caused all directories I wanted to ignore to be
> > included. (However, without their contents.) It would have been
> > neccessary to explicitely exclude the directory as well, so I changed
> > to behaviour to the one explained above.
> >
> > This feature has one drawback: ./**/tmp/ will also ignore all FILES
> > which are exactly called "tmp", not only the dirs. :-/

> That's not so nice.

Right.

> How about changing the path generation to include a "/" at the end for
> directories? Then this would work, too -- and the pattern would not
> have to be "(/|$)".

That was my first "solution", as stated above. It was the simplistic
solution I originally had in mind.

However, in this case './**/tmp/' did not match the empty directory
itself, so to exclude the directory itself a second pattern './**/tmp'
had to be added - and this would also irgnore all files named "tmp", not
only all directories, so it would be no improvement over my patche's
current behaviour, in my eyes. The problem here is that the glob patterns
are purely name-based and do not regard the file's type. (Additionally,
paattern matching behaviour depending on the type of files would also not
be what the user expects and thus would not be exactly "intuitive"...)

To ignore a ddirectories contents, but not the directory itself, you can
write './**/tmp/**' instead of './**/tmp'. Just ignoring directories with
a specific name but bot files with exactly the same name currently is not
posssible, with neighter version of the patch nor with the current
implementation.

> > You can now write stuff like
> > ./**/\[is[_.-]this[​_.-]an_intereres*tin​g\*filename\?[]!]?​
> > and it should work as expected.
> Note that I didn't know what to expect from that for some time ;-)

Hey, that's why fsvs does it for you. ;)

Ok, it's a rather extreme example... ;)

> I think that they are a big step forward. I'll give them a try ASAP.
> Thank you for this work!

No problem, I'm happy if I can help to improve fsvs.

Greetings,

  Gunter

--
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
> Beim Thema Jungfrauen empfehle ich rwth.informatik.* !
Ich glaub er meinte mit Jungfrauen nicht Frauen, die wie Jungs aussehen
;-)
        -- <news:c7j6gq$6rr​$04$1@news.t-onlin​e.com>
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
+ PGP-verschlüsselte Mails bevorzugt! +
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
Attachments

Re: ignore globbing patterns are not anchored

Author pmarek
Full name P.Marek
Date 2006-10-27 05:13:19 PDT
Message Sorry, I forgot ...

> PS: The one major headache still left in regard of globbing patterns is
> that it's still not possible write a single pattern which matches the
> file "tmp" in the top-level directory and any subdirectory...
> './**/tmp' won't match './tmp' while './**tmp' will match much more...
> (Any file ending with "tmp".) However, as fsvs relies on "./" as the
> start of a pattern, I had no good idea of how to fix it...

I believe that's something that has to be changed, too ...
Currently we substitute "**" to ".*".
If we checked for "/**/" and replaced that with ".*" it wouldn't work ...
"a/**/b" would match "ab" too.

So we'd have to special-case "**/" to ".*"? Hmmm, doesn't work, too
"a/**/b" would match "a/XXXb" too.
So we'd need "/**/" => "/(.*/)?" ...
That's a special-case -- because /**/ is a special case of possibly
(near-) 0-byte length ...
Is that correct?


Regards,

Phil

Re: ignore globbing patterns are not anchored

Author pmarek
Full name P.Marek
Date 2006-10-27 05:13:19 PDT
Message Sorry, I forgot ...

> PS: The one major headache still left in regard of globbing patterns is
> that it's still not possible write a single pattern which matches the
> file "tmp" in the top-level directory and any subdirectory...
> './**/tmp' won't match './tmp' while './**tmp' will match much more...
> (Any file ending with "tmp".) However, as fsvs relies on "./" as the
> start of a pattern, I had no good idea of how to fix it...

I believe that's something that has to be changed, too ...
Currently we substitute "**" to ".*".
If we checked for "/**/" and replaced that with ".*" it wouldn't work ...
"a/**/b" would match "ab" too.

So we'd have to special-case "**/" to ".*"? Hmmm, doesn't work, too
"a/**/b" would match "a/XXXb" too.
So we'd need "/**/" => "/(.*/)?" ...
That's a special-case -- because /**/ is a special case of possibly
(near-) 0-byte length ...
Is that correct?


Regards,

Phil

Re: ignore globbing patterns are not anchored

Author pmarek
Full name P.Marek
Date 2006-10-27 04:46:45 PDT
Message > Am Sonntag, 15. Oktober 2006 08:24 schrieb Philipp Marek:
>> Do you have a patch that does all this, maybe :-?
> Again it took significantly longer than I had anticipated (partly caused
> by the fact that I forgot to take my notebook power supply with me last
> weekend and could not work while being on the train...), but here it is.
Having no power tends to be disturbing, yes :-)

> I split my changes into two small patches actually, as I added two
> features which are pretty much unrelated.
>
> Consider both patches to be a request for comments. :-)
Ok, you wanted that ...

> dir_ignore
> **********
> changes the matching behaviour of fsvs glob-like filename patterns. With
> dir_ignore,
I believe that dir_ignore is just the name of the patch?

> a glob-like pattern matches the full directory-/filename
> instead of just a prefix as it currently does.
> An exception are patterns which end with a slash, which will match the
> exact full directory-/filename without the slash as well as everything
> the pattern is a prefix of. This is used to exclude directories and their
> contents.
>
> Examples:
>
> ./**/tmp
> will match all files in any subdirectory which are exactly called "tmp".
> ./**/tmp**
> mimics the above pattern's current semantics: match any file or
> directory whose name starts with "tmp".
> ./**/tmp/
> will match all files in all directories which are called "tmp" and the
> directory itself.
> ./**/tmp/**
> will match all files in all directories which are called "tmp" but NOT
> the directory itself, the empty directory "tmp" won't be ignored but
> will be included in the directory
Ok, that's fine.

> This patch works by anchoring all globbing patterns at the end of the
> line, except if they end with a slash. In this case, the PCRE is closed
> with '($|/)' which causes an exact match of the directory name to be
> ignored and everything below the directory as well.
>
> My first try was to simply anchor all patterns except patterns ending
> in '/', but that caused all directories I wanted to ignore to be
> included. (However, without their contents.) It would have been neccessary
> to explicitely exclude the directory as well, so I changed to behaviour
> to the one explained above.
>
> This feature has one drawback: ./**/tmp/ will also ignore all FILES which
> are exactly called "tmp", not only the dirs. :-/
That's not so nice.
How about changing the path generation to include a "/" at the end for
directories? Then this would work, too -- and the pattern would not have
to be "(/|$)".

> However, I consider the
> overall matching behaviour with this patch to be a huge improvement over
> the current situation.
That's right.


> escape_mode
> ***********
> adds support for escaping characters with a backslash '\' and for bracket
> expressions (character classes). This implementation requires the RE to
> be interpreted as a PCRE, it's not correct if the resulting RE is
> interpreted as a POSIX RE.
>
> You can now write stuff like
>
>
> ./**/\[is[_.-]this[​_.-]an_intereres*tin​g\*filename\?[]!]?​
>
> and it should work as expected.
Note that I didn't know what to expect from that for some time ;-)

> I implemented this as altough any pattern
> can be directly written as an PCRE of course, a globbing pattern is
> simpler to read if you eg. just want to use straight character classes.
> Additionally, much more people know how to use globbing patterns than
> PCREs. While the basics of PCREs are also simple and straight forward
> most people do not seem to know that and appear to be frightened by them.
That's correct.

> I'd love to hear your opinion about and your experiences with these small
> patches! :-)
I think that they are a big step forward. I'll give them a try ASAP.
Thank you for this work!


Regards,

Phil

Re: ignore globbing patterns are not anchored

Author Gunter Ohrner <G dot Ohrner at post dot rwth-aachen dot de>
Full name Gunter Ohrner <G dot Ohrner at post dot rwth-aachen dot de>
Date 2006-10-26 09:56:29 PDT
Message Am Sonntag, 15. Oktober 2006 08:24 schrieb Philipp Marek:
> Do you have a patch that does all this, maybe :-?

Again it took significantly longer than I had anticipated (partly caused
by the fact that I forgot to take my notebook power supply with me last
weekend and could not work while being on the train...), but here it is.

I split my changes into two small patches actually, as I added two
features which are pretty much unrelated.

Consider both patches to be a request for comments. :-)

dir_ignore
**********
changes the matching behaviour of fsvs glob-like filename patterns. With
dir_ignore, a glob-like pattern matches the full directory-/filename
instead of just a prefix as it currently does.
An exception are patterns which end with a slash, which will match the
exact full directory-/filename without the slash as well as everything
the pattern is a prefix of. This is used to exclude directories and their
contents.

Examples:

./**/tmp
  will match all files in any subdirectory which are exactly called "tmp".
./**/tmp**
  mimics the above pattern's current semantics: match any file or
  directory whose name starts with "tmp".
./**/tmp/
  will match all files in all directories which are called "tmp" and the
  directory itself.
./**/tmp/**
  will match all files in all directories which are called "tmp" but NOT
  the directory itself, the empty directory "tmp" won't be ignored but
  will be included in the directory

This patch works by anchoring all globbing patterns at the end of the
line, except if they end with a slash. In this case, the PCRE is closed
with '($|/)' which causes an exact match of the directory name to be
ignored and everything below the directory as well.

My first try was to simply anchor all patterns except patterns ending
in '/', but that caused all directories I wanted to ignore to be
included. (However, without their contents.) It would have been neccessary
to explicitely exclude the directory as well, so I changed to behaviour
to the one explained above.

This feature has one drawback: ./**/tmp/ will also ignore all FILES which
are exactly called "tmp", not only the dirs. :-/ However, I consider the
overall matching behaviour with this patch to be a huge improvement over
the current situation.


escape_mode
***********
adds support for escaping characters with a backslash '\' and for bracket
expressions (character classes). This implementation requires the RE to
be interpreted as a PCRE, it's not correct if the resulting RE is
interpreted as a POSIX RE.

You can now write stuff like


  ./**/\[is[_.-]this[​_.-]an_intereres*tin​g\*filename\?[]!]?​

and it should work as expected. I implemented this as altough any pattern
can be directly written as an PCRE of course, a globbing pattern is
simpler to read if you eg. just want to use straight character classes.
Additionally, much more people know how to use globbing patterns than
PCREs. While the basics of PCREs are also simple and straight forward
most people do not seem to know that and appear to be frightened by them.



I'd love to hear your opinion about and your experiences with these small
patches! :-)

Greetings,

  Gunter

PS: The one major headache still left in regard of globbing patterns is
that it's still not possible write a single pattern which matches the
file "tmp" in the top-level directory and any subdirectory...
'./**/tmp' won't match './tmp' while './**tmp' will match much more...
(Any file ending with "tmp".) However, as fsvs relies on "./" as the
start of a pattern, I had no good idea of how to fix it...

--
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
The person on the other side was a young woman. Very obviously a young
woman. There was no possible way that she could have been mistaken for a
young man in any language, especially Braille. -- The goddess
with the nice earrings (Terry Pratchett, Maskerade)
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
+ PGP-verschlüsselte Mails bevorzugt! +
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
Attachments

Re: ignore globbing patterns are not anchored

Author Gunter Ohrner <G dot Ohrner at post dot rwth-aachen dot de>
Full name Gunter Ohrner <G dot Ohrner at post dot rwth-aachen dot de>
Date 2006-10-15 08:03:09 PDT
Message Am Sonntag, 15. Oktober 2006 08:24 schrieb Philipp Marek:
> I wholeheartly agree with you.
> Do you have a patch that does all this, maybe :-?

I'll give it a shot...

Greetings,

  Gunter

--
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
The trouble is that things *never* get better, they just stay the same,
only more so. -- (Terry Pratchett, Eric)
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
+ PGP-verschlüsselte Mails bevorzugt! +
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+

Re: ignore globbing patterns are not anchored

Author pmarek
Full name P.Marek
Date 2006-10-14 23:24:03 PDT
Message On Friday 13 October 2006 21:15 Gunter Ohrner wrote:
> Am Mittwoch, 11. Oktober 2006 17:19 schrieb Philipp Marek:
> > > As the manual does not mention this at all and there is no privision
> > > to achor a globbing paattern explicitely, I think this is a bug.
> >
> > I know that this makes sense in a way - but using a pattern
> > like "./directory/" ignores currently everything below this directory,
> > which makes sense, too.
> >
> > How about a flag or allowing $ at the end?
>
> Allowing a "$" at the end of the pattern would be ok, but not intuitive -
> usually globbing patterns in UNIX do not know about a "$" but are
> anchored automatically. A flag in contrast would be fine, and/or
> unanchored behaviour if the strings ends with a "/" - that'd be also fine
> in my eyes, as it would not unexectedly match substrings in filenames and
> still allow easily ignoring whole subtrees in a natural way. There's also
> still the possibility to write "./dir/**" although that looks awkward and
> isn't intuitive, either.
>
> I think I'd opt for anchoring iif the last character is no slash.
> Excluding directories but not their contents doesn't make sense anyway.
> And if one really wants to mach a substring of a filename, one can
> write "./**/*bla*" or stuff like this. If more complex paatterns are
> desirec, PCREs are the way to go, anyway, but globbing patterns are more
> common and easier readable for the day-to-day filename matching tasks,
> IMHO.
>
> However, independant of the solution which will be implemented it's
> neccessary to be able to anchor the patterns (if that won't become the
> default behaviour) and to extend the IGNORING document with a sentence or
> two explaining the choosen behaviour. :-)
I wholeheartly agree with you.

Do you have a patch that does all this, maybe :-?

--
Versioning your /etc, /home or even your whole installation?
             Try fsvs (fsvs.tigris.org)!

Re: ignore globbing patterns are not anchored

Author Gunter Ohrner <G dot Ohrner at post dot rwth-aachen dot de>
Full name Gunter Ohrner <G dot Ohrner at post dot rwth-aachen dot de>
Date 2006-10-13 12:15:41 PDT
Message Am Mittwoch, 11. Oktober 2006 17:19 schrieb Philipp Marek:
> > As the manual does not mention this at all and there is no privision
> > to achor a globbing paattern explicitely, I think this is a bug.
> I know that this makes sense in a way - but using a pattern
> like "./directory/" ignores currently everything below this directory,
> which makes sense, too.

> How about a flag or allowing $ at the end?

Allowing a "$" at the end of the pattern would be ok, but not intuitive -
usually globbing patterns in UNIX do not know about a "$" but are
anchored automatically. A flag in contrast would be fine, and/or
unanchored behaviour if the strings ends with a "/" - that'd be also fine
in my eyes, as it would not unexectedly match substrings in filenames and
still allow easily ignoring whole subtrees in a natural way. There's also
still the possibility to write "./dir/**" although that looks awkward and
isn't intuitive, either.

I think I'd opt for anchoring iif the last character is no slash.
Excluding directories but not their contents doesn't make sense anyway.
And if one really wants to mach a substring of a filename, one can
write "./**/*bla*" or stuff like this. If more complex paatterns are
desirec, PCREs are the way to go, anyway, but globbing patterns are more
common and easier readable for the day-to-day filename matching tasks,
IMHO.

However, independant of the solution which will be implemented it's
neccessary to be able to anchor the patterns (if that won't become the
default behaviour) and to extend the IGNORING document with a sentence or
two explaining the choosen behaviour. :-)

Greetings,

  Gunter

--
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
"There must be a hundred silver dollars in here," moaned Boggis, waving
a purse. "I mean, that's not my league. That's not my class. I can't
handle that sort of money. You've got to be in the Guild of Lawyers or
something to steal that much." -- (Terry Pratchett, Wyrd Sisters)
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
+ PGP-verschlüsselte Mails bevorzugt! +
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
Attachments

Re: ignore globbing patterns are not anchored

Author pmarek
Full name P.Marek
Date 2006-10-11 08:19:37 PDT
Message On Tuesday 10 October 2006 23:48 Gunter Ohrner wrote:
> The pseudo-globbing-patterns fsvs uses to ignore files are not anchored to
> the end of the string when they are compiled into PCR expressions.
...
> As the manual does not mention this at all and there is no privision to
> achor a globbing paattern explicitely, I think this is a bug.
I know that this makes sense in a way - but using a pattern
like "./directory/" ignores currently everything below this directory, which
makes sense, too.

How about a flag or allowing $ at the end?


Regards,

Phil


PS: The update-problem is much harder than I expected, I'm still working on
that one.

--
Versioning your /etc, /home or even your whole installation?
             Try fsvs (fsvs.tigris.org)!

ignore globbing patterns are not anchored

Author Gunter Ohrner <G dot Ohrner at post dot rwth-aachen dot de>
Full name Gunter Ohrner <G dot Ohrner at post dot rwth-aachen dot de>
Date 2006-10-10 14:48:18 PDT
Message Hi!

The pseudo-globbing-patterns fsvs uses to ignore files are not anchored to
the end of the string when they are compiled into PCR expressions.

This causes a pattern like "./**.o" not only to ignore aall object files,
as at least I had expected, but any file which contains the
substring ".o" somewhere in its full file or directory name. ie. it will
exclude all OpenDocument files and everything inside the
directory "my.o.my", for example.

As the manual does not mention this at all and there is no privision to
achor a globbing paattern explicitely, I think this is a bug.

I think something like the attached patch could fix that, but
unfortunately I could not even compile test it as for some reason fsvs
does not like to build in my system at the moment. Running Debian SID I
probably messed something up recently... :-/

Greetings,

  Gunter


--- fsvs-1.0.13/src/ignore.c 2006-08-22 19:30:52.000000000 +0200
+++ fsvs-1.0.13_mod/src/ignore.c 2006-10-10 23:45:58.000000000
+0200
@@ -159,9 +159,10 @@
                                        "not enough space in buffer");
                } while (*src);

+ *dest++='$';
                *dest=0;
                /* return unused space */
- buffer=realloc(buffer, dest-buffer+2);
+ buffer=realloc(buffer, dest-buffer+3);
                STOPIF_ENOMEM(!buffer);
                ignore->compare_s​tring=buffer;
                dest=buffer;

--
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
"The thing is that Mr. Dibbler can even sell sausages to people who have
bought them off him *before*." -- (Terry Pratchett, Moving
Pictures)
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
+ PGP-verschlüsselte Mails bevorzugt! +
+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+-+-+-+-​+-+-+-+-+-+-+
Attachments
Messages per page: