Skip to content

skip walking irrelevant directories when matching globs#31

Open
robinfriedli wants to merge 2 commits intoGilnaa:masterfrom
ds1-fs21-gruppe19:master
Open

skip walking irrelevant directories when matching globs#31
robinfriedli wants to merge 2 commits intoGilnaa:masterfrom
ds1-fs21-gruppe19:master

Conversation

@robinfriedli
Copy link
Copy Markdown

  • for example, when given the glob "src/resources/templates/*.html", it
    is unnecessary to walk through any directories in the working
    directory other than src/
    • to skip irrelevant directories the iterator now checks whether the
      match result is Match::None when matching a directory and decides
      to skip the directory if it doesn't match any parents of any globs
    • a second Override stores all globs for all parent paths for all
      globs provided by the user
    • refs issue globwalk looks at all files even when it doesn't need to #29

 - for example, when given the glob "src/resources/templates/*.html", it
   is unnecessary to walk through any directories in the working
   directory other than src/
   - to skip irrelevant directories the iterator now checks whether the
     match result is Match::None when matching a directory and decides
     to skip the directory if it doesn't match any parents of any globs
   - a second Override stores all globs for all parent paths for all
     globs provided by the user
@Gilnaa
Copy link
Copy Markdown
Owner

Gilnaa commented May 13, 2021

Overall looks good.
Note that thos does break patterns that have a slash inside an or-group:

a/{b/c,b/g}/*.gif

Not 100% sure what's the right solution, probably to normalize it into several paths (i.e. one path for each group member).

Not sure how multiple or nested groups act in ignore.

Another option is that if you detect a non trivial case, replace it with ** (it's fine since it's an optimization, we can also make it better later)

@Gilnaa
Copy link
Copy Markdown
Owner

Gilnaa commented May 13, 2021

FWIW that's the test case:

#[test]
    fn test_florp() {
        let dir = TempDir::new("globset_walkdir").expect("Failed to create temporary folder");
        let dir_path = dir.path();
        create_dir_all(dir_path.join("a/b/c")).expect("");
        create_dir_all(dir_path.join("a/b/g")).expect("");
        create_dir_all(dir_path.join("a/e/f")).expect("");

        touch(
            &dir,
            &[
                "a[/]b[/]c[/]lib.rs",
                "a[/]b[/]g[/]lib.rs",
                "a[/]e[/]f[/]lib.rs",
            ][..],
        );

        let expected: Vec<_> = [
            "a[/]b[/]c[/]lib.rs",
            "a[/]b[/]g[/]lib.rs",
            "a[/]e[/]f[/]lib.rs",
        ]
        .iter()
        .map(normalize_path_sep)
        .collect();

        let patterns = [
            "a/{b/c,b/g}/*rs",
            "a/e/**/*rs",
        ];
        let glob = GlobWalkerBuilder::from_patterns(dir_path, &patterns)
            .build()
            .unwrap();

        equate_to_expected(glob, expected, dir_path);
    }

@robinfriedli
Copy link
Copy Markdown
Author

Yeah any iterator provided by std::fs is probably not gonna to cut it, at least not without some hacky business. We might want to parse the glob ourselves and ignore path separators that appear in any kind of glob expression ("[]" as well maybe though I'm not sure where path separators are even allowed to appear, I don't presume escaping path separators is valid for instance and slashes probably don't get matched in [] anyway). Might be sufficient to just leave components containing glob expressions as is and ignoring path separators until the expression ends. Windows file paths always give me a headache though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants