Optimize IncludesNormalize#2725
Conversation
|
This has been pulled out from #2706 |
| /// Make \a `path` absolute and return whether it was successful. On failure, | ||
| /// load an error message to \a `err`. | ||
| /// \pre `path` must be in the canonical form, see `CanonicalizePath`. | ||
| static bool MakePathAbsolute(std::string* path, std::string* err); |
There was a problem hiding this comment.
Please change the name to MakeAbsolutePath() to clarify that this modifies |*s| in-place. Return false on error, true on success to simplify error checking.
@digit-google review from previous PR
| static void Relativize(std::string* abs_path, | ||
| const std::vector<StringPiece>& start_list); |
There was a problem hiding this comment.
Similarly, return bool for success/failure here. Or in this specific case, it seems your new implementation never returns an error, so drop the |err| parameter and return void.
@digit-google review from previous PR
| // same drive. Skip comparing them each time and start from the 2nd | ||
| // component. | ||
| assert(EqualsCaseInsensitiveASCII(*path_list.begin(), *start_list.begin())); | ||
| const auto diff = std::mismatch(std::next(path_list.begin()), path_list.end(), |
There was a problem hiding this comment.
I must say that this code is a hard to understand and verify.
In particular because StringPieceRange, when passed an absolute file path as input, will return an empty StringPiece as the first item during iteration. For the loop (before your change) and the std::mistmatch call to work properly requires that
start_listalso starts with an empty StringPiece, which is absolutely not obvious here.It turns out that the only calls to this function pass the value of
split_relative_to_which is itself the result ofSplitStringView(relative_to_, '/')which does indeed create a vector with a first empty StringPiece as its first item. Hence that code is correct, but this is quite fragile as this precondition is not commented anywhere.It seems
split_relative_to_is useful for repeated calls to Normalize() though, so keeping it, instead of calling StringPieceRange(relative_to_, '/') within this function is probably better for performance.What would you think about a helper class to hide these gory details and clarify usage, e.g.:
/// Helper class to convert absolute paths into paths relative to a fixed reference path. /// Useful to perform repeated operations in a loop with minimal allocations. /// /// Usage is: /// RelativePathMaker maker(from_path_abs); /// for (...) { /// std::string path = GetAbsolutePath(...); /// maker.AbsoluteToRelativeInPlace(&path); /// ... // path is now relative to from_path_abs /// } class RelativePathMaker { /// Create instance, input path must be absolute. RelativePathMaker(StringView from_path_abs) : from_path_split_(SpltStringPiece(from_path_abs)) {} /// Convert an absolute path into a path relative to the recorded reference path. /// Modifies |*abs_path| in-place. void AbsoluteToRelativeInPlace(std::string* abs_path) const; /// A variant that takes a StringPiece as input, and generates a new relative path string. std::string AbsoluteToRelative(StringPiece abs_path) const { std::string result = abs_path; AbsoluteToRelativeInPlace(&result); return result; } private: std::vector<StringPiece> from_path_split_; };
@digit-google review from previous PR
There was a problem hiding this comment.
I have added more documentation that Relativize can only be called with absolute, canonicalized paths that share the same drive.
Relativize is only called when we are on the same drive and have absolute paths. The first path segment in Relativize will always be the drive (e.g. "C:") so previously we wasted some time double checking the first segment. I now assert these are the same and start from the second segment. I don't think the first segment would ever have been empty.
I have replaced the SplitStringView with StringPieceRange so it's obvious that we split the paths the same way.
| const std::size_t common_prefix_len = | ||
| (diff.first == path_list.end()) ? abs_path->size() | ||
| : diff.first->str_ - abs_path->data(); |
There was a problem hiding this comment.
Technically, this
diff.firstvalue being the result of an std::mistmatch() call could point topath_list.end()iterator value. In general, dereferencing anend()iterator is undefined behavior in C++, so this code looks suspicious.In that case, and based on the current
StringPieceRangeimplementation,diff.first->str_will point to the end of theStringPiecevalue passed to theStringPieceRangeconstructor.In general, dereferencing a
StringPiece::end()pointer is also undefined behavior, as there is no guarantee that this points to any addressable memory. In this specific case, this works because it points to the buffer of andstd::stringwhich must store a terminating \0 byte, but this is a very subtle implementation detail.So please to check against path_list.end() explicitly before dereferencing it, e.g.:
const size_t bytes_to_delete = (diff.first == path_list.end()) ? 0 : (diff.first->str_ - abs_path->data());Also add a comment explaining that
diff.first->str_points to the first character after the common path prefix, as this is not obvious from the lines above.However, maybe not keep this variable after all (see comment below).
@digit-google review from previous PR
There was a problem hiding this comment.
Because StringPieceRange was private here, I was a bit cheeky and allowed end() to be dereferencable - but I'm no longer doing this and have a conditional.
I don't believe I am dereferencing StringPiece::end() at any point though.
|
|
||
| // The number of ../ to be inserted at the start of the result, corresponding | ||
| // to the number of path segments after the common prefix from start_list. | ||
| const std::size_t dotdot_count = start_list.end() - diff.second; |
There was a problem hiding this comment.
This is hard to follow. I suggest simplifying (e.g. don't use iterators if you don't have to) and using better variable names to make this easier to follow, e.g.:
// The length of the common path prefix, in abs_path characters. const size_t common_prefix_len = (diff.first == path_list.end()) ? abs_path->size() : diff.first->str_ - abs_path->data(); // The number of ../ to be inserted at the start of the result, corresponding // to the number of path segments after the common prefix from start_list. const size_t dotdot_count = start_list.end() - diff.second; const size_t dotdot_len = 3 * dotdot_count; // The following must remove the common prefix characters, then prepend // a sequence of |dotdot_count| "../" segments into the result. The end result // is [<dotdot_len>][<non_common_path>]. // First relocate the non common path to the right position, removing // or inserting bytes if needed. if (common_prefix_len > dotdot_len) { abs_path->erase(0, common_prefix_len - dotdot_len); } else if (common_prefix_len < dotdot_len) { abs_path->insert(0, dotdotlen - common_prefix_len, '\0') } // Now write the ../ sequence in place. for (char* data = abs_path->data(), size_t count = dotdot_count; count > 0; count--) { memcpy(data, "../", 3); data += 3; }
@digit-google review from previous PR
There was a problem hiding this comment.
I have updated my implementation to use most of this - thanks.
Achieve a 2x speedup in `clparser_perftest.exe` by optimizing
`IncludesNormalize`.
Before:
Parse 16384 times in 2917ms avg 178.0us
After:
Parse 32768 times in 2685ms avg 81.9us
In particular:
* Take string as `StringPiece` if possible to avoid `std::string`
copies.
* Swap the `StringPiece` argument in `InternalGetFullPathName` to
`const std::string&` as we need a `null`-terminated string and this
avoids allocating a temporary within the function.
* Use `_MAX_DRIVE` instead of `_MAX_DRIVE` (3 vs 256)
* `IsPathSeparator` has been removed and we check only against `'/'`
because all paths are canonical and do not contain `'\\'`
* `IsFullPathName` is run only on canonical paths, so there is never
going to be a "./" component, and the only "../" components will
appear at the beginning of the string (unit tests added to confirm).
Avoid iterating through the entire string and just look at the
start.
* `AbsPath` is only run against canonical paths so there's no need to
replace backslashes unless we get a new path from `GetFullPathNameA`
- and even in this case we can use `memchr` for performance.
* `Relativize` has been rewritten to avoid almost all allocations and
to lazily parse the path components until a mismatch has been found.
7059488 to
810eadf
Compare
Achieve a 2x speedup in
clparser_perftest.exeby optimizingIncludesNormalize.Before:
After:
In particular:
StringPieceif possible to avoidstd::stringcopies.StringPieceargument inInternalGetFullPathNametoconst std::string&as we need anull-terminated string and this avoids allocating a temporary within the function._MAX_DRIVEinstead of_MAX_DRIVE(3 vs 256)IsPathSeparatorhas been removed and we check only against'/'because all paths are canonical and do not contain'\\'IsFullPathNameis run only on canonical paths, so there is never going to be a "./" component, and the only "../" components will appear at the beginning of the string (unit tests added to confirm). Avoid iterating through the entire string and just look at the start.AbsPathis only run against canonical paths so there's no need to replace backslashes unless we get a new path fromGetFullPathNameAmemchrfor performance.Relativizehas been rewritten to avoid almost all allocations and to lazily parse the path components until a mismatch has been found.