Allow EvalString to retain capacity#2696
Allow EvalString to retain capacity#2696elliotgoodrich wants to merge 1 commit intoninja-build:masterfrom
EvalString to retain capacity#2696Conversation
d691e62 to
3871c6d
Compare
d3cdc29 to
3c979e7
Compare
258dd22 to
782941f
Compare
782941f to
1f0d83f
Compare
27d9fe3 to
0f6cafb
Compare
src/manifest_parser.cc
Outdated
| return false; | ||
| while (!out.empty()) { | ||
| outs_.push_back(std::move(out)); | ||
| outs_.emplace_back(std::move(out).str()); |
There was a problem hiding this comment.
Using emplace_back doesn’t provide more benefit than push_back.
std::move(out).str() is already an EvalString&&, and outs_ stores EvalString.
So we’re inserting a fully constructed rvalue, not building it in place.
There was a problem hiding this comment.
You're right - in this case it's going to be identical. I made this change as I initially made EvalStringBuilder convertible to EvalString and avoided the str() method. However, this resulted in some ambiguous function calls and I had to go back to an explicit method.
However, is there any reason to change it back if it's identical? I generally think preferring emplace_back over push_back is a good way to work since it avoids potentially expensive temporaries.
There was a problem hiding this comment.
Also, I'm not a big fan of std::move(builder).str(), even though it may be the most obvious way of what's happening - IDK, perhaps a builder.release() or something would also work.
I would be happy to take suggestions on a prettier API!
There was a problem hiding this comment.
However, is there any reason to change it back if it's identical?
- changes to the code base should have a good reason
push_backis more clear thanemplace_backhere
I generally think preferring
emplace_backoverpush_backis a good way to work since it avoids potentially expensive temporaries.
Antithesis: if it does yes, if not push_back is more clear(Opinionated).
There was a problem hiding this comment.
I would be happy to take suggestions on a prettier API!
c++17
// minimum c++17
class EvalStringBuilder {
...
operator EvalString() &&;
};
outs_.emplace_back(std::move(out));
// outs_.push_back(std::move(out)); // do not workc++14
I do not know better, but I would prefer the builder.release(). Current solution is fine as well.
There was a problem hiding this comment.
I think emplace_back should be idiomatic at this point. But I agree that it's probably confusing at minimum to make this change as part of this PR since it has no effect. I've reverted the push_back -> emplace_back changes :)
|
|
||
| void Rule::AddBinding(const string& key, const EvalString& val) { | ||
| void Rule::AddBinding(const std::string& key, const EvalString& val) { | ||
| bindings_[key] = val; |
There was a problem hiding this comment.
nit
This requires EvalString::EvalString().
The best option would be std::map::insert_or_assign, but that’s only available in C++17.
In C++14 you can implement the same behavior manually using find and emplace.
EvalString::EvalString() could be made private so that only EvalStringBuild is allowed to construct an EvalString.
There was a problem hiding this comment.
I agree that default ctor gives us the fastest way to do this pre-C++17. Since EvalString is meant to be a value-semantic type, you can make the argument it should have a default constructor (from Elements of Programming), but that view is debated.
There was a problem hiding this comment.
Do you think some additional unittests for this file (and may be the evaluate function) would be appropriate?
There was a problem hiding this comment.
Can never have enough unit tests - I've added evalstring_test.cc to test EvalString and EvalStringBuilder. I haven't tested evaluate though, there is already a decent amount of code testing evaluate based on the unit test failures I had when making this change.
0f6cafb to
ede174f
Compare
|
Lines 1832 to 1834 in c74beb4 use a temporary to limit lifetime, and free memory early: if (!ManifestParser(&ninja.state_, &ninja.disk_interface_, parser_opts)
.Load(options.input_file, &err)) {if you think it is not related to your PR, just ignore... |
| using Token = EvalString::TokenType; | ||
|
|
||
| namespace { | ||
|
|
There was a problem hiding this comment.
More pretty with literals
add here:
std::pair<StringPiece, Token> operator"" _R(const char* str, std::size_t len) {
return std::pair<StringPiece, Token>{ StringPiece(str, len), Token::RAW };
}
std::pair<StringPiece, Token> operator"" _S(const char* str, std::size_t len) {
return std::pair<StringPiece, Token>{ StringPiece(str, len), Token::SPECIAL };
}allows to use
std::vector<std::pair<StringPiece, Token>> data = { "a"_R, "b"_R, "c"_S };
// std::vector<std::pair<StringPiece, Token>> data2 = { { "a", Token::RAW }, { "b", Token::RAW }, { "a", Token::SPECIAL }There was a problem hiding this comment.
I don't think there is a good reason to introduce custom string literals in this code base. While this can make some code prettier, it introduces a mini-language with non-obvious semantics, making everything more difficult to review for multiple maintainers.
While this could be justified if there was a heavy use case to simplify a large code base, but this is clearly not the case here.
| std::unique_ptr<ManifestParser> subparser_; | ||
| std::vector<EvalString> ins_, outs_, validations_; | ||
| // value_ is reused while parsing variables. | ||
| EvalStringBuilder value_; |
There was a problem hiding this comment.
This variable is an replacement for a local scope variable. To prevent a misuse, and make it safe, I propose the following.
// just calls clear() if an local object is "generated"
class EvalStringFactory{
public:
// returns data_ and calls clear()
EvalStringBuilder& make(){
data_.clear();
return data_;
}
private:
EvalStringBuilder data_;
};usage
class X{
EvalStringFactory eval_;
};
void X::function()
{
auto& myEvalStringBuilder = eval_.make();
// myEvalStringBuilder is ready to use;
// do sth.
}Note: as a bonus, a check for debug builds could be inserted. This checks that only one generated variable is alive at the same time. That would require an action in the destructor of the generated object, and therefore an additional wrapper type.
There was a problem hiding this comment.
I think this is a good change and could also be applied to ins_, outs_, and validations_ that are similarly used. I would probably prefer to not make this change in this PR and write a separate generic components that could be used here and in other source files and make the change afterwards in one go.
There was a problem hiding this comment.
There is nothing wrong with this variable (apart the name, see below :-)). Please don't make things more complicated than necessary. In general, if a type has delicate usage semantics that would justify what is proposed here, this is generally the sign of a bad design. If that class was part of a third-party library, that would be a good idea, but this is not the case here.
Change the implementation of `EvalString` from `std::vector<std::string>` (or `std::string` for 1-token strings) to just a `std::string`, where tokens are stored contiguously, prefixed by their size. Introduce an `EvalStringBuilder` to create `EvalString` elements, which allows us to reduce the size of `EvalString` further because appending tokens requires an additional `std::size_t` for bookkeeping. This has the added benefit that `EvalString` is now immutable, which is usually easier to reason about. This implementation has a few performance improvements: * Smaller `sizeof` (roughly 50% of the previous `EvalString`) * Better cache locality of tokens * Retains all capacity on `Clear()` Since all capacity can be retained in `Clear()`, it is possible to reuse `EvalString` (or `EvalStringBuilder`) objects and remove the need for many allocations. In this commit we introduce `value_` to `ManifestParser`, which is reused in parsing variables, pools, rules, and default statements. A future commit will create a class similar to `std::vector<EvalStringBuilder>` that will allow reusing `EvalStringBuilder` elements and this will avoid most allocations when constructing `EvalString`s. The biggest change is that `EvalString` no longer has a `std::string` for the token, but instead returns a `StringPiece`. This has a problem with `Env::LookupVariable` and several other functions that took a `std::string` to lookup values. To avoid creating temporary `std::string` values, we change the comparator of `std::map` to allow looking up using both `StringPiece` and `std::string` and update the function signatures. Now that `EvalString` has a public way to iterate through tokens, the `EvalString::Evaluate` method no longer needs to be on `EvalString`, and has been moved to a static method on `Env`. This has been templated on `Env` because there's no need for polymorphism and it gives the compiler more opportunity to devirtualize calls to `LookupVariable`. The implementation has a shortcut where we can avoid switching on the type when we have a text token since it is either the last one or followed by a special token. The implementation of `EvalString` has been adapted from the one here https://github.com/elliotgoodrich/trimja/blob/e6fb3d188ea018806e28be5b2d4a5d2a82d427a1/src/evalstring.h
ede174f to
19a0146
Compare
In any other function I would agree, but |
| return std::lexicographical_compare(lhs.begin(), | ||
| lhs.end(), | ||
| rhs.begin(), | ||
| rhs.end()); |
There was a problem hiding this comment.
What is the purpose of this change since the result is the same as using StringPiece::operator<, except that the former will always use a fast memcmp() call, while this implementation may be slower, depending on how the C++ library is implemented?
Even if this was a valid change, move this to an independent commit and add some unit test for this.
There was a problem hiding this comment.
I've just double checked and there is no operator< for StringPiece that I can find, which is probably why I added this here.
Clang and MSVC both have fast branches for using memcmp when possible and have the appropriate logic for deciding order when memcmp == 0 and the strings have different lengths.
I'll bring this out into a smaller PR.
| <DisplayString Condition="!isEnd() && !isVariable()">{{ text = {pos_ + sizeof(EvalString::Offset),[length()]s8} }}</DisplayString> | ||
| </Type> | ||
| </AutoVisualizer> | ||
|
|
There was a problem hiding this comment.
It's debugging visualizer for MSVC in the same way you can get gdb pretty printers (https://learn.microsoft.com/en-us/visualstudio/debugger/create-custom-views-of-native-objects?view=visualstudio)
| /// Append (or extend if the last token is already raw text) a raw text | ||
| /// token to the end of the held EvalString. | ||
| /// @pre `!text.empty()` | ||
| void AddText(StringPiece text); |
There was a problem hiding this comment.
nit: It is common in Builder patterns to return an EvalStringBuilder& reference here and other methods, which will allow chaining method calls, which is very handy, especially in unit-tests which can do things like:
EvalStringBuilder builder;
builder.AddText("foo").AddSpecial("bar");
|
|
||
| /// Return a const reference to the held EvalString. | ||
| const EvalString& str() const &; | ||
| operator const EvalString&() const; |
There was a problem hiding this comment.
nit: The purpose of a Builder is to create a frozen / immutable object, so returning references to the builder's state is generally a bad idea and makes the code more fragile. A common name for the method that returns the newly-constructed instance is also "Build()" or "Take()" (the latter clearing the builder at the same time) and makes for clearer code.
|
|
||
| /// Return a copy of the EvalString by move-constructing the held string. | ||
| /// @post After calling this method `this->empty() == true`. | ||
| EvalString str() &&; |
There was a problem hiding this comment.
nit: The name str() is misleading, just use a Build() method that returns a new EvalString instance. With the chaining mentioned above, you can write tests simply with statements like:
EvalString s = EvalStringBuilder().AddText("foo").AddSpecial("bar").Build();
| #include <vector> | ||
|
|
||
| #include "dyndep.h" | ||
| #include "evalstring.h" |
There was a problem hiding this comment.
To follow conventions of the actual code, please call this file eval_string.h instead.
There was a problem hiding this comment.
I am stopping my comments here, this PR is far too large and should be split into several commits for readbility.
There was a problem hiding this comment.
Thank you for your review, I will break this up into:
- Changing methods in
graph.hto takeStringPieceinstead ofstd::string - Add
EvalString - Add
EvalStringto manifest parser
| static_cast<EvalString::Offset>(1) | ||
| << (std::numeric_limits<EvalString::Offset>::digits - 1); | ||
|
|
||
| EvalString::Offset clearLeadingBit(EvalString::Offset in) { |
There was a problem hiding this comment.
nit: respect coding conventions by capitalizing the first letter of functions, e.g. ClearLeadingBit().
In this case, it might be simpler to make EvalString::Offset a struct using sized bitfields, e.g.:
struct Offset {
bool is_special : 1;
unsigned size : 31;
Offset() : is_special(0), size(0) {}
Offset(bool a_special, size_t a_size) : is_special(a_special), size(static_cast<unsigned>(a_size)) {
assert(static_cast<size_t>(size) == a_size);
}
};
The rest of the code will be considerably clearer, and the compiler will handle all the bit fiddling for you.
There was a problem hiding this comment.
Will do, thanks.
| struct StringPiece; | ||
|
|
||
| /// A tokenized string that contains variable references. | ||
| /// Can be evaluated relative to an Env. |
There was a problem hiding this comment.
Please update this documentation to clarify the purpose of this class, in particular:
-
EvalString instances are immutable, and non-empty ones must be created by an EvalStringBuilder.
-
Their main use is to be evaluated into a final string by an Env::Lookup
-
An EvalString models a sequence of (token_string token_kind) pairs, and only supports range-based iteration is supported through begin() and end() methods, which return forward-only iterators.
I also recommend defining an EvalString::Token struct type to hold the type and string for each token, which will make uses clearer compared to std::pair::first and std::pair::second.
There was a problem hiding this comment.
Will make those changes.
| std::unique_ptr<ManifestParser> subparser_; | ||
| std::vector<EvalString> ins_, outs_, validations_; | ||
| // value_ is reused while parsing variables. | ||
| EvalStringBuilder value_; |
There was a problem hiding this comment.
nit: This name is misleading, call this value_builder_ for clarity.
| // what is a common case. If parsed_ is nonempty, then this value | ||
| // must be ignored. | ||
| std::string single_token_; | ||
| template <typename E> |
There was a problem hiding this comment.
nit: This template function should be a template method of EvalString instead. Also keep it simple, performance will be dominated by the lookups in env->LookupVariable(), there is little evidence that the fallthrough improves anything except micro benchmarks. E.g.
std::string result;
bool last_was_string = false;
for (const auto& token : *this) {
if (token.is_special) {
result.append(env->LookupVariable(token.str));
} else {
result.append(token.str.str_, token.str.len_);
}
}
return result;
Also please document that the E type should provide a LookupVariable(StringView) method for clarity.
There was a problem hiding this comment.
I'll resubmit this as a method on EvalString.
For EvalStrings that alternative between text and variables (presumably most of them as the only other alternative is multiple consecutive variables) the fallthrough trick simplifies work for the branch predictor as the switch will almost always be for raw text.
| /// Returns the shell-escaped value of |key|. | ||
| std::string GetBinding(const std::string& key) const; | ||
| bool GetBindingBool(const std::string& key) const; | ||
| std::string GetBinding(const StringPiece& key) const; |
There was a problem hiding this comment.
nit: Pass StringPiece by value, it's simpler and has the same performance in nearly all cases.
|
This is being closed in favor of smaller PRs. The first of which is introducing |
Change the implementation of
EvalStringfromstd::vector<std::string>(orstd::stringfor 1-token strings) to just astd::string, where tokens are stored contiguously, prefixed by their size. Introduce anEvalStringBuilderto createEvalStringelements, which allows us to reduce the size ofEvalStringfurther because appending tokens requires an additionalstd::size_tfor bookkeeping. This has the added benefit thatEvalStringis now immutable, which is usually easier to reason about.This implementation has a few performance improvements:
sizeof(roughly 50% of the previousEvalString)Clear()Since all capacity can be retained in
Clear(), it is possible to reuseEvalString(orEvalStringBuilder) objects and remove the need for many allocations. In this commit we introducevalue_toManifestParser, which is reused in parsing variables, pools, rules, and default statements. A future commit will create a class similar tostd::vector<EvalStringBuilder>that will allow reusingEvalStringBuilderelements and this will avoid most allocations when constructingEvalStrings.The biggest change is that
EvalStringno longer has astd::stringfor the token, but instead returns aStringPiece. This has a problem withEnv::LookupVariableand several other functions that took astd::stringto lookup values. To avoid creating temporarystd::stringvalues, we change the comparator ofstd::mapto allow looking up using bothStringPieceandstd::stringand update the function signatures.Now that
EvalStringhas a public way to iterate through tokens, theEvalString::Evaluatemethod no longer needs to be onEvalString, and has been moved to a static method onEnv. This has been templated onEnvbecause there's no need for polymorphism and it gives the compiler more opportunity to devirtualize calls toLookupVariable. The implementation has a shortcut where we can avoid switching on the type when we have a text token since it is either the last one or followed by a special token.The implementation of
EvalStringhas been adapted from the one here https://github.com/elliotgoodrich/trimja/blob/e6fb3d188ea018806e28be5b2d4a5d2a82d427a1/src/evalstring.h