-
Notifications
You must be signed in to change notification settings - Fork 174
Description
Comments are currently a post-parse oddity. There is a lot of infrastructure in Prism to support putting comments on locations, but it's not very well understood. The general flow is:
- Do a regular parse
- Call
attach_comments!on the ParseResult, which will:- For each comment, walk down the AST
- Find the nearest enclosing, preceding, and following nodes
- Attach with some logic depending on if it's a trailing comment
There are a couple of issues with this approach:
- The C extension has no knowledge of where comments are. If you are using the C API, you have to handle comments entirely on your own.
- Comment attachment is opinionated — Prism has decided for you which comments belong to which nodes.
- Comments can become clumped together, regardless of whitespace, depending on which nodes surrounded them.
I would like to rewrite all of this to instead attach the comments in C, and fold comments into the AST where appropriate. This actually gets us closer to the old Syntax Tree AST, with a fair amount of refinement baked in now that we know better. The general scheme will be:
- The closest node to a comment gets the comment, where closest is defined as deepest in the tree.
- Every node can optionally have a Comments object attached, and that object has a schema like any other.
- If the deepest node in the tree is list-like (StatementsNode, ArgumentsNode, ParametersNode) then comments get folded into their list fields.
The benefits of this are:
- We do not decide for consumers whether something is "leading" or "trailing". Consumers get to specialize based on their needs.
- We have a comment schema for each node, and it is much more clear for consumers where comments actually exist. I have been burned by this so many times in formatters, as comment placement is one of the most painful parts. Having a dedicated schema will make this so much easier.
- The C extension will see this, and consumers of the C API will be able to gather these comments and handle them. This is necessary for something like Sorbet which would like to be able to handle inline RBS comments.
- We get to strip out a whole bunch of code from Prism that feels like it's in the wrong place: comment_targets, locations having comments (which will now be much smaller), etc.
Here are some examples:
alias foo # comment
# comment
# comment
bar # commentIn the above example, the foo SymbolNode will get a LeafComments object attached, which only has the trailing_comment field. The same happens with the bar SymbolNode. The other two comments are nearest the AliasMethodNode, which will get an AliasComments object attached, which has a single field for gap_comments which is a list of comments.
[
# comment
1, # comment
# comment
# comment
# comment
2, # comment
]In the above example, there are now 6 child nodes in the ArrayNode as opposed to the previous 2 child nodes. The other four are CommentNode instances. The 1 IntegerNode gets a LeafComments object attached with the trailing_comment field set, and the same for the 2. This is the fundamental difference from the previous algorithm: those two inner comments are nodes in the array now, as opposed to Prism opinionatedly saying that they belong to the 2 as leading comments.
class Foo
# comment
# comment
# comment
def foo
end
endA very similar example to the above, the statements inside the Foo class get 3 new children. This is as opposed to attaching all 3 as leading comments on the foo method. This is probably the most controversial change, as previously you could "simply" call node.leading_comments and get the 3 comments. However, I think this is much more correct for us to not make that call for you. It is up to the consumer whether or not that blank line in the middle constitutes a semantic break in the comments, Prism should not be making this call for you. Consider also:
class Foo
# comment
# comment
def foo
endRDoc would consider this to be a valid documentation comment for foo, but not all consumers feel that way. I want Prism to no longer be in the business of deciding if this is a leading comment.