Skip to content

Revisit comments #4044

@kddnewton

Description

@kddnewton

Comments are currently a post-parse oddity. There is a lot of infrastructure in Prism to support putting comments on locations, but it's not very well understood. The general flow is:

  • Do a regular parse
  • Call attach_comments! on the ParseResult, which will:
    • For each comment, walk down the AST
    • Find the nearest enclosing, preceding, and following nodes
    • Attach with some logic depending on if it's a trailing comment

There are a couple of issues with this approach:

  • The C extension has no knowledge of where comments are. If you are using the C API, you have to handle comments entirely on your own.
  • Comment attachment is opinionated — Prism has decided for you which comments belong to which nodes.
  • Comments can become clumped together, regardless of whitespace, depending on which nodes surrounded them.

I would like to rewrite all of this to instead attach the comments in C, and fold comments into the AST where appropriate. This actually gets us closer to the old Syntax Tree AST, with a fair amount of refinement baked in now that we know better. The general scheme will be:

  • The closest node to a comment gets the comment, where closest is defined as deepest in the tree.
  • Every node can optionally have a Comments object attached, and that object has a schema like any other.
  • If the deepest node in the tree is list-like (StatementsNode, ArgumentsNode, ParametersNode) then comments get folded into their list fields.

The benefits of this are:

  • We do not decide for consumers whether something is "leading" or "trailing". Consumers get to specialize based on their needs.
  • We have a comment schema for each node, and it is much more clear for consumers where comments actually exist. I have been burned by this so many times in formatters, as comment placement is one of the most painful parts. Having a dedicated schema will make this so much easier.
  • The C extension will see this, and consumers of the C API will be able to gather these comments and handle them. This is necessary for something like Sorbet which would like to be able to handle inline RBS comments.
  • We get to strip out a whole bunch of code from Prism that feels like it's in the wrong place: comment_targets, locations having comments (which will now be much smaller), etc.

Here are some examples:

alias foo # comment
  # comment
  # comment
  bar # comment

In the above example, the foo SymbolNode will get a LeafComments object attached, which only has the trailing_comment field. The same happens with the bar SymbolNode. The other two comments are nearest the AliasMethodNode, which will get an AliasComments object attached, which has a single field for gap_comments which is a list of comments.

[
  # comment
  1, # comment

  # comment
  # comment

  # comment
  2, # comment
]

In the above example, there are now 6 child nodes in the ArrayNode as opposed to the previous 2 child nodes. The other four are CommentNode instances. The 1 IntegerNode gets a LeafComments object attached with the trailing_comment field set, and the same for the 2. This is the fundamental difference from the previous algorithm: those two inner comments are nodes in the array now, as opposed to Prism opinionatedly saying that they belong to the 2 as leading comments.

class Foo
  # comment
  # comment

  # comment
  def foo
  end
end

A very similar example to the above, the statements inside the Foo class get 3 new children. This is as opposed to attaching all 3 as leading comments on the foo method. This is probably the most controversial change, as previously you could "simply" call node.leading_comments and get the 3 comments. However, I think this is much more correct for us to not make that call for you. It is up to the consumer whether or not that blank line in the middle constitutes a semantic break in the comments, Prism should not be making this call for you. Consider also:

class Foo
  # comment
  # comment

  def foo
  end

RDoc would consider this to be a valid documentation comment for foo, but not all consumers feel that way. I want Prism to no longer be in the business of deciding if this is a leading comment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions