Revisit comments

Comments are currently a post-parse oddity. There is a lot of infrastructure in Prism to support putting comments on locations, but it's not very well understood. The general flow is:

* Do a regular parse
* Call `attach_comments!` on the ParseResult, which will:
  * For each comment, walk down the AST
  * Find the nearest enclosing, preceding, and following nodes
  * Attach with some logic depending on if it's a trailing comment

There are a couple of issues with this approach:

* The C extension has no knowledge of where comments are. If you are using the C API, you have to handle comments entirely on your own.
* Comment attachment is opinionated — Prism has decided for you which comments belong to which nodes.
* Comments can become clumped together, regardless of whitespace, depending on which nodes surrounded them.

I would like to rewrite all of this to instead attach the comments in C, and fold comments into the AST where appropriate. This actually gets us closer to the old Syntax Tree AST, with a fair amount of refinement baked in now that we know better. The general scheme will be:

* The closest node to a comment gets the comment, where closest is defined as deepest in the tree.
* Every node can optionally have a Comments object attached, and that object has a schema like any other.
* If the deepest node in the tree is list-like (StatementsNode, ArgumentsNode, ParametersNode) then comments get folded into their list fields.

The benefits of this are:

* We do not decide for consumers whether something is "leading" or "trailing". Consumers get to specialize based on their needs.
* We have a comment schema for each node, and it is much more clear for consumers where comments actually exist. I have been burned by this so many times in formatters, as comment placement is one of the most painful parts. Having a dedicated schema will make this so much easier.
* The C extension will see this, and consumers of the C API will be able to gather these comments and handle them. This is necessary for something like Sorbet which would like to be able to handle inline RBS comments.
* We get to strip out a whole bunch of code from Prism that feels like it's in the wrong place: comment_targets, locations having comments (which will now be much smaller), etc.

Here are some examples:

```ruby
alias foo # comment
  # comment
  # comment
  bar # comment
```

In the above example, the `foo` `SymbolNode` will get a `LeafComments` object attached, which only has the `trailing_comment` field. The same happens with the `bar` `SymbolNode`. The other two comments are nearest the `AliasMethodNode`, which will get an `AliasComments` object attached, which has a single field for `gap_comments` which is a list of comments.

```ruby
[
  # comment
  1, # comment

  # comment
  # comment

  # comment
  2, # comment
]
```

In the above example, there are now 6 child nodes in the `ArrayNode` as opposed to the previous 2 child nodes. The other four are `CommentNode` instances. The `1` `IntegerNode` gets a `LeafComments` object attached with the `trailing_comment` field set, and the same for the `2`. This is the fundamental difference from the previous algorithm: those two inner comments are nodes in the array now, as opposed to Prism opinionatedly saying that they belong to the `2` as leading comments.

```ruby
class Foo
  # comment
  # comment

  # comment
  def foo
  end
end
```

A very similar example to the above, the statements inside the `Foo` class get 3 new children. This is as opposed to attaching all 3 as leading comments on the `foo` method. This is probably the most controversial change, as previously you could "simply" call `node.leading_comments` and get the 3 comments. However, I think this is much more correct for us to not make that call for you. It is up to the consumer whether or not that blank line in the middle constitutes a semantic break in the comments, Prism should not be making this call for you. Consider also:

```ruby
class Foo
  # comment
  # comment

  def foo
  end
```

RDoc would consider this to be a valid documentation comment for `foo`, but not all consumers feel that way. I want Prism to no longer be in the business of deciding if this is a leading comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit comments #4044

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Revisit comments #4044

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions