Skip to content

Treat backslash as token, not as whitespace#10

Open
WebFreak001 wants to merge 1 commit intos-ludwig:masterfrom
WebFreak001:backslash-token
Open

Treat backslash as token, not as whitespace#10
WebFreak001 wants to merge 1 commit intos-ludwig:masterfrom
WebFreak001:backslash-token

Conversation

@WebFreak001
Copy link
Copy Markdown

reverts #9

I tried to make fancy code with .substitute and other range methods, but nothing worked well and in the end I think this is better anyway lol.

@WebFreak001
Copy link
Copy Markdown
Author

@s-ludwig @l-kramer ping

@s-ludwig
Copy link
Copy Markdown
Owner

Hm, I'm still not convinced that this token representation actually solves a real problem, while it definitely is an additional caveat when parsing the token sequence. As it stands, it also currently accepts invalid sequences, as there must not be any white space or comments between the backslash and the following line break. Probably instead of a "backslash" token, this should be a "lineContinuation" token that includes the new line, which would also simplify the filtering logic for the parser.

But then there is another point that I didn't realize until now. Line continuations can also occur inside of double-quote strings:

"foo \
    bar"

In this case, the spec says that all white space following the line break should be ignored, so that this parses as just "foo bar". There is no way the parser can handle this correctly.

IMO, almost everything, also the C preprocessor heritage, here speaks for making/keeping this part of the lexer.

@WebFreak001
Copy link
Copy Markdown
Author

the lexer itself needs to be able to extract this information in order to be able to reconstruct the input string just from the token stream, aka not "normalizing" what the user explicitly wrote, just because it's technically the same. If that were the case, we could just as well omit the whitespace tokens, but then this library would completely lose its use for me and I would just start to maintain my own fork.

@s-ludwig
Copy link
Copy Markdown
Owner

What do you mean by white space tokens? BTW, I really don't understand what isn't working for your use case right now.

@s-ludwig
Copy link
Copy Markdown
Owner

Nothing needs to change in the quoted string case, this is already implemented and the lexer reports the raw input string. parseValue then performs the unescaping and drops the white space after line continuations.

@WebFreak001
Copy link
Copy Markdown
Author

ok sorry, you are right, just putting the backslashes in the whitespace, like #9 has done, would be workable.

However I think the API how I proposed it here is a bit more understandable from a maintainer perspective, as well as keeping lexer API usage intact[1], since backslashes were regular tokens before, they were just part of the "invalid" tokens, and now they are called "backslash".

Furthermore backslash tokens at invalid positions (not before line endings) throw an exception in the parser now, instead of being silently ignored, which I think makes more sense.

[1]: I guess not a large concern in general, it just fit very well with the other code in my formatter I think, it's much easier to work with that than with whitespace (which I just completely skip everywhere): https://github.com/Pure-D/sdlfmt/blob/eaea6ff29cc7f88bbb5ac429bb55936505c3264a/source/sdlfmt.d#L196

@s-ludwig
Copy link
Copy Markdown
Owner

Furthermore backslash tokens at invalid positions (not before line endings) throw an exception in the parser now, instead of being silently ignored, which I think makes more sense.

They should already throw in the parser in the form of unexpected invalid tokens, but looking at the code, error messages definitely need an overhaul - I'll look into that.

fit very well with the other code in my formatter I think, it's much easier to work with that than with whitespace (which I just completely skip everywhere)

So, if I understand this correctly, the goal here is to retain line continuations, but to remove/normalize proper white space? I always assumed that you'd just ignore existing line continuations and insert new ones to keep a particular line width, in which case the #9 solution would be a very natural fit.

@s-ludwig
Copy link
Copy Markdown
Owner

error messages definitely need an overhaul

Opened #11 to bring error messages to an acceptable state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants