Treat backslash as token, not as whitespace by WebFreak001 · Pull Request #10 · s-ludwig/sdlite

WebFreak001 · 2023-05-20T22:47:01Z

reverts #9

I tried to make fancy code with .substitute and other range methods, but nothing worked well and in the end I think this is better anyway lol.

reverts s-ludwig#9

WebFreak001 · 2023-05-22T19:35:07Z

@s-ludwig @l-kramer ping

s-ludwig · 2023-06-12T08:12:18Z

Hm, I'm still not convinced that this token representation actually solves a real problem, while it definitely is an additional caveat when parsing the token sequence. As it stands, it also currently accepts invalid sequences, as there must not be any white space or comments between the backslash and the following line break. Probably instead of a "backslash" token, this should be a "lineContinuation" token that includes the new line, which would also simplify the filtering logic for the parser.

But then there is another point that I didn't realize until now. Line continuations can also occur inside of double-quote strings:

"foo \
    bar"

In this case, the spec says that all white space following the line break should be ignored, so that this parses as just "foo bar". There is no way the parser can handle this correctly.

IMO, almost everything, also the C preprocessor heritage, here speaks for making/keeping this part of the lexer.

WebFreak001 · 2023-06-12T09:11:46Z

the lexer itself needs to be able to extract this information in order to be able to reconstruct the input string just from the token stream, aka not "normalizing" what the user explicitly wrote, just because it's technically the same. If that were the case, we could just as well omit the whitespace tokens, but then this library would completely lose its use for me and I would just start to maintain my own fork.

s-ludwig · 2023-06-12T09:44:39Z

What do you mean by white space tokens? BTW, I really don't understand what isn't working for your use case right now.

s-ludwig · 2023-06-12T09:54:07Z

Nothing needs to change in the quoted string case, this is already implemented and the lexer reports the raw input string. parseValue then performs the unescaping and drops the white space after line continuations.

WebFreak001 · 2023-06-12T10:03:38Z

ok sorry, you are right, just putting the backslashes in the whitespace, like #9 has done, would be workable.

However I think the API how I proposed it here is a bit more understandable from a maintainer perspective, as well as keeping lexer API usage intact[1], since backslashes were regular tokens before, they were just part of the "invalid" tokens, and now they are called "backslash".

Furthermore backslash tokens at invalid positions (not before line endings) throw an exception in the parser now, instead of being silently ignored, which I think makes more sense.

[1]: I guess not a large concern in general, it just fit very well with the other code in my formatter I think, it's much easier to work with that than with whitespace (which I just completely skip everywhere): https://github.com/Pure-D/sdlfmt/blob/eaea6ff29cc7f88bbb5ac429bb55936505c3264a/source/sdlfmt.d#L196

s-ludwig · 2023-06-16T07:05:15Z

Furthermore backslash tokens at invalid positions (not before line endings) throw an exception in the parser now, instead of being silently ignored, which I think makes more sense.

They should already throw in the parser in the form of unexpected invalid tokens, but looking at the code, error messages definitely need an overhaul - I'll look into that.

fit very well with the other code in my formatter I think, it's much easier to work with that than with whitespace (which I just completely skip everywhere)

So, if I understand this correctly, the goal here is to retain line continuations, but to remove/normalize proper white space? I always assumed that you'd just ignore existing line continuations and insert new ones to keep a particular line width, in which case the #9 solution would be a very natural fit.

s-ludwig · 2023-06-16T08:20:00Z

error messages definitely need an overhaul

Opened #11 to bring error messages to an acceptable state.

Treat backslash as token, not as whitespace

4979a79

reverts s-ludwig#9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treat backslash as token, not as whitespace#10

Treat backslash as token, not as whitespace#10
WebFreak001 wants to merge 1 commit intos-ludwig:masterfrom
WebFreak001:backslash-token

WebFreak001 commented May 20, 2023

Uh oh!

WebFreak001 commented May 22, 2023

Uh oh!

s-ludwig commented Jun 12, 2023

Uh oh!

WebFreak001 commented Jun 12, 2023

Uh oh!

s-ludwig commented Jun 12, 2023

Uh oh!

s-ludwig commented Jun 12, 2023

Uh oh!

WebFreak001 commented Jun 12, 2023

Uh oh!

s-ludwig commented Jun 16, 2023

Uh oh!

s-ludwig commented Jun 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

WebFreak001 commented May 20, 2023

Uh oh!

WebFreak001 commented May 22, 2023

Uh oh!

s-ludwig commented Jun 12, 2023

Uh oh!

WebFreak001 commented Jun 12, 2023

Uh oh!

s-ludwig commented Jun 12, 2023

Uh oh!

s-ludwig commented Jun 12, 2023

Uh oh!

WebFreak001 commented Jun 12, 2023

Uh oh!

s-ludwig commented Jun 16, 2023

Uh oh!

s-ludwig commented Jun 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants