👋 Hey folks! I saw the NUM project Reddit and thought it was a great idea. Diving in I found MODL, which made me even more excited, but noticed it didn't have a ton of libraries yet, so I started hacking on one just to see if I could get something working. It's still very much a work in progress, and just something I'm hacking on in my free time, so no promises on quality.
https://github.com/bign8/modl.go
Anyway, I ran into an issue with my unicode parsing logic. Based on the test added in d066849, it appears MODL is supporting non-4 digit unicode characters which doesn't seem to match with the grammar defined below or the written specification: https://www.modl.uk/specification#hex-values.
|
fragment UNICODE |
|
: 'u' HEX HEX HEX HEX |
|
; |
|
fragment HEX |
|
: [0-9a-fA-F] |
|
; |
But, the Java library looks to support this behavior, which is great, I just didn't notice it really documented anywhere besides the test case and in the java source.
https://github.com/MODLanguage/java-interpreter/blob/d9cc9d76f73687a03114d57fccc253c3c82fad71/src/main/java/uk/modl/utils/UnicodeEscapeReplacer.java#L104-L174
Given the complexity of the UnicodeEscapeReplacer, I'm not really sure the best way to represent those nuances in the grammar effectively. But having a note somewhere that non-4 digit code points are supported would be dope. Anyway, let me know what you think and I can get something in a PR for ya.
Cheers 🍻
👋 Hey folks! I saw the NUM project Reddit and thought it was a great idea. Diving in I found MODL, which made me even more excited, but noticed it didn't have a ton of libraries yet, so I started hacking on one just to see if I could get something working. It's still very much a work in progress, and just something I'm hacking on in my free time, so no promises on quality.
https://github.com/bign8/modl.go
Anyway, I ran into an issue with my unicode parsing logic. Based on the test added in d066849, it appears MODL is supporting non-4 digit unicode characters which doesn't seem to match with the grammar defined below or the written specification: https://www.modl.uk/specification#hex-values.
grammar/antlr4/MODLLexer.g4
Lines 73 to 78 in 3c78809
But, the Java library looks to support this behavior, which is great, I just didn't notice it really documented anywhere besides the test case and in the java source.
https://github.com/MODLanguage/java-interpreter/blob/d9cc9d76f73687a03114d57fccc253c3c82fad71/src/main/java/uk/modl/utils/UnicodeEscapeReplacer.java#L104-L174
Given the complexity of the
UnicodeEscapeReplacer, I'm not really sure the best way to represent those nuances in the grammar effectively. But having a note somewhere that non-4 digit code points are supported would be dope. Anyway, let me know what you think and I can get something in a PR for ya.Cheers 🍻