-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing of escaped characters in quoted strings and backslashes in unquoted strings #335
Comments
First of all, there's some interference from triplequote here:
This can be fixed with:
When we try to parse that we get an error:
this is probably a mistake as there is a character (the
Which is what you'd expect, I guess. I think there are issues around escaping for sealed abstract class ScalarStyle(indicator: Char)
object ScalarStyle {
case object Plain extends ScalarStyle(' ')
case object DoubleQuoted extends ScalarStyle('"')
case object SingleQuoted extends ScalarStyle('\'')
case object Folded extends ScalarStyle('>')
case object Literal extends ScalarStyle('|')
def escapeSpecialCharacter(scalar: String, scalarStyle: ScalarStyle): String =
scalarStyle match {
case ScalarStyle.DoubleQuoted => scalar
case ScalarStyle.SingleQuoted => scalar
case ScalarStyle.Literal => scalar
case _ =>
scalar.flatMap { char =>
char match {
case '\\' => "\\\\"
case '\n' => "\\n"
case other => other.toString
}
}
} but those are limited to some escapes for unquoted strings which do make sense to be honest but I'm not sure if they are 100% correct as they were here before I started maintaining the lib. |
Triplequotes prevent backslashes to be used as escapes, they are used as literals instead. This is expected, as triple quotes define raw string literals. |
yeah, but it's not a problem with yaml parser, you get what you see |
I think scala> val yaml = s"""
| regexBoundary: "${"\\b"}"
| backspace: "${"\b"}"
| regexBoundaryUnquoted: some${"\b"}text${"\b"}
| """
val yaml: String = "
regexBoundary: "\b"
backspace: "
regexBoundaryUnquoted: somtext
" notice scala> YamlEncoder.escapeSpecialCharacters(yaml)
val res6: String = "
regexBoundary: "\b"
backspace: "\u0008"
regexBoundaryUnquoted: some\u0008text\u0008
"
scala> val example = yaml.as[Example].right.get
val example: Example = Example(\b,somtext)
scala> YamlEncoder.escapeSpecialCharacters(example.regexBoundaryUnquoted)
val res8: String = some\u0008text notice missing |
I am not sure I understand. Compare this with Circe behaviour in https://scastie.scala-lang.org/OndrejSpanel/cz1HDKc7RoaOgEnrg6YcQA/5. Instead of triple quotes I could use an input from a file. When there is a backslash in the quoted string, it should be processed as an escape by the YAML parser. When it is in an unquoted input, it should be processed as a backslash character. What I see instead is it is processed as a backslash character in a quoted string and as two backslashes in an unquoted string. |
Note: it is not my intention to have backspace characters present in my input. I want \b escaped sequence to be present there, which is exactly what triple quotes allow me to do - you can imagine you are reading the input from a file instead. The code using interpolation places a backspace character into the input, which is not what I am interested about and I have no idea how such thing should be handled by a parser. |
From specs: https://yaml.org/spec/1.2.2/#57-escaped-characters
|
Ahhh, I misunderstood your intent. Ok, I get it now. |
Another example which is related, but perhaps simpler: at the moment I cannot find a way to represent a backslash character in my input. Using \ in unquoted strings results in a double backslash. using double backslash in quoted strings results in a crash or strange behaviour: Check:
Or even worse:
Which results in the strange:
|
In the following code the escaped characters are ignored and parsed as a double backslash and a backslash followed by b, instead of a backslash and a backspace:
See also https://scastie.scala-lang.org/OndrejSpanel/jfavH3u2Sq29Gs5nQHmXKQ/92
The output is:
None of this is correct. There should be no double backslashes and there should be a backspace, not \b in the second line.
Note: even the unquoted string is parsed wrong. The single backslash is converted to a double backslash in the case class value.
The text was updated successfully, but these errors were encountered: