Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -quote argument to kakoune #5211

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

pjungkamp
Copy link
Contributor

Many kakoune scripts evaluate kak script printed from %sh blocks.
Some of these script want to interpolate strings into the printed script and need to escape these strings properly.
Common cases that need to be escaped are interpolated kakoune and shell commands.
This is currently done using sed scripts which are duplicated across many files.

Kakoune already provides all the necessary quoting logic in e.g. the echo command with its -quoting switch.

This PR adds a -quote switch which allows scripts to use kak -quote kakoune or kak -quote shell instead of sed and applies these changes to some .kak scripts.

I personally think that the quoting in e.g. menu.kak becomes way more readable this way.

@Screwtapello
Copy link
Contributor

This is currently done using sed scripts which are duplicated across many files.

One reason for using sed is that it's pretty quick to start up. On my machine, the GNU sed binary is 124K, while the Kakoune binary is 57M - I worry that this approach is going to slow things down.

@pjungkamp
Copy link
Contributor Author

pjungkamp commented Aug 5, 2024

At least for me, kak -quote kakoune is actually faster than the kakquote sh function using sed for simple cases. sed can be faster if I use a statically linked version.

$ du -Dh (which kak)
53M ...

$ du -Dh (which sed)
140K ...

$ du -DhL (which sed-static)
276K ...

$ time sh -c 'kakquote() { printf "%s" "$*" | sed "s/'\''/'\'''\''/g; 1s/^/'\''/; \$s/\$/'\''/"; }
kakquote github
kakquote kakoune
kakquote quoting'
'github''kakoune''quoting'
________________________________________________________
Executed in   23.63 millis    fish           external
   usr time   15.54 millis    0.00 micros   15.54 millis
   sys time    9.43 millis  774.00 micros    8.65 millis

$ time sh -c '
kak -quote kakoune -- github
kak -quote kakoune -- kakoune
kak -quote kakoune -- quoting'
'github''kakoune''quoting'
________________________________________________________
Executed in   16.39 millis    fish           external
   usr time    9.36 millis    0.00 micros    9.36 millis
   sys time    6.99 millis  705.00 micros    6.29 millis

$ time sh -c 'kakquote() { printf "%s" "$*" | sed-static "s/'\''/'\'''\''/g; 1s/^/'\''/; \$s/\$/'\''/"; }
  kakquote github
  kakquote kakoune
  kakquote quoting'
'github''kakoune''quoting'
________________________________________________________
Executed in    9.42 millis    fish           external
   usr time    3.54 millis  346.00 micros    3.19 millis
   sys time    7.10 millis  122.00 micros    6.98 millis

I don't think that sed-static is representative of sed performance since most distros seem to ship a dynamically linked sed.

EDIT: Add du -Dh for executable size.
EDIT: Add sed-static test.

@krobelus
Copy link
Contributor

Not sure if this is worth it. When there are multiple layers of escaping, things can indeed get complicated but those cases are infrequent. For the occasional complex task it's best to use a different scripting language instead of shell.
For menu.kak specifically, I've been using a version that renders the menu into a scratch buffer (see my menu branch).
The UI is not without surprises but its usable to me

@krobelus
Copy link
Contributor

BTW it's interesting that your (presumably -O0) kak.debug is faster than the sed solution. A kak.opt binary is much smaller

@arachsys
Copy link
Contributor

arachsys commented Dec 4, 2024

Agree it's fun that the kak version is faster that sed - possibly something to do with initialising and parsing the sed script? mmap()ing the kak binary is going to be pretty fast as it's already mapped for the running editor that called the script, although ld.so still has to resolve everything each time.

One place kakoune-quoting from shell code is invariably needed is when using kak_response_fifo. Locally, I'm happy to assume kakoune's shell is set to a recent bash (because it always is) and write (for example)

  printf "echo -to-file '%s' %%val{selection}\n" \
    "${kak_response_fifo//\'/\'\'}" > "$kak_command_fifo"
  dosomething < "$kak_response_fifo"

but POSIX has failed to standardise a lot of these expansions, leading to expensive fork+exec out for trivial tasks if you can't assume a sufficiently capable shell. (I'd love the Austin Group to be less indolent; alas we are where we are.)

As far as I can see, many kak scripts just get this wrong and do

  printf 'echo -to-file %s %%val{selection}\n' \
    "$kak_response_fifo" > "$kak_command_fifo"

or worse, interpolate $kak_response_fifo directly into the printf-format, and since TMPDIR defaults to /tmp which doesn't contain spaces or special characters, it probably doesn't bite much so they never realise.

I note that ${var#pattern} and so on have (somehow) made it into POSIX, so I wonder if the high-speed portable option is a shell function that builds the quoted version in a while loop - something like this, maybe?

quote() {
  while [ "${1#*\'}" != "$1" ]; do
    printf "%s''" "${1%%\'*}"
    set -- "${1#*\'}"
  done
  printf '%s\n' "$1"
}

This never execs anything and only loops once for each ', so is probably faster than both kak -quote and sed. Using it in

  printf "echo -to-file '%s' %%val{selection}\n" \
    "$(quote "$kak_response_fifo")" > "$kak_command_fifo"
  dosomething < "$kak_response_fifo"

doesn't obfuscate too badly either.

Edit: use

quote() {
  printf "'"
  while [ "${1#*\'}" != "$1" ]; do
    printf "%s''" "${1%%\'*}"
    set -- "${1#*\'}"
  done
  printf "%s'\n" "$1"
}

if you prefer your quoted version to include the outer single-quotes as with the sed-script above. The kak_response_fifo printf would need %s rather than '%s' in that case.

Edit 2: For me, your sed-based kakquote function above takes around 1.4ms (1.4s in a loop of a thousand with GNU sed linked against musl) whereas that quote function takes 67us (67ms in a loop of a thousand, with bash as sh, again linked against musl) - 21x faster. kak -help takes 3.1ms on the same box, linked against musl and llvm libc++, which is probably similar to cost of kak -quote.

@Screwtapello
Copy link
Contributor

I spent some time investigating writing a quoting function in shell in #3340 (comment) - the fastest implementation I came up with used case and string slicing, and was about ten times faster than sed. However, the sed implementation can be written in a single line, which makes it hard to pass up when you're pasting it into a bunch of shell blocks.

@arachsys
Copy link
Contributor

arachsys commented Dec 4, 2024

Ah, interesting. For me, your case version is around 67us for the "Don't Let's Start" test string you benchmark, my quote function above is 59us, the sed script takes 1.26ms, and just running /bin/true is around 0.98ms, bounding below the time a lighter-weight sed could manage.

So on my system, your and my shell functions are 19-21x faster. I'm guess the key difference between my test and yours is that you used dash, which I can believe is a lot slower than bash.

It would be fairly easy to squeeze out more performance, e.g.

quote2() {
  set -- "$1" ""
  while [ "${1#*\'}" != "$1" ]; do
    set -- "${1#*\'}" "$2${1%%\'*}''"
  done
  printf '%s\n' "$2$1"
}

should be a bit quicker (about 10% for me), but I'd imagine these will be diminishing gains vs massive win of not exec()ing out at all unless there's something very wrong with the one of the builtins on whichever shell is under test.

# time for i in {1..10000}; do single_builtin_quoter "Don't Let's Start" >/dev/null; done

real	0m0.670s
user	0m0.619s
sys	0m0.051s

# time for i in {1..10000}; do quote "Don't Let's Start" >/dev/null; done

real	0m0.593s
user	0m0.550s
sys	0m0.043s

# time for i in {1..10000}; do sed "s/'/''/g" <<< "Don't Let's Start" >/dev/null; done

real	0m12.638s
user	0m2.574s
sys	0m10.872s

# time for i in {1..10000}; do /bin/true; done

real	0m9.840s
user	0m2.513s
sys	0m8.089s

(Though to be honest, I'll do none of these things locally. I just define /bin/sh as a decent shell on my systems and use ${x//\'/\'\'}. That's about 3us on both bash and zsh - cheaper than the echo builtin! - and is both space efficient and nice to read.)

@alexherbo2
Copy link
Contributor

alexherbo2 commented Dec 5, 2024

@arachsys In your example if FIFOs were available as %val{...} expansions it could be simpler.

echo 'echo -to-file %val{response_fifo} -- %val{selection}' > "$kak_command_fifo"

Not sure it’s possible though.

EDIT: You can also use the $kak_quoted_response_fifo form.

@arachsys
Copy link
Contributor

arachsys commented Dec 5, 2024

In your example if FIFOs were available as %val{...} expansions it could be simpler.

echo 'echo -to-file %val{response_fifo} -- %val{selection}' > "$kak_command_fifo"

Not sure it’s possible though.

I'm not sure how tricky this would be either, but it would certainly encourage people not to make the mistake of badly-quoting this common idiom. 👍

EDIT: You can also use the $kak_quoted_response_fifo form.

This is the wrong kind of quoting, although I'm sure people have mistakenly used it in kak scripts. It does shell quoting, which replaces ' with '\'' then wraps the result in single quotes, like the bash ${x@Q} parameter expansion. Kakoune expects a simpler style of quoting which just doubles up single quotes before wrapping in them. (If you try :echo 'foo''bar' and compare with :echo 'foo'\''bar', you'll see how the latter breaks.)

One could definitely imagine an alternative $kak_altquoted_ construct which did the ${x//\'/\'\'} transform rather than the ${x@Q} one, but this might not address the PR author's desire to correctly interpolate arbitrary strings generated by the script.

(Perhaps one good thing to do here is to put a clean, well-written shell function to kak-quote strings in the documentation, so people don't write something worse or fail to realise it's necessary. If you're sticking to strict POSIX, remember that POSIX doesn't even manage to specify local, so you have to work with positional parameters to avoid trampling on the caller's own variables. Luckily, that's not too unpleasant here.)

@Screwtapello
Copy link
Contributor

It would be fairly easy to squeeze out more performance, e.g.

Congratulations, @arachsys, you have the fastest Kakoune-quoting implementation in pure POSIX shell. :)

I added this to the benchmark harness I linked to previously, and that implementation won handily over all the others I'd tried. I've now made a thread on the forum so these tricks won't be forgotten in some random PR comment.

I'm guess the key difference between my test and yours is that you used dash, which I can believe is a lot slower than bash.

It turns out, this is not the case! Running in my benchmark harness, your quote2() function runs 25,242 iterations in 5 seconds with bash 5.2, but 89,146 iterations in 5 seconds with dash: more than three times faster. Your ${x//\'/''} implementation runs 41,545 iterations in five seconds, nearly twice as fast as quote2() if you're using bash, but less than half as fast as quote2() in dash.

I'd imagine these will be diminishing gains vs massive win of not exec()ing out at all

Oh my, yes. This stuff would definitely be a rounding error in most Kakoune plugins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants