1// vim: textwidth=99 2/* 3Meta note: This file is loaded as a .rs file by rustdoc only. 4*/ 5/*! 6 7A more detailed version of the [warning at the top level](super#warning) about the `quote`/`join` 8family of APIs. 9 10In general, passing the output of these APIs to a shell should recover the original string(s). 11This page lists cases where it fails to do so. 12 13In noninteractive contexts, there are only minor issues. 'Noninteractive' includes shell scripts 14and `sh -c` arguments, or even scripts `source`d from interactive shells. The issues are: 15 16- [Nul bytes](#nul-bytes) 17 18- [Overlong commands](#overlong-commands) 19 20If you are writing directly to the stdin of an interactive (`-i`) shell (i.e., if you are 21pretending to be a terminal), or if you are writing to a cooked-mode pty (even if the other end is 22noninteractive), then there is a **severe** security issue: 23 24- [Control characters](#control-characters-interactive-contexts-only) 25 26Finally, there are some [solved issues](#solved-issues). 27 28# List of issues 29 30## Nul bytes 31 32For non-interactive shells, the most problematic input is nul bytes (bytes with value 0). The 33non-deprecated functions all default to returning [`QuoteError::Nul`] when encountering them, but 34the deprecated [`quote`] and [`join`] functions leave them as-is. 35 36In Unix, nul bytes can't appear in command arguments, environment variables, or filenames. It's 37not a question of proper quoting; they just can't be used at all. This is a consequence of Unix's 38system calls all being designed around nul-terminated C strings. 39 40Shells inherit that limitation. Most of them do not accept nul bytes in strings even internally. 41Even when they do, it's pretty much useless or even dangerous, since you can't pass them to 42external commands. 43 44In some cases, you might fail to pass the nul byte to the shell in the first place. For example, 45the following code uses [`join`] to tunnel a command over an SSH connection: 46 47```rust 48std::process::Command::new("ssh") 49 .arg("myhost") 50 .arg("--") 51 .arg(join(my_cmd_args)) 52``` 53 54If any argument in `my_cmd_args` contains a nul byte, then `join(my_cmd_args)` will contain a nul 55byte. But `join(my_cmd_args)` is itself being passed as an argument to a command (the ssh 56command), and command arguments can't contain nul bytes! So this will simply result in the 57`Command` failing to launch. 58 59Still, there are other ways to smuggle nul bytes into a shell. How the shell reacts depends on the 60shell and the method of smuggling. For example, here is Bash 5.2.21 exhibiting three different 61behaviors: 62 63- With ANSI-C quoting, the string is truncated at the first nul byte: 64 ```bash 65 $ echo $'foo\0bar' | hexdump -C 66 00000000 66 6f 6f 0a |foo.| 67 ``` 68 69- With command substitution, nul bytes are removed with a warning: 70 ```bash 71 $ echo $(printf 'foo\0bar') | hexdump -C 72 bash: warning: command substitution: ignored null byte in input 73 00000000 66 6f 6f 62 61 72 0a |foobar.| 74 ``` 75 76- When a nul byte appears directly in a shell script, it's removed with no warning: 77 ```bash 78 $ printf 'echo "foo\0bar"' | bash | hexdump -C 79 00000000 66 6f 6f 62 61 72 0a |foobar.| 80 ``` 81 82Zsh, in contrast, actually allows nul bytes internally, in shell variables and even arguments to 83builtin commands. But if a variable is exported to the environment, or if an argument is used for 84an external command, then the child process will see it silently truncated at the first nul. This 85might actually be more dangerous, depending on the use case. 86 87## Overlong commands 88 89If you pass a long string into a shell, several things might happen: 90 91- It might succeed, yet the shell might have trouble actually doing anything with it. For example: 92 93 ```bash 94 x=$(printf '%010000000d' 0); /bin/echo $x 95 bash: /bin/echo: Argument list too long 96 ``` 97 98- If you're using certain shells (e.g. Busybox Ash) *and* using a pty for communication, then the 99 shell will impose a line length limit, ignoring all input past the limit. 100 101- If you're using a pty in cooked mode, then by default, if you write so many bytes as input that 102 it fills the kernel's internal buffer, the kernel will simply drop those bytes, instead of 103 blocking waiting for the shell to empty out the buffer. In other words, random bits of input can 104 be lost, which is obviously insecure. 105 106Future versions of this crate may add an option to [`Quoter`] to check the length for you. 107 108## Control characters (*interactive contexts only*) 109 110Control characters are the bytes from `\x00` to `\x1f`, plus `\x7f`. `\x00` (the nul byte) is 111discussed [above](#nul-bytes), but what about the rest? Well, many of them correspond to terminal 112keyboard shortcuts. For example, when you press Ctrl-A at a shell prompt, your terminal sends the 113byte `\x01`. The shell sees that byte and (if not configured differently) takes the standard 114action for Ctrl-A, which is to move the cursor to the beginning of the line. 115 116This means that it's quite dangerous to pipe bytes to an interactive shell. For example, here is a 117program that tries to tell Bash to echo an arbitrary string, 'safely': 118```rust 119use std::process::{Command, Stdio}; 120use std::io::Write; 121 122let evil_string = "\x01do_something_evil; "; 123let quoted = shlex::try_quote(evil_string).unwrap(); 124println!("quoted string is {:?}", quoted); 125 126let mut bash = Command::new("bash") 127 .arg("-i") // force interactive mode 128 .stdin(Stdio::piped()) 129 .spawn() 130 .unwrap(); 131let stdin = bash.stdin.as_mut().unwrap(); 132write!(stdin, "echo {}\n", quoted).unwrap(); 133``` 134 135Here's the output of the program (with irrelevant bits removed): 136 137```text 138quoted string is "'\u{1}do_something_evil; '" 139/tmp comex$ do_something_evil; 'echo ' 140bash: do_something_evil: command not found 141bash: echo : command not found 142``` 143 144Even though we quoted it, Bash still ran an arbitrary command! 145 146This is not because the quoting was insufficient, per se. In single quotes, all input is supposed 147to be treated as raw data until the closing single quote. And in fact, this would work fine 148without the `"-i"` argument. 149 150But line input is a separate stage from shell syntax parsing. After all, if you type a single 151quote on the keyboard, you wouldn't expect it to disable all your keyboard shortcuts. So a control 152character always has its designated effect, no matter if it's quoted or backslash-escaped. 153 154Also, some control characters are interpreted by the kernel tty layer instead, like CTRL-C to send 155SIGINT. These can be an issue even with noninteractive shells, but only if using a pty for 156communication, as opposed to a pipe. 157 158To be safe, you just have to avoid sending them. 159 160### Why not just use hex escapes? 161 162In any normal programming languages, this would be no big deal. 163 164Any normal language has a way to escape arbitrary characters in strings by writing out their 165numeric values. For example, Rust lets you write them in hexadecimal, like `"\x4f"` (or 166`"\u{1d546}"` for Unicode). In this way, arbitrary strings can be represented using only 'nice' 167simple characters. Any remotely suspicious character can be replaced with a numeric escape 168sequence, where the escape sequence itself consists only of alphanumeric characters and some 169punctuation. The result may not be the most readable[^choices], but it's quite safe from being 170misinterpreted or corrupted in transit. 171 172Shell is not normal. It has no numeric escape sequences. 173 174There are a few different ways to quote characters (unquoted, unquoted-with-backslash, single 175quotes, double quotes), but all of them involve writing the character itself. If the input 176contains a control character, the output must contain that same character. 177 178### Mitigation: terminal filters 179 180In practice, automating interactive shells like in the above example is pretty uncommon these days. 181In most cases, the only way for a programmatically generated string to make its way to the input of 182an interactive shell is if a human copies and pastes it into their terminal. 183 184And many terminals detect when you paste a string containing control characters. iTerm2 strips 185them out; gnome-terminal replaces them with alternate characters[^gr]; Kitty outright prompts for 186confirmation. This mitigates the risk. 187 188But it's not perfect. Some other terminals don't implement this check or implement it incorrectly. 189Also, these checks tend to not filter the tab character, which could trigger tab completion. In 190most cases that's a non-issue, because most shells support paste bracketing, which disables tab and 191some other control characters[^bracketing] within pasted text. But in some cases paste bracketing 192gets disabled. 193 194### Future possibility: ANSI-C quoting 195 196I said that shell syntax has no numeric escapes, but that only applies to *portable* shell syntax. 197Bash and Zsh support an obscure alternate quoting style with the syntax `$'foo'`. It's called 198["ANSI-C quoting"][ansic], and inside it you can use all the escape sequences supported by C, 199including hex escapes: 200 201```bash 202$ echo $'\x41\n\x42' 203A 204B 205``` 206 207But other shells don't support it — including Dash, a popular choice for `/bin/sh`, and Busybox's 208Ash, frequently seen on stripped-down embedded systems. This crate's quoting functionality [tries 209to be compatible](crate#compatibility) with those shells, plus all other POSIX-compatible shells. 210That makes ANSI-C quoting a no-go. 211 212Still, future versions of this crate may provide an option to enable ANSI-C quoting, at the cost of 213reduced portability. 214 215### Future possibility: printf 216 217Another option would be to invoke the `printf` command, which is required by POSIX to support octal 218escapes. For example, you could 'escape' the Rust string `"\x01"` into the shell syntax `"$(printf 219'\001')"`. The shell will execute the command `printf` with the first argument being literally a 220backslash followed by three digits; `printf` will output the actual byte with value 1; and the 221shell will substitute that back into the original command. 222 223The problem is that 'escaping' a string into a command substitution just feels too surprising. If 224nothing else, it only works with an actual shell; [other languages' shell parsing 225routines](crate#compatibility) wouldn't understand it. Neither would this crate's own parser, 226though that could be fixed. 227 228Future versions of this crate may provide an option to use `printf` for quoting. 229 230### Special note: newlines 231 232Did you know that `\r` and `\n` are control characters? They aren't as dangerous as other control 233characters (if quoted properly). But there's still an issue with them in interactive contexts. 234 235Namely, in some cases, interactive shells and/or the tty layer will 'helpfully' translate between 236different line ending conventions. The possibilities include replacing `\r` with `\n`, replacing 237`\n` with `\r\n`, and others. This can't result in command injection, but it's still a lossy 238transformation which can result in a failure to round-trip (i.e. the shell sees a different string 239from what was originally passed to `quote`). 240 241Numeric escapes would solve this as well. 242 243# Solved issues 244 245## Solved: Past vulnerability (GHSA-r7qv-8r2h-pg27 / RUSTSEC-2024-XXX) 246 247Versions of this crate before 1.3.0 did not quote `{`, `}`, and `\xa0`. 248 249See: 250- <https://github.com/advisories/GHSA-r7qv-8r2h-pg27> 251- (TODO: Add Rustsec link) 252 253## Solved: `!` and `^` 254 255There are two non-control characters which have a special meaning in interactive contexts only: `!` and 256`^`. Luckily, these can be escaped adequately. 257 258The `!` character triggers [history expansion][he]; the `^` character can trigger a variant of 259history expansion known as [Quick Substitution][qs]. Both of these characters get expanded even 260inside of double-quoted strings\! 261 262If we're in a double-quoted string, then we can't just escape these characters with a backslash. 263Only a specific set of characters can be backslash-escaped inside double quotes; the set of 264supported characters depends on the shell, but it often doesn't include `!` and `^`.[^escbs] 265Trying to backslash-escape an unsupported character produces a literal backslash: 266```bash 267$ echo "\!" 268\! 269``` 270 271However, these characters don't get expanded in single-quoted strings, so this crate just 272single-quotes them. 273 274But there's a Bash bug where `^` actually does get partially expanded in single-quoted strings: 275```bash 276$ echo ' 277> ^a^b 278> ' 279 280!!:s^a^b 281``` 282 283To work around that, this crate forces `^` to appear right after an opening single quote. For 284example, the string `"^` is quoted into `'"''^'` instead of `'"^'`. This restriction is overkill, 285since `^` is only meaningful right after a newline, but it's a sufficient restriction (after all, a 286`^` character can't be preceded by a newline if it's forced to be preceded by a single quote), and 287for now it simplifies things. 288 289## Solved: `\xa0` 290 291The byte `\xa0` may be treated as a shell word separator, specifically on Bash on macOS when using 292the default UTF-8 locale, only when the input is invalid UTF-8. This crate handles the issue by 293always using quotes for arguments containing this byte. 294 295In fact, this crate always uses quotes for arguments containing any non-ASCII bytes. This may be 296changed in the future, since it's a bit unfriendly to non-English users. But for now it 297minimizes risk, especially considering the large number of different legacy single-byte locales 298someone might hypothetically be running their shell in. 299 300### Demonstration 301 302```bash 303$ echo -e 'ls a\xa0b' | bash 304ls: a: No such file or directory 305ls: b: No such file or directory 306``` 307The normal behavior would be to output a single line, e.g.: 308```bash 309$ echo -e 'ls a\xa0b' | bash 310ls: cannot access 'a'$'\240''b': No such file or directory 311``` 312(The specific quoting in the error doesn't matter.) 313 314### Cause 315 316Just for fun, here's why this behavior occurs: 317 318Bash decides which bytes serve as word separators based on the libc function [`isblank`][isblank]. 319On macOS on UTF-8 locales, this passes for `\xa0`, corresponding to U+00A0 NO-BREAK SPACE. 320 321This is doubly unique compared to the other systems I tested (Linux/glibc, Linux/musl, and 322Windows/MSVC). First, the other systems don't allow bytes in the range [0x80, 0xFF] to pass 323<code>is<i>foo</i></code> functions in UTF-8 locales, even if the corresponding Unicode codepoint 324does pass, as determined by the wide-character equivalent function, <code>isw<i>foo</i></code>. 325Second, the other systems don't treat U+00A0 as blank (even using `iswblank`). 326 327Meanwhile, Bash checks for multi-byte sequences and forbids them from being treated as special 328characters, so the proper UTF-8 encoding of U+00A0, `b"\xc2\xa0"`, is not treated as a word 329separator. Treatment as a word separator only happens for `b"\xa0"` alone, which is illegal UTF-8. 330 331[ansic]: https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html 332[he]: https://www.gnu.org/software/bash/manual/html_node/History-Interaction.html 333[qs]: https://www.gnu.org/software/bash/manual/html_node/Event-Designators.html 334[isblank]: https://man7.org/linux/man-pages/man3/isblank.3p.html 335[nul]: #nul-bytes 336 337[^choices]: This can lead to tough choices over which 338 characters to escape and which to leave as-is, especially when Unicode gets involved and you 339 have to balance the risk of confusion with the benefit of properly supporting non-English 340 languages. 341 <br> 342 <br> 343 We don't have the luxury of those choices. 344 345[^gr]: For example, backspace (in Unicode lingo, U+0008 BACKSPACE) turns into U+2408 SYMBOL FOR BACKSPACE. 346 347[^bracketing]: It typically disables almost all handling of control characters by the shell proper, 348 but one necessary exception is the end-of-paste sequence itself (which starts with the control 349 character `\x1b`). In addition, paste bracketing does not suppress handling of control 350 characters by the kernel tty layer, such as `\x03` sending SIGINT (which typically clears the 351 currently typed command, making it dangerous in a similar way to `\x01`). 352 353[^escbs]: For example, Dash doesn't remove the backslash from `"\!"` because it simply doesn't know 354 anything about `!` as a special character: it doesn't support history expansion. On the other 355 end of the spectrum, Zsh supports history expansion and does remove the backslash — though only 356 in interactive mode. Bash's behavior is weirder. It supports history expansion, and if you 357 write `"\!"`, the backslash does prevent history expansion from occurring — but it doesn't get 358 removed! 359 360*/ 361 362// `use` declarations to make auto links work: 363use ::{quote, join, Shlex, Quoter, QuoteError}; 364 365// TODO: add more about copy-paste and human readability. 366