I find it easier to understand in terms of the Unix syscall API. `2>&1` literally translates as `dup2(1, 2)`, and indeed that's exactly how it works. In the classic unix shells that's all that happens; in more modern shells there may be some additional internal bookkeeping to remember state. Understanding it as dup2 means it's easier to understand how successive redirections work, though you also have to know that redirection operators are executed left-to-right, and traditionally each operator was executed immediately as it was parsed, left-to-right. The pipe operator works similarly, though it's a combination of fork and dup'ing, with the command being forked off from the shell as a child before processing the remainder of the line.
Though, understanding it this way makes the direction of the angled bracket a little odd; at least for me it's more natural to understand dup2(2, 1) as 2<1, as in make fd 2 a duplicate of fd 1, but in terms of abstract I/O semantics that would be misleading.
Another fun consequence of this is that you can initialize otherwise-unset file descriptors this way:
$ cat foo.sh
#!/usr/bin/env bash
>&1 echo "will print on stdout"
>&2 echo "will print on stderr"
>&3 echo "will print on fd 3"
$ ./foo.sh 3>&1 1>/dev/null 2>/dev/null
will print on fd 3
It's a trick you can use if you've got a super chatty script or set of scripts, you want to silence or slurp up all of their output, but you still want to allow some mechanism for printing directly to the terminal.
The danger is that if you don't open it before running the script, you'll get an error:
$ ./foo.sh
will print on stdout
will print on stderr
./foo.sh: line 5: 3: Bad file descriptor
The aws cli has a set of porcelain for s3 access (aws s3) and plumbing commands for lower level access to advanced controls (aws s3api). The plumbing command aws s3api get-object doesn't support stdout natively, so if you need it and want to use it in a pipeline (e.g. pv), you would naively do something like
Unfortunately, aws s3api already prints the API response to stdout, and error messages to stderr, so if you do the above you'll clobber your pipeline with noise, and using /dev/stderr has the same effect on error.
The existing build system I did not have control over, and would produce output on stdout/stderr. I wanted my build scripts to be able to only show the output from the build system if building failed (and there might have been multiple build system invocations leading to that failure). I also wanted the second level to be able to log progress messages that were shown to the user immediately on stdout.
Level 1: create fd=3, capture fd 1/2 (done in one place at the top-level)
Level 2: log progress messages to fd=3 so the user knows what's happening
Level 3: original build system, will log to fd 1/2, but will be captured
It was janky and it's not a project I have a need for anymore, but it was technically a real world use case.
And just like dup2 allows you to duplicate into a brand new file descriptor, shells also allow you to specify bigger numbers so you aren’t restricted to 1 and 2. This can be useful for things like communication between different parts of the same shell script.
The comments on stackoverflow say the words out of my mouth so I'll just copy & paste here:
> but then shouldn't it rather be &2>&1?
> & is only interpreted to mean "file descriptor" in the context of redirections. Writing command &2>& is parsed as command & and 2>&1
That's where all the confusion comes from. I believe most people can intuitively understand > is redirection, but the asymmetrical use of & throws them off.
Interestingly, Powershell also uses 2>&1. Given an once-a-lifetime chance to redesign shell, out of all the Unix relics, they chose to keep (borrow) this.
They're more like capabilities or handles than pointers. There's a reason in Rust land many systems use handles (indices to a table of objects) in absence of pointer arithmetic.
In the C API of course there's symbolic names for these. STDIN_FILENO, STDOUT_FILENO, etc for the defaults and variables for the dynamically assigned ones.
> At least allow us to use names instead of numbers.
You can for the destination. That's the whole reason you need the "&": to tell the shell the destination is not a named file (which itself could be a pipe or socket). And by default you don't need to specify the source fd at all. The intent is that stdout is piped along but stderr goes directly to your tty. That's one reason they are separate.
And for those saying "<" would have been better: that is used to read from the RHS and feed it as input to the LHS so it was taken.
This is misleading because you use plural for both and I'm sure most of these UX missteps were _each_ made by a _single_ person, and there were >1 users even at the time.
Yeah but now they're using npm to install a million packages to do things like tell if a number is greater than 10000. The chances of the programmer wanting to understand the underlying system they are using is essentially nil.
Which is about the same as `2>&1` but with a friendlier name for STDOUT. And this way `2> /dev/stdout`, with the space, also works, whereas `2> &1` doesn't which confuses many. But it's behavior isn't exactly the same and might not work in all situations.
And of course I wish you could use a friendlier name for STDERR instead of `2>`
Bash syntax is anything but simple or logical. Just look at the insane if-statement syntax. Or how the choice of quotes fundamentally changes behavior. Argument parsing, looping, the list goes on.
> Also the choice of quotes changing behavior is a thing in:
In those languages they change what's contained in the string. Not how many strings you get. Or what the strings from that string look like. ($@ being an extreme example)
Counter reason in favor is that you can always count on it being there and working the same way. Perl is too out of fashion and python has too many versioning/library complexities.
You have to write the crappy sh script once but then you get simple, easy usage every time. (If you're revising the script frequently enough that sh/bash are the bottleneck, then what you have is a dev project and not a script, use a programming language).
You're not wrong, but there's fairly easy ways to deal with filenames containing spaces - usually just enclosing any variable use within double quotes will be sufficient. It's tricker to deal with filenames that contain things such as line breaks as that usually involves using null terminated filenames (null being the only character that is not allowed in filenames). e.g find . -type f -print0
You're not wrong, but at my place, our main repository does not permit cloning into a directory with spaces in it.
Three factors conspire to make a bug:
1. Someone decides to use a space
2. We use Python
3. macOS
Say you clone into a directory with a space in it. We use Python, so thus our scripts are scripts in the Unix sense. (So, Python here is replacable with any scripting language that uses a shebang, so long as the rest of what comes after holds.) Some of our Python dependencies install executables; those necessarily start with a shebang:
#!/usr/bin/env python3
Note that space.
Since we use Python virtualenvs,
#!/home/bob/src/repo/.venv/bin/python3
But … now what if the dir has a space?
#!/home/bob/src/repo with a space/.venv/bin/python3
Those look like arguments, now, to a shebang. Shebangs have no escaping mechanism.
As I also discovered when I discovered this, the Python tooling checks for this! It will instead emit a polyglot!
#!/bin/bash
# <what follows in a bash/python polyglot>
# the bash will find the right Python interpreter, and then re-exec this
# script using that interpreter. The Python will skip the bash portion,
# b/c of cleverness in the polyglot.
Which is really quite clever, IMO. But, … it hits (2.). It execs bash, and worse, it is macOS's bash, and macOS's bash will corrupt^W remove for your safety! certain environment variables from the environment.
Took me forever to figure out what was going on. So yeah … spaces in paths. Can't recommend them. Stuff breaks, and it breaks in weird and hard to debug ways.
I quite like how archaic it is. I am turned off by a lot of modern stuff. My shell is nice and predictable. My scripts from 15 years ago still work just fine. No, I don't want it to get all fancy, thanks.
Nushell, Powershell, Python, Ruby, heck even Perl is better. Shell scripting is literally the worst language I've ever seen in common use. Any realistic alternative is going to be better.
Anything that is infected by UCS-2 / UTF-16 garbage should be revised and reconsidered... Yeah UTF-8 has carve outs for those escape sequences... However JSON is even worse, you _have_ to use UTF-16 escapes. https://en.wikipedia.org/wiki/JSON#Character_encoding
Trying to be language agnostic: it should be as self-explanatory as possible. 2>&1 is all but.
Why is there a 2 on the left, when the numbers are usually on the right. What's the relationship between 2 and 1? Is the 2 for std err? Is that `&` to mean "reference"? The fact you only grok it if you know POSIX sys calls means it's far from self explanatory. And given the proportion of people that know POSIX sys calls among those that use Bash, I think it's a bit of an elitist syntax.
POSIX has a manual for shell. You can read 99% of it without needing to know any syscalls. I'm not as familiar with it but Bash has an extensive manual as well, and I doubt syscall knowledge is particularly required there either.
If your complaint is "I don't know what this syntax means without reading the manual" I'd like to point you to any contemporary language that has things like arrow functions, or operator overloading, or magic methods, or monkey patching.
Is it more sane, or is it just what you are used to?
Python doesn't really have much that makes it a sensible choice for scripting.
Its got some basic data structures and a std-lib, but it comes at a non-trivial performance cost, a massive barrier to getting out of the single thread, and non-trivial overhead when managing downstream processes. It doesn't protect you from any runtime errors (no types, no compile checks). And I wouldn't call python in practice particularly portable...
Laughably, NodeJS is genuinely a better choice - while you don't get multithreading easily, at least you aren't trivially blocked on IO. NodeJS also has pretty great compatibility for portability; and can be easily compiled/transformed to get your types and compile checks if you want. I'd still rather avoid managing downstream processes with it - but at least you know your JSON parsing and manipulation is trivial.
Go is my goto when I'm reaching for more; but (ba)sh is king. You're scripting on the shell because you're mainly gluing other processes together, and this is what (ba)sh is designed to do. There is a learning curve, and there are footguns.
I regularly refer to [the unix shell specification][1] to remember the specifics of ${foo%%bar} versus ${foo#bar}, ${parameter:+word} versus ${parameter:-word}, and so on.
It also teaches how && and || work, their relation to [output redirection][3] and [command piping][2], [(...) versus {...}][4], and tricky parts like [word expansion][5], even a full grammar. It's not exciting reading, but it's mostly all there, and works on all POSIXy shells, e.g. sh, bash, ksh, dash, ash, zsh.
Process substitution and calling it file redirect is a bit misleading because it is implemented with named pipes which becomes relevant when the command tries to seek in them which then fails.
Also the reason why Zsh has an additional =(command) construct which uses temporary files instead.
As someone who use LLMs to generate, among others, Bash script I recommend shellcheck too. Shellcheck catches lots of things and shall really make your Bash scripts better. And if for whatever reason there's an idiom you use all the time that shellcheck doesn't like, you can simply configure shellcheck to ignore that one.
Humans used this combination extensively for decades too. I'm no aware of any other simple way to grep both stdout and stderr from a process. (grep, or save to file, or pipe in any other way).
"not humans" are using this extensively precisely because humans used this combination extensively for decades. It's muscle-memory for me. And so is it for LLMs.
I found the explanation useful, about "why" it is that way. I didn't realize the & before the 1 means to tell it is the filedescriptor 1 and not a file named 1.
Well sure, but surely this takes some inspiration from both `&` as the "address of" operator in C as well as the `>` operator which (apart from being the greater-than operator) very much implies "into" in many circumstances.
So `>&1` is "into the file descriptor pointed to by 1", and at the time any reasonable programmer would have known that fd 1 == STDOUT.
I've also found llms seem to love it when calling out to tools, I suppose for them having stderr interspersed messaged in their input doesn't make much difference
I know the underlying call, but I always see the redirect symbols as indicating that "everything" on the big side of the operator fits into a small bit of what is on the small side of the operator. Like a funnel for data. I don't know the origin, but I'm believing my fiction is right regardless. It makes <(...) make intuitive sense.
The comment about "why not &2>&1" is probably the best one on the page, with the answer essentially being that it would complicate the parser too much / add an unnecessary byte to scripts.
It means redirect file descriptor 2 to the same destination as file descriptor 1.
Which actually means that an undelrying dup2 operation happens in this direction:
2 <- 1 // dup2(2, 1)
The file description at [1] is duplicated into [2], thereby [2] points to the same object. Anything written to stderr goes to the same device that stdout is sending to.
The notation follows I/O redirections: cmd > file actually means that a descriptor [n] is first created for the open file, and then that descriptor's decription is duplicated into [1]:
I enjoyed the commenter asking “Why did they pick such arcane stuff as this?” - I don’t think I touch more arcane stuff than shell, so asking why shell used something that is arcane relative to itself is to me arcane squared.
I love myself a little bit of C++. A good proprietary C++ codebase will remind you that people just want to be wizards, solving their key problem with a little bit of magic.
I've only ever been tricked into working on C++...
> I am thinking that they are using & like it is used in c style programming languages. As a pointer address-of operator. [...] 2>&1 would represent 'direct file 2 to the address of file 1'.
I had never made the connection of the & symbol in this context. I think I never really understood the operation before, treating it just as a magic incantation but reading this just made it click for me.
No, the shell author needed some way to distinguish file descriptor 1 from a file named "1" (note that 2>1 means to write stderr to the file named "1"), and '&' was one of the few available characters. It's not the address of anything.
To be consistent, it would be &2>&1, but that makes it more verbose than necessary and actually means something else -- the first & means that the command before it runs asynchronously.
The point is that the order in which that is processed is not left to right.
First the | pipe is established as fd [1]. And then 2>&1 duplicates that pipe into [2]. I.e. right to left: opposite to left-to-right processing of redirections.
When you need to capture both standard error and standard output to a file, you must have them in this order:
bob > file 2>&1
It cannot be:
bob 2>&1 > file
Because then the 2>&1 redirection is performed first (and usually does nothing because stderr and stdout are already the same, pointing to your terminal). Then > file redirects only stdout.
But if you change > file to | process, then it's fine! process gets the combined error and regular output.
So if i happen to know the numbers of other file descriptors of the process (listed in /proc), i can redirect to other files opened in the current process? 2>&1234? Or is it restricted to 0/1/2 by the shell?
Would probably be hard to guess since the process may not have opened any file once it started.
I've used (see: example) to handle applications that just dump pointless noise into stdout/stderr, which is only useful when the binary crashes/fails. Provided the error is marked by a non-zero return code, this will then correctly display the stdout/stderr (provided there is <64KiB of it).
I always wondered if there ever was a standard stream for stdlog which seems useful, and comes up in various places but usually just as an alias to stderr
Somewhat off topic, but related: I worked at this place that made internet security software. It ran on Windows, and on various flavors of Unix.
One customer complained about our software corrupting files on their hard disk. Turns out they had modified their systems so that a newly-spawned program was not given a stderr. That is, it was not handed 0, 1, and 2 (file descriptors), but only 0 and 1. So whenever our program wrote something to stderr, it wrote to whatever file had been the first one opened by the program.
We talked about fixing this, briefly. Instead we decided to tell the customer to fix their broken environment.
I understand how this works, but wouldn’t a more clear syntax be:
command &2>&1
Since the use of & signifies a file descriptor. I get what this ACTUALLY does is run command in the background and then run 2 sending its stout to stdout. That’s completely not obvious by the way.
Though, understanding it this way makes the direction of the angled bracket a little odd; at least for me it's more natural to understand dup2(2, 1) as 2<1, as in make fd 2 a duplicate of fd 1, but in terms of abstract I/O semantics that would be misleading.
The danger is that if you don't open it before running the script, you'll get an error:
You can, though, do the following:
This will pipe only the object contents to stdout, and the API response to /dev/null.https://github.com/jez/symbol/blob/master/scaffold/symbol#L1...
The existing build system I did not have control over, and would produce output on stdout/stderr. I wanted my build scripts to be able to only show the output from the build system if building failed (and there might have been multiple build system invocations leading to that failure). I also wanted the second level to be able to log progress messages that were shown to the user immediately on stdout.
It was janky and it's not a project I have a need for anymore, but it was technically a real world use case.Which is lost when using more modern or languages foreign to Unix.
And I also disagree, your suggestion is not easier. The & operator is quite intuitive as it is, and conveys the intention.
open a terminal (OSX/Linux) and type:
open a browser window and search for: Both will bring up the man page for the function call.To get recursive, you can try:
(the unix is important, otherwise it gives you manly men)That's only just after midnight [1][2]
[1] - https://www.youtube.com/watch?v=XEjLoHdbVeE
[2] - https://unix.stackexchange.com/questions/405783/why-does-man...
> but then shouldn't it rather be &2>&1?
> & is only interpreted to mean "file descriptor" in the context of redirections. Writing command &2>& is parsed as command & and 2>&1
That's where all the confusion comes from. I believe most people can intuitively understand > is redirection, but the asymmetrical use of & throws them off.
Interestingly, Powershell also uses 2>&1. Given an once-a-lifetime chance to redesign shell, out of all the Unix relics, they chose to keep (borrow) this.
File descriptors are like handing pointers to the users of your software. At least allow us to use names instead of numbers.
And sh/bash's syntax is so weird because the programmer at the time thought it was convenient to do it like that. Nobody ever asked a user.
In the C API of course there's symbolic names for these. STDIN_FILENO, STDOUT_FILENO, etc for the defaults and variables for the dynamically assigned ones.
You can for the destination. That's the whole reason you need the "&": to tell the shell the destination is not a named file (which itself could be a pipe or socket). And by default you don't need to specify the source fd at all. The intent is that stdout is piped along but stderr goes directly to your tty. That's one reason they are separate.
And for those saying "<" would have been better: that is used to read from the RHS and feed it as input to the LHS so it was taken.
Are you sure there wasn't >&1 users... Sorry I'll get my coat.
And of course I wish you could use a friendlier name for STDERR instead of `2>`
if $command; then <thing> else <thing> fi
You may be complaining about the syntax for the test command specifically or bash’s [[ builtin
Also the choice of quotes changing behavior is a thing in:
1. JavaScript/typescript 2. Python 3. C/C++ 4. Rust
In some cases it’s the same difference, eg: string interpolation in JavaScript with backticks
In those languages they change what's contained in the string. Not how many strings you get. Or what the strings from that string look like. ($@ being an extreme example)
You can use /dev/stdin, /dev/stdout, /dev/stderr in most cases, but it's not perfect.
Never ever write code that assumes this. These dev shorthands are Linux specific, and you'll even need a certain minimum Linux version.
I cringe at the amount of shell scripts that assume bash is the system interpreter, and not sh or ksh.
Always assume sh, it's the most portable.
Linux != Unix.
Which means that reading someone else's shell script (or awk, or perl, or regex) is INCREDIBLY inconvenient.
But my main reason is that most scripts break when you call them with filenames that contain spaces. And they break spectacularly.
You have to write the crappy sh script once but then you get simple, easy usage every time. (If you're revising the script frequently enough that sh/bash are the bottleneck, then what you have is a dev project and not a script, use a programming language).
Three factors conspire to make a bug:
Say you clone into a directory with a space in it. We use Python, so thus our scripts are scripts in the Unix sense. (So, Python here is replacable with any scripting language that uses a shebang, so long as the rest of what comes after holds.) Some of our Python dependencies install executables; those necessarily start with a shebang: Note that space.Since we use Python virtualenvs,
But … now what if the dir has a space? Those look like arguments, now, to a shebang. Shebangs have no escaping mechanism.As I also discovered when I discovered this, the Python tooling checks for this! It will instead emit a polyglot!
Which is really quite clever, IMO. But, … it hits (2.). It execs bash, and worse, it is macOS's bash, and macOS's bash will corrupt^W remove for your safety! certain environment variables from the environment.Took me forever to figure out what was going on. So yeah … spaces in paths. Can't recommend them. Stuff breaks, and it breaks in weird and hard to debug ways.
I suppose it would also need env to be able to handle paths that have spaces in them.
Even if you're a programmer, that doesn't mean you magically know what other programmers find easy or logical.
What should be the syntax according to contemporary IT people? JSON? YAML? Or just LLM prompt?
Why is there a 2 on the left, when the numbers are usually on the right. What's the relationship between 2 and 1? Is the 2 for std err? Is that `&` to mean "reference"? The fact you only grok it if you know POSIX sys calls means it's far from self explanatory. And given the proportion of people that know POSIX sys calls among those that use Bash, I think it's a bit of an elitist syntax.
If your complaint is "I don't know what this syntax means without reading the manual" I'd like to point you to any contemporary language that has things like arrow functions, or operator overloading, or magic methods, or monkey patching.
Python doesn't really have much that makes it a sensible choice for scripting.
Its got some basic data structures and a std-lib, but it comes at a non-trivial performance cost, a massive barrier to getting out of the single thread, and non-trivial overhead when managing downstream processes. It doesn't protect you from any runtime errors (no types, no compile checks). And I wouldn't call python in practice particularly portable...
Laughably, NodeJS is genuinely a better choice - while you don't get multithreading easily, at least you aren't trivially blocked on IO. NodeJS also has pretty great compatibility for portability; and can be easily compiled/transformed to get your types and compile checks if you want. I'd still rather avoid managing downstream processes with it - but at least you know your JSON parsing and manipulation is trivial.
Go is my goto when I'm reaching for more; but (ba)sh is king. You're scripting on the shell because you're mainly gluing other processes together, and this is what (ba)sh is designed to do. There is a learning curve, and there are footguns.
It also teaches how && and || work, their relation to [output redirection][3] and [command piping][2], [(...) versus {...}][4], and tricky parts like [word expansion][5], even a full grammar. It's not exciting reading, but it's mostly all there, and works on all POSIXy shells, e.g. sh, bash, ksh, dash, ash, zsh.
[1]: https://pubs.opengroup.org/onlinepubs/7908799/xcu/chap2.html
[2]: https://pubs.opengroup.org/onlinepubs/7908799/xcu/chap2.html...
[3]: https://pubs.opengroup.org/onlinepubs/7908799/xcu/chap2.html...
[4]: https://pubs.opengroup.org/onlinepubs/7908799/xcu/chap2.html...
[5]: https://pubs.opengroup.org/onlinepubs/7908799/xcu/chap2.html...
Also the reason why Zsh has an additional =(command) construct which uses temporary files instead.
[1]: https://www.oreilly.com/library/view/essential-system-admini...
It's very, very easy to get shell scripts wrong; for instance the location of the file redirect operator in a pipeline is easy to get wrong.
It redirects STDERR (2) to where STDOUT is piped already (&1). Good for dealing with random CLI tools if you're not a human.
So `>&1` is "into the file descriptor pointed to by 1", and at the time any reasonable programmer would have known that fd 1 == STDOUT.
The comment about "why not &2>&1" is probably the best one on the page, with the answer essentially being that it would complicate the parser too much / add an unnecessary byte to scripts.
Which actually means that an undelrying dup2 operation happens in this direction:
The file description at [1] is duplicated into [2], thereby [2] points to the same object. Anything written to stderr goes to the same device that stdout is sending to.The notation follows I/O redirections: cmd > file actually means that a descriptor [n] is first created for the open file, and then that descriptor's decription is duplicated into [1]:
I've only ever been tricked into working on C++...
I had never made the connection of the & symbol in this context. I think I never really understood the operation before, treating it just as a magic incantation but reading this just made it click for me.
To be consistent, it would be &2>&1, but that makes it more verbose than necessary and actually means something else -- the first & means that the command before it runs asynchronously.
Thus you cannot write:
You also cannot write However you may write The n>& is one clump.But also | isnt a redirection, it takes stdout and pipes it to another program.
So, if you want stderr to go to stdout, so you can pipe it, you need to do it in order.
bob 2>&1 | prog
You usually dont want to do this though.
First the | pipe is established as fd [1]. And then 2>&1 duplicates that pipe into [2]. I.e. right to left: opposite to left-to-right processing of redirections.
When you need to capture both standard error and standard output to a file, you must have them in this order:
It cannot be: Because then the 2>&1 redirection is performed first (and usually does nothing because stderr and stdout are already the same, pointing to your terminal). Then > file redirects only stdout.But if you change > file to | process, then it's fine! process gets the combined error and regular output.
# echo 1 >&2 2>| echo
Would probably be hard to guess since the process may not have opened any file once it started.
It is not. You can use any arbitrary numbers provided they're initialized properly. These values are just file descriptors.
For Example -> https://gist.github.com/valarauca/71b99af82ccbb156e0601c5df8...
I've used (see: example) to handle applications that just dump pointless noise into stdout/stderr, which is only useful when the binary crashes/fails. Provided the error is marked by a non-zero return code, this will then correctly display the stdout/stderr (provided there is <64KiB of it).
> Would probably be hard to guess since the process may not have opened any file once it started.
You need to not only inspect the current state, but also race the process before the assignments change.
[0] https://stackoverflow.com/questions/3618078/pipe-only-stderr...
One customer complained about our software corrupting files on their hard disk. Turns out they had modified their systems so that a newly-spawned program was not given a stderr. That is, it was not handed 0, 1, and 2 (file descriptors), but only 0 and 1. So whenever our program wrote something to stderr, it wrote to whatever file had been the first one opened by the program.
We talked about fixing this, briefly. Instead we decided to tell the customer to fix their broken environment.
command &2>&1
Since the use of & signifies a file descriptor. I get what this ACTUALLY does is run command in the background and then run 2 sending its stout to stdout. That’s completely not obvious by the way.
command &stderr>&stdout