Too much discussion of the XOR swap trick

(heather.cafe)

112 points | by CJefferson 3 days ago

19 comments

pjc50 6 hours ago
The fun bit is right at the start when the author notices that the compiler spots this and optimizes it away.
We didn't get into the deeper question of benchmarking it vs. a three-register swap, because I suspect the latter would be handled entirely by register renaming and end up being faster due to not requiring allocation of an ALU unit. Difficult to benchmark that because in order for it to make a difference, you'd need to surround it with other arithmetic instructions.
A meta question is why this persists. It has the right qualities for a "party trick": slightly esoteric piece of knowledge, not actually that hard to understand when you do know about it, but unintuitive enough that most people don't spontaneously reinvent it.
See also: https://en.wikipedia.org/wiki/Fast_inverse_square_root , which requires a bit more maths.
The other classic use of XOR - cursor overdrawing - has also long since gone away. It used to be possible to easily draw a cursor on a monochrome display by XORing it in, then to move it you simply XOR it again, restoring the original image.
[-]
- chrisss395 17 minutes ago
  Reminds me of the time developing the 747-8 avionics and the systems engineers started observing a bug where the entire box would just stop randomly, all output stopped, including HW traces...the thing would just halt.
  About a month later an engineer decided to turn on all warnings for gcc...and behold a message stating something to the effect of "WARNING: execution will halt upon reaching this statement." The compiled code basically just halted and never returned from the function call (can't remember the specifics).
  And that is how we learned not to hide compiler messages.
  [-]
  - warmwaffles 10 minutes ago
    `-Wall` is your friend and boy it is a major irritant using third party sloppy libraries when statically compiling it all.
- ahoka 14 minutes ago
  When I opened the article, I have immediately searched for "register renaming"...
- jcalvinowens 2 hours ago
  The "party trick" is more general than just xor, for example:
```
    void swap(unsigned* a, unsigned* b)
    {
        *a = *a + *b;
        *b = *a - *b;
        *a = *a - *b;
    }
```
  GCC still sees through it with restrict: https://godbolt.org/z/8n3bGha3e
  [-]
  - warmwaffles 4 minutes ago
    Yea this is operating on the pointer to the values, but if you use just an `unsigned a` it uses xor
    clang: https://godbolt.org/z/5PEMTrxba
    gcc: https://godbolt.org/z/7j8h4zo4v
  - sifar 5 minutes ago
    This has overflow issues unlike xor.
- jimjag 1 hour ago
  You can even do it with complex images in a multi-color bitmap. I do it with my 6502 SBC setup: https://github.com/jimjag/JJ65c02/blob/main/Software/pico-co...
- vidarh 3 hours ago
  The cursor overdrawing trick in part started going away before it's time thanks to patent enforcement (one of the lawsuits infamously exacerbated Commodores financial woes towards the end)... By the time the patent expired there was really no longer much value in going back to it.
danhau 6 hours ago
> Are there other XOR tricks?
Yes, error correction.
You have some packets of data a, b, c. Add one additional packet z that is computed as z = a ^ b ^ c. Now whenever one of a, b or c gets corrupted or lost, it can be reconstructed by computing the XOR of all the others.
So if b is lost: b = a ^ c ^ z. This works for any packet, but only one. If multiple are lost, this will fail.
There are way better error correction algorithms, but I like the simplicity of this one.
[-]
- throwaway2027 4 hours ago
  https://en.wikipedia.org/wiki/XOR_linked_list
- amelius 4 hours ago
  XOR is also great for storing copyrighted works without liability.
  a = the bits of some song or movie
  b = pure noise
  Store c = a^b.
  Give b to a friend. Throw away a.
  Now both you and your friend have a bit vector of pure noise. Together you can produce the copyrighted work. But nobody is liable.
  [-]
  - aleph_minus_one 4 hours ago
    This is the "What Colour are your bits?" argument:
    > https://ansuz.sooke.bc.ca/entry/23
    > https://ansuz.sooke.bc.ca/entry/24
    As these article outline, the legal situation is much more complicated (and unintuitive to people who are used to "computer science thinking").
    [-]
    - amelius 2 hours ago
      This is because legal people want something to exist that does not physically exist.
      There is no trace back from pure noise to the original work.
      Colour of bits is just magical thinking.
      [-]
      - TeMPOraL 1 hour ago
        Colour of bits isn't a property of bits. It's provenance. It's facts about history of the things.
        There may be no trace from pure noise to original work, but you didn't get that particular noise randomly, you in fact got it from the original work.
        Once you understand that law cares less about the thing itself, and more about the causal chain that led to it, it stops seeming magical and becomes perfectly reasonable.
        (Also, FWIW, it's not that far conceptually from code = data, but there's still tons of technical people who can't comprehend the fact that there is no code/data distinction in reality. "Code" vs "data" too isn't a property of bits, it's only a matter of perspective.)
        [-]
        amelius 56 minutes ago
        That's true, but without a warrant the law cannot see how someone fabricated that noise.
        [-]
        TeMPOraL 33 minutes ago
        In this particular case there's also the simpler, more technical/mathematical argument: you cannot possibly just "accidentally" have that exact noise. Getting those specific bits instead of any other sequence from the space of random numbers that much long requires you to extend effort at least equivalent to possession of the exact copyrighted work that happens to fall out of the XOR exercise.
        [-]
        amelius 29 minutes ago
        Except there are two people, B and C with noise. The only thing you can prove is that if you XOR both noise vectors together, you have a copyrighted work.
        Both people will say they are innocent and that the other person used the other's noise vector and the copyrighted work to produce their noise vector.
      - direwolf20 54 minutes ago
        There is no trace from a dead body back to the original act of killing, but police regularly manage to link them anyway (at least when the body had a large enough bank account).
        They do this by means such as "questioning people" and "finding evidence". For example, if you have a file on your computer describing your plan to use XOR to infringe copyright, that would be considered "evidence".
      - aleph_minus_one 2 hours ago
        > This is because legal people want something to exist that does not physically exist.
        No law exists "physically".
        Otherwise: Even in Computer Science the situation is more complicated, as is explained in the linked articles). Relevant excerpt from the first linked article:
        "Child pornography is an interesting case because I find myself, and I think many people in the computing community will find themselves, on the opposite side of the Colourful/Colour-blind gap from where I would normally be. In copyright I spend a lot of time explaining why Colour doesn't exist and it doesn't matter where the bits came from. But when it comes to child pornography, I think maybe Colour should make a difference - if we're going to ban it at all, it should matter where it came from. Whether any children were actually involved, who did or didn't give consent, in short: what Colour the bits are. The other side takes the opposite tack: child pornography is dangerous by its very existence, and it doesn't matter where it came from. They're claiming that whether some bits are child pornography or not, and if so, whether they're illegal or not, should be entirely determined by (strictly a function of) the bits themselves. Legality, at least under the obscenity law, should not involve Colour distinctions.
        [...]
        The computer science applications of Colour seem to be mostly specific to security. Suppose your computer is infected with a worm or virus. You want to disinfect it. What do you do? You boot it up from original write-protected install media. Sure, you have a copy of the operating system on the drive already, but you can't use that copy - it's the wrong Colour. Then you go through a process of replacing files, maybe examining files, swapping disks around and carefully write-protecting them; throughout, you're maintaining information on the Colour of each part of the system and each disk until you've isolated the questionable files and everything else is known to be the "not infected with virus" Colour. Note that developers of Web applications in Perl use a similar scorekeeping system to keep track of which bits are "tainted" by influence from user input.
        When we use Colour like that to protect ourselves against viruses or malicious input, we're using the Colour to conservatively approximate a difficult or impossible to compute function of the bits. Either our operating system is infected, or it is not. A given sequence of bits either is an infected file or isn't, and the same sequence of bits will always be either infected or not. Disinfecting a file changes the bits. Infected or not is a function, not a Colour. The trouble is that because any of our files might be infected including the tools we would use to test for infection, we can't reliably compute the "is infected" function, so we use Colour to approximate "is infected" with something that we can compute and manage - namely "might be infected". Note that "might be infected" is not a function; the same file can be "might be infected" or "not (might be infected)" depending on where it came from. That is a Colour.
        [...]
        Random numbers have a Colour different from that of non-random numbers. [...]
        Note my terminology - I spoke of "randomly generated" numbers. Conscientious cryptographers refuse to use the term "random numbers". They'll persistently and annoyingly correct you to say "randomly generated numbers" instead, because it's not the numbers that are or are not random, it's the source of the numbers that is or is not random. If you have numbers that are supposed to come from a random source and you start testing them to make sure they're really "random", and you throw out the ones that seem not to be, then you end up reducing the Shannon entropy of the source, violating the constraints of the one-time pad if that's relevant to your application, and generally harming security. I just threw a bunch of math terms at you in that sentence and I don't plan to explain them here, but all cryptographers understand that it's not the numbers that matter when you're talking about randomness. What matters is where the numbers came from - that is, exactly, their Colour.
        So if we think we understand cryptography, we ought to be able to understand that Colour is something real even though it is also true that bits by themselves do not have Colour. I think it's time for computer people to take Colour more seriously - if only so that we can better explain to the lawyers why they must give up their dream of enforcing Colour inside Friend Computer, where Colour does not and cannot exist."
  - nkrisc 1 hour ago
    Crimes aren’t legal just because you hide it. Intent matters.
    Law is more akin to philosophy than computer science.
  - pjc50 4 hours ago
    Whether people are liable is a question for the courts, and I suspect they simply look through the tech and ask "do you end up with a copy of the work?"
    (unless you're an AI company, in which case you can copy the whole internet just fine)
    [-]
    - amelius 4 hours ago
      But that's an unsolvable question. Just like when A is caught on camera stealing a diamond, but A turns out to have an identical twin B. So the prosecutor can't do anything.
      [-]
      - crustaceansoup 2 hours ago
        And you could say the same is true if you lost an AES key. But if they can establish a chain of evidence that shows (to whatever degree the court you're in requires) that it does contain the work, you've lost.
        How many ways could they do this? Could they note in court that they found you getting your copy from a "super secure no liability legal loophole" piracy service? Could they just get B's side, whether through subpoena or whatever mechanism you have to communicate with B? (You must, since your file is "just noise" and useless to you as it is)
      - AlienRobot 3 hours ago
        It's an unsolvable question if you assume the judge is a moron.
  - lelanthran 4 hours ago
    That's called encryption using a one time pad.
    [-]
    - amelius 4 hours ago
      That's one way of looking at it. Note that you can easily extend it to multiple people holding multiple keys, so that a=b^c^d^e, etc.
      [-]
      - lelanthran 4 hours ago
        My point was that encrypting a copyrighted work does not remove the copyright from it; the result is still copyright and not legally redistributable.
        [-]
        jagged-chisel 1 hour ago
        I think the topic is “evidence of infringement is no longer readily available” not “suddenly, there is no infringement!”
        [-]
        lelanthran 31 minutes ago
        I didn't read it that way - the OP said "liability", after all, not "detection".
        But, meh, could just be he meant "won't get caught" and not "liability"; I make mistakes in comms all the time, after all.
        amelius 3 hours ago
        Maybe, but b and c are indistinguishable from pure noise.
        [-]
        lelanthran 3 hours ago
        > Maybe, but b and c are indistinguishable from pure noise.
        That's the whole point of encryption.
        morning-coffee 3 hours ago
        Right. But the copyright was violated when you used 'a' to begin with.
  - meindnoch 3 hours ago
    Pretty sure the law doesn't care about this XOR trick.
- direwolf20 56 minutes ago
  Important note: Only if you already know which one was corrupted.
- nsteel 5 hours ago
  Also to cheaply (area) create multi-port RAMs.
  [-]
  - pjc50 5 hours ago
    How does that work?
    [-]
    - nsteel 3 hours ago
      It's similar to RAID schemes but instead of drive failure it's port unavailability. There's a reference at [1] or an FPGA-centric one at [2], but it applies to anywhere where dual/single-port rams are readily available but anything more exotic isn't.
      [1] Achieving Multi-Port Memory Performance on Single-Port Memory with Coding Techniques - https://arxiv.org/abs/2001.09599 [2] https://people.csail.mit.edu/ml/pubs/fpga12_xor.pdf
- Terr_ 5 hours ago
  See also: RAID levels that use one disk for parity. Three disks is simplest, but technically you can do more if you trust that only one will go bad at a time.
  A few months ago, I had a rare occasion of trying to explain them to a relative who had just bought a fancy NAS and wanted help setting it up.
gobdovan 7 hours ago
The XOR trick is only cool in its undefined-behavior form:
a^=b^=a^=b;
Which allegedly saves you 0.5 seconds of typing in competitive programming competitions from 20 years ago and is known to work reliably (on MinGW under Windows XP).
Bonus authenticity: use `a^=a` to zero a register in a single x86 instruction (and makes a real difference for compiler toolchains 30+ years old).
For real now, a very useful application of XOR is its relation to the Nim game [0], which comes in very handy if you need to save your village from an ancient disgruntled Chinese emperor.
[0] https://en.wikipedia.org/wiki/Nim
[-]
- MathMonkeyMan 6 hours ago
  `xor eax, eax` is less code when assembled. That's why compilers generate it.
  [-]
  - gobdovan 6 hours ago
    ..and reliably generate it for decades, but you can still manually do it in C for bonus stylistic points!
- leni536 5 hours ago
  > The XOR trick is only cool in its undefined-behavior form:
  > a^=b^=a^=b;
  I believe this is defined in C++ since C++17, but still undefined in C.
- ginko 6 hours ago
  >Bonus authenticity: use `a^=a` to zero a register in a single x86 instruction (and makes a real difference for compiler toolchains 30+ years old).
  Modern compilers will still use xor for zeroing registers on their own.
  For instance: https://godbolt.org/z/n5n35xjqx
  Variable a(register esi) is first initialized to 42 with mov and then cleared to zero using xor.
  [-]
  - gobdovan 6 hours ago
    Yes, the whole comment is tongue in cheek for pointless manual optimizations.
JKCalhoun 2 hours ago
My assumption has been that the XOR trick was closer to a thought experiment or hypothetical. That is to say, no one would ever use it in practice, but by asking an interviewee to walk through it shows they have a low-level understanding of how binary numbers work, bit-masking.
For me it falls in the obfuscated-C quadrant for code. Performance implications aside, it's just not the kind of "self-documenting" code I like lying around in my sources. (And I'll take clarity of purpose over performance every day.)
mmozeiko 8 hours ago
xor swap trick was useful in older simd (sse1/sse2) when based on some condition you want to swap values or not:
```
  tmp = (a ^ b) & mask
  a ^= tmp
  b ^= tmp
```
If mask = 0xfff...fff then a/b will be swapped, otherwise if mask = 0 then they'll remain the same.
[-]
- CJefferson 7 hours ago
  Oh, that is cool, I’ve never seen that. I might add that to an extended version of the post sometime, I’ll be sure to credit you.
- jagged-chisel 1 hour ago
  So mask is marking the bits you want swapped and leaving the others in place.
- koolala 4 hours ago
  Could this still be the ideal way for vectors of Ints in WebGL2?
- jesse__ 4 hours ago
  That's hella cute
pcherna 50 minutes ago
On the Amiga, accessing the pulldown menus caused the display to be temporarily frozen, so that the menu panel could be rendered off screen and then swapped in using the blitter and the XOR trick. It wasn’t as fast as a straight copy, but it saved on precious memory, and it rendered very cleanly.
When you released to the menu button on the mouse, it did a similar swap to restore the screen contents. In version 2.0 I optimized it to do a simple copy of the offscreen rectangle back onto the screen because before it was wasting time in order to preserve the menu pixels that we’re going to be thrown away immediately.
fuglede_ 6 hours ago
The XOR swap trick also features in the compilation/synthesis of quantum algorithms, where the XOR instruction (in the form of a CNOT gate) is fundamental in many architectures, and where native swapping need not be available.
One extension that I ran into, and which I think forms a nice problem is the following:
Just like the XOR swap trick can be used to swap to variables (and let's just say that they're bools), it can be extended to implement any permutation of the variables: suppose that the permutation is written as a composition of n transpositions (i.e., swaps of pairs), and that is the minimal number of transpositions that let's you do that. Each transposition can be implemented by 3 XORs, by the XOR swap trick for pairs, and so the full permutation can be implemented by 3n XORs. Now here's the question: Is it possible to come up with a way of doing it with less than 3n, or can we find a permutation that has a shortcut through XOR-land (not allowing any other kinds of instructions)? In other words, is XOR-swapping XOR-optimal?
I'm not going to spoil it, but only last year a paper was published in the quantum information literature that contains an answer [0]. I ended up making a little game where you get to play around with XOR-optimizing not only permutations, but general linear reversible circuits. [1]
[0] https://link.springer.com/article/10.1007/s11128-025-04831-5
[1] https://swapple.fuglede.dk
HarHarVeryFunny 2 hours ago
> Are there other XOR tricks?
I'm not sure I'd call it a "trick", but since A ^ 0 = A, and B ^ B = 0, then ((A ^ B) ^ B) = A. i.e. XOR-ing any number by the same number twice gets you back the original number.
This used to be used back in the day for cheap and nasty computer graphics, since it means that if you draw to the screen by XOR-ing with the pixels already on the screen then you can undo it, restoring the background, by doing it a second time. The "nasty" part is that XORing with what's already on the screen isn't going to look great, but for something like a rotating wire-frame figure it might be OK.
praptak 7 hours ago
Xor swap trick has perfect profile for underhanded C contests. It generally works until a specific condition triggers its failure. The condition is "the arguments are aliases", so for example XOR_SWAP(a[i], a[j]) when i=j.
[-]
- imploded_sub 4 hours ago
  TFA mentions this as an issue for the trick, whereas you rightly point out it's an evil feature instead.
Panzerschrek 6 hours ago
Such tricks were maybe useful 40 years ago while writing assembly code manually or while using a dumb compiler with no optimizations. But nowadays such tricks are near to useless. All useful ones (like optimizing division by 2 via bitshift ) are already implemented as compiler optimizations. Others shouldn't be used in order to avoid making optimizer's job harder.
[-]
- Tade0 6 hours ago
  Back when I was in college in the late 00s we were advised to not attempt to optimise using assembly unless we really found a bottleneck the compiler missed.
  [-]
  - pjc50 6 hours ago
    Correct. There are still some situations that benefit from it, but only using the extended instruction sets that compilers can't/won't generate. Even then you should at least try writing the C code and seeing if it will auto-vectorize. Even C# can auto-vectorize some cases now.
    Things like "add with saturation" and the special AES instructions.
    [-]
    - jesse__ 4 hours ago
      In my experience, relying on the compiler to auto vectorize your code is a path fraught with peril.
      It'll break eventually. If it matters, write the simd yourself. It'll probably be 2-50x better than the compiler anyways.
fch42 2 hours ago
xor'ing the elements of lists together also allows a test of "unordered equality" because 1^2^3^4^5 == (xor of any permutation of [1,2,3,4,5]).
Yes there are false positives, and the false negative of all-zero/all-equal, but the test can be useful in a "bloom filter" type case.
Have used it in dynamic firewalling rules ... one can do something pretty close to a JA3/JA4 TLS fingerprint in eBPF with that (to match the cipher lists).
torginus 4 hours ago
I would remark that modern CPUs don't use physical registers, so swapping should be just a register rename op, and this kind of bithacking only applies to old machines.
The article makes the same point as well at the end:
It is the kind of technique which might have been occasionally useful in the 1980s, but now is only useful for cute interview questions and as a curiosity.
robinsonb5 3 hours ago
This brings back memories of Chunky-to-Planar conversion on the Amiga, where the Xor trick was combined with bitshifts and masking to rearrange the bits in 8 32-bit words in reasonable time.
YasuoTanaka 4 hours ago
Used to do this in assembly back when registers actually mattered. Today it mostly hurts readability more than anything.
DeathArrow 6 hours ago
30 years ago, when I wanted to 0 a register in assembly I used something like xor ah, ah because it was a bit more performant.
[-]
- icelusxl 6 hours ago
  It still is. The CPU's register renamer can detect these instructions to not have data dependencies and can zero the register itself. It doesn't send the instruction to the execution engine meaning they use no execution resources and have zero latency.
- pjc50 6 hours ago
  IIRC that's because that instruction is one byte while a "load immediate" would have to express 0 as one or more bytes.
  (See also the wacky way in which ARM "load immediate" works)
pjmlp 3 hours ago
Another trick when SIMD is not available, is pseudo SIMD via SWAR, one example,
https://lemire.me/blog/2022/01/21/swar-explained-parsing-eig...
adonovan 2 hours ago
Another situation to avoid the XOR trick, even when registers are tight, is when swapping pointers in a garbage-collected language, since the intermediate bit patterns are invalid pointers: if a GC mark phase occurs at that moment, you might lose some objects, or spuriously mark others as live.
stinkbeetle 7 hours ago
You can use this same property of xor to make a double-linked list using just one pointer per item, which is xor of the previous and next item addresses!
[-]
- Panzerschrek 6 hours ago
  Except such "optimization" may affect performance on CPUs with speculative execution/prediction. But it should be measured.
ranger_danger 8 hours ago
> given a list where every value appears exactly twice except one, XOR all the values together and the duplicates cancel out, leaving the unique element
For some reason this reminds me of the Fourier transform. I wonder if it can be performed with XOR tricks and no complicated arithmetic?
[-]
- fluoridation 8 hours ago
  There's a more general formulation, which is that every value but one must appear even numbers of times, and the one must appear some odd number of times.
  [-]
  - twiceaday 7 hours ago
    How about every value appears 0 mod n times except one which appears 1 mod n times? :)
    Solution: xor is just addition mod 2. Write the numbers in base n and do digit-wise addition mod n (ie without carry). Very intuitive way to see the xor trick.