That is due to the brand new cpu features that exist in 64bit processors that make it really good for mathematical operations, but that 32bit software can't use.
Here's a couple of shamefully copy-pasted features from the article on 64bit from Wikipedia. http://en.wikipedia.org/wiki/X86-64
The ability to work with 8 bytes (64bit) at once instead of 4 (32bit) makes it possible to move around twice as much data between CPU and RAM. (And only CPU and RAM. No GPUs here)64-bit integer capability: All general-purpose registers (GPRs) are expanded from 32 bits to 64 bits, and all arithmetic and logical operations, memory-to-register and register-to-memory operations, etc. can now operate directly on 64-bit integers. Pushes and pops on the stack are always in 8-byte strides, and pointers are 8 bytes wide.
The result is that if you want to move two 64bits long numbers in RAM to the CPU, in a 64bit OS you can do it in two clock cycles (64bits at a time) while in 32bit OS you run 4 clock cycles (32bits at a time).
Registers are memory spaces inside the cpu where you store numbers to be worked on. Having more registers means you can store more numbers there to crunch. If you have 2 registers and need to do a sum of three parcels, at some point you have to waste time moving around partial results to RAM because they don't all fit in the registers.
If you had 4 registers for the same sum, you would do it all at once.