Note: You are viewing this blog post without the intended style information, which may result in formatting issues.

I've seen a lot of people expressing concern that Bitcoin mining ASICs are going to lead to some sort of password cracking apocalypse.

They won't.

An "Application Specific Integrated Circuit" (ASIC) is a chip that is designed to do one thing quickly and efficiently. Think of it as a single program that has been made into a hardware and cannot be changed once manufactured. In this case the "Application" is Bitcoin mining, at the heart of which is a hash algorithm called "double SHA256" (DSHA256). There are many password hashing schemes in use, but only two well known schemes can benefit from a fast DSHA256 engine - "salted SHA256" (SSHA256) and plain SHA256. Neither of these schemes is particularly common.

The general operation of a password cracker is pretty simple:

  1. Load the password hashes into memory.
  2. Generate a candidate password.
  3. Hash the candidate password.
  4. Check the hash against those in memory.
  5. Repeat steps 2-4 over and over as fast as possible, outputting results as they are found.

For a moment, let's assume that Bitcoin mining ASICs just takes in a block of data and returns its DSHA256 hash. For SHA256 and SSHA256 hashes, using the DSHA256 engine is fairly straightforward, use a CPU or GPU to compute the SHA256 of each hash (since DSHA256(password) = SHA256(SHA256(password))) then follow the plan above. To simplify things, we'll round the time the ASIC takes to hash data down to zero. Under this simplification, where is the process still slow?

Referring back to our model of a password cracker, we see some steps that must be performed for each hash which are not handled by the ASIC, specifically steps 2 and 4. The CPU is going to need to get involved here - if a quad core CPU clocked at 3.0GHz took one cycle (in reality it will be thousands) to generate each candidate password and another cycle to check the hash, we'd be operating at 6 billion passwords per second. Sounds great, but a high end GPU running oclHashcat can try about 1 billion passwords per second. Our infinitely fast ASIC is only giving us a factor of six speedup vs a GPU! Unfortunately a few steps were left out of the our password cracker model, and there are other bottlenecks to contend with even though our CPU and ASIC are awesome. Adding in the missing steps, we have this model:

  1. Load the password hashes into memory.
  2. Generate a candidate password.
  3. Transfer the candidate password to the ASIC.
  4. Hash the candidate password.
  5. Transfer the hash back from the ASIC.
  6. Check the hash against those in memory.
  7. Repeat steps 2-6 over and over as fast as possible until you have enough results.

It turns out that steps 3 and 5 are slow. Really slow. ASICs are usually connected to a computer via a USB interface that emulates a serial port for easy interfacing. These usually operate at speeds of around 115.2 Kbps, but what if we had the full 480 Mbps of USB 2.0? Assume each password is 8 bytes and each hash is 32 bytes. USB 2.0 is a half duplex protocol, so we have 40 bytes that need to be transferred for each password. 480 Mbps is 480 million bits per second or 60 million bytes per second, divided by 40 bytes gives us an upper bound of 1.5 million passwords per second. The limited speed at which we can get passwords to the ASIC means that GPUs will run circles around it.

This probably leaves you wondering how ASICs can do tens or hundreds of billions of hashes per second given the limited bandwidth. The answer is fairly simple - they do more than just "data in, hash out". A Bitcoin block header contains the following data:

Field Size Notes
version 4B Currently always 1
hashPrevBlock 32B DSHA256 of previous block
hashMerkleRoot 32B DSHA256 of all hashes related to this block
timestamp 4B Seconds since Midnight of January 1st, 1970
bits 4B Compact representation of current difficulty
nonce 4B Chosen to make the hash of the block headers less than the current target value

We send the ASIC the version, hashPrevBlock, hashMerkleRoot, bits, range of timestamps (they do not have to be exactly correct) along with a difficulty target (in a pool an easier "share target" is used to prove the miner is doing work). It then cycles though all possible combinations of nonce and timestamp in the range specified until it either meets the difficulty target or has run out of values to try and reports back. New values are then sent (the coinbase transaction can be tweaked to cause the hashMerkleRoot to change) and the cycle repeats. This scheme dramatically reduces the amount of data that needs to be exchanged, but makes the device totally useless for hashing anything other than block headers.

Even GPUs are limited in password cracking speed by bandwidth, and as alluded to earlier, the speed at which the CPU can generate candidate passwords. The solution is a similar - reduce the required bandwidth by generating the data to be hashed partially on the GPU. In oclHashcat there is an on-GPU rule-based mangling engine that can generate thousands of variations on entries in a wordlist, as well as on-GPU brute force guessing.