1.3 Addition and Subtraction

Adding two binary numbers works in the same way as adding decimal numbers. Let us first consider the sum of two bits  and . In general, the sum of two bits is two bits long, because the addition can cause a carry. Bit 0 of the sum is given by the xor of both bits: . The carry  of that position is 1 if and only if , i.e. .

When adding entire bit strings not just single digits, the carry at a position needs to be propagated into the next higher position. So, if  denotes the carry bit position  generates, the sum bit at position  is given as

The carry bit of position  may also influence the carry bit out of position . For example, if , then . The carry bit position  generates is defined as

The following lemma proves that indeed is the sum of , and .

Lemma 1.3.1

Proof. By value table:

Figure 1.3.2: The bits in the dashed box represent the right-hand side of Lemma 1.3.1 and the digits in the solid box the ones on the left-hand side.

Now, consider two bit strings  and  of equal length . When adding  and  we assume that there flows no carry into the least-significant digit, i.e. .

Definition 1.3.3: Addition of bit strings

Remark 1.3.4: Efficiency and practicability of our adder

While Lemma 1.3.5 shows that the addition algorithm we defined is sound, it has a significant practical disadvantage.

Our algorithm is a so-called ripple-carry adder, which means that the carry “ripples” through the positions: In order to compute , you have to have computed  for which you need  and so forth. This makes the depth of the actual circuit of ands, ors, xors that implements the adder linear in terms of the number of positions.

Since the circuit depth significantly influences the frequency by which you can clock the computer and therefore its speed, practical implementations use different adders, so called carry-lookahead adders that have more gates but are flatter and can therefore be clocked faster.

The following lemma shows that this definition is sound, i.e. the bit operations by which we defined the addition really produce a bit string that represents the natural number of the sum of the natural numbers that are represented by the individual bit strings.

Lemma 1.3.5: Addition of bit strings is sound

Let  and  be two bit strings of length . Then, .

Proof. By induction over .

  1. Base case

    Follows directly from Lemma 1.3.1.

  2. Induction step

    The induction hypothesis is

    By using the definitions and lemmas we have established above, we get:

Note that the sum contains  bits. If the carry bit out of the last position is , Lemma 1.3.5 tell us that the lower  bits are accurate, i.e. . If the carry out of the last position is , the lower  bits are not accurate and the sum is greater or equal . In this case an overflow happened and we would need all  bits to accurately represent the result. Since computers internally operate on bit strings of fixed size, there is no space to store this additional bit directly in-place with the result. If we are only interested in the least significant bits when adding, we write shortly

to indicate that we ignore the carry out of the most significant place.

Notation:
lower  bits of binary addition
Notation:
lower  bits of binary subtraction

Let us now turn to subtraction. Instead of inventing a new algorithm for subtracting, we are going to use a trick to express subtraction by addition. This trick hinges on the fact that we are only interested in subtracting words of  bits and not bit strings of arbitrary length. Let us first understand the overall idea using an example in base 10 arithmetic:

Assume we want to subtract base 10 numbers of length 3, and more concretely suppose we want to compute .

What essentially happens here is that we, instead of subtracting , we added  and just considered the result modulo  which is , the desired result. This of course only works if we limit the digit sequences to a certain length before (here 3).

Notation:
equivalent modulo :

We use the same trick to subtract two bit strings of length  and therefore define subtraction to fulfill the following specification:

So, instead of subtracting  from  we are adding to . Let us first observe that the result is equivalent to modulo  analogously to the example above. This means that the lower  bits of and lower  bits of the true difference are equal which is exactly what we wanted. The number is called the two’s complement of .

Using Lemma 1.1.6 we see that we can easily compute the two’s complement of  by flipping all bits of  and adding . The latter can easily be achieved by setting the carry into the least significant position () to . Additionally, if (i.e. the result of the subtraction is defined), will be greater than . Hence, the carry out of the most significant bit is set if the result of the subtraction is valid and is clear when the subtraction overflowed.

Definition 1.3.6: Subtraction

Lemma 1.3.7: Subtraction is sound

(i.e. (1.4) holds)

Proof.

Remark 1.3.8: vs
Note that is an n+1 bit binary number that is equal to . It is not the n+1 bit binary number that this the result of the subtraction. What Lemma 1.3.7 tells you is that if the most significant bit of is set, is equal to .
Remark 1.3.9: Practical Considerations

Definition 1.3.6 already suggests how an adder has to be extended to support subtraction: The right operand of the ALU must be optionally negateable and the carry bit into the least position must be presettable to either  or .

Many processors provide the carry out of the last position in a special register, the so-called flag register. It is then often just called the carry bit because all the intermediate carries are unaccessible to the programmer. Some architectures negate the carry bit when subtracting, actually making it a borrow bit. For example, on an x86 machine the carry bit is clear when computing  whereas an ARM processor would set the carry in that case.

Pro comment: This is why on x86 the instruction that subtracts and reading  from the flags is called sbb (sub with borrow) while it is called sbc (sub with carry) on ARM processors. Such instructions are used when implementing arithmetic on bit strings that are longer than the word size of the processors.