1.3 Addition and Subtraction

Adding two binary numbers works in the same way as adding decimal numbers. Let us first consider the sum of two bits and . In general, the sum of two bits is two bits long, because the addition can cause a carry. Bit 0 of the sum is given by the xor of both bits: . The carry of that position is 1 if and only if , i.e. .

When adding entire bit strings not just single digits, the carry at a position needs to be propagated into the next higher position. So, if denotes the carry bit position generates, the sum bit at position is given as

The carry bit of position may also influence the carry bit out of position . For example, if , then . The carry bit position generates is defined as

The following lemma proves that indeed is the sum of , , and .

Lemma 1.3.1

Proof. By value table:

Figure 1.3.2: The bits in the dashed box represent the right-hand side of Lemma 1.3.1 and the digits in the solid box the ones on the left-hand side.

Now, consider two bit strings and of equal length . When adding and we assume that there flows no carry into the least-significant digit, i.e. .

Definition 1.3.3: Addition of bit strings

Remark 1.3.4: Efficiency and practicability of our adder

While Lemma 1.3.5 shows that the addition algorithm we defined is sound, it has a significant practical disadvantage.

Our algorithm is a so-called ripple-carry adder, which means that the carry “ripples” through the positions: In order to compute , you have to have computed for which you need and so forth. This makes the depth of the actual circuit of ands, ors, xors that implements the adder linear in terms of the number of positions.

Since the circuit depth significantly influences the frequency by which you can clock the computer and therefore its speed, practical implementations use different adders, so called carry-lookahead adders that have more gates but are flatter and can therefore be clocked faster.

The following lemma shows that this definition is sound, i.e. the bit operations by which we defined the addition really produce a bit string that represents the natural number of the sum of the natural numbers that are represented by the individual bit strings.

Lemma 1.3.5: Addition of bit strings is sound

Let and be two bit strings of length . Then, .

Proof. By induction over .

Base case

Follows directly from Lemma 1.3.1.
Induction step

The induction hypothesis is

By using the definitions and lemmas we have established above, we get:

Note that the sum contains bits. If the carry bit out of the last position is , Lemma 1.3.5 tell us that the lower bits are accurate, i.e. . If the carry out of the last position is , the lower bits are not accurate and the sum is greater or equal . In this case an overflow happened and we would need all bits to accurately represent the result. Since computers internally operate on bit strings of fixed size, there is no space to store this additional bit directly in-place with the result. If we are only interested in the least significant bits when adding, we write shortly

to indicate that we ignore the carry out of the most significant place.

Notation:

lower bits of binary addition

Notation:

lower bits of binary subtraction

Let us now turn to subtraction. Instead of inventing a new algorithm for subtracting, we are going to use a trick to express subtraction by addition. This trick hinges on the fact that we are only interested in subtracting words of bits and not bit strings of arbitrary length. Let us first understand the overall idea using an example in base 10 arithmetic:

Assume we want to subtract base 10 numbers of length 3, and more concretely suppose we want to compute .

What essentially happens here is that we, instead of subtracting , we added and just considered the result modulo which is , the desired result. This of course only works if we limit the digit sequences to a certain length before (here 3).

Notation:

equivalent modulo :

We use the same trick to subtract two bit strings of length and therefore define subtraction to fulfill the following specification:

So, instead of subtracting from we are adding to . Let us first observe that the result is equivalent to modulo analogously to the example above. This means that the lower bits of and lower bits of the true difference are equal which is exactly what we wanted. The number is called the two’s complement of .

Using Lemma 1.1.6 we see that we can easily compute the two’s complement of by flipping all bits of and adding . The latter can easily be achieved by setting the carry into the least significant position () to . Additionally, if (i.e. the result of the subtraction is defined), will be greater than . Hence, the carry out of the most significant bit is set if the result of the subtraction is valid and is clear when the subtraction overflowed.

Definition 1.3.6: Subtraction

Lemma 1.3.7: Subtraction is sound

(i.e. (1.4) holds)

Proof.

Remark 1.3.8: vs

Note that is an n+1 bit binary number that is equal to . It is not the n+1 bit binary number that this the result of the subtraction. What Lemma 1.3.7 tells you is that if the most significant bit of is set, is equal to .

Remark 1.3.9: Practical Considerations

Definition 1.3.6 already suggests how an adder has to be extended to support subtraction: The right operand of the ALU must be optionally negateable and the carry bit into the least position must be presettable to either or .

Many processors provide the carry out of the last position in a special register, the so-called flag register. It is then often just called the carry bit because all the intermediate carries are unaccessible to the programmer. Some architectures negate the carry bit when subtracting, actually making it a borrow bit. For example, on an x86 machine the carry bit is clear when computing whereas an ARM processor would set the carry in that case.

Pro comment: This is why on x86 the instruction that subtracts and reading from the flags is called sbb (sub with borrow) while it is called sbc (sub with carry) on ARM processors. Such instructions are used when implementing arithmetic on bit strings that are longer than the word size of the processors.