Building a computer with a screwdriver

Computation from another angle

Building a Mechanical Computer

Computers are ubiquitous, but few people understand what it's all about. Kids use them with surprising ease, but cannot answer basic questions about the way they behave: they seem to repeat known patterns without understanding their meaning. Others simply don't use them because "it's too complicated".

That's a sad situation: computers take quite an important place in our lives, and understanding what they can and cannot do —at least roughly— is a prerequisite for taking part in the debate on current societal questions, such as the regulation of the Internet.

The main ideas that led to computers are exposed on this web site. There are only a handful of concepts to grasp: nothing that complicated really. You will learn the precise definition of the word "information", and see that a computer is just an apparatus that handles it. Then you will then build a computer using simple components such as gears, pulleys, and motors.

Let's go.

Chapter 1. The Digital World

One has to understand what "information" means before he can understand what a computer is.

Everybody has an informal understanding of what the word "information" conveys, but we are going to see that it can be defined rigorously. In order to do that, we have to first understand two other related words: digital and analog.

Digital and analog: an example

Computers copy the data they manipulate, all the time and incredibly fast. In order to display the text you are looking at, your computer has duplicated —in one form or another— each of its words numerous times. Let's look at what happens when we try to duplicate data without the help of a computer.

Imagine that you have a drawing that you want to copy in order to give it to a neighbour. You neighbour will in turn copy the copy it in order to give it to his neighbour, and so on a third time.

Original
Original
Altered copy
Copy
Very altered copy
Copy's copy
Very altered copy
Copy of the copy's copy

Every body did a pretty good job: each copy is pretty similar to the previous one. Yet the difference between the original and the last copy is striking: a computer cannot rely on this process if it has to copy data thousands of times.

Here is a simple modification that will completely change the process: draw a grid on top of the picture, and draw the copies using this grid. The picture below shows that:

  • since the new drawing has to be superposed to the grid, the first copy is awful (but could be better if we had chosen another grid).
  • however, the other copies do not cause any degradation: one can use the grid to correct small imprecisions.
This process is called digitalization; the ungridded image is said to be analog.

Original
Original
Grid copy
Copy
Grid copy
Copy's copy
Grid copy
Copy of the copy's copy

Why is this process known as digitalization? Because we can now quite easily represent the drawing as a set of digits, i.e. a number. It is quite easy. First, number each horizontal and vertical line: each intersection of the grid can now be determined using two numbers. Then, since a line connects those intersections, it can be represented as the list of the pair of numbers corresponding to those intersections. In the example below, the zigzagging line is represented by the pairs (1,1), (1,2), (2,2), (2,3), (4,3) and (4,4). Putting those digits next to each other yields the number 111222234344. This number is just another way to represent the zigzagging line: it conveys the same information. Notice that this number can be copied by hand a hundred times by a hundred different persons: at the end, you'll still get the number 111222234344 This is possible because there is only a finite set of digits, thus anybody can cope with the differences between two handwritings. Note that there is a (small) chance that one person as a handwriting so bad that the next person introduces a mistake, but it is pretty unlikely with to happen with a computer: computers write those numbers in such a way that the probability of mis-reading a digit is, in practice, very small. .

Original

The Importance of Digitalization

What we just described, digitalization, is the cornerstone of computer science. Once you have described the process you want to study as a number, you can handle it using a computer. A computer manipulate those numbers in four ways:

  • it takes a physical process (e.g. sound, pictures, keystrokes) and turn it to a number (by connecting a microphone, a camera, or a keyboard)
  • takes a number turn it back to the "physical world" (e.g. images can be generated using printers and screens, sounds are produced by speakers)
  • store numbers. Hard drives store huge quantities of those numbers for a long period of time, but are relatively slow. Random Access Memory (RAM) store smaller quantities of data, but are extremely fast.
  • operate on those numbers: a computer can add, substract, multiply, and divide numbers obviously, but can also perform more complex operations.

A Real Example

We just gave a toy example that gives the gist of digitalization. Now let's examine a real example that you might use frequently: the digitalization of music the way it is done in CDs.

Music is sound, and sound is a wave of vibration caused by varying air pressure. The variation of air pressure can be detected by your ear or by a microphone. A microphone is thus a simple device that measures air pressure It fact, the mic converts pressure to voltage, a property associated with electricity. That detail is not important for the matter we want to discuss. ; it allows one to draw a graph that describes the evolution of pressure as time passes, such as the one sketched below.

Sound signal

This curve is another representation of a given sound and, as we have seen in the previous example, copying this curve leads to problems that we can avoid if we digitalize it (i.e. turn it into a number). We know the drill: we have to fit the curve to a grid.

Gridded sound signal

We now have to convert this curve to a set of digits and, in turn, to a number. We could do it in exactly the same way we did with the drawing above, but that would not be very efficient: there is a better way.

At each time step (i.e. for each vertical line of the grid) corresponds a measure of the pressure at that time. All we need in order to capture all the data contained in the curve is the sequence of those measurements. There is no need to digitilize the time step itself: we know that the first number correspond to the first time step, the second measure corresponds to the second time step, and so on. If we had used the same way to encode the curve as the one we used for the drawing, we would have encoded a number for each time step, which is not necessary: we will see how important it is to keep those lists of numbers as small as possible.

Near the origin (0,0) of the graph, the pressure is equal to 1: this is our first number. Between the second vertical line and the second vertical line, the pressure is equal to 5: it is our second number. The next numbers are: 10, 13, 16, 18, 17, 17, 16, 14, 13, etc... We can now concatenate those numbers in order to get a huge number that is a digital representation of the original sound wave: 0105101316181717161413... Note that we encoded the digits "1" and "5" as "01" and "05" respectively. This is because the writing "1" next to "5" leads to the number "15" which could be read as the number "15" (instead of two digits written next to each other).

This process might seem ineffective: the gridded curve looks pretty different than the original one. Indeed, digitilization introduces an error (sensibly called digitalization error), but in most cases that's a price worth paying. Furthermore, this error can be reduced as much as you want: just use a more precise grid.

For each second of recording, the music encoded on a CD uses a grid that has 44100 vertical lines and 65536 horizontal lines We will see shortly why such an odd number has been chosen. : that's pretty big, and that's only for one second! It means that in order to produce music with CD-quality, you have to sample air pressure 44100 times per second (i.e. at 44.1kHz). In order to get a sense of the kind of numbers we are talking about, let's calculate the number of digits of the number representing the 80 minutes of music held on a CD.

We have seen that we must encode air pressure for each time step (i.e. for each vertical lines of our grid). Since the grid has 65536 horizontal lines, pressure can be encoded using a number that has 5 digits (the first line line will be encoded as 00000 and the last one as 65535, and 5 digits are sufficient to represent those numbers). Now now many vertical lines do we have? If a grid corresponding to 1 second has 44100 lines, it has n times 44100 lines for a sound n times as long. A CD holds at most 80 minutes (or 4800 seconds) of music, so the grid we are talking about has 44100 times 4800 vertical lines. Furthermore, CD deliver stereo recordings, meaning that the left and right speakers emits different sounds: we thus have to double that number. At the end, the number necessary to encode the music of one CD has a number of digits equal to:

5 (digits per air pressure sample)
x 44100 (samples per second)
x 4800 (length of a CD in seconds)
x 2 (because of stereo)
= 2.116.800.000

That's a number with... 2 billion digits! If you wrote it down, it would span across the Atlantic! ...And yet this number can be written down on a cheap, 12cm plastic disk. Needless to say that digits are written on the disk using a technology that allows to pack them extremely densely The digits are written along a 4.5 km (3 miles) long spiral. Along that spiral, each digit takes about 3 microns (i.e. thousandth of a millimeter). For reference, a human hair has a diameter of about 50 microns. .

So What is Information?

The concept of "information" has been formalized in a body of knowledge known as Information Theory. It basically says that the information contained in some data is the number of digits required to write the number obtained once you have digitalized that data. The unit of information is the dit (for decimal digit, which are the digits we use everyday), just like the unit for measuring length is the meter. This is a very simplified overview of what information theory really tells us. Explaining it more precisely would require to study a field of mathematics known as probability, which is only indirectly related to our final aim, the explanation of computers and computation. .

We have just measured the information stored on a CD to be about 2 billion dits. Other (very) often-used units of information are the bit (1 bit equals about 0.3 dit), and the byte (1 byte equals 8 bits). In fact, for practical for reasons we will soon expore, almost everybody measures information in bits and bytes, not dits. . A CD thus contains 2 billion divided by 0.3 bits (6 billion bits), which is roughly 900 million bytes, denoted 900MB (mega bytes, or million of bytes).

In fact, a CD contains 800MB, not 900MB). The difference is caused by a simplification made in the way numbers are encoded: we had to use binary numbers instead of the decimal numbers we use everyday Apart from this non-critical detail, CDs are really encoded the way it's been explained. . This, among other things, is the topic of the next chapter.

Notice that this definition of information corresponds to our intuitive feeling. For instance:
  • Two CDs should allow to store twice as much information as one. That's indeed the case: since a CD is just a way to store a number containing 2 billion digits, then 2 CDs allow one to store 4 billion digits, i.e. 4 billion dits.
  • The better the quality of an image, the higher the amount of information needed to represent it. Indeed: remember how we crudely encoded an image using a grid? There's a simple way to improve the quality of the digitalization process: we can simply use a more precise grid. Notice what would happen to the number representing the drawing: we now have to encode more intersection coordinates, hence the number of digits has to increase, i.e. more information is needed You might hear that your digital camera has, say, 10 Mega pixels. This means that the grid used to digitalize the pictures you take has 10 million cells (i.e. about 2500 lines and 3500 columns). As you have understood now, that means that the higher the number of mega pixels, the better the image quality (but the faster the memory card that holds your pictures will be full, since each picture requires more information). .

We have seen that numbers and digits have an extremely important role in the way information is processed by computers. Unfortunately, the way we have represented numbers so far is not way digital devices do; hopefully the next chapter will set things straight.

Footnotes

About

Table of Contents

1. The Digital World
2. The Binary System