Building a Mechanical Computer
Computers are ubiquitous, but few people understand what it's all about. Kids use them with surprising ease, but
cannot answer basic questions about the way they behave: they seem to repeat known patterns without understanding
their meaning. Others simply don't use them because "it's too complicated".
That's a sad situation: computers take quite an important place in our lives, and
understanding what they can and cannot do —at least roughly— is a prerequisite for taking part in
the debate on current societal questions, such as the regulation of the Internet.
The main ideas that led to computers are exposed on this web site. There are only a handful of concepts to grasp:
nothing that complicated really. You will learn the precise definition of the word "information", and
see that a computer is just an apparatus that handles it. Then you will then build a computer using simple components
such as gears, pulleys, and motors.
Let's go.
Chapter 1. The Digital World
One has to understand what "information" means before he can understand what a computer is.
Everybody has an informal understanding of what the word "information" conveys, but we are going to see that it
can be defined rigorously. In order to do that, we have to first understand two other related words: digital
and analog.
Digital and analog: an example
Computers copy the data they manipulate, all the time and incredibly fast. In order to display the text you are looking
at, your computer has duplicated —in one form or another— each of its words numerous times.
Let's look at what happens when we try to duplicate data without the help of a computer.
Imagine that you have a drawing that you want to copy in order to give it to a neighbour. You neighbour will in
turn copy the copy it in order to give it to his neighbour, and so on a third time.

Original

Copy

Copy's copy

Copy of the copy's copy
Every body did a pretty good job: each copy is pretty similar to the previous one. Yet the difference between the original
and the last copy is striking: a computer cannot rely on this process if it has to copy data thousands of times.
Here is a simple modification that will completely change the process: draw a grid on top of the picture, and draw the copies
using this grid. The picture below shows that:
- since the new drawing has to be superposed to the grid, the first copy is awful (but could be better if we had chosen
another grid).
- however, the other copies do not cause any degradation: one can use the grid to correct small imprecisions.
This process is called
digitalization; the ungridded image is said to be
analog.

Original

Copy

Copy's copy

Copy of the copy's copy
Why is this process known as digitalization? Because we can now quite easily represent the drawing as a
set of digits, i.e. a number. It is quite easy. First, number each horizontal and vertical line:
each intersection of the grid can now be determined using two numbers. Then, since a line connects those
intersections, it can be represented as the list of the pair of numbers corresponding to those
intersections. In the example below, the zigzagging line is represented by the pairs (1,1), (1,2), (2,2),
(2,3), (4,3) and (4,4).
Putting those digits next to each other yields the number 111222234344. This number is just another way to represent the
zigzagging line: it conveys the same information. Notice that this number can be copied by hand a hundred times
by a hundred different persons: at the end, you'll still get the number 111222234344
.
The Importance of Digitalization
What we just described, digitalization, is the cornerstone of computer science. Once you have described
the process you want to study as a number, you can handle it using a computer. A computer manipulate those numbers in four
ways:
- it takes a physical process (e.g. sound, pictures, keystrokes) and turn it to a number (by connecting
a microphone, a camera, or a keyboard)
- takes a number turn it back to the "physical world" (e.g. images can be generated using printers and screens,
sounds are produced by speakers)
- store numbers. Hard drives store huge quantities of those numbers for a long period of time, but are relatively
slow. Random Access Memory (RAM) store smaller quantities of data, but are extremely fast.
- operate on those numbers: a computer can add, substract, multiply, and divide numbers obviously, but can also
perform more complex operations.
A Real Example
We just gave a toy example that gives the gist of digitalization. Now let's examine a real example that you might
use frequently: the digitalization of music the way it is done in
CDs.
Music is
sound,
and sound is a wave of vibration caused by varying air pressure. The variation of air pressure can
be detected by your ear or by a microphone. A microphone is thus a simple device that measures air
pressure
; it allows one to draw a graph that describes the evolution of pressure as
time passes, such as the one sketched below.
This curve is another representation of a given sound and, as we have seen in the previous example, copying this curve
leads to problems that we can avoid if we digitalize it (i.e. turn it into a number). We know the drill: we have to fit
the curve to a grid.
We now have to convert this curve to a set of digits and, in turn, to a number. We could do it in exactly the same way we
did with the drawing above, but that would not be very efficient: there is a better way.
At each time step (i.e. for each vertical line of the grid) corresponds a measure of the pressure at that time.
All we need in order to capture all the data contained in the curve is the sequence of those measurements. There is
no need to digitilize the time step itself: we know that the first number correspond to the first time step, the second
measure corresponds to the second time step, and so on. If we had used the same way to encode the curve as the one
we used for the drawing, we would have encoded a number for each time step, which is not necessary: we will see how
important it is to keep those lists of numbers as small as possible.
Near the origin (0,0) of the graph, the pressure is equal to 1: this is our first number. Between the second vertical
line and the second vertical line, the pressure is equal to 5: it is our second number. The next numbers are:
10, 13, 16, 18, 17, 17, 16, 14, 13, etc... We can now concatenate those numbers in order to get a huge number that is
a digital representation of the original sound wave: 0105101316181717161413...
This process might seem ineffective: the gridded curve looks pretty different than the original one. Indeed,
digitilization introduces an error (sensibly called digitalization error), but in most cases that's a price worth
paying. Furthermore, this error can be reduced as much as you want: just use a more precise grid.
For each second of recording, the music encoded on a CD uses a grid that has 44100 vertical lines and 65536 horizontal
lines
:
that's pretty big, and that's only for one second! It means that in order to produce music with CD-quality, you have to
sample air pressure 44100 times per second (i.e. at 44.1kHz). In order to get a sense of the kind of numbers
we are talking about, let's calculate the number of digits of the number representing the 80 minutes of music held on a CD.
We have seen that we must encode air pressure for each time step (i.e. for each vertical lines of our grid). Since
the grid has 65536 horizontal lines, pressure can be encoded using a number that has 5 digits (the first line line will
be encoded as 00000 and the last one as 65535, and 5 digits are sufficient to represent those numbers). Now now many vertical
lines do we have? If a grid corresponding to 1 second has 44100 lines, it has n times 44100 lines for a sound
n times as long.
A CD holds at most 80 minutes (or 4800 seconds) of music, so the grid we are talking about has 44100 times 4800
vertical lines. Furthermore, CD deliver stereo recordings, meaning that the left and right
speakers emits different sounds: we thus have to double that number. At the end, the number necessary to encode the music
of one CD has a number of digits equal to:
|
5 |
(digits per air pressure sample) |
| x |
44100 |
(samples per second) |
| x |
4800 |
(length of a CD in seconds) |
| x |
2 |
(because of stereo) |
| = |
2.116.800.000 |
|
That's a number with... 2 billion digits! If you wrote it down, it would span across the Atlantic!
...And yet this number can be written down on a cheap, 12cm plastic disk. Needless to say that
digits are written on the disk using a technology that allows to pack them extremely densely
.
So What is Information?
The concept of "information" has been formalized in a body of knowledge known as
Information Theory.
It basically says that the information contained in some data is the number of digits required to write
the number obtained once you have digitalized that data. The unit of information is the dit
(for decimal digit, which are the digits we use everyday), just like the unit for measuring length
is the meter.
.
We have just measured the information stored on a CD to be about 2 billion dits. Other (very) often-used
units of information are the bit (1 bit equals about 0.3 dit), and the byte
(1 byte equals 8 bits).
. A CD thus contains 2 billion divided by 0.3 bits (6 billion bits),
which is roughly 900 million bytes, denoted 900MB (mega bytes, or million of bytes).
In fact, a CD contains 800MB, not 900MB). The difference is caused by a simplification made in the way numbers
are encoded: we had to use binary numbers instead of the decimal numbers we use everyday
. This,
among other things, is the topic of the next chapter.
Notice that this definition of information corresponds to our intuitive feeling. For
instance:
- Two CDs should allow to store twice as much information as one. That's
indeed the case: since a CD is just a way to store a number containing 2 billion
digits, then 2 CDs allow one to store 4 billion digits, i.e. 4 billion
dits.
- The better the quality of an image, the higher the amount of information
needed to represent it. Indeed: remember how we crudely encoded an image
using a grid? There's a simple way to improve the quality of the digitalization
process: we can simply use a more precise grid. Notice what would happen to the
number representing the drawing: we now have to encode more intersection
coordinates, hence the number of digits has to increase, i.e. more
information is needed
.
We have seen that numbers and digits have an extremely important role in the way
information is processed by computers. Unfortunately, the way we have represented
numbers so far is not way digital devices do; hopefully the
next chapter will set things straight.