Science —

Programming for all, part 1: An introduction to writing for computers

Telling a computer what to do is actually quite simple when broken down.

Programming for all, part 1: An introduction to writing for computers
Aurich Lawson

Computers are ubiquitous in modern life. They offer us portals to information and entertainment, and they handle the complex tasks needed to keep many facets of modern society running smoothly. Chances are, there is not a single person in Ars' readership whose day-to-day existence doesn't rely on computers in one manner or another. Despite this, very few people know how computers actually do the things that they do. How does one go from what is really nothing more than a collection—a very large collection, mind you—of switches to the things we see powering the modern world?

We've arranged a civilization in which most crucial elements profoundly depend on science and technology. We have also arranged things so that almost no one understands science and technology. This is a prescription for disaster. We might get away with it for a while, but sooner or later this combustible mixture of ignorance and power is going to blow up in our faces.

—Carl Sagan

At their base, even though they run much of the world, computers are one thing: stupid. A computer knows nothing. Its brain is little more than a large collection of on/off switches. The fact that you can play video games, browse the Internet, and pump gas at a gas station is thanks to the programs the computers have been given by a human. In this article, we'll take a look at some of the basic concepts of computer programming: how a person teaches a computer something and how the ideas encapsulated in the program go from something we can understand to something a computer understands.

First, it needs to be said that programming is not some black art, something arcane that only the learned few may ever attempt. It is a method of communication whereby a person tells a computer what, exactly, they want it to do. Computers are picky and stupid, but they will indeed do exactly as they are told. Therefore, each program you write should be like an elegant recipe that anyone—including a computer—can follow. Ideally, each step in a program should be clearly described and, if it is complicated, broken down into smaller steps to remove all doubt about what is to happen.

Programming is about problem solving and thinking in a methodical manner. Like many other disciplines, it requires someone to be able to look at a complex problem and start whittling away at it, solving easier pieces first until the whole thing is tackled. It is this whittling away, this identification of smaller challenges and developing solutions to them, that requires the real talent—and creativity. If you go out to solve a problem, or create a program no one has ever done before, all the book knowledge in the world won't give you the answer. A creative mind might.

Think of programming like cooking: you learn the basic rules and then you can let your creativity run wild. Few will go on to become the rock stars of the kitchen, but that's OK. The barrier to entry is not high, and once you are in, you are then limited only by your desire and creativity.

Thinking in base 10

Like many things in the modern world, to understand computer programming and how computers function in general, you have to start with numbers—integers, to be precise. The nineteenth century mathematician Leopold Kronceker is credited with the phrase "God made the integers; all else is the work of man." If one is intending to understand computer programming, they must start their journey by looking at simple integers.

In the modern world, we count using what is known as base 10; that is, there are ten distinct digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), if you want to go higher than those, we prepend a digit representing the number of tens we have. For instance, the number 13 really tells us we have one of the tens, and three of the ones. This structure can go on and on. If we write 586, we are really saying that we have five of the one hundreds, eight of the tens, and six of the ones.

Before looking at how we can translate that to the on and off switches of computers, let's take a step back and look at our description of a number in a bit more detail. Let's use 53,897 as our example. Following what we did above, we are really saying that this is five of the ten thousands, three of the one thousands, eight of the hundreds, nine of the tens, and seven of the ones. If we write this out in a more mathematical notation we would arrive at 5*10,000 + 3*1,000 + 8*100 + 9*10 + 7*1. This would evaluate to the number 53,897.

Looking at this summation a little more closely, one might realize the number on the right-hand side of the multiplication (the multiplicand) are all powers of 10; 10,000 is 104, 1,000 is 103, 100 is 102, 10 is 101, and 1 is 100. Re-writing our expression would give 5*104+3*103+8*102+9*101+7*100. In the more general sense, any base-10 number AB,CDE (where, A, B, C, D, and E are some arbitrary digit from 0 through 9) is really saying that we have the following summation: A*104+B*103+C*102+D*101+E*100.

All your base... oh, nevermind

Taking the generalization one step further, we can note that the multiplicand is always 10 to some power when we are working in base 10. That is not a coincidence. If we want to work in any other base number system, we would write any number AB,CDE (where again, A, B, C, D, and E are valid digits in the base we are working with) as the follow summation: A*base4+B*base3+C*base2+D*base1+E*base0.

If we one day encounter an alien civilization that uses base five for everything and they threaten us by stating that they have 342 battle cruisers en route to Earth, we'd know that we would count that as 3*52+4*51+2*50, or 97 ships headed to destroy us. (We'll just send out the winner of the USS Defiant vs. the Millennium Falcon to take them on, no problem).

Now, there is no reason that we must count in base 10, it is simply convenient for us. Given our ten fingers and ten toes it is only natural that we use a base-10 number system. Past societies have used others, Babylonians, for instance, used a base-60 number system and they had the same symbol represent 1, 60, and 3,600 (a similar problem existed for 61, 3,601, and 3,660).

What does this all have to do with computers? Well, as I mentioned above, computers are nothing more than a big collection of switches. A monstrous set of on/off items—some with only two possible states. If we find that using ten possible states to be logical due to our biology, then it would reason that using base-two numbers would make the most sense for a computer that can only know on or off.

That means the binary number system. The binary number system is base two, and the only numbers available for counting are zero (0) and one (1). Even though you only have 0s and 1s, any number can be represented, as we just showed above. This gives us a way to tell a computer—a big collection of switches—all about numbers. Counting to 10 (base 10) in binary would go as follows: 0 (0), 1 (1), 10 (2), 11 (3), 100 (4), 101 (5), 110 (6), 111 (7), 1000 (8), 1001 (9), 1010 (10).

More generally any binary number can be computed similarly to how we wrote the general sum for base-10 numbers. The number ABCDE in binary would represent the number A*24+B*23+C*22+D*21+E*20. As an example, the number 10011010 in binary would be 1*27+0*26+0*25+1*24+1*23+0*22+1*21+0*20, or 154 in base-10 terminology. (As a nerdy aside, such counting is the origin of the joke "There are only 10 kinds of people in this world. Those who understand binary and those who do not.")

Before moving on, a quick aside about terminology. In computer parlance, a single 0 or 1 (a single on/off value) is termed a "bit." When you group eight bits together, you get a byte (four bits is called a nybble, but you won't see that around much anymore). Therefore, if you have a 1GB drive, that means you have 1024*1024*1024 bytes of information, or 8*1024*1024*1024 individual on/off storage spots to which information can be written. When someone in the computer world says something is 16-bit or 32-bit or 64-bit, they are talking about how many bits are available, by default, for representing an integer (or memory address location) on that computer.

Early days

With a method to represent numbers, it became possible to use these collections of switches to do things with those numbers. Even with all the innovations in CPU design and technology that the past 50+ years have brought, at the heart of things, there are only a handful of things that a CPU can do with those numbers: simple arithmetic and logical operations. Just as importantly, it can also move bits/bytes around in memory.

Each of those simple operations could be given a numeric code, and a sequence of these operations could then be issued to a computer for it to execute—this sequence is termed a program. In the beginning, developing a program quite literally involved a programmer sitting in front of a large board of switches, setting some to up (on/1) and some to down (off/0) one at a time, thereby entering the computation sequence into the computer directly. Once all the operations in the sequence have been entered, the program could then be executed.

After entering programs by literally flipping switches became tedious (this didn't take too long), hardware manufacturers and software developers came up with the concept of assembly languages. While the hardware really only knows combinations of ones and zeros, the machine language, these are difficult to work with. Assembly languages gave a simple name to each possible operation that a given piece of hardware could carry out.

<em>The Fortran Automatic Coding System for the IBM 704</em>, the first Programmer's Reference Manual for Fortran.
The Fortran Automatic Coding System for the IBM 704, the first Programmer's Reference Manual for Fortran.
Assembly languages are considered low-level programming languages, in that you are working directly with operations that the hardware can do rather than anything more complex or abstract built on top of these commands. For instance, a simple CPU could understand assembly for adding (ADD) two numbers, subtracting (SUB) two numbers, and moving memory between locations (MOV). The development of assembly allowed programmers to create software that was somewhat removed from the absolute binary nature of the machine and quite a bit easier to understand by reading.

Even though this got programmers one step removed from the numeric codes that actually run the computer, the fact that each different piece of hardware from each manufacturer could have its own, non-overlapping assembly language meant is was still difficult to create complex software that carried out large-scale tasks in a general manner.

It wasn't until 1956 that John Backus came down from the mountain with a host of punch cards inscribed with "The FORTRAN Automatic Coding System for the IBM 704" (PDF). FORTRAN (FORmula TRANslator) was the first ever high-level language, a language in which scientists and engineers could interact with a computer without needing to know the down and dirty specifics of the commands the machine actually supported.

Channel Ars Technica