A truly random image would be TV screen static (which is now an anachronism, at least in the US, but I think most readers understand what I mean). That is, each pixel that makes up a random image is completely random- simply rolling a die for each one. This is very different from the so-called "random maps" that are used in games, and both of these are different from truly procedural textures, images, and maps.
In the case of many conventional random maps- and particularly, H&H's current map- the world is defined by features that are plopped, top-down, onto a canvas. This can work well, but has some artifacts. We are familiar with this form of random map; it can be generated quickly, but is completely impossible to expand on-the-fly*. It is simple to code, but looks just as bad as that implies.
Procedural imaging, particularly when used for maps, uses the opposite approach. Using bottom-up rules, a complex image with large-scale features is created.
In essence, it works like this: If you where to transmit an entire picture of a black circle on a white plain, you would have to send a bit of information for each and every pixel in the image. To create this image procedurally, all one would need to do is to simply send this: "For each pixel, find the square root of (this pixel's X co-ord^2 + this pixel's Y co-ord^2). If this is greater than A, color the pixel white, otherwise color it black." Of course, not in those words, but that's the basic thrust of the message one would send.
When that is received, all that needs to be done is run that for each pixel in the image. If you want, you can even enlarge the image by increasing A or move the center of the circle by adding or subtracting from the X and Y co-ords before squaring them.
Now here's a key point that sets Procedural generation apart from conventional random maps: If we where to only run the circle program for, say, the left side of the image, then we would get the left side of the circle, and running it for the right side produces a second image that fits up with the first precisely! In fact, it doesn't matter in what order the pixels are queried in or when; we can only generate every other pixel, then come back three years with a different OS and fill in the rest.
That expandability and seemlessness are the same with any procedural map, though they naturally use larger and more complex instruction sets. In fact, most maps are based on Perlin Noise**, a code that creates a pattern of undulating highs and lows, overlayed over itself several times to create patters rich in detail. This Perlin Noize functions by, for each pixel, figuring out what the four closest pixels with co-ordinates ending in "00" are (so pixel 53:22 is closest to 0:0, 100:0, 100:100, and 0:100) and giving each of those a random value between 1 and 0, based on the position of this "key pixel"***. Then, it averages out those four values based on how far our queried pixel is from these four corners and assigns that value to our pixel. Since it does this for each pixel individually, there is no need for these corner pixels to actually exist!
That is the basic Perlin noise function, and usually it is run over and over again on different scales (for instance, having the corner pixels 10, 100, and 1000 points distant from one another). the closer, smaller scales give sharp detail, while the large, distant ones provide general shape.
Finally, we have defined the value of one pixel of our image or map. At this point, the code is run for however many pixels you like, in any order and any position. The large-scale and the finest details will still line up. It may seem like a lot of calculation, but even a poorly optimized version of that code running in java on this five year old computer can complete a few thousand pixels a minute at least.
We can then take this finished 5000:5000 pixel map and say that all pixels with a value below 0.5 are water, and all those below 0.45 are deep water. Then the remaining land areas can have further maps generated to define dirt and forest and swamp. This is all tweeking and fine-tuning, and is what really makes or breaks the map. There are several ways to interpret this raw procedural height map, but the best one depends on what you are looking to do with the finished map.
At that point you have a basic series of islands and continents. To add stuff like rivers and cliffs, while these can be implemented as a part of the procedural generator, it is often best to have these placed by a simple post-production program of some sort, such as something that randomly places springs and then draws rivers downhill from that point. That and other detail programs are usually better done once an entire area has been generated, so that adjacent terrain can be taken into account.
The point I want to get across about procedural map generation is that compared to conventional generation, it is more difficult to fine-tune, but it creates more realistic and attractive landscapes, it is inherently live-expandable, and it cures fail in 98% of test maps.
--------------------------------------------------------------------------------------------------
*As an aside, the current world with the ability to add new land mass onto the edges is a bit of a workaround of this- each new megagrid is an entirely new random map- but note that this means that rivers, forests, and other features cannot cross the borders between areas, and because of that all edges will always have to be the default type. Thus, multi-multigrid seas will remain impossible without some re-engineering.
**Perlin noise is actually the little brother of the significantly smoother and efficient Simplex noise. Perlin is a bit simpler, though.
***That is, the random number is seeded with the sum of the co-ordinates, or the sum of random numbers seeded by the co-ordinates.
Examples:



Some more examples of Procedurally generated maps are available in my earlier thread.