Meaningful Information and Artifact

Meaningful Information and Artifact

Sean D. Pitman, M.D.

Table of Contents

Shannon Information

Finding Randomness

Pattern Entropy

SETI and Intelligent Design

Creating Meaningful Information

Problems of Meaningful Information

Beneficial Random Mutations

Definitions

Levels of Meaningful Complexity

Different types of functions

Sequence Space

Complex Specified Information (CSI)

Home

Claude E. Shannon, born April 30, 1916, grew up to become a true genius in many respects. Perhaps his most recognized contribution came in 1948 with his publication of A Mathematical Theory of Communication in the Bell System Technical Journal - which introduced the word "bit" (as in a "bit" of information) for the first time. This paper became the foundation of what is now called, "Information Theory."

What Shannon basically did was to imagine information as a finite number of character symbols that were transmitted through a channel with each symbol spending a finite amount of time in this channel. Mathematically he showed that there was a limit to the maximum amount of useable information that could be transmitted through this channel in a given span of time without it being significantly corrupted by random disruptions or "noise". This was a significant advancement for the telecommunications industry as well as the exploding computer age. In a 1974 book on the development of information theory, Slepian wrote:

Probably no single work in this century has more profoundly altered man's understanding of communication than C E Shannon's article, "A mathematical theory of communication", first published in 1948. The ideas in Shannon's paper were soon picked up by communication engineers and mathematicians around the world. They were elaborated upon, extended, and complemented with new related ideas. The subject thrived and grew to become a well-rounded and exciting chapter in the annals of science.^7,8

Now, even though Shannon's ideas formed the basis of what is now called "information theory", Shannon information is not really a theory of information in the usual sense of the word. Rather, it is more a theory of maximum information transmission or sequence "complexity". For example, consider a short children's storybook. Such a book contains more Shannon information than a book of equal size composed entirely of a string of As, but less Shannon information that a book of random letters. In other words, a series of random letters has more Shannon information than is contained in a meaningful storybook. Of course, for most people, this description is very confusing since it seems counterintuitive for a string of random letters to have more "information" content than a meaningful storybook with the same number of letters.

Richard Feynman summed up this little problem in his 1999 Lectures on Computation when he asked, "How can a random string contain any information, let alone the maximum amount? Surely we must be using the wrong definition of 'information'?" ¹ In fact, it seems to me quite unfortunate that Shannon used the term "information" at all when it might have saved a lot of confusion to use the term "maximum information transfer" instead. In fact, Warren Weaver (coauthor with Shannon of The Mathematical Theory of Communication) noted, "The word information, in Shannon's theory, is used in a special sense that must not be confused with its ordinary usage. In particular, information must not be confused with meaning..." ²

And yet, many people, even scientists on occasion, seem to confuse information (in the present sense) with "meaning". Crutchfield tries to clarify the seemingly counterintuitive notion of randomness producing a high level of Shannon "information" while at the same time producing a high level of "algorithmic complexity" (as defined by those like Kolmogorov, Chaitin, and Solomonov, etc). He noted that, "Information theory as written down by Shannon is not a theory of information content. It's a quantitative theory of information." ³

For example, consider that before a coin is flipped the potential outcome of a fair toss of the coin is either heads or tails, or two possible outcomes. Compare this to a six-sided die where the potential outcome of the roll of the die is one of six options. After the coin is flipped, say the outcome is "heads". This result produced a "reduction in uncertainty". The same thing is true of a roll of a die where the outcome is, say, "5", this result produces a reduction in uncertainty. The only difference being that there is a greater reduction in uncertainty for the roll of the die vs. the flip of the coin because there were more unknown potential outcomes for the roll of the die than for the flip of the coin. Therefore, the roll of the die conveys more Shannon information than the flip of the coin because the roll of the die produced a greater reduction in uncertainty. In other words, the greater the number of potential out outcomes, the less like it will be that any particular outcome will be realized, the more Shannon information that is generated when one particular outcome is in fact realized.

As another example, consider the following sequences:

"In the beginning God created the heavens and the Earth"

"ax epm iwmxseigm uio wazirlf hgt pohmsat qwv hrg ouirm"

Both of these sequences have 54 characters each. Since both are composed of the same 26-letters of the English alphabet, the amount of uncertainty eliminated by each character is identical. The probability of producing each of these two sequences at random is identical. Therefore, both sequences have an equal amount of Shannon information - even though one is "meaningful" while the other is not. This means that the mathematical formula for Shannon information is in fact quite simple. It is: I = -log₂p where p is the probability of a particular sequence, out of all the potential options, being realized. Obviously then, given that all other things are the same, the Shannon information of a sequence will increase as the length of the sequence increases.

Why then do some argue that a more random appearing sequence of characters has more Shannon information than does a meaningful sequence of characters? Well, a random sequence of characters has a uniform spread of character usage whereas a meaningful storybook does not have a uniform usage of characters. Therefore, the odds that some characters will turn up more often compared to others can be determined, thus reducing the Shannon information of the storybook more than the random sequence of characters.

To explain a bit further, Shannon information is determined by reference to a hypothetically random source of string production - a source that produces maximum Shannon information. This is in contrast to Kolmogorov/Chaitin "complexity" (discussed below) which concerns the object itself independent of the knowledge of the actual source of the object. Note again that neither Shannon information nor Kolmogorov/Chaitin complexity are concerned with the actual "meaning" of the "information" represented by the source or the object.

"In the Shannon approach, however, the method of encoding objects is based on the presupposition that the objects to be encoded are outcomes of a known random source. It is only the characteristics of that random source that determine the encoding, not the characteristics of the objects that are its outcomes. . . Shannon ignores the object itself but considers only the characteristics of the random source of which the object is one of the possible outcomes, while Kolmogorov considers only the object itself to determine the number of bits in the ultimate compressed version irrespective of the manner in which the object arose." ²²

This means that Shannon information is more about the type of source it will take to transmit a particular type of string (of digits) rather than the string itself. So, to transmit a number like pi, where all the symbols seem to appear with equal frequency, the source needed to transmit a sequence like pi will have to be able to produce all possible numbers with a similar character frequency. In other words, this source must be able to produce not only pi, but all possible numbers in infinite sequence space - to include truly "random" and "non-computable" numbers like Omega (Ω).

Again, it is all about the "complexity" or "randomness" of the needed source or the reference that produces a particular type of string when it comes to Shannon information, while Kolmogorov complexity is all about the "randomness" of the string itself. Yet, there is a close connection between these two concepts that is so close as to make them essentially equivalent. In other words, "Shannon Entropy [and Shannon Information] equals or is at least close to the expected Kolmogorov Complexity (with a constant)." ²² Since KC is measured relative to a specific UTM (Universal Turing Machine), it follows that in some sense Shannon Entropy in also related to this reference UTM as well (discussed in more detail below).

Finding Randomness

To understand this concept a bit more, it might be helpful to consider distinct but related concepts of Kolmogorov, Chaitin and Solomonov "complexity" of numbers or sequences. Many might think that all potential number sequences of a given length, say 10 digits, are equally likely to appear when randomly chosen by a "fair toss" number generator. Although this is true, it is interesting to note that certain patterns of numbers are not equally likely. Intuitively some numbers just look more "random" than other more "orderly" or "predictable" number patterns. Of course, if one had the ability to predict a type of number sequence with a greater than random degree of accuracy, this would be very helpful in places like a Las Vegas casino. Clearly then, the notion of randomness and the ability to produce unpredictable sequences with number generators is very important not only in Las Vegas, but in many aspects of business and daily life - such as identity and password protection.

The problem, as Gregory J. Chaitin observed in a 1975 paper published in Scientific American is that, "Although randomness can be precisely defined and can even be measured, a given number cannot be proved to be random. This enigma establishes a limit to what is possible in mathematics." ⁹ And yet, this intuitive notion of randomness remains. Chaitin describes this problem in the following passage:

     Almost everyone has an intuitive notion of what a random number is. For example, consider these two series of binary digits:

01010101010101010101
01101100110111100010

     The first is obviously constructed according to a simple rule; it consists of the number 01 repeated ten times. If one were asked to speculate on how the series might continue, one could predict with considerable confidence that the next two digits would be 0 and 1. Inspection of the second series of digits yields no such comprehensive pattern. There is no obvious rule governing the formation of the number, and there is no rational way to guess the succeeding digits. The arrangement seems haphazard; in other words, the sequence appears to be a random assortment of 0's and 1's.

     The second series of binary digits was generated by flipping a coin 20 times and writing a 1 if the outcome was heads and a 0 if it was tails. Tossing a coin is a classical procedure for producing a random number, and one might think at first that the provenance of the series alone would certify that it is random. This is not so. Tossing a coin 20 times can produce any one of 2²⁰ (or a little more than a million) binary series, and each of them has exactly the same probability. Thus it should be no more surprising to obtain the series with an obvious pattern than to obtain the one that seems to be random; each represents an event with a probability of 2⁻²⁰. If origin in a probabilistic event were made the sole criterion of randomness, then both series would have to be considered random, and indeed so would all others, since the same mechanism can generate all the possible series. The conclusion is singularly unhelpful in distinguishing the random from the orderly. ⁹

Chaitin goes on to suggest that a "more sensible definition of randomness is required, one that does not contradict the intuitive concept of a 'patternless' number." Efforts to define such a patternless number where not started until fairly recently (~1965). The definition of a patternless number does not depend upon the origin of the number (random or not), but entirely upon the sequence of digits or characters that make up the number or character sequence. In short, the Kolmogorov/Chaitin complexity (KCC) of a sequence is a measure of its compressibility relative to a particular compression mechanism such as a Universal Turing Machine (UTM).

This apparent relationship between KCC and the selection of a UTM posses a bit of a problem. The argument can be made that any finite sequence can be compressed into a single bit representation depending on which UTM is chosen - which is true. "For any string S, there is a universal machine U_s such that the complexity of S relative to U_s is zero."¹⁵

However, this isn't the end of the issue. As it turns out, the choice of the UTM doesn't matter much when it comes to the KCC of longer and longer sequences. "Suppose that U and U₀ are both universal machines. Then there is a constant k such that for any S, the complexity of S relative to U never differs from the complexity of S relative to U₀ by more than k. So: fixing U and U₀, in the limit as the strings get larger, the particular choice of UTM matters less and less."¹⁵ Regardless of the choice of UTM, a program that is capable of successfully predicting longer and longer portions of a finite string, maintaining the string's KC at zero, or at least less than the maximum upper bound KC, is demonstrating, to a greater and greater degree of confidence, that the string is or was not the result of some random process. The hypothesis of less than maximum KC (or the hypothesis that the string is not Omega Ω) is supported to a greater and greater degree of "statistical significance". And, this support is completely independent of the UTM chosen.

On the other hand, if the selected UTM has to increase its size as the size of the string increases in order to make the string's KCC equal to zero, or less than the maximum upper bound on KCC, that UTM is not a good predictor of what will come next. Compare this with certain UTMs that do not have to increase in size to reproduce a specific sequence of increasing size. Such a situation is capable of providing a great deal of predictive value with regard to what will come next.

For example, "Pi [i.e., the ratio of the circumference vs. the diameter of a circle; 3.14159 . . . ], although infinite and non-repeating, has low Kolmogorov complexity because it can be specified by a short finite algorithm (like Gauss's summation or one of the newer digit-specifying algorithms). On the other side of the spectrum, a million randomly produced digits can't be derived in any other way besides specifying them in their entirety. The algorithm used to compute pi doesn't change in size depending on how many digits are needed; it has a constant Kolmogorov complexity as the size of the sequence increases. Since the algorithm used to determine the randomly produced digits is in fact the digits themselves, it gets larger with each additional digit -- the program itself has a linearly increasing Kolmogorov complexity."¹⁶

In a sense then it does matter what UTM is chosen first and in a sense it doesn't matter. If an arbitrarily chosen UTM happens to be a good predictor of the future, that was certainly a fortuitous choice. The resulting success was indeed dependent upon the original choice of UTM. At the same time the success of the predictive program supports the hypothesis that the origin of the string is not random more and more over time - until the initial choice of the UTM doesn't really matter at all.

This notion of using a particular pattern that is already known to predict what will come next gave rise to various other closely related concepts by those like Martin-Lof, Leonid Levin, Claus-Peter Schnorr, C.S. Wallace, D.L. Dowe, and others. Many of the descriptions of the problem use the concept of Las Vegas-style gambling as an illustration.

"Martin-Lof's original definition of a random sequence was in terms of constructive null covers; he defined a sequence to be random if it is not contained in any such cover. Leonid Levin and Claus-Peter Schnorr proved a characterization in terms of Kolmogorov complexity: a sequence is random if there is a uniform bound on the compressibility of its initial segments. Schnorr gave a third equivalent definition in terms of martingales (a type of betting strategy). . .

The martingale characterization conveys the intuition that no effective procedure should be able to make money betting against a random sequence. A martingale d is a betting strategy. d reads a finite string w and bets money on the next bit. It bets some fraction of its money that the next bit will be 0, and then remainder of its money that the next bit will be 1. d doubles the money it placed on the bit that actually occurred, and it loses the rest. d(w) is the amount of money it has after seeing the string w. Since the bet placed after seeing the string w can be calculated from the values d(w), d(w0), and d(w1), calculating the amount of money it has is equivalent to calculating the bet. The martingale characterization says that no betting strategy implementable by any computer (even in the weak sense of constructive strategies, which are not necessarily computable) can make money betting on a random sequence." ¹⁷

Consider also the following passage:

"When we toss a coin a number of times we are accustomed to thinking of the resulting sequence of heads and tails as a random sequence. Thus we would consider any sequence that is the result of tossing a coin a sequence of times to be random. This approach makes the sequence HHHHH no less random than the sequence HTHHTH. However, most people would consider the second sequence a random sequence but not the first.

This leads some to believe that people do not understand what random means. However, it led the founder of modern probability Kolmogorov, and others (Chaitin, Solomonov, and Martin Lof) to try to say what it should mean to say that a specific sequence is random.

Martin Lof took the approach that a sequence of heads and tails should be considered random if it would pass a set of statistical tests for randomness such as: the proportion of heads should be near 1/2, there should not be too many or too few runs of heads or tails etc. That is a sequence is random if it is a typical sequence in the sense that it would not be rejected by standard tests of randomness. Kolmogorov, Chaitin and Solomonov took an apparently different approach. They say a sequence of heads and tails is random if it is "complex", meaning that the shortest computer program you can write to produce the sequence is about as long as the sequence itself. The Martin Lof approach was shown to be equivalent to the Komogorov approach, and Beltrami restricts himself to the Kolmogorov approach.

Here is a connection between complexity and entropy. Entropy as defined by Shannon, is a measure of the uncertainly in a chance experiment. It is defined as -sum p(i)log(p(i) where the sum is over all possible outcomes i of the chance experiment and the log is to the base 2. For a single toss of a biased coin with probability p for heads and q for tails the entropy is -plogp -qlogq = -log(pq). This entropy is maximum when p = 1/2 when it has the value 1. For a fair coin tossed n times, the entropy is n and, for a biased coin tossed n times, the entropy is less than n.

Shannon was interested in the problem of encoding sequences for more efficient transmission. He showed that the expected length of the encoded sequence could be at most the entropy, and there is an encoding that achieves this. Writing a computer program to produce sequences of H's and T's of length n is a way to encode a sequence. Thus Shannon's coding theorem is consistent with the fact that most sequences produced by tossing a fair coin are random but this is not true of the tosses of a biased coin."¹⁸ [Emphasis added]

Those like Wallace and Dowe also show the connection between KCC and sequence prediction in the following excerpts from their famous 1999 paper entitled, "Minimum Message Length and Kolmogorov Complexity":

Suppose a string S has been observed, and we are interested in the relative probabilities of two alternative future events, representable respectively by S1 and S2. . . .It is this universal ability of the complexity-based probabilities to approximate the probability ratios of effectively computable distributions which justifies their use in prediction. . . .

Informally, a random source of binary digits provides an infinite stream of digits such that knowledge of the first N digits provides no information about the next digit and which produces ones and zeros with equal frequency. A property of an efficient (Shannon) code is that if a sequence of events is randomly drawn from the probability distribution for which the code is efficient, and the code words for these events concatenated, then the resulting stream of digits comes from a random source. . . The randomness of a finite string is usually assessed by asking whether any standard statistical test would cause us to reject the hypothesis that the string came from a random source. If the string indeed came from a random source such as an efficient binary coding of an event sequence, the test(s) will of course sometimes result in our falsely rejecting the randomness hypothesis in just those cases where the source happens by chance to give a string exhibiting some pattern. . .

The aim in this stream is to find the hypothesis H which leads to the shortest such string I, which may be regarded as the shortest message encoding the data given in S. For this reason, the technique is termed minimum message length (MML) or minimum description length (MDL) inference. The motivation for this aim is that the MML hypothesis can be shown to capture all of the information in S which is relevant to the selection of an hypothesis and this method of inference automatically embraces selection of an hypothesis of appropriate complexity as well as leading to good estimates of any free parameters of the hypothesis. If no I exists with #I < #S, then there is no acceptable hypothesis for the data, at least within the set considered possible, and the data is concluded to be random noise. . .

Solomonoff explicitly relates complexity and probability. . . The algorithmic complexity KT (S) may also be used to define an unnormalized probability. This probability does not correspond to the probability of an easily-defined event and is always less than the Solomonoff probability PT (S). However, it may reasonably be considered as a conservative estimate of PT (S), since it will often be only a little smaller.²¹

So, it seems that non-randomness of a finite sequence can be detected in such a way that it can be used to predict, statistically, the "significance" of a null hypothesis that the source of the sequence or pattern was "random" vs. the alternate hypothesis that the source was in fact "biased" in some non-random way.

This process often has a lot to do with the internal symmetry of a sequence or pattern. However, just because non-randomness can be reliably detected does not mean that randomness or even non-randomness can be definitively proved. Of course, this means that apparent randomness cannot be conclusively distinguished from true randomness. In other words, nothing can ever be proven to be truly random. An undiscovered pattern or bias could always have been responsible. On the other hand, it is still useful to be able to at least reasonably detect both randomness and non-randomness. That is the strength of KCC and the various other concepts of algorithmic complexity. By ruling out potential non-random causes to a statistic degree of certainty or "significance" (i.e., with the use of p-values, F-values, etc), KCC provides a basis upon which to at least estimating the likelihood of true randomness for a given sequence.

The Entropy of Red and Blue Marbles

In this light, consider a box filled with 1,000 red and 1,000 blue marbles, for a total of 2,000 marbles, where the only difference in the marbles is their color. Now, imagine that all the red marbles are on one side of the box while all the white marbles are on the other side. Clearly, such a pattern of red and white marbles would appear to have a very low degree of algorithmic complexity or "entropy" - a highly ordered or predictable or informationally compressible state - like the numerical sequence 1111100000. This pattern is not very "complex" or "chaotic" or "random" in the KCC sense. Of course a non-random cause could have been responsible for such a symmetrically ordered state, but what are the odds that a truly random process would give rise to such a highly symmetrical pattern of red and white marbles that do not inherently tend to create such symmetrical patterns when random energy is applied to the box?

It is sort of like asking how long it would take to doubt the truly random nature of the California Lottery if the same person won over and over again. How many times would it take for this person to win the lottery or even the jackpot on the same slot machine before the suspicion of a non-random or even deliberate bias should be reasonably entertained?

It seems then that a non-random cause or even a deliberate or intelligent cause can be detected, with very good predictive value, by studying certain phenomena in sufficient detail. Knowing that random energy applied to a system like red and white marbles always tends toward pattern homogeny or a non-symmetry (a state of "maximum" algorithmic entropy) the finding of a highly symmetrical state is very good evidence that a non-random selective energy source acted within the box of red and white marbles. Knowing this, what non-random energy source is most reasonably responsible?

Many potential non-random causes have been suggested to me regarding such hypothetical situations, such as a very strong source of red light close by or red paint dripping from the ceiling of the room. But, what if all of these were ruled out? How long should one continue looking for a mindless biasing agent before one considers the workings of deliberate design? It is kind of like a someone walking by a house in the morning and seeing that one of its windows is broken and then walking by in the afternoon and seeing that this same window is now fixed. The broken window can be easily explained by either mindless or mindful causes, but can the fixed window be explained by any mindless cause?

Consider that any time a system goes significantly beyond what the individual parts are collectively capable of achieving, an outside biasing agent must be considered. Sometimes, given the proper system and situation, the level of bias is so great that an intelligent cause is the only rational explanation. For example, many people actually looked for mindless natural causes to explain the sudden appearance of crop circles in England when they first appeared many years ago. Although this might be a reasonable initial response, how long should such searching out of potential mindless natural causes continue before the obvious conclusion is even entertained? - a very high-level intelligent cause? It seems that even those who suggested alien intelligences as the ones responsible were more logical than those who refused to consider anything but mindless non-deliberate natural causes. Of course, humans were eventually discovered making these crop circles, but the fact that intelligence had to be involved was clear from the very beginning from the understanding of two basic facts: Intelligent activity can be reasonably hypothesized when a given phenomenon is far beyond anything that any known mindless cause has ever even come close to achieving and is, at the same time, at least similar in meaningful informational complexity to that which known mindful causes, like humans, are capable of achieving.

SETI and the Detection of Intelligent Design

This concept is in fact behind the mainstream scientific search for extraterrestrial intelligence (i.e., SETI). H. Paul Shuch, Ph.D., the executive director of the SETI League, Inc., wrote that intelligent "artifact" initially involves the detection of an "anti-entropic" pattern.

We listed at the outset several of the hallmarks of artificiality, which we can expect to be exhibited an electromagnetic emission of intelligent origin. The common denominator of all these characteristics, in fact of all human (and we anticipate, alien) existence, is that they are anti-entropic. Any emission which appears (at least at the outset) to defy entropy is a likely candidate for an intelligently generated artifact. In that regard, periodicity is a necessary, though not a sufficient, condition for artificiality (remembering once again the pulsar).¹⁹

So, what was is it about pulsar radiosignals that should have indicated a non-deliberate origin? SETI scientists, like Seth Shostak, a senior astronomer at the SETI Institute, point out some very interesting things about pulsars and why pulsar signals are not "artifactual":

Its worth asking what about a pulsars precise radio heartbeat would tell you that they're not artificial. To begin with, the emissions occur over a wide bandwidth; a broadcast splatter rather similar to the static caused by lightning, and clearly a very inefficient way to transmit information. In addition, endless, regular pulses don't convey any information. Just as an interminable flute tone would not be music (except, perhaps, to Andy Warhol), so too is an unceasing clock tick devoid of any message. . .

Since [the discovery of pulsars], SETI researchers have expended considerable neural energy in considering what type of radio emissions would unequivocally qualify as artificial. An obvious suggestion . . . is to search for a signal that is branded with a mathematical label. For example, maybe the aliens will tag their transmission with the value of pi. That would clearly bespeak a middle school education, and would prove that the signal comes from thinking beings, rather than witless neutron stars or some other cosmic oddity. . . Perhaps the extraterrestrials will preface their message with a string of prime numbers, or maybe the first fifty terms of the ever-popular Fibonacci series. Well, there's no doubt that such tags would convey intelligence. ²⁰

There is no doubt that such tags would convey intelligence? Really? Consider then the humble Romanesco Broccoli plant which clearly shows an intricate Fibonacci-type fractal pattern - as does the galaxy pictured to the right. Such mathematical patterns are everywhere throughout nature. Is it just that such mathematical patterns in radiosignals, in particular, would convey intelligence then? - but not in other media? Does this make any sense? How can Shostak be so sure that a Fibonacci pattern or other such mathematical pattern in radiosignals would be clearly artificial while these same patterns in pinecones, sunflowers, broccoli, trees, cacti, and the even the human form are clearly the result of non-intelligent non-deliberate natural processes?

But what if the prime numbers are only broadcast at the start of a 100-hour interstellar screed, and we tune in somewhere in the middle? Wed miss the label.

In fact, we don't have to worry about this. Our TV signals, for instance, don't have repetitive headers with the value of pi attached, and yet we rightly suspect that they are recognizable as the product of a technically advanced society (were not talking program content here.) What is it about a TV signal that marks it as artificial? Surely, the picture and sound components make up a dynamic, and clearly non-random distribution of energy across the band. But these components are subtle, and would be rather difficult to detect at great distance.

However, one-third of the TV signal power is squeezed into a tiny part of the dial a narrow-band carrier that's only 1 Hz wide or so (the picture and sound are five million times wider.) You don't see this carrier on the screen, and you don't hear it from the speakers, thank goodness: your TV tuner uses it to decode the information, and then throws it away. But the point is that such narrow-band signals compress a lot of radio energy into a small bit of spectrum making them the easiest type of signal to pick out in a sea of static.

SETI Institute signal detection expert Kent Cullers, whose clear thinking routinely enlightens both novice and savant, describes the merits of narrow-band radio signals by comparing them to their audio counterparts.

"Imagine the roar of the ocean or the rustling of leaves in a high wind," he says. "For these natural events, the sound is produced simultaneously from many unsynchronized sources. If we plot the frequencies present in such natural events and compare them to artificial sounds, such as a tuning fork or an auto horn, a startling difference appears. Natural signals have a rather broad frequency spectrum, but the artificial ones usually don't."

Its the same with radio emissions. "Sure, if we want to, we can intentionally produce broad messy signals," Cullers notes. "Cell phones that use spread spectrum technology do this, and the military sometimes uses broad signals to intentionally hide them. But it seems that Nature cannot make a pure-tone radio signal." . . .

There's just one trouble with this. A perfect, narrow-band signal can have no message. A tuning forks steady note is not music. And it seems only reasonable to assume that if another civilization has troubled to build a transmitter, they wont waste the megawatts by merely sending an empty signal into space.²⁰

These comments are very interesting especially when SETI scientists like Shostak try, at the same time, to distinguish what they are trying to do in their search for alien intelligent design from those who think to find evidence for intelligent design in living things. Shostak himself goes on to present the difference between SETI and Intelligent Design

What many readers will not know is that SETI research has been offered up in support of Intelligent Design.

The way this happens is as follows. When ID advocates posit that DNA - which is a complicated, molecular blueprint - is solid evidence for a designer, most scientists are unconvinced. They counter that the structure of this biological building block is the result of self-organization via evolution, and not a proof of deliberate engineering. DNA, the researchers will protest, is no more a consciously constructed system than Jupiter's Great Red Spot. Organized complexity, in other words, is not enough to infer design.

But the adherents of Intelligent Design protest the protest. They point to SETI and say, "upon receiving a complex radio signal from space, SETI researchers will claim it as proof that intelligent life resides in the neighborhood of a distant star. Thus, isn't their search completely analogous to our own line of reasoning - a clear case of complexity implying intelligence and deliberate design?" And SETI, they would note, enjoys widespread scientific acceptance.

If we as SETI researchers admit this is so, it sounds as if we're guilty of promoting a logical double standard. If the ID folks aren't allowed to claim intelligent design when pointing to DNA, how can we hope to claim intelligent design on the basis of a complex radio signal? It's true that SETI is well regarded by the scientific community, but is that simply because we don't suggest that the voice behind the microphone could be God?

In fact, the signals actually sought by today's SETI searches are not complex, as the ID advocates assume. We're not looking for intricately coded messages, mathematical series, or even the aliens' version of "I Love Lucy." Our instruments are largely insensitive to the modulation - or message - that might be conveyed by an extraterrestrial broadcast. A SETI radio signal of the type we could actually find would be a persistent, narrow-band whistle. Such a simple phenomenon appears to lack just about any degree of structure . . .

And yet we still advertise that, were we to find such a signal, we could reasonably conclude that there was intelligence behind it. It sounds as if this strengthens the argument made by the ID proponents. Our sought-after signal is hardly complex, and yet we're still going to say that we've found extraterrestrials. If we can get away with that, why can't they?

Well, it's because the credibility of the evidence is not predicated on its complexity. If SETI were to announce that we're not alone because it had detected a signal, it would be on the basis of artificiality. An endless, sinusoidal signal - a dead simple tone - is not complex; it's artificial. Such a tone just doesn't seem to be generated by natural astrophysical processes. In addition, and unlike other radio emissions produced by the cosmos, such a signal is devoid of the appendages and inefficiencies nature always seems to add - for example, DNA's junk and redundancy.

Consider pulsars - stellar objects that flash light and radio waves into space with impressive regularity. Pulsars were briefly tagged with the moniker LGM (Little Green Men) upon their discovery in 1967. Of course, these little men didn't have much to say. Regular pulses don't convey any information - no more than the ticking of a clock. But the real kicker is something else: inefficiency. Pulsars flash over the entire spectrum. No matter where you tune your radio telescope, the pulsar can be heard. That's bad design, because if the pulses were intended to convey some sort of message, it would be enormously more efficient (in terms of energy costs) to confine the signal to a very narrow band. Even the most efficient natural radio emitters, interstellar clouds of gas known as masers, are profligate. Their steady signals splash over hundreds of times more radio band than the type of transmissions sought by SETI. . .

Junk, redundancy, and inefficiency characterize astrophysical signals. It seems they characterize cells and sea lions, too. These biological constructions have lots of superfluous and redundant parts, and are a long way from being optimally built or operated. ²⁰

One obvious problem with this argument is that so-called "junk DNA", pseudogenes, and other supposed superfluous evolutionary vestiges are disappearing at a very rapid rate. As it turns out a lot of what was previously thought of as non-coding junk DNA is now being found to have not only important functions, but even more important functions than the coding portions of DNA. In fact, the non-coding regions are ending up to be the overseers of those portions that do the dirty work of coding for proteins. Junk DNA actually holds the most vital information for building different types of organisms. It is the residence of the lion's share of useful genetic information.

But, beyond this fact, the argument that, "No intelligent designer would have done it that way." isn't a good basis for being unable to detect deliberate artifact. For example, a Picasso painting doesn't necessarily make orderly "sense" to most people. Yet, its intelligent artifactual nature is quite evident. The same thing is true of a first-grader's finger painting of his or her interpretation of their stick-figure family, smiling sun above, and warped house to the side.

I also find it interesting to note that Shostak did in fact argue for the helpfulness of specified sequence complexity, to include a series of prime numbers, pi, or the Fibonacci series (in the first referenced article), but then tried to distance himself from that concept by arguing for the artifactual nature of a "simple" signal pattern in his efforts to separate himself from the ID camp. The problem here is that one just can't have it both ways.

Evolution and the Creation of Meaningful Information

Some refer to the concept of meaningful information as "complex" information and argue that if one understands the difference between concepts like Shannon information and entropy, Kolmogorov/Chaitin complexity, and algorithmic entropy that it is "easy to see what's wrong with the 'information argument' as it is generally used by some, like creationists and design theorists, against the theory of evolution." Some, like David Roche (Committee for the Scientific Investigation of Claims of the Paranormal) explain that, "Natural selection simply removes some members from a population, making it more homogenous and less diverse. The resulting population is easier to describe in detail and so has less Shannon information, making it more homogenous and less diverse. Conversely, mutation makes the population less homogenous and so increases the amount of Shannon information." ¹¹

I would agree that natural selection removes diversity from the gene pool and thus lowers the Shannon information, Shannon entropy, and algorithmic entropy of the gene pool while mutations do the opposite. Of course, as noted previously, increasing the Shannon information or the algorithmic complexity of a gene pool does not necessarily increase the meaningful or usable informational of a given gene pool. It only increases the randomness or overall chaos of the gene pool, not the meaningful information. One must be very careful to keep these concepts clearly separated in one's mind since meaningful information is much more than mere random chaos.

Of course, this issue is commonly addressed by explaining that mutations are "random" and that "random processes do not, at least on their own, generate complexity." ¹¹ However, as previously noted, this statement is incorrect from the perspective of algorithmic complexity where random processes do indeed generate increasing complexity - as defined as less predictability or increasing "chaos" or "randomness". The term "complexity" must be carefully defined and understood in the different contexts in which it can have very different meanings. The "complexity" of living things is not the result of a very high level of randomness or chaos nor is it the result of a very high degree of predictability or system order. Rather, the complexity of living things is found somewhere in-between, as is the case with the children's storybook mentioned earlier. The following thoughts from Michel Baranger, a physicist from Cambridge University, may help to clarify this concept:

        "The constituents of a complex system are interdependent. . . Consider first a non-complex system with many constituents - say a gas in a container. Take away 10% of its constituents, which are its molecules. What happens? Nothing very dramatic! The pressure changes a little, or the volume, or the temperature; or all of them. But on the whole, the final gas looks and behaves much like the original gas. Now do the same experiment with a complex system. Take a human body and take away 10%: let's just cut off a leg! The result will be rather more spectacular than for the gas. I leave the scenario up to you. And yet, it's not even the head that I proposed to cut off. . .

        When you look at an elementary mathematical fractal, it may seem to you very 'complex', but this is not the same meaning of complex as when saying 'complex systems'. The simple fractal is chaotic, it is not complex. Another example would be the simple gas mentioned earlier: it is highly chaotic, but it is not complex in the present sense. We already saw that complexity and chaos have in common the property of nonlinearity. Since practically every nonlinear system is chaotic some of the time, this means that complexity implies the presence of chaos. But the reverse is not true. Chaos is a very big subject. There are many technical papers. Many theorems have been proved. But complexity is much, much bigger. It contains lots of ideas, which have nothing to do with chaos. . .

        So the field of chaos is a very small sub-field of the field of complexity. Perhaps the most striking difference between the two is the following. A complex system always has several scales. While chaos may reign on scale n, the coarser scale above it (scale n - 1) may be self-organizing [i.e., greater than the sum of its parts], which in a sense is the opposite of chaos. Therefore, let us add [another] item to the list of the properties of complex systems: Complexity involves an interplay between chaos and non-chaos." In fact, many people have suggested "that complexity occurs 'at the edge of chaos', but no one has been able to make this totally clear."¹⁰

So, although the complexity of living things does have something to do with Shannon information and Kolmogorov/Chaitin complexity, it goes beyond to something very different - meaningful or functional complexity. The question is, can random non-deliberate processes explain the existence of such high levels of meaningful functional complexity as are found within all living things? Well, it seem quite clear that random or apparently random forces, acting alone, lead only to general system chaos - the maximum state of system "algorithmic entropy" or Kolmogorov/Chaitin complexity. Of course, the power of natural selection is brought to the rescue with the explanation that, "Natural selection is not a random process but an ordering process that creates structure from noise." ¹¹

Structure from noise . . . think about that for just a minute. Think back to the box of red and white marbles again. Random energy applied to the box would tend to make a very "noisy" pattern of red and white marbles. In fact, the noisiness of the pattern would eventually reach a state of "maximum" noisiness or algorithmic entropy - maximum chaos basically. What can be applied to this system to reduce this maximally chaotic state? Certainly not more random energy which is itself chaotic - right? Not more marbles which do not, in themselves, contain enough information to reduce the chaos of the system. It seems then that the only agents capable of reducing the chaos of the marble patterns are those that contain more internal informational complexity (as defined above) than does the system of marbles. In other words, the biasing agent must be more informationally rich or informationally complex than the system which it acts upon if it is to reduce the pre-established chaos of that system or add informational complexity to the system by redirecting random energy into directed non-random energy.

Of course, one might now ask the obvious question: Is Natural Selection such a biasing agent? Do the mindless processes of natural selection contain enough meaningful informational complexity to take existing systems (living things) and increase or change their functional complexity in entirely new directions by directing random energy in non-random ways?

The Problem of Meaningful Information

Of course, most evolutionists ardently believe that mindless non-deliberate natural selection, combined with random mutation, does indeed posses the power to create higher and higher levels of meaningful information. The problem here is that no one seems to seriously address the problem of how natural selection "selects" if two genetic sequences are equally meaningful or meaningless. The fact of the matter is that most of the potential genetic sequences of a given length are not meaningful at all from the perspective of a given life form just like most potential letter sequences of a give length are not meaningful to an English speaking person - much less beneficial in a given situation or environment. Obviously then, all such non-meaningful sequences have the same non-meaningful meaning - right? How then does a selection process that only recognizes differences in meaning, such as the process of natural selection, select between equally meaningless options? For example, what's the meaningful difference between quiziligook and quiziliguck?

Most evolutionists do not even consider this problem much less discuss it. It would be no problem if natural selection could recognize and select between all changes in a genetic sequence, but it cannot do this for many types of functions. It is simply powerless to recognize those genetic changes that do not result in a meaningful or functional change in the organism. These non-meaningful changes do in fact occur and they are very common. In fact, the large majority of mutations are neutral with regard to function or a "meaningful" change, especially at lower levels of meaningful informational complexity.

This is well recognized. In fact, there is even a Neutral Theory of Evolution (Motoo Kimura) that describes how mutations change the genetics of a gene pool without changing the meaningful information content of that gene pool. Such neutral mutations would increase the Shannon information and KCC of the gene pool, but do not increase the meaningful information content of the gene pool. This is a very important distinction between Shannon information or KCC and meaningful information and yet many who discuss this topic of informational complexity fail to make this distinction.

Beneficial Random Mutations

Now of course, on relatively rare occasions, apparently random mutations do actually change the meaning and/or function of a genetic sequence. Almost always this change results in a loss, not a gain, of meaning. However, very rarely, a mutation may result in a gain of a truly new type of meaningful/beneficial function - like evolving from cat to hat to bat to bad to big to dig to dog where a uniquely meaningful potentially beneficial change is obtained each step of the way from "cat" to "dog". Of course, natural selection is able to select against "bad" mutations and for "good" mutations as far as meaningful functions are concerned and so the good mutations are kept and the bad mutations are discarded. Obviously then, it seems as though good mutations could add up over time to eventually give rise to the fantastic variety of type and level of function that we see in the vast array of living organisms on this planet.

There is just one problem. The "good" mutations never produce any new or unique types of functions beyond the lowest levels of functional complexity or meaning. For example, it is very easy to find pathways between just about every 3-letter word and short phrase, but it gets exponentially harder to finding intact pathways for longer and longer "beneficial" sequences in the English language system. Very quickly it becomes enormously difficult to evolve any novel beneficial functional sequence or system that requires a minimum structural threshold of more than a few hundred specifically arranged characters without having to traverse vast non-beneficial oceans of completely meaningless or detrimental sequences within the potential of sequence space.

Beyond this, no functional system that requires a minimum of more than a few hundred specifically arranged amino acid residues working together at the same time has ever been shown to evolve in observable time - period. In fact, a function requiring just a couple thousand fairly specified amino acid residues working together at the same time (as in a multiprotein motility system etc.) would take trillions upon trillions of years, on average, to evolve using random mutation and natural selection. Why? Because, at this level of meaningful/functional complexity, each beneficially meaningful sequence is separated from every other potentially beneficial sequence in "sequence space" by oceans upon oceans of non-meaningful / non-beneficial sequences.

In other words, there is a "gap" between potentially beneficial islands of beneficial sequences in sequence space. This gap grows linearly with each increase in the level of the functions in question. With each linear increase in the gap size, the average random walk time needed to traverse this gap increases exponentially (i.e., many fold per step). This is because once random mutations take a genetic sequence into this ocean, the ocean of non-beneficial sequences that separates the potentially beneficial islands, natural selection is powerless to guide the evolution of this genetic sequence and it starts to wander blindly around this huge ocean of neutral non-beneficial options. This "random walk" simply takes far far longer than any sort of direct path would have taken. Unfortunately, there is no direct path since there is no guide (remember, natural selection is blind now).

This "limited" characteristic of the evolutionary mechanism has been demonstrated in the laboratory over and over again. In fact, Barry Hall, after doing many such experiments with E. coli bacteria, concluded that these creatures did in fact have significantly "limited evolutionary potential" when it comes to evolving certain types of functions. ^4,5,6

Are these types of functional systems therefore "artifactual" in nature? - clearly beyond the realm of non-deliberate natural creative processes? What conclusion carries with it the most predictive value in such cases?

Appendix Definitions:

Levels of meaningful informational complexity are defined by the minimum number of characters and minimum degree of specificity or variability that can be tolerated in the arrangement of the characters to achieve a particular type of beneficial system and function to any useful degree (i.e., a threshold level of usefulness of a particular type). For example, different biosystem functions require different minimum numbers of fairly specifically arranged amino acid residues. The insulin function requires about 50aa at minimum, cytochrome c function about 80aa, nylonase ~200aa, lactase ~380aa, flagellar motility >10,000aa. Without these minimums in place, none of these functions could be realized at all - not even a little bit. Beneficially functional systems that have lower minimum requirements are many fold easier to evolve than higher-level systems because they are that much easier to find via random searches through the vastness of sequence space.

Different types of functions are not the same thing as different degrees of the same function. For example, some bacteria can make the anti-penicillin enzyme penicillinase, but just not very much of it. Deregulation mutations can remove this limitation, resulting in a marked increase in penicillinase production and a corresponding increase in resistance to the penicillin antibiotic. This is an example of increasing degree of activity of the same type of functional system. No new type of function is evolved here - just more of the same thing. However, consider the mutation of a genetic sequence that does not yet have the minimum size and specificity requirement for the lactase function. Yet, with just one point mutation this sequence, otherwise known as evolved beta-galactosidase (ebg), is suddenly able to have the lactase function to a minimally selectable degree. This example does qualify as the evolution of a new type of function that wasn't there before.

Sequence space is made up of the total number of potential sequences for a given sequence size. For example, the total sequence space size for English language 3-letter sequences is 26³ = 17,576. Of these, only about 1 in 18 are defined in the English language system. The sequence space size for 7-letter sequences is 26⁷ = 8,031,810,176. Of these, only about 1 in 250,000 are defined. The same thing is true of biosystem genetic and protein sequences. The sequence space for a 100aa sequence takes in the possibility of 20 potential amino acid residues per position for a total of 20¹⁰⁰ = 1e130. Depending on the functional system in question that requires 100aa at minimum, the vast majority of potential sequences in the sequence space of 1e130 sequences would not carry that function to any selectable degree. Scientists like Sauer and Yockey have done some interesting research in this regard. ^12-14

Complex Specified Information (Pitman version): My version of complex specified information (CSI) is somewhat different from other versions, like Dembski's version (see formula listed below). My view of CSI is based on the above mentioned concepts of algorithmic complexity combined with the concept of being able to make predictions based on the past successes/failures of a prediction algorithm being able to predict what will come next in a growing sequence or string. A string that is truly the result of random production will tend toward non-predictability or maximum entropy over time as the string grows in size, regardless of the original reference algorithms or "strings" chosen - to an exponential degree.

        Therefore, CSI is all about being able to predict non-randomness or bias. A reference that has a history of very good predictive success will carries with it a very high CSI value. The higher the CSI value, the less likely the string's true origin is or was truly random.

        Of course this means that CSI isn't really about detecting design. It is about detecting bias or non-randomness. Can mindless non-deliberate forces of nature produce strings or sequences in a non-random way with high CSI values? Sure they can. For example, a pulsar produces a very regular repeating radiosignal and is very predictable and therefore caries with it a very high CSI value that increases exponentially over time.

        So, what's the point of testing for CSI? The point is that one must first be able to detect bias to various degrees before one can detect artifact or deliberate design of any kind. If one is unable to first detect bias, the detection of artifact is impossible. After one detects that bias is in fact responsible for a given feature or phenomenon, one can then go about researching that bias to see if any non-deliberate force of nature even comes close to producing that type of bias in the material under investigation (like a highly symmetrical polished granite cube, the patterns of red and white marbles mentioned above, or the first 50 expressions of the Fibonacci series repeated over and over again in a radiosignal coming from outer space). It turns out that some types of biased sequences, depending upon the medium in which they appear, are well beyond anything that non-deliberate forces of nature can achieve and are therefore clear indications of artifact or deliberate intelligence.

The formula for my version of CSI:

For binary strings:

The number of strings at a given Hamming Distance:

n! / ((n-hd)! * hd!))

My version of CSI is a take on this formula:

CSI = n! / ((n-|(n/2-hd))|! |(n/2-hd)|!)

For strings X > 2:

CSI = log₂(X^n))! / (((log₂(X^n)) - |(log₂(X^n)) / 2-hd)|)! * |((log₂(X^n)/2) - (hd)|!))

X = number of possible characters per position

n   = size of the sequence

hd = Hamming Distance

Interesting Quote:

     Unsurprisingly, mathematicians had a difficult time coming to terms with Omega. But there is worse to come. "We can go beyond Omega," Chaitin says. In his new book, Exploring Randomness (New Scientist, 10 January, p 46), Chaitin has now unleashed the "Super-Omegas".

     Like Omega, the Super-Omegas also owe their genesis to Turing. He imagined a God-like computer, much more powerful than any real computer, which could know the unknowable: whether a real computer would halt when running a particular program, or carry on forever. He called this fantastical machine an "oracle". And as soon as Chaitin discovered Omega--the probability that a random computer program would eventually halt--he realized he could also imagine an oracle that would know Omega. This machine would have its own unknowable halting probability, Omega'.

     But if one oracle knows Omega, it's easy to imagine a second-order oracle that knows Omega'. This machine, in turn, has its own halting probability, Omega'', which is known only by a third-order oracle, and so on. According to Chaitin, there exists an infinite sequence of increasingly random Omegas. "There is even an all-seeing infinitely high-order oracle which knows all other Omegas," he says. ( Link )

References:

Richard Feynman (1999) Feyman Lectures on Computation, p120.
Claude Shannon and Warren Weaver (1949): The Mathematical Theory of Communication.
Tom Siegfried (2000), The Bit and the Pendulum, chapter 8, p169-171.
B.G. Hall, Evolution on a Petri Dish. The Evolved B-Galactosidase System as a Model for Studying Acquisitive Evolution in the Laboratory, Evolutionary Biology, 15 (1982): 85-150.
http://naturalselection.0catch.com/Files/galactosidaseevolution.html
http://naturalselection.0catch.com/Files/steppingstones.html
Slepian, D., Key papers in the development of information theory, Institute of Electrical and Electronics Engineers, Inc. New York, 1974
http://www-gap.dcs.st-and.ac.uk/~history/Mathematicians/Shannon.html
Gregory J. Chaitin, Randomness and Mathematical Proof, Scientific American, 232, No. 5 (May 1975), pp. 47-52
Michel Baranger, Chaos, Complexity, and Entropy - A physics talk for non-physicists, Center for Theoretical Physics, Laboratory for Nuclear Science and Department of Physics Massachusetts Institute of Technology, Cambridge, MA 02139, USA and New England Complex Systems Institute, Cambridge, MA 02138, USAMIT-CTP-3112 ( Link )
David Roche, A Bit Confused, Skeptical Inquirer, March 2001 ( Link )
Sauer, R.T. , James U Bowie, John F.R. Olson, and Wendall A. Lim, 1989, 'Proceedings of the National Academy of Science's USA 86, 2152-2156. and 1990, March 16, Science, 247; and, Olson and R.T. Sauer, 'Proteins: Structure, Function and Genetics', 7:306 - 316, 1990. ( Link )
Yockey, H.P., J Theor Biol, p. 91, 1981
Yockey, H.P., Information Theory and Molecular Biology, Cambridge University Press, 1992
Adam Elga, Intricacy Handout, Princeton, February 18, 2003 ( Link )
DG, Kolmogorov Complexity, Everything2.com, September 26, 2000 ( Link )
Wikipedia, Algorithmically Random Sequence, accessed March 15, 2007 ( Link )
Edward Beltrami, What is Random?, Springer-Verlag 1999 ( Link to Excerpt from ChanceNews.com )
H. Paul Shuch, Standards of Proof for the Detection of Extra-Terrestrial Intelligence, The SETI League Inc. 1999 ( Link )
Seth Shostak, How to Sort Signs of Artificial Life from the Real Thing, SETI Thursday, Space.com, January 30, 2003 ( Part 1 ; Part 2)
C.S. Wallace and D.L. Dowe, Minimum Message Length and Kolmogorov Complexity, The Computer Journal, Vol. 42, No. 4, 1999. ( Link )
Peter Grunwald and Paul Vitanyi, Shannon Information and Kolmogorov Complexity, Essay, September 14, 2004 (Link)