THE END OF THE WORLD
It is coming. Unless we get rid of the weapons of war. Doing that requires a
lot of people dedicated to that goal. And that requires a lot of people
understanding that the world really is doomed if we don’t all join together in
that effort.
Convincingly predicting an annihilating nuclear war is a hard sell given the many factors involved. But we make its coming unarguably clear with mathematics. Accepting that grim reality then makes the need for eliminating weapons on planet earth an unavoidable choice except for the suicidal and those who reject the logic of 2+3=5 mathematical truth. For there are some so mentally crippled as to believe that God can make 2+3 be something other than 5 if He wishes and that His laws supersede the laws of mathematical science.
To the yet sane we address a mathematical argument that points the way to dodging the impending extinction of the human race. The resurgence of the Cold War from the proxy battle in the Ukraine between America and Russia, armed each with 4000 nuclear weapons, may be more than enough to convince some of the nuclear problem without the need for mathematical persuasion. Former Soviet premier, Mikhail Gorbachev, recently said that “the United States has already drawn us into a new Cold War, trying openly to achieve its main idea of triumphalism” and that he feared hostilities could escalate into armed conflict. And on the very same day he said that, the Russian envoy to the OSCE said that “catastrophe” could result from US support for the Ukrainians.
One does not have to agree with America as the cause of the problem to see armed
conflict and catastrophe between nuclear powers as realistically alarming
enough to induce a dedication
making A World with No Weapons happen. But for doubting optimists who are yet
are open to argument, the math that follows spells out the stark reality of nuclear
conflict on the horizon and what must be done to avoid it.
We also include data for the mathematical analysis of human nature from our true life experiences to counter the suppression of reality by the media and other privileged class information outlets. Our story contains the basic ideas for those who dislike math, the issues discussed being contentious enough, though, that considering them mathematically is highly recommended for those who can handle math.
By Ruth Marion Graf and Dr. Peter V. Calabria, PhD, Biophysics
© Ruth Marion Graf, 1/29/15
contact: ruthmariongraf@gmail.com
If by the end of reading this it becomes clear why we must have a president
whose primary concern is getting us to A World with No Weapons, scroll back and
click here to encourage and support my candidacy. The time is right for having a woman mathematician at
the helm for the men in charge have all proven themselves to be insincere, selfish
and stupidly shortsighted. We talk now about the big issues in life in
mathematical language that starts with the very basics.
1. Counting
The simplest thing in mathematics are the counting numbers: 1, 2, 3, and so on. But even counting isn’t as simple as it seems. Count the number of objects in (■■■■). There are 4 objects here, of course. Now count the number of objects in (■■■■). Is it also 4? Is this count of 4 exact?
There’s something wrong with counting the unequal sized objects in (■■■■) as 4. It is important to make clear what it is for doing so leads to a revolution in cognitive science that solves the mystery of how the human mind works and in physical science unravels the mystery of entropy. And these lead to a mathematical understanding of emotion including the feelings people have that underpin violence, which will make clear the awful place the world is heading and why we have to get rid of its weapons.
Counting 4 objects in (■■■■) is inexact. You remember the grade school dictum against adding things together that are different in kind like 2 galaxies and 2 chickens. This restriction is also valid for adding or counting things together that are different in size. Consider (■■■■) as fresh killed chickens at the grocery store of sizes of (5, 3, 3, 1) pounds. Is a count of them of 4 chickens exact? I am not questioning whether calling them “4 chickens” to buy is useful in planning your cookout this weekend. But rather whether this count of 4 chickens is mathematically exact.
A grocer would say no, the inexactness in specifying chickens of differing poundage, (5, 3, 3, 1), as 4 chickens being exactly why chickens are not sold by the chicken, but rather by the pound, all of which are exactly the same size. Four pounds is exact count of pounds because all pounds are the same size: four chickens is an inexact count when the chickens counted are not the same size. That characteristic of sameness in size applies to all standard measures whether pounds or fluid ounces or inches or minutes. And that is because standard measure leads to commercially exact transactions.
We can make this point of counting objects the same size being exact and inexact when not the same size in a more rigorous way by considering next a set of K=12 objects all the same size, (■■■■■, ■■■, ■■■, ■), are divided into N=4 color subsets. We stipulate the K=12 objects to be “unit objects”, of the same size, because this mathematical regularity insures that the K=12 count of them is exact.
The N=4 count of the subsets, which are unequal in size in terms of having unequal amounts of unit objects in them, we will show now is inexact. To prove that there is an error in counting the number of subsets in (■■■■■, ■■■, ■■■, ■) as N=4, we need a formal description of the set having in it x_{1}=5 red objects, x_{2}=3 green ones, x_{3}=3 purple ones and x_{4}=1 black object or in shorthand notation, (5, 3, 3, 1). The sum of the objects in each of the N=4 subsets is the K=12 total number of objects in the set. Or formally,
1.)
This says for our example set that the total number of objects is
1a.) K = x_{1}+ x_{2}+ x_{3}+ x_{4 }= 5+3+3+1 =12
Now it is easy to show that the N=4 count of the unequal sized subsets in (■■■■■, ■■■, ■■■, ■) is in error or inexact as follows. A basic statistic of a set of numbers like (5, 3, 3, 1), here representing the set of objects, (■■■■■, ■■■, ■■■, ■), is the mean or arithmetic average, µ, (mu).
2.)
For the K=12, N=4, set, (5, 3, 3, 1), the arithmetic average is µ=K/N=12/4=3. That the µ arithmetic average is inexact is well fleshed out over the 47 chapters and myriad examples in the modern classic, The Flaw of Averages. A quicker way of understanding the inexactness or error in the µ arithmetic average is to note that the mean of arithmetic average of any number set is accompanied, explicitly or implicitly, with a measure of its statistical error. The most common statistical error used is the standard deviation, σ, (sigma),
3.)
For the N=4, µ=3, (5, 3, 3, 1) set,
3a.) ==1.414
And another commonly used statistical error is the relative or percent error, r
4.)
For the µ=3, σ=1.414, (5, 3, 3, 1), set, the relative error is r=σ/µ=1.414/3=.471=47.1%. The statistical error in the µ=K/N=3 arithmetic average of (5, 3, 3, 1), whether of σ=1.414 or r=47.1%, can be understood as a counting error or inexactness in the functions that make up µ=K/N, namely K and N. The error does not derive from K as a counting error because all K=12 unit objects in (■■■■■, ■■■, ■■■, ■) are, by definition of a unit object, the same size. Rather the statistical error in µ=K/N derives from the counting error or inexactness in the N=4 count of the subsets in the (■■■■■, ■■■, ■■■, ■), (5, 3, 3, 1) set.
To make this point firmly, let’s next look at the µ, σ and r of the K=4 object, N=4 subset, “balanced” set of objects, (■■■, ■■■, ■■■, ■■■), (3, 3, 3, 3). It also has a µ=K/N=12/4=3 arithmetic average, but from Eq3 a σ=r=0 statistical error, which suggests no error or inaccuracy in µ and, hence, no error or inaccuracy in the K or N variables that make up µ=K/N. And that is because the count of K=12 objects in (■■■, ■■■, ■■■, ■■■), (3, 3, 3, 3), is exact given all K=12 objects being the same size; and because the count of N=4 subsets is exact because all the subsets are the same size in all having 3 unit objects in them.
Since many sets in mathematical science and in the natural world are unbalanced and the N count of them, hence, inexact, this error in counting them would seem to pose a genuine problem. And it does, as we shall make clear later, both in the understanding of information the human mind processes as part of our daily existence and in the understanding of entropy, which, as we shall see, very much impinges also on a correct understanding of the mind. Fortunately there is a way around inexact counting in the concept of and functions for diversity, which solves the problem of inexact counting and in doing so tells us much we did not know about workings of the human mind in its thoughts and emotions and about the nature of entropy.
2. Diversity
While diversity seems such a general concept that we would guess it to have been part of science for a long time, it is relatively new in mathematical form, newer than quantum mechanics or relativity, having come about only after WWII from the pioneering efforts of two men, an Englishman, Edward Hugh Simpson, and an American, Claude Shannon. Simpson’s diversity indices were specifically developed by him to measure diversity while Shannon’s “information entropy” became used as a diversity index by biologists and sociologists after its properties were recognized as diversitylike by researchers in those fields. We will focus on one of the Simpson diversity index to begin with. It resolves the problem of the inexact counting of unbalanced subsets like (■■■■■, ■■■, ■■■, ■).
We say intuitively that N=4 color set of objects, (■■■, ■■■, ■■■, ■■■), (3, 3, 3, 3), is more diverse colorwise than the N=2 color set, (■■■■■■, ■■■■■■), (6, 6), for the color diversity of any set of objects obviously depends on the N number of different colors the set has. A measure for diversity that fits this intuitive sense of it is the Simpson’s Reciprocal Diversity Index defined below in terms of the K and x_{i} parameters of a set as
5.)
For the N=4, (■■■, ■■■, ■■■, ■■■), (3, 3, 3, 3), set, the D diversity index is
6.)
And for the N=2, (■■■■■■, ■■■■■■), (6, 6), set, it is
7.) D
This fits our sense of color diversity being greater the greater the N number of colors in a set. Indeed, we see for both of these balanced sets and for all balanced sets that
8.) D = N (balanced)
For the N=4 color unbalanced set, (■■■■■, ■■■, ■■■, ■), (5, 3, 3, 1), from Eq5,
9.)
One can interpret the D=3.273 diversity of the unbalanced N=4 set, (■■■■■, ■■■, ■■■, ■), being less than its N=4 color subsets deriving from the only x_{4}=1 object in the black subset contributing a reduced diversity or only token diversity to the set’s diversity. For (5, 3, 3, 1) and all unbalanced sets, Eq5 calculates s
10.) D < N (unbalanced)
Now while N=D is an exact measure for the balanced sets, N > D, as we made clear earlier, is an inexact measure for all unbalanced sets. To the contrary, though, D is an exact quantification of the subsets in a set, balanced or unbalanced. It is so because, as we see in Eq5, D is a function of the exact K count of the total number of objects in a set and also, of course, of the exact count of the x_{i} number of objects in each subset, i=1, 2,…N. Hence D is understood as an exact correlate of the N inexact number of subsets in an unbalanced set.
Another way of appreciating the exactness in the D diversity index derives from developing it as a statistical function. To do that we first express the σ standard deviation as its square, σ^{2}, which is the variance statistical error of statistics.
11.)
Then solving the above for the summation term in it obtains,
12.)
Now inserting this summation term into D of Eq5 obtains DE via µ of Eq2 and r of Eq4 as
13.)
This shows exact D deriving from inexact N via the inclusion of the r relative error measure of the inexactness in N. This reinforces D as an exact correlate of inexact N that can be used in place of N as an exact quantification of the subsets of an unbalanced set. This has important ramifications for explaining information and entropy correctly.
As to information processed by the human mind, D is readily interpreted as the number of significant subsets in a set, which, as we shall see next, explains how the mind intuitively distinguishes the significant from the insignificant. We illustrate this an item in the news on the makeup of the K=53 man Ferguson Police Dept. being x_{1}=50 Caucasian officers and x_{2}=3 Blacks officers. While a mathematical understanding is not entirely necessary for people to understand the Black contingent of the force to be insignificant (quantitatively), the D diversity index as the number of significant subsets in a set makes it mathematically clear.
The Ferguson PD number set of (50, 3) has from Eq5 a diversity index of D=1.12, which rounded off to the nearest integer as D=1 implies that there is only 1 significant subset in it. Were the force made up in a more diverse way of, say, x_{1}=28 Caucasians and x_{2}=25 Blacks, the diversity for its (28, 25) number set of D=1.994 rounded off to D=2 would indicate 2 significant subsets, that both racial contingents were (quantitatively) significant. Returning to the actual (50, 3) makeup calculated to have a rounded diversity measure of D=1 significant subset, the x_{1}=50 preponderance of the Caucasian contingent suggests that it is the significant subset or subgroup in the force and, hence, that the x_{2}=3 Black officer subgroup is insignificant, which might also be interpreted as its contributing only token diversity to the force.
It is also possible to assign a significance index to each subset in a set as a more direct way to specify the subsets that are significant and those that are insignificant. We will use the K=12, N=3, (■■■■■■, ■■■■■, ■), (6, 5, 1), x_{1}=6, x_{2}=5, x_{3}=1, set to introduce significance indices. We calculate from Eq5 a D=2.323 diversity index for this set, which rounded off to D=2 suggests 2 significant subsets, the red and the green, with the purple subset that has only x_{3}=1 object in it understood as insignificant. To specify these attributions of significant and insignificant in a more direct way, we first introduce the root mean square (rms) average, ξ, (xi),
14.)
And the rms average squared, ξ^{2}, is
15.)
The rms average of the K=12, N=3, (■■■■■■, ■■■■■, ■), (6, 5, 1), unbalanced set is ξ =4.546 with ξ^{2}=20.667=62/3. And the rms average of the K=12, N=3, balanced set, (■■■■, ■■■■, ■■■■), (4, 4, 4), which we will use for comparison sake, is ξ =µ=4 with ξ^{2} =µ^{2} =16. Next note from Eqs5,15,2&1 the D diversity index given as
16.)
Now defining the significance index of the i^{th} subset of a set as s_{i}, i=1, 2,…N, as
17.)
Obtains the D diversity index as the sum of its s_{i} significance indices.
18.)
For sets, balanced and unbalanced, that have N=3 subsets, x_{1}, x_{2} and x_{3},
19.) D = s_{1} + s_{2} + s_{3}
For the balanced, N=3, (■■■■, ■■■■, ■■■■), (4, 4, 4), set, x_{1}=4, x_{2}=4 and x_{3}=4, D computed from the above is
20.) D = s_{1} + s_{2} +s_{3 }= 1 + 1 + 1 = 3 = N
What D=1+1+1 means is that all N=3 subsets, in having the value of unity or one, are significant. The D diversity of the unbalanced (■■■■■■, ■■■■■, ■), which has subsets, x_{1}=6, x_{2}=5 and x_{3}=1, is
21.) D = s_{1} + s_{2} +s_{3 }= 1.161 + .968 + .194 = 2.323
What D= 1.161 + .968 + .194 means is that the subset with x_{1}=6 red objects in it, in having a significance index of s_{1}=1.161, is significant in rounding off to s_{1}=1; that the subset with x_{5}=5 green objects in it, in having a significance index of s_{2}=.968, is significant in rounding off to s_{2}=1; and that the purple subset with x_{3}=1 object in it, in having a significance index of s_{3}=.194, is insignificant in in rounding off s_{3}=0.
Now returning to the Ferguson Police Dept. we see that s_{1}=1.056 rounded off to s_{1}=1 indicates that the x =50 Caucasian cops are significant and with s_{2}=.056 rounded off to s_{2}=0, that the x_{2}=3 Black cops are insignificant.
That
the human mind actually operates with these significance functions, or some
neurobiology facsimile of them, is made clear with the next illustration of
significance and insignificance with the three sets of colored objects seen below,
each of which has K=21 objects in it in N=3 colors.
Sets of K=21 Objects 
Number Set Values 
D from Eq5 
Rounded to 
Significance Indices 
(■■■■■■■, ■■■■■■■, ■■■■■■■) 
x_{1}=7, x_{2}=7, x_{3}=7 
D=3 
D=3 
s_{1}=1, s_{2}=1, s_{3}=1 
(■■■■■■, ■■■■■■, ■■■■■■■■■) 
x_{1}=6, x_{2}=6, x_{3} =9 
D= 2.88 
D=3 
s_{1}=.824, s_{2}=.824, s_{3}=1.24 
(■■■■■■■■■■, ■■■■■■■■■■, ■) 
x_{1}=10, x_{2}=10, x_{3}=1 
D=2.19 
D=2 
s_{1}=1.04, s_{2}=1.04, s_{3}=.104 
Table 22. Sets of K=21 Objects in N=3 Colors and Their D Diversity and s Significance Indices
The N=3 set, (■■■■■■■■■■, ■■■■■■■■■■, ■), (10, 10, 1), has a diversity index of D=2.19, which rounded off to D=2 implies D=2 significant subsets, the red and the green via their s_{1}=s_{2}=1.04 significance indices, with the one object purple subset insignificant via its s_{3}=.104 significance index, which also might be interpreted as the purple set contributing only token diversity to the set. In contrast, the D=3, (■■■■■■■, ■■■■■■■, ■■■■■■■), (7, 7, 7), set has 3 significant subsets, red, green and purple, s_{1}=1, s_{2}=1, s_{3}=1; as does the (■■■■■■, ■■■■■■, ■■■■■■■■■), (6, 6, 9), set whose D=2.88 diversity rounds off to D=3, s_{1}=.824, s_{2}=.824, s_{3}=1.24.
One can get a better sense of how automatically the human mind evaluates significance and insignificance by manifesting the K=21, N=3, colored object sets in Table 22 as K=21 threads in N=3 colors in a swath of plaid cloth.


(10, 10, 1), D≈2 
(7, 7, 7), D=3 
(6, 6, 9), D≈3 

A woman with a plaid skirt with the (10, 10, 1), D≈2, pattern on the left would spontaneously describe it as a red and green plaid, omitting reference to the insignificant thread of purple. She would do this automatically or subconsciously without any conscious calculation of her sense of it because the human mind automatically disregards the insignificant both in its sense and verbalizing of things in tis visual field. This verbalization of only the significant colors in the plaid swath, red and green, should not be surprising given that the word “significant” has as its root, “sign” meaning “word”, which suggests that what is sensed by the mind as significant is signified or verbalized while what is insignificant isn’t signified or verbalized or given a word in discourse or in thought.
Quantitative significance is not just a characteristic of the size or quantity of objects observed but also of the frequency of our observing objects or events. Consider as an illustration of this a game where you guess the color of an object picked blindly from a bag of objects, (■■■■■■■■■■, ■■■■■■■■■■, ■). Assume that you don’t know the makeup of the objects in the bag. Then your sense of what colors are significant or insignificant come only from the frequency the colors are picked from the bag (with replacement). And over time, as you see purple picked infrequently, that color will come to seem insignificant in your mind and to also be disregarded as the color you think likely to be picked.
The human mind’s operating automatically to disregard the insignificant is an important factor for behavior because we generally think, talk about, pay attention to and act on what we consider to be significant while automatically disregarding the insignificant in our thoughts, conversations and behavior. This is an important aspect of propaganda and mind control for issues and opinions frequently disseminated through mass media and other ruling class information outlets subconsciously or automatically are sensed as significant and tend to take up the bulk of one’s thoughts, conversations and behavioral considerations; while issues, observations and opinions infrequently or not at all transmitted are appraised as insignificant and disregarded. In this way sports, entertainment and vacuous political opinions are made to seem significant, crowding out issues genuinely meaningful to people shown infrequently or not at all, like the abuse in workday life from bosses, which then tend to become insignificant in discourse and thought or less significant than they should be. This does not come about by chance, for people drugged with misinformation tend to stay in line. To hear this set to music, take a few minutes break from the math and listen to Curse That TV Set.
The application of the D diversity index to explaining the mind’s sense of significance and insignificance is a proper understanding of information as the human mind process it because D is an exact function that shows how the mind actually views and automatically compares sets of things balanced and unbalanced. To make this more clear we next will look at what information is from the broadest perspective. And in the section following the next one we will reformulate entropy correctly, which will not only explain this heretofore mystery phenomenon clearly, but also in locating significance in physical systems by showing entropy to be the number of energetically significant molecules in a thermodynamic system, reinforce the above explanation of the mind’s sense of significance and insignificance, which is central to understanding how we think and feel about things and how it is affected by ideological propaganda.
3. Information
The best case for D diversity based significance from insignificance as an intrinsic part of mental information is made with a revised elaboration of information theory. It has major limitations it stands. The inability of information theory to address the problem of meaning, which includes the meaning of things in terms of their significance, is made clear in a Scientific American article, From Complexity to Perplexity, (John Horgan, June, 1995.)
Created by Claude Shannon in 1948, information theory provided a way to quantify the information content in a message. The hypothesis still serves as the theoretical foundation for information coding, compression, encryption and other aspects of information processing. Efforts to apply information theory to other fields ranging from physics and biology to psychology and even the arts have generally failed – in large part because the theory cannot address the issue of meaning.
This shortcoming of information theory is remedied by understanding the prime information functions in information theory as diversity. This not only develops quantitative significance as one of the two primary factors for meaningfulness in information but also develops the association of emotion with objects and events as the other primary factor for meaningfulness. This revision of information theory also aids in clarifying thermodynamic entropy as a physical manifestation of diversity based significance, which further reinforces the reality of the mind’s diversity based sense of significance and insignificance. The central function for information in information theory is the Shannon information entropy.
24.)
This is an exceedingly messy looking thing, to be introduced in the simplest way possible. The sole variable in H is the p_{i }term. The simplest way to understand it is with a set of colored objects. Recall the K=12 object, N=4 color, (■■■, ■■■, ■■■, ■■■), (3, 3, 3, 3), set that has x_{1}=3 red, x_{2}=3 green, x_{3}=3 purple and x_{4}=3 black objects. The p_{i} term in Eq24 is most basically just the fractional measures of the colored objects. Formally we define p_{i} in this way as
25.)
For (■■■, ■■■, ■■■, ■■■), the p_{i} weight fractions of the set are p_{1}=x_{1}/K=3/12=1/4, p_{2}=x_{2}/K=1/4; p_{3}=1/4 and p_{4 }=1/4. The p_{3}=1/4, for example, just says that the green objects in the set are 1/4 of all the objects in the set. Note that p_{i} is an exact property of a set as the ratio of two exact functions, K and x_{i}. Also note that the p_{i} weight fractions of a set must sum to one.
26.)
Information theory was developed in 1948 by Claude Shannon, then a communications engineer with Bell Labs, to characterize messages sent from a source to some destination. Consider (■■■, ■■■, ■■■, ■■■) as a set of K=12 colored buttons in a bag in N=4 colors. I’m going to pick one of the buttons without looking and then send a message of the color I picked to some destination. The probability of any color of the N=4 colors being picked is just their p_{i} weight fraction.
27.) p_{i }= 1/N = 1/4.
So there’s a p_{1}=1/4 probability of my sending a message saying “I picked red.” And a p_{2}=1/4 probability of my message saying, “I picked green,” and so on. Plugging these p_{i}=1/4 probabilities into messy Eq24 obtains the amount of information in the color message sent.
28.)
That tells us that there’s H=2 bits of information in a message sent. What does that mean? This H=2 bits is the number of binary digits, 0s or 1s, minimally used to encode the color messages derived from the N=4 color set, (■■■, ■■■, ■■■, ■■■), as bits signals, namely as [00, 01, 10, 00]. Red might be encoded as 00, green, 01, and so on. So when the receiver gets 00 as the message, he decodes it back to red. The H=2 bits in each bit signal are considered to be the amount of information in a message. This kind of bit signal information is the synthetic or digital information that computers run on. All of this may seem quite out of the way from our primary goal of understanding violence, war and the need to rid the world of weapons, but be patient, for eventually this exercise will develop precise mathematical functions for all of the emotions including excitement, sex, fear, love and anger that are quite relevant.
To continue on technically, there is a simpler form of the Shannon entropy of Eq24 for balanced or equiprobable sets like (■■■, ■■■, ■■■, ■■■). Because the p_{i} probabilities for them are all alike as p_{i}=1/N, substitution of 1/N for p_{i} in Eq24 gets us a much simpler form for H for them of
29.) H= log_{2}N
This simpler equation gets us the same result as Eq28 in a faster way as
30.) H= log_{2}N = log_{2}4 = 2 bits
Now let’s evaluate the amount of information in a message that derives from a random pick from another set of buttons, K=16 of them in N=8 colors, (■■, ■■, ■■, ■■, ■■, ■■, ■■, ■■). Because this set is balanced, the probability of picking a particular color and sending a message about it is the same for all N=8 colors, p_{i}=1/N=1/8. And the amount of information in a color message from this set can be calculated from the simple, equiprobable, form of the Shannon entropy of Eq29 as
31.) H= log_{2}N= log_{2}8= 3 bits
It tells us to encode messages derived from N=8 color (■■, ■■, ■■, ■■, ■■, ■■, ■■, ■■) with the N=8 bit signals, [000, 010, 100, 001, 110, 101, 011, 111]. Each of them has H=log_{2}8=3 bits in it as the amount of information in a message from this set. Now we want to make the case that information can be understood as diversity and conversely that diversity is a measure of information. A very direct corroboration of a synonymy between diversity and information lies in the H Shannon entropy expressed in natural log terms as the Shannon Diversity Index that is found over the last 60 years as a measure of ecological and sociological diversity in the scientific literature. Paralleling Eqs24&29 for the Shannon information entropy is the Shannon Diversity Index of
32.) H= (general); H=ln(N) (equiprobable)
The difference between the Shannon entropy as information and the Shannon entropy as diversity derives merely from the difference in logarithm base used in the two as no way affects the perfect mathematical equivalence of H as information and as diversity. We can also generate a linear measure of H as
33.) M = 2^{H}
Termed the
number of messages (in Pierce, Introduction to Information Theory),
it is readily understood as a linear measure of diversity akin to the D
Simpson’s Reciprocal Diversity Index as seen in the list of sets below.
Set 
Number Set Values 
D from Eq5 or Eq8 
M from Eqs24&33 
(■■■, ■■■, ■■■, ■■■) 
x_{1}=x_{2}=_{ }x_{3}=x_{4}=3 
4 
4 
(■■■■■, ■■■, ■■■, ■) 
x_{1}=5, x_{2}=_{ }x_{3}=3, x_{4}=1 
3.273 
3.444 
(■■■■■■, ■■■■■■) 
x_{1}=x_{2}=6 
2 
2 
(■■■■, ■■■■, ■■■■) 
x_{1}=x_{2}=_{ }x_{3}=4 
3 
3 
(■■■■■■, ■■■■■■, ■■■■■■■■■) 
x_{1}=x_{2}=6, _{ }x_{3}=9 
2.882 
2.942 
(■■■■■■, ■■■■■, ■) 
x_{1}=6, x_{2}=5,_{ }x_{3}=1 
2.323 
2.505 
(■■■■■■■■■■, ■■■■■■■■■■, ■) 
x_{1}=x_{2}=10,_{ }x_{3}=1 
2.194 
2.343 
(■■, ■■, ■■, ■■, ■■, ■■, ■■, ■■) 
All x_{i}=2 
8 
8 
Table 34. Various Sets and Their D and M Biased Diversity Indices
We see in the above that M=2^{H} functions as a measure of diversity much
like D in having the same value as it, M=D=N, for the balanced sets and being
less than N in imbalanced sets as D is, even if not quite by the same amount,
which does not make it less a measure of diversity for the specific reduction
from N in the unbalanced case is inherently arbitrary given that our intuitive
sense of diversity is the only guide we have to the functions for diversity
being correct.
And another connection of the D diversity index with accepted information functions lies in D being the nonlogarithmic term in the Renyi entropy of information theory.
35.) R = logD
The Renyi entropy, R, is considered in information theory a bona fide information function as a generalization of the H Shannon entropy, details of the close relationship between R and H omitted here. The important thing to be pointed out is the intimate relationship between D as diversity sitting in R as information, which makes the R Renyi entropy a logarithmic form of diversity and suggests that D is information in some way in being the variable part of R
The above diversity, information connections suggest two kinds of diversity indices that fit two kinds of information. The two kinds of diversity indices are the logarithmic kind, as with H and R; and the linear kind, as with D and M. To better understand the two kinds of information that the two kinds of diversity, logarithmic and linear, represent, we next develop the D diversity index as a bit encoding recipe that parallels H as we showed it to be when we first introduced it.
Recall the H=2 bits measure for (■■■, ■■■, ■■■, ■■■) specify a bit encoding of the N=4 color messages derived from it of N=4 bit signals, [00, 01, 10, 11], each consisting of H=2 bits. We can also use the D=4 diversity index of this set as a bit encoding recipe for bit signals, each of which have D=4 bits: [0001, 0011, 0111, 1111]. And for the N=8 color set of (■■, ■■, ■■, ■■, ■■, ■■, ■■, ■■) whose H=3 bits measure encoded it as [000, 001, 010, 100, 110, 101, 011, 111], the D=8 diversity index used as a coding recipe encodes it as [00000001, 00000011, 00000111, 00001111, 00011111, 00111111, 01111111, 11111111] with each bit signal consisting of D=8 bits. Note in both D encodings that only one permutation of a given combination of 1s and 0s is used. This restricts us writing the 20s and 61s combination of bits in any one permutation of it, as 01111011 or 00111111 and so on, but not more than one permutation of the 20S and 61s combination. Also note that the all 0s bit signal is disallowed in this D encoding recipe.
Now anyone familiar with information theory will immediately note that the D bit recipe is quite inefficient as a practical coding scheme in its requiring significantly more bit symbols for a message than the H Shannon entropy coding recipe. This is not surprising since Claude Shannon devised his H entropy initially strictly as an efficient coding recipe for generating the minimum number of bit symbols needed to encode a message in bit signal form. The D diversity index as a coding recipe fails miserably at that task of bit symbol minimization. But we have developed it not trying to engineer a practical coding system in any way but rather to show how D can be understood in parallel to H as an information function, the efficiency of D for message transmission being quite beside the point.
We show that for D by next looking carefully at the details of the difference in the H and D bit encodings. Recall the (■■■, ■■■, ■■■, ■■■) set, whose N=4 colors are encoded in H coding with [00, 01, 10, 11] and in D coding with [0001, 0011, 0111, 1111]. Now look closely to see that these are two very different ways of encoding the N=4 distinguishable color messages from (■■■, ■■■, ■■■, ■■■) with N=4 distinguishable bit signals. What is special about the D bit encoding of (■■■, ■■■, ■■■, ■■■) with [0001, 0011, 0111, 1111] is that these N=4 bit signals are all quantitatively distinguishable with each bit signal distinct numerically from the other bit signals in having a different number of 0s and 1s in each bit signal.
This is not the case for the H encoding of (■■■, ■■■, ■■■, ■■■) with [00, 01, 10, 11], for there it is seen that the 01 and 10 signals have the exact same number of 0s and 1s in them and are, thus, not quantitatively distinguished from each other. Rather the 01 and 10 bit signals are positionally distinct from each other in the 0 and 1 bit signals being in different positions in 01 and 10. They are, we might say, qualitatively distinct, different, but in kind from position rather than in amounts of 1 and 0 bits.
This quantitative versus qualitative distinction between D versus H encoding is also clear for the N=8 set, (■■, ■■, ■■, ■■, ■■, ■■, ■■, ■■), and its D=8 bit encoding of it as [00000001, 00000011, 00000111, 00001111, 00011111, 00111111, 01111111, 11111111]. For there we see that every one of the N=8 bit signals is quantitatively distinguished from every other bit signal, each having a different number of 0s and 1s in them. This quantitatively distinguishable bit encoding with D=8 is in contrast to the H=3 bit encoding of that color message set as [000, 001, 010, 100, 110, 101, 011, 111] in which we see that the 001, 010 and 100 signals are not quantitatively distinguished, each of them having 20s and 11, but rather distinguished entirely by the positions of the 1 and 0 bits in those bit signals. And that positional or qualitative distinction in bit signals is also seen between the 011, 101 and 110 signals, all of which are quantitatively the same rather than quantitatively distinguishable.
This qualitative versus quantitative difference between these two kinds of bit information, H and D, corresponds to our everyday sense of information as being either qualitative or quantitative. When I tell you General George Washington worked his Virginia planation with slaves rather than hired help, that’s qualitative information for you. While if I tell you that he owned 123 slaves at the time of his death, that’s quantitative information. Nearer at hand with our example set, it is clear in (■■■, ■■■, ■■■, ■■■) that the color subsets are all qualitatively distinct from each other as is well denoted with their [00, 01, 10, 11] bit encoding. It is also, though, clear that there are N=D=4 color subsets, which is well denoted with [0001, 0011, 0111, 1111], which effectively counts them.
This explains why the H (qualitative) coding recipe is logarithmic in form and why the D (quantitative) coding recipe is linear in form. H is logarithmic because it is a coding recipe for information communicated from one person to another. The human mind distinguishes intuitively between the positions of things as between 20s and 11 arranged as 001 or 010 in different positions. This property of mind allows us to represent distinguishable messages sent from one person to another, like ■ and ■, encoded with signals distinguished via positional or qualitative distinction like 001 and 010. Because the N number of distinguishable messages that can be constructed from H variously permuted, variously positioned, bit symbols is determined by N=2^{H}, a power function, the information in one of those messages specified as the H number of bits in each bit signal is inherently logarithmic via the inversion of N=2^{H} as H=log_{2}N.
Compare this to D=4 encoding of the N=4 colors in (■■■, ■■■, ■■■, ■■■) as [0001, 0011, 0111, 1111]. This D encoding recipe encodes the colors via the number of 1s in the bit signals or effectively with ordinal numbers that encode the colors as the 1^{st} color, the 2^{nd} color, the 3^{rd} color and the 4^{th} color, which is most basically just a count of the number of distinguishable colors and clearly represents quantitative information about them. Information that comes to us from nature, in contrast to information communicated between one person and another, is, when it is a precise description of nature, quantitative information as every serious practitioner of physical sciences knows. As such an encoding in bit form of such quantitative information whose source is nature, must be, in contrast to information communicated from person to person, of the D encoded linear type because the fundamental operation for quantification, counting, is inherently linear as 1, 2, 3, and so on.
Hence the most general understanding of information is as diversity. That includes logarithmic diversity for communicated information, as H most basically is as we see unarguably when it is understood as the Shannon Diversity Index; and linear diversity, which is linear in form as specified with the counting numbers, 1, 2, 3, and so on. It is not that such quantitative descriptions of items cannot be conveyed in communication via positional distinctions as seen in the Arabic numerals that write thirteen as 13, distinct from 31 positionally, rather than as 1111111111111; and in binary numbers that write thirteen with positional distinctions as 1101, distinct from 1110 positionally. But that should not take away from the reality of the elemental linear nature of counting and, hence, of science’s distinguishing things quantitatively in the linear form that the D diversity exists in.
Further, as is clear from our introductory take on the inexactness in counting things unequal in size, the quantitative information that comes to us from nature when the things to be counted in a natural system are unequal in size, must come in the form of the exact D diversity rather than inexact N. And in that case, as we also have made clear, D must be understood as the number of significant things or subsets in a set.
It is important to show this convincingly in a physical system to show that the human mind works this way as a matter of course in obeying physical laws or better said, biophysical laws, that are as inviolate and compelling in controlling human nature and the behavior that flows from it as the law of gravitation is in controlling planetary behavior.
4. Thermodynamic Systems
This sense of diversity as an exact specification of the number of significant subsets in a set is made thoroughly clear physically when shown for a thermodynamic system consisting of N molecules over which are distributed K energy units and whose entropy we will formulate in terms of diversity as the number of energetically significant molecules. The reason why diversity must be used to describe such a system is from the empirical fact that the K energy units of a gaseous thermodynamic system are distributed over its N molecules in an unbalanced way from the MaxwellBoltzmann energy distribution (below), which tells us that N is inexact and mandates that the molecules be specified quantitatively in exact form with a diversity function, which is what entropy is physically.
Figure 36
To unequivocally derive entropy as diversity interpreted as the number of energetically significant molecules, we need to show that diversity fits both of the formal representations of entropy. One of them is the Clausius formulation of entropy,
37.)
The dS term means a small change in S entropy. The dQ term means a small change in Q heat. And T is the absolute temperature of the system undergoing change. This Clausius expression for S entropy says that the change in the Q heat energy by the T temperature brings about a change in the entropy, S. We will spend most of this section showing how the dQ/T term in the Clausius expression is diversity dimensionally.
And the other entropy function we must show is correlated with diversity is the Boltzmann microstate entropy formulation, S=klogW, inscribed on Ludwig Boltzmann’s 1906 tombstone in Vienna.
In modern terminology it is
38.) S=k_{B}lnΩ
It shows entropy, S, to be a function of the natural logarithm of the Ω number of microstates in a random distribution with k_{B} a constant called Boltzmann’s constant. We will show the correlation of S=k_{B}lnΩ to diversity in the next section.
Considering first the Clausius entropy of Eq37 dimensionally as energy diversity requires us to now introduce a new diversity index to our menagerie of diversities, the Square Root Diversity Index, h.
39.)
This h diversity index, as it turn outs, is the proper diversity measure for entropy. We will derive it formally in a while as will explain why it is called the square root diversity index. For the moment we will evaluate for the (6, 5, 1) number set depicted earlier as (■■■■■■, ■■■■■, ■), which is also understandable as K=12 discrete energy units distributed over the N=3 molecules of a gaseous minithermodynamic system as x_{1}=6 energy units on the 1^{st} molecule, x_{2}=5 energy units on the 2^{nd} molecule and x_{3}=1 energy unit on the 3^{rd} molecule. The p_{i} for this (6, 5, 1) distribution are p_{1}=6/12, p_{2}=5/12 and p_{3}=1/12. This obtains the h square root diversity of this (6, 5, 1) distribution as
40.)
To illustrate
h as a diversity measure, we compare it for the sets in Table 34 to the D and M
diversity indices as
Set of Unit Objects 
Subset Values 
D, Eq5 or Eq8 
M, Eqs24&33 
h, Eq37 
(■■■, ■■■, ■■■, ■■■) 
x_{1}=x_{2}=_{ }x_{3}=x_{4}=3 
4 
4 
4 
(■■■■■, ■■■, ■■■, ■) 
x_{1}=5, x_{2}=_{ }x_{3}=3, x_{4}=1 
3.273 
3.444 
3.468 
(■■■■■■, ■■■■■■) 
x_{1}=x_{2}=6 
2 
2 
2 
(■■■■, ■■■■, ■■■■) 
x_{1}=x_{2}=_{ }x_{3}=4 
3 
3 
3 
(■■■■■■, ■■■■■■, ■■■■■■■■■) 
x_{1}=x_{2}=6, _{ }x_{3}=9 
2.882 
2.942 
2.941 
(■■■■■■, ■■■■■, ■) 
x_{1}=6, x_{2}=5,_{ }x_{3}=1 
2.323 
2.505 
2.538 
(■■■■■■■■■■, ■■■■■■■■■■, ■) 
x_{1}=x_{2}=10,_{ }x_{3}=1 
2.194 
2.343 
2.394 
(■■, ■■, ■■, ■■, ■■, ■■, ■■, ■■) 
All x_{i}=2 
8 
8 
8 
Table 41. Various Sets and Their D, M and h Biased Diversity Indices
We see that h=N for balanced sets and that h < N for unbalanced sets much as D and M=2^{H} are and also note that h is very close in value for all sets to M=2^{H}, the H Shannon entropy linear diversity. Note also that the h diversity is also an exact measure, as is M=2^{H} in h and H being solely functions of the p_{i} of a set, which we made clear earlier are exact.
Next we return to the topic of the inexactness in the arithmetic average, µ, first discussed following Eq2 to ask if there is there an exact average for an unbalanced set that is a correlate of the inexact µ mean and can be used in place of it as the D diversity index as an exact correlate of and replacement for inexact N. Yes, there is. It is a biased average formed by substituting for inexact N in K/N=µ the exact D diversity correlate of N. This forms K/D as an exact average, to which we give the symbol, φ, (phi).
42.)
The φ=K/D biased average is an exact average in being a function of K, which is exact, and D, which is also exact as we showed earlier. The K=12, N=3, µ=K/N=4, D=2.323, (■■■■■■, ■■■■■, ■), (6, 5, 1) set has a biased average of φ=K/D=12/2.323=5.166. It is greater than this set’s arithmetic average of µ=K/N=4 in being weighted or biased towards the larger x_{i} values in (6, 5, 1). We detail the origin of the bias in the φ average towards the larger x_{i} values in the set by expressing φ via Eqs5,2&25 as
43.) = =
This shows φ to be the sum of fractional “slices” of the x_{i} of a set, slices that are p_{i} in thickness, which biases the average towards the larger x_{i} in the set by weighting them with their paired larger p_{i} weight fractions. Biased averages are not new to mathematics. Recall the ξ rms (root mean square) average of Eq14 seen to be ξ=4.546 for the (6, 5, 1) set, also greater than the µ=4 arithmetic average in being biased towards the larger x_{i} values. Unlike the φ biased average, though, the ξ rms average is not an exact average because it is a function of inexact N as seen in Eq14. Note that we can invert Eq42 to express the D diversity as a function of the φ biased average as
44.)
Much as the human mind operates on the D diversity index to sense what is significant and what is insignificant, so also does it operate on the φ=K/D biased average by intuitively sensing the size of a thing as biased towards the (quantitatively) significant examples of the thing. An example might be from a person’s first sighting of a cluster of four Northern California mountain goats of heights in feet (6, 2, 2, 2), the one of them a really “big Billy Goat Gruff.” While the arithmetic average of these mountain goats is µ=3 feet high, one’s automatic estimation of the goats’ size from the bias of the biggest one would be that they are generally “big”, an intuitive impression of their size more tilted towards the φ biased average of φ=4 feet tall. In general we automatically bias our sense of the size of something towards its larger sized or more significant representatives.
Our interests in this section are more for understanding significance in physical than in mental systems and towards that end we next develop an exact average from the h square root diversity of Eq40 called the square root biased average. In parallel to K/N and K/D, it is K/h, to which we give the symbol, ψ, (psi).
45.) ψ
The ψ=K/h biased average is an exact average in being a function of K, which is exact, and h, which is also exact as we made clear earlier. The K=12, N=3, µ=K/N=4, h=2.538, (■■■■■■, ■■■■■, ■), (6, 5, 1), set has a square root biased average of ψ=K/h=12/2.538=4.72, greater than the arithmetic average of this set, µ=4, in being biased towards the larger x_{i} values in the (6, 5, 1) set. We detail the origin of the bias in the ψ average towards the larger x_{i} in a set by expressing ψ via Eqs5,2&25 as
46.) ψ
The numerator in the end term shows the ψ square root biased average to be the sum of “slices” of the x_{i} of a set, with each slice p_{i}^{1/2} in thickness as biases this average towards the larger x_{i} in the set. The ∑p_{i}^{1/2} term in the denominator of the end term is a normalizing function that makes the p_{i}^{1/2 }“slices” in the numerator sum to one, this summing to one of the fractional “slices” necessary for the construction of any kind of an average of a number set. We can invert Eq45 to express the h square root diversity index as a function of the φ square root biased average as
47.)
Now we list
the sets in Table 41 along with their φ=K/D and ψ=K/h biased averages.
Set 
Subset values 
φ=K/D 
ψ=K/h 
(■■■, ■■■, ■■■, ■■■) 
x_{1}=x_{2}=_{ }x_{3}=x_{4}=3 
3 
3 
(■■■■■, ■■■, ■■■, ■) 
x_{1}=5, x_{2}=_{ }x_{3}=3, x_{4}=1 
3.667 
3.369 
(■■■■■■, ■■■■■■) 
x_{1}=x_{2}=6 
6 
6 
(■■■■, ■■■■, ■■■■) 
x_{1}=x_{2}=_{ }x_{3}=4 
4 
4 
(■■■■■■, ■■■■■■, ■■■■■■■■■) 
x_{1}=x_{2}=6, _{ }x_{3}=9 
7.286 
7.1387 
(■■■■■■, ■■■■■, ■) 
x_{1}=6, x_{2}=5,_{ }x_{3}=1 
5.167 
4.727 
(■■■■■■■■■■, ■■■■■■■■■■, ■) 
x_{1}=x_{2}=10,_{ }x_{3}=1 
9.571 
8.771 
(■■, ■■, ■■, ■■, ■■, ■■, ■■, ■■) 
All x_{i}=2 
2 
2 
Table 48. Various Sets and Their φ and ψ Biased Averages
Now let’s apply the above to a consideration of a thermodynamic system of N gas
molecules over which are distributed K discrete energy units. In it the microstate
absolute temperature is understood in the standard physics rubric as the arithmetic
average per molecule of the system’s kinetic energy. This implies, via the
equipartition of energy theorem, a normalized microstate
temperature of an arithmetic average per molecule of the K total energy, µ=K/N.
There is something wrong or inexact with that, though, for we earlier learned that the K energy units of the system are distributed over its N molecules in an unbalanced way as seen in the MaxwellBoltzmann energy distribution of Figure 36. Hence though the count of K discrete energy units in the system is exact because they all have the same value of 1 energy unit, the N number of molecules is inexact because the N molecules are not all the “same size” energetically in not all containing the same number of energy units, as the MaxwellBoltzmann distribution of Figure 36 makes clear. Hence the arithmetic average molecular energy of the N molecules as µ=K/N must also be inexact in being a function of inexact N. This suggests its replacement with one of the exact biased averages we have developed, φ or ψ.
We need not equivocate between the two because the specification of the square root biased average, ψ, as normalized microstate temperature quickly becomes clear from how temperature is physically measured with a thermometer. Each of the N molecules in the system collides with the thermometer wall to contribute to the temperature measure at a frequency of collision equal to the velocity of the molecule, which is itself proportional to the square root of the x_{i} number of energy units on the molecule. Hence the smaller energies of the slower moving molecules in the MaxwellBoltzmann energy distribution of Figure 36 collide with the thermometer less frequently and have their energies “recorded” by the thermometer less frequently, with smaller p_{i}^{1/2} slices of their energies, and in this way contributing less to the temperature measure than the energies of the faster moving molecules which collide with the thermometer more frequently, thus contributing to the temperature measure their energies with a larger p_{i}^{1/2} slice in an obviously biased way.
As the velocities and, hence, collision rates of the molecules are directly proportional to the square root of the x_{i} energy of the molecules, the true average molecular energy that is temperature is the weighting of the x_{i} molecular energies by their squarerootoftheenergy velocities, which fractional weightings are the p_{i}^{1/2} slices as determines the biased energy average to be the square root biased average energy per molecule, ψ. This is the inarguable physical end of temperature measure.
Now from Eq43, we see K energy divided by ψ biased energy average, K/ψ, to be the h square root diversity, h= K/ψ. In parallel the Clausius formulation of Eq37 of dS=dQ/T tells us that dimensionally S entropy is Q energy divided by T temperature. With normalized microstate temperature understood as the ψ square root biased average, the division of dQ energy by T temperature as ψ leads dimensionally to dS entropy being some measure of the h square root diversity index. This understanding of entropy as energy diversity or energy dispersal in terms of h diversity further understood as the number of energetically significant molecules in the system.
But to be
sure of that understanding, which also provides physical insight into the human
mind’s sense of significance, we need to understand entropy in terms of h as a
property of a clearly pictured thermodynamic system. And to get that picture we
need to show how h diversity based entropy also fits Boltzmann’s microstate
entropy, S=k_{B}lnΩ,
quantitatively.
5.) Microstate Entropy
We begin that task by first clarifying the distinctions that we make between things. Consider the K=8 objects in (■■■■, ■■■■). On the one hand they are distinguished by color. A red object, ■, is intuitively distinguished from a green object, ■. This is called categorical distinction or distinction of kind. But we also, in real experience, distinguish between two materially different objects of the same kind. We do this in an intuitive way by noting that two objects of the same kind are residing in different places. Consider two disposable pens of the same kind fresh out of a package of two, one of which I am holding in my right hand and one in my left. Clearly these are two distinguishable objects even if of the same kind. From that perspective, we understand the set of 4 red objects in (■■■■) as fundamentally distinct and represent them as (abcd) to make it clear that though they are all red objects, of the same kind, they are yet distinct, fundamentally distinct, from each other.
We next want to understand the dividing of sets of objects into subsets not only as objects divided up as differentiated by color as we see in the set, (■■, ■■), but of objects divided up into different containers as subsets. Thus we may think of a set of fundamentally distinct objects as K=4 perfectly the same red candies are as (abcd) divided into N=2 subsets of 2 children, Jack and Jill. We can divide up the K=4 set of red candies into these N=2 child subsets, for example, as (a, bcd) as denotes x_{1}=1 candy for Jack and x_{2}=3 candies for Jill and in a number of other ways. Note that this particular distribution of objects has, from Eq2, an arithmetic average of µ=K/N=4/2=2 pieces of candy per child and a D diversity from Eq5 of D=1.6.
This analysis of a set of K=4 candies divided up between N=2 kids will have great relevance to our understanding entropy properly and thoroughly when we use a candy distribution as a model for the distribution of K discrete energy units over the N molecules as a thermodynamic system. But for now it is easier to talk about candies and kids rather than energy units and molecules because the dynamics of the systems, essentially the same for both, is easier to picture and intuitively understand with the candies and kids.
Specifically we are going to consider a random distribution of K=4 candies to N=2 children, Jack and Jill, as done by their grandmother tossing the candies blindly over her shoulder, one at a time, to the kids. Such a distribution is understood as equiprobable because each of the N=2 children has an equal probability of P=1/N=1/2 of getting a candy on any given toss. There are ω=N^{K}=2^{4}=16 (ω is little omega) permuted states or different ways of candy distribution as given in the {braces} in the table below with the candies Jack gets on the left of the comma in the {braces} and the candies that Jill gets to the right of the comma. And we explain the other properties of this random or equiprobable distribution listed in the table below it.
ω=16 permuted states 
{abcd, 0} 
{abc, d} 
{ab, cd} 
{a, bcd} 
{0, abcd} 


{abd, c} 
{ac, bd} 
{b, acd} 



{adc, b} 
{ad, bc} 
{c, abd} 



{bcd, a} 
{bd, ac} 
{d, bca} 




{bc, ad} 





{cd, ab} 


Ω=5 microstates, (Ω is capitol omega) 
[4, 0] 
[3, 1] 
[2, 2] 
[1, 3] 
[0, 4] 
Permutated states per microstate 
1 
4 
6 
4 
1 
Probability of a microstate 
1/16 
4/16=1/4 
6/16=3/8 
4/16=1/4 
1/16 
Number set notation of a microstate 
x_{1}=4, x_{2}=0 
x_{1}=3, x_{2}=1 
x_{1}=2, x_{2}=2 
x_{1}=1, x_{2}=3 
x_{1}=0, x_{2}=4 
Table 49.The Permuted States and so on of the Random Distribution of K=4 Candies to N=2 Children
All the ω=16 permuted states are inherently equiprobable, the probability
of each permuted state that arises after all K=4 candies are tossed being from
elementary probability theory, 1/ω=1/K^{N}=1/16. If grandma repeated
her random tossing of K=4 candies to the N=2 grandkids 16 times, on average
each of the ω=16 permuted states shown on the first through sixth lines in
the table would occur 1 time.
The ω=N^{K}=16 permutations are grouped into Ω=5 microstates, [4, 0], [3, 1], [2, 2], [1, 3] and [0, 4] on the seventh line in the table with each microstate consisting of a given number of permuted states, as is listed on the eighth line in the table. The [1, 3] microstate state consists, for example, of 4 permuted states as tells us that there are 4 ways that Jack can get 1 candy and Jill get 3.
On the ninth line in the table is the probability of each microstate coming about. For example, the probability of each child getting 2 of the K=4 candies tossed, the [2, 2] state, is 6/16=3/8=.375. And on the last line in the table are the number set notations of the states, micro and permuted, x_{1} being the number of candies that Jack gets and x_{2},_{ }the number that Jill gets in any state.
We are using this random distribution of K candies over N kids as an easy to follow model for the random distribution of K discrete (whole numbered) energy units over the N molecules of a thermodynamic system, so we will toggle back and forth between the two distributions, candies over kids and energy units over molecules, as we proceed.
To get a physical picture of the distributions, consider for the candies over kids distribution grandma repeating the toss of the K=4 candies 16 times. This produces, on average, each of the ω=N^{K}=16 permuted states once. Now consider the K=4 parameter as the number of energy units distributed randomly between N=2 gas molecules by collisions that transfer the energy units between the colliding molecules in an overall random way such that after 16 collisions, on average, every permuted state in Table 49 occurs once. That, of course, is only if the energy units, like the candy bars, are distinguishable from each other.
We will consider this unorthodox view in detail in a moment. Boltzmann considered the energy units to be indistinguishable from each other and would consider all Ω=5 of the microstates in Table 49 to be equiprobable. The difference between these two views is easy to depict physically: Line50 below for 16 permuted states of the K=4 candies and distinguishable energy units; and Line51 for 15 microstates of the indistinguishable energy units. The latter are just specified numerically for we do not spell them out in terms of energy unit distinctions because none exist with the indistinguishable energy unit assumption.
50.) {abcd, 0}, {abc, d}, {abd, c}, {adc, b}, {bcd, a}, {ab, cd}, {ac, bd}, {ad, bc}, {bd, ac}, {bc, ad}, {cd, ab}, {a, bcd}, {b, acd}, {c, abd}, {d, bca}, {0, abcd}
51.) [4, 0], [3, 1], [2, 2], [1, 3], [0, 4], [4, 0], [3, 1], [2, 2], [1, 3], [0, 4], [4, 0], [3, 1], [2, 2], [1, 3], [0, 4]
Neither distribution occurs necessarily in the order shown, but as any random variation of it for either over time. Now note that the two kinds of distribution are quite different in frequency or probability. The [4, 0] microstate, for example, occurs three times more frequently with energy unit indistinguishability in Line51 than with distinguishability assumed in Line50. We can also express the distinguishable case of Line50 in terms of microstates, which in the unorthodox take would appear over 16 collisions as fit the Line50 frequencies as
50a.) [4, 0], [3, 1], [3, 1], [3, 1], [3, 1], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [1, 3], [1, 3], [1, 3], [1, 3], [0, 4]
These are two very different pictures of a thermodynamic system, inherently incompatible, so they cannot both be correct much as we shall see they are mathematically equivalent. Boltzmann statistical mechanics assumes that the molecules of a thermodynamic system are distinguishable, as we do. That is, even when the molecules of the system are of the same kind, say, all Helium molecules, they are materially distinct from each other as seen to be residing in different places in space.
Hence, one can argue that Lines50&50a based on the assumption of energy unit distinguishability are correct both for candy and energy random distributions from the perspective of energy units being distinguishable in being in different places in their residing on molecules that are in different places in space. And one can also understand distinguishable energy units from the argument that upon collision, some one or more of the energy units on a colliding molecule may be transferred to the other molecule and as such that the energy unit/s transferred from a molecule are distinguishable from the energy units that remain on that molecule.
But we don’t want to disqualify the Boltzmann view which has held sway for 100 years with too facile an argument on distinguishability that almost comes down to a “how many angels on the head of a pin” debate. In the end Boltzmann’s theory does fit the empirical data, so however it is that the assumption of distinguishable energy units might be argued, Boltzmann’s theory must be taken seriously and debated further. On the other hand, we shall see that the unorthodox assumption of distinguishable energy units produces an equally perfect mathematical fit to data and one that makes near infinitely more physical and intuitive sense out of entropy.
As we further compare the two theories, note that the sense of microstates appears in both theories, equiprobable in the Boltzmann theory and as is made clear in Table 49, nonequiprobable in the unorthodox theory. Note also that the permuted states are the “microstates” of a thermodynamic system, its fundamental states, under the distinguishable energy unit assumption), but given a different name of “permuted states” so as to not confuse the two formulations.
That said, let’s get on to explaining Boltzmann’s S=k_{B}lnΩ entropy in terms of the Ω number of microstates. There is a textbook shortcut formula for calculating the Ω number of microstates for any distribution of K objects over N containers, this whether the Ω microstates are nonequiprobable subsets of the ω permuted states in the unorthodox view or the equiprobable feundamental microstates of the Boltzmann orthodox view.
52.)
This formula calculates the Ω=5 number of microstates for a K=4 over N =2 random distribution that we saw in Table 49 as
53.)
Eq52 is useful, indeed necessary, for calculating the Ω number of microstates
of very large K over N distributions. If grandma distributed K=145 candies
randomly to N=25 children in the neighborhood, the Ω number of microstates
calculated from Eq52 is Ω =1.45EXP31, which one would have a hard time arriving
at from listing them all out as we did the Ω=5 microstates for the K=4
over N=2 distribution in Table 49. Note importantly that this formula of Eq52
is valid for Ω whether one is talking about Ω for the Boltzmann orthodox
or our unorthodox theory.
The Boltzmann microstate entropy is centrally the natural logarithm of Ω as S=k_{B}lnΩ when the thermodynamic system is understood to consist of indistinguishable energy units. What this means physically, though, is quite impossible to make any sense out of. In the recently published book, A Conceptual Guide to Thermodynamics, (Wiley, 2014), the author, Bill Poirier of Texas Tech University, quotes a dozen other textbook authors in his Chapter 10 devoted to entropy characterizing entropy as mysterious, uncomfortable, most misunderstood, less than satisfying, less than rigorous, confused, rather abstract, rather subjective, difficult to comprehend, not a household word like energy and in their being no adequate answer to the question, “What is entropy?”.
Professor Poirier ends this litany of opinions on the lack of clarity of entropy as currently understood with the encouraging words: “But it doesn’t have to be this way!” as a preview of his attempt to clarify the mystery of entropy via information theory. Which he then fails miserably at in not understanding information correctly. If this assessment seems excessively curt, the reader is invited to read this chapter of Poirier’s book himself or herself and find after doing so that the concept of entropy remains mysterious, uncomfortable, most misunderstood, less than satisfying, less than rigorous, confused, rather abstract, rather subjective and difficult to comprehend. The reason why is that the problem with explaining entropy in an intuitively sensible way arises not from any inherent inexplicability of entropy, but with the incorrectness in its formulation, which corrected, as we will do, erases all the confusions about it.
We will develop our explanation for entropy in terms of the easytointuitivelyvisualize candy over children distribution and will stick to that picture for as long as it serves its purpose well. Understand that, we proceed under the distinguishable energy unit assumption in which the Ω=5 microstates serve solely as a convenient grouping of the underlying physical and countable reality of the permuted states of the system jotted in red in Table 49 and as they appear over time from repeated distributions in Lines 50 and 50a.
It should be clear that some of the Ω=5 microstates in the K=4 over N=2 candy distribution are more diversely distributed than other microstates. The [2, 2] balanced microstate of both children getting the same number of candies in the random tossing of K=4 candies has greater diversity than do the unbalanced [1, 3], [3, 1], [0, 4] and [4, 0] microstates states. We next calculate the diversity indices of these microstates with focus for the most part on the D diversity in its being easier to manipulate mathematically than the h diversity, which we will pick up on later.
From Eq13 we obtain D as a function of the σ^{2} variance of a microstate (of its number set representation) as obtained from Eq11 and as a function of its µ mean obtained from Eq2. The variance of the [3, 1] microstate of the K=4 over N=2 distribution, x_{1}=3 and x_{2}=1, is σ^{2 }=1 with a µ mean of µ=K/N=2 and, hence a D diversity of from Eq13 of
54.)
The µ mean of
all Ω=5 microstates of the K=4 over N=2 distribution is µ=K/N=4/2=2 and
their σ^{2} variances and D diversities from Eqs11&13 are
Microstate 
Mean, µ 
Variance, σ^{2} 
Diversity, D 
[4, 0] 
2 
4 
1 
[3, 1] 
2 
1 
1.6 
[2, 2] 
2 
0 
2 
[1, 3] 
2 
1 
1.6 
[0, 4] 
2 
4 
1 
Table 55. Set Properties of the W=5 States
of the K=4 over N=2 Distribution
The average
of the σ^{2} variances of the Ω=5 microstates in the K=4 over
N=2 distribution is a probability weighted average that weights the
variances of each microstate by the probability of occurrence of that microstate
listed on the ninth line in Table 49. To repeat, if confusion with standard
statistical mechanics terminology arises, while the Ω microstates in the
Boltzmann take are assumed equiprobable, with our unorthodox distinguishable
energy units assumption, the microstates are not equiprobable as seen on the
ninth line in Table 49 and as is also made clear from the frequency of occurrence
of microstates diagram under the distinguishable energy unit assumption in
Line50a.
Microstate 
Variance, σ^{2} 
Probability of the Microstate 
Probability Weighted Variance 
[4, 0] 
4 
1/16 
(4)(1/16)=1/4 
[3, 1] 
1 
Ľ 
(1)(1/4)=1/4 
[2, 2] 
0 
3/8 
(0)(3/8)=0 
[1, 3] 
1 
Ľ 
(1)(1/4)=1/4 
[0. 4] 
4 
1/16 
(4)(1/16)=1/4 



Sum is the average variance=σ^{2}_{AV}=1 
Table 56. The Average Variance, σ^{2}_{AV}, of the Ω=5 Microstates of the K=4 over N=2 Distribution
The average variance is given as σ^{2}_{AV}, which for the
K=4 over N=2 distribution is σ^{2}_{AV}=1. Now let’s
modify D in Eq13 where it is a function of σ^{2} so we can
calculate an average diversity, D_{AV}, as a function of the average
variance, σ^{2}_{AV}.
57.)
This calculates the average diversity, D_{AV}, of the K=4 over N=2 distribution from the σ^{2}_{AV}=1 average variance that was obtained in Table 56 as
58.)
We can obtain a shortcut formula for the D_{AV} average diversity of a K over N equiprobable distribution from a shortcut formula for the σ^{2}_{AV }average variance that we will derive from a textbook expression for the variance of a multinomial distribution (see Wikipedia). For the general case, that variance expression is
59.)
This simplifies for the equiprobable case where the P_{i} term in the above becomes is P_{i}= 1/N, which tells us that each of the N containers in a K over N distribution have an equal, 1/N, probability of getting any one of the K objects distributed, be the K objects candies or energy units and be the N containers respectively children or molecules. This P_{i}=1/N probability for equiprobable distribution is familiar from the K=4 candy over N=2 children equiprobable distribution as P=1/N=1/2 where each child having a 1/2 or 50% probability of getting any one candy blindly tossed by grandma. This P_{i} =1/N probability greatly simplifies the multinomial variance expression of Eq59 for the equiprobable case to
60.)
This variance of an equiprobable multinomial distribution, as it turns out, is the average variance of an equiprobable distribution, σ^{2}_{AV}, that we calculated for our example distribution in Table 56. Hence we can write Eq60 as
61.)
That the variance of an equiprobable multinomial distribution is, indeed, the average variance, σ^{2}_{AV}, is demonstrated by calculating the σ^{2}_{AV}=1 average variance of the K=4 over N=2 distribution obtained in Table 56 from Eq61 as
62.)
Eq61 can now be used to generate a shortcut formula for the average diversity, D_{AV}, by substituting its σ^{2}_{AV} into Eq57 to obtain D_{AV} as
63.)
And we demonstrate the validity of the above by calculating the D_{AV}=1.6 average diversity of the K=4 over N=2 distribution obtained in Eq58 as
64.)
Eq63 enables us to obtain the average diversity of a large K and large N
equiprobable distribution. Next we show that the D_{AV} average
diversity of the Ω microstates states of a large K over N equiprobable
distribution (and, indeed, also of the underlying ω permuted states of the
system) is near perfectly directly proportional to the logarithm of the Ω
number of microstates, lnΩ, as given in terms of K and N from Eq52 as
65.)
For the K=145 over N=25 distribution, lnΩ=71.75. For much larger K and N equiprobable distributions that are closer to realistic thermodynamic distributions of K energy units over N molecules it is easier to calculate lnΩ using Stirling’s Approximation. It approximates the ln (natural logarithm) of the factorial of any number, n, as
66.)
This approximation is excellent for large n. For example, ln(170!) =706.5731 is well approximated with Eq66 as ln170! ≈706.5726. Stirling’s Approximation for the lnΩ expression in Eq65 is
67.)
Now let’s use this formula for lnΩ to compare the lnΩ of randomly
chosen large K over N equiprobable distributions to their D_{AV}
average diversity in Eq63.
K 
N 
lnΩ 
D_{AV} 
145 
30 
75.71 
25 
500 
90 
246.86 
76.4 
800 
180 
462.07 
147.09 
1200 
300 
745.12 
240.16 
1800 
500 
1151.2 
381.13 
2000 
800 
1673.9 
571.63 
3000 
900 
2100.88 
692.49 
Table 68. The lnΩ and D_{AV}
of Large K over N Distributions
The
Pierson’s correlation between D_{AV} and lnΩ for these
distributions is .9995 indicating a near perfect direct proportionality between
the two as can be appreciated visually from the near straight line in the scatter
plot below of the D_{AV} versus lnΩ values in Table 68.
Figure 69. A plot of the D_{AV}
versus lnΩ data in Table 68
This high .9995 correlation between lnΩ and D_{AV} becomes greater yet the larger are the K and N values, K>N, of the distribution. And it has these very high correlation coefficients no matter how one randomly selects the K>N distributions to be tested. For values of K on the order of EXP20, the correlation for K>N distributions is .9999999≈1 indicating effectively a perfect direct proportionality between lnW and D_{AV} for very large, realistic, K over N random distributions.
This correlation is so high that we must conclude that as the Boltzmann S entropy is adjudged correct from its fit to laboratory data, so also, from the same empirical perspective, must the diversity based entropy formulation be correct. However the two cannot both be correct even though they are both numerically or “mathematically correct” because the assumptions for the two of distinguishable versus indistinguishable energy units are mutually contradictory. So they cannot be adjudged as two mutually valid, mutually supportive ways of understanding the same phenomena, entropy. Either the Boltzmann entropy remains the correct formulation and the diversity based entropy we have just developed is some odd and irrelevant coincidence; or the diversity based entropy is correct and we must attribute Boltzmann’s entropy to mathematical formulations appropriately sensible back in 1900 long before mathematical diversity came on the scene in 1948.
The latter conclusion is most difficult to embrace, of course, because Boltzmann’s famous entropy function that was inscribed on his tombstone and has been quite revered by all scientists is not likely to be reevaluated by them as just serendipitous flukes of numerical coincidence to newcomer diversity entropy. Because of natural resistance to the overthrow of a revered theory and its deservedly revered developer, we must produce more evidence that unorthodox diversity based entropy is superior to its Boltzmann predecessor. And that in the next leg of our argument takes the form of showing that the diversity entropy theory better explains the MaxwellBoltzmann energy distribution.
Figure 36.
To do that we next introduce a property of an equiprobable distribution that is
closely related to their Ω microstates called a configuration. A
configuration is the collection of all microstates of a distribution that have
the same number set representation. For example, the microstates of [0,
4] and [4, 0] of the K=4 over N=2 distribution in Table 49 are represented by
the same number set, (4, 0), then understood as a configuration of the K=4 over
N=2 distribution. Note that we write a configuration in parenthesis, (4, 0), in
contrast to the [bracket] used for a microstate. Hence we say that [4, 0] and
[0, 4] are the microstates of the (4, 0) configuration.
The K=4 over N=2 equiprobable distribution of Table 49 has 3 configurations,
(4, 0), (3, 1) and (2, 2), which the Ω=5 microstates belong to as
The 3 configurations of the K=4 over N=2 Distribution 
(4, 0) 
(3, 1) 
(2, 2) 
The Ω =5 microstates of the K=4 over N=2 Distribution 
[4, 0] 
[3, 1] 
[2, 2] 
[0, 4] 
[1, 3] 

Table 70. The Configurations of the K=4 over N=2 Distribution and Their
States
A look back to Table 49 makes it clear that a configuration has the same σ^{2}
variance and D diversity index as the microstates that comprise it.
Configuration 
Microstates 
Variance, σ^{2} 
Diversity, D 
(4, 0) 
[4, 0], [0,4] 
4 
1 
(3, 1) 
[3, 1], [1, 3] 
1 
1.6 
(2, 2) 
[2, 2] 
0 
2 
Table 71. The Variance, σ^{2}, and Diversity, D, of the Configurations of the K=4 over N=2 Distribution
Note carefully now from the above table that the average variance of σ^{2}_{AV}=1
of the K=4 over N=2 distribution from Table 56 and Eq62 and its average
diversity of D_{AV}=1.6 from Eqs58&64 have the same value as the σ^{2}=1
variance and D=1.6 diversity of the (3, 1) configuration of the distribution. On
that basis the (3, 1) configuration is understood to be a compressed
representation of all three of the distribution’s configurations, (4,
0), (3, 1) and (2, 2), and as such is called the Average Configuration
of a random distribution. The Average Configuration is one configuration as a
compressed representation of all the configurations of a random distribution much
like the µ arithmetic average is one number as a compressed representation of all
the numbers in a number set as with the numbers in the K=24, N=6, (6, 4, 2, 1,
5, 6), set being represented in compressed form by the one number of the mean, μ=K/N=4.
Further it should be clear that the Average Configuration is a compressed representation not only of all of a random distributions configurations but also of all of its Ω microstates and most fundamentally in representing the physical reality of an ever changing thermodynamic system, of all its ω permuted states as were drawn out in Line50 for the K=4, N=2 distribution.
50.) {abcd, 0}, {abc, d}, {abd, c}, {adc, b}, {bcd, a}, {ab, cd}, {ac, bd}, {ad, bc}, {bd, ac}, {bc, ad}, {cd, ab}, {a, bcd}, {b, acd}, {c, abd}, {d, bca}, {0, abcd}
That is the σ^{2}_{AV} =1 average variance and D_{AV}=1.6 average diversity of the above permuted states are the σ^{2} variance and D diversity of the (3, 1) Average Configuration of the K=4 over N=2 random distribution. This makes it easy to understand how for thermodynamic systems whose collisions and procession of permuted states occur rapidly relative to laboratory measurements taken on it that reveal its MaxwellBoltzmann energy distribution of Figure 36, the MaxwellBoltzmann energy distribution should be a manifestation of the average energy distribution of all the configurations and permuted states of the system, which, we shall show, is the energy distribution of the Average Configuration.
The K=4 energy units over N=2 molecules distribution has too few K energy units and N molecules for its Average Configuration of (3, 1) to bear any resemblance to the MaxwellBoltzmann distribution of Figure 36. We need random distributions with higher K and N values to show it, starting with the K=12 energy units over N=6 molecule distribution. To find its Average Configuration we first calculate from Eq61 the σ^{2}_{AV} average variance of the distribution, which is a defining property of it.
72.)
The Average Configuration of the K=12 over N=6 distribution is a configuration that has this σ^{2}_{AV} value of σ^{2}_{AV} =1.667. The easiest way to find the Average Configuration is with a Microsoft Excel program that generates all the configurations of this distribution and their σ^{2} measures to find one that has the same value as σ^{2}_{AV} =1.667. It is the (4, 3, 2, 2, 1, 0) configuration, which is the Average Configuration on the basis of its having as its variance, σ^{2}_{AV} =1.667. A plot of the number of energy units on a molecule vs. the number of its molecules that have that energy for this Average Configuration of (4, 3, 2, 2, 1, 0) is shown below.
Figure 73. Number of Energy Units per
Molecule vs. the Number of Molecules Which
Have That Energy for the Average Configuration of the K=12 over N=6
Distribution
Seeing this distribution as the MaxwellBoltzmann energy distribution of Figure 65 is a bit of a stretch, though it might be characterized as a very simple, very choppy MaxwellBoltzmann distribution. Next let’s consider a larger K over N distribution, one of K=36 energy unit over N=10 molecules. Its σ^{2}_{AV} is from Eq61, σ^{2}_{AV}=3.24. The Microsoft Excel program runs through the configurations of this distribution to find one whose σ^{2} variance has the same value as σ^{2}_{AV} =3.24, namely, (1, 2, 2, 3, 3, 3, 4, 5, 6, 7). A plot of the energy distribution of this Average Configuration is
Figure 74. Number of Energy Units per
Molecule vs. the Number of Molecules Which
Have That Energy for the Average Configuration of the K=36 over N=10
Distribution
This curve was greeted without prompting by Dr. John Hudson, Professor Emeritus of Materials Engineering at Rensselaer Polytechnic Institute and author of the graduate text, Thermodynamics of Surfaces, with, “It’s an obvious protoMaxwellBoltzmann.” Next we look at the K=40 energy unit over N=15 molecule distribution, whose σ^{2}_{AV} average variance is from Eq61, σ^{2}_{AV} =2.489. The Microsoft Excel program finds four configurations that have this diversity including (0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 6), which is an Average Configuration of the distribution on the basis of its σ^{2}=2.489 variance. A plot of its energy distribution is
Figure 75. Number of Energy Units per
Molecule vs. the Number of Molecules Which
Have That Energy for the Average Configuration of the K=40 over N=15
Distribution
And next we look at the K=145 energy unit over N=30 molecule distribution whose average diversity is from Eq61, σ^{2}_{AV} =4.672. There are nine configurations with a σ^{2}^{ }=4,672 including this natural number set of (0, 0, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 8, 9, 10), which is an Average Configuration of the distribution on that basis. A plot of its energy distribution is
Figure 76. Number of Energy Units per
Molecule vs. the Number of Molecules Which
Have That Energy for the Average Configuration of the K=145 over N=30
Distribution
We are at this level considering K and N values large enough to show a good
resemblance to the classical MaxwellBoltzmann distribution.
Figure 36.
All configurations of the K=145 energy unit over N=30 molecule distribution bear a reasonable resemblance to the MaxwellBoltzmann of Figure 36 independently and collectively. And as we progressively increase the K and N values of distributions, the plot of their energy per molecule versus the number of molecules with that energy more and more approaches and eventually perfectly fits the shape of the above realistic MaxwellBoltzmann distribution.
Now it should be made clear that this development of the MaxwellBoltzmann distribution as a property of the Average Configuration of a random distribution is valid for any and all kinds of random distributions of K units over N subset containers whether they be K energy units over N molecular containers of those energy or K candies over N kid containers of the candies. The Average Configuration of Figure 76 would also come about from grandma repeatedly distributing K=145 candies to N=30 children randomly. The Average Configuration and its MaxwellBoltzmann shape comes about from the very nature of the random or equiprobable distribution in the unorthodox.
This is most definitely not the case in the Boltzmann stat. mech. Development of the MaxwellBoltzmann distribution, which does not come about from the intrinsic nature of a thermodynamic system being an equiprobable or random distribution, but rather from a quite separate assumption of higher energy molecules retaining their energy units for shorter periods of time, something most definitely not generalized for all random distributions, which as such can be reasonably accused of being an assumption that is “curve fitted” ad hoc to fit the empirical data.
This is another strong point in accepting our unorthodox statistical mechanics and entropy formulation over the standard Boltzmann take. As only one theory can be correct because of the mutually contradicting assumptions in the two of energy unit distinguishability versus indistinguishability, note that there is a strong argument in favor of our unorthodox theory from the Occam’s razor principle used in science generally for deciding between two contending explanations that states that among competing hypotheses, the one with the fewest assumptions should be selected.
There are other strong reasons for choosing entropy formulated as diversity, but we hold up on them for now in order to clarify which diversity measure is the proper one for entropy. We have given arguments for the D diversity as entropy so far because D has mathematical regularities associated with it that make it relatively easy to manipulate mathematically. But the h=K/ψ Square Root Diversity Index of Eqs39&47 has a distinct property that make it the actual diversity that is entropy and that is its ψ Square Root Biased Average being the correct form for normalized microstate temperature as we showed earlier in the previous section.
But to accept it as such we must show that h_{AV} also has a very high correlation to the lnΩ variable in S=k_{B}lnΩ before we can consider it as a valid replacement for Boltzmann’s S entropy. To demonstrate this, though, is not as straightforward as was done between D_{AV} and lnΩ earlier because h_{AV} is not a simple function of the K energy units and N molecules of a thermodynamic system as D_{AV }was in Eq63 as D_{AV}=KN/(K+N−1).
Because h_{AV} is the h square root diversity index of the Average Configuration much as D_{AV} was its D diversity index, we can obtain h_{AV} for the K over N distributions for which we know the specific Average Configurations and their x_{i }and p_{i} values, which are those in Figures 7376. We list their h_{AV} values as calculated from Eq46 below alongside the lnΩ values of those Average Configurations as calculated from Eq65. And we also include their D_{AV} diversity indices from Eq63 for comparison sake.
Figure 
K 
N 
lnW 
D_{AV} 
h_{AV} 
36 
12 
6 
8.73 
4.24 
4.57 
37 
36 
10 
18.3 
8 
8.85 
38 
45 
15 
26.1 
11.11 
12.33 
39 
145 
30 
75.88 
25 
26.49 
Table 77. The lnΩ, D_{AV} and h_{AV} of the Distributions in Figures 7376
The correlation between the lnΩ and D_{AV} of the above random distributions
is .997. Though quite high, this is less than the .9995 correlation between lnΩ
and D_{AV} seen in Table 68 for large K and N distributions. This difference
in correlation coefficients is attributable to the fact that the Pearson’s correlation
coefficient is a function of the magnitude of the K and N parameters, those of
the distributions in Figures 7376 and in Table 77 definitely being smaller
than those in Table 68. However we see that the Pearson’s correlation between
lnΩ and h_{AV }of .995 for the K over N distributions in Table 77
is not much different than the .997 correlation between lnΩ and D_{AV}
for them, which implies that generally the lnΩ and h_{AV }correlation is, as it was between lnΩ
and D_{AV}, very high for all distributions, high enough to be accepted
as entropy in being a replacement for lnΩ based entropy on that basis.
To repeat now, while the D_{AV} and h_{AV} average diversities would both be candidates for diversity based entropy in both having a very high correlation to lnΩ and in both fitting the Average Configuration manifest as the MaxwellBoltzmann energy distribution, we take h_{AV} as the proper diversity based entropy on the basis of the relationship of the h_{AV} average square root diversity index to ψ_{AV} that parallels the h=K/ψ relationship of Eq47.
78.)
The ψ_{AV} function in the above is the average square root biased average energy per molecule. We earlier introduced ψ as a function that represented normalized microstate temperature dimensionally. But ψ, in itself, is not temperature because each permuted state of a thermodynamic system has a ψ measure as the square root average energy of all of the molecules in that permuted state. Rather it is the average of ψ of all the permuted states, ψ_{AV}, which is also the ψ of the Average Configuration as the normalized microstate temperature of a thermodynamic system. As such we understand ψ_{AV} as a double average of molecular energy, first as the ψ biased average of the energy of all of the N molecules of the system in a specific permuted state that exists physically at a specific time in the random sequence of permuted states that is a thermodynamic system. And with ψ then averaged over all the ω permuted states of the system to form ψ_{AV}, which is also the ψ of the Average Configuration that is a compressed representation of the entire thermodynamic system. This has us understand entropy as the average square root diversity index, h_{AV},
79.)
Now let us demonstrate how this diversity index replaces S in the 2^{nd} Law of Thermodynamics. The usual form of the 2^{nd} Law is
80.) ΔS > 0
We replace it with diversity as
81.) Δh > 0
The reason we use h for diversity based entropy rather than h_{AV} in the above diversity based rendition of the 2^{nd} Law will become apparent in a moment. A minithermodynamic system we use to illustrate the 2^{nd} Law of Thermodynamics with a thermal equilibration process is comprised of two subsystems. Subsystem A has K_{A}=12 energy units distributed randomly over N_{A}=3 molecules. And Subsystem B has K_{B}=84 energy units distributed randomly over N_{B} =3 molecules.
These two subsystems are initially isolated from each other out of thermal contact. From Eq61 we see the average variance of the K_{A}=12 energy units over N_{A}=3 molecules subsystem to be σ^{2}_{AV }=2.667 with an Average Configuration that has that variance of (6, 4, 2). Its normalized microstate temperature is from Eq46, ψ_{AVA}=4.353. And for the K_{B}=84 energy units over N_{B}=3 molecules subsystem, from its Eq61 average variance is σ^{2}_{AV}=18.667, the Average Configuration is (34, 26, 24). And its normalized microstate temperature is from Eq46, ψ_{AVB}=28.328.
Upon thermal contact the system as a whole has K=K_{A}+K_{B}=96 energy units and N=N_{A}+N_{B}=6 molecules. At the first moment of contact the K=96 energy units of the system are distributed over the N=6 molecules as (6, 4, 2, 34, 26, 24). At this moment there is no ψ_{AV} temperature of the system because it is not in thermal equilibrium. But it does have a square root diversity index of h=4.394 from Eq39.
After molecular collision sufficient to achieve an equilibrium random distribution of the K=96 energy units over the N=6 molecules, the Average Configuration from a σ^{2}_{AV }=13.333 variance of Eq61 is (11, 14, 15, 16, 17, 23) with a diversity index from Eq39 of h_{AV}=5.85 and a normalized microstate temperature from Eq46 of ψ_{AV}=16.409. This is the whole system’s temperature at equilibrium. Note that standard computation of what the temperature should be from the 1^{st} Law of Thermodynamics, an energy conservation law, suggests rather a temperature that is the simple average of the subsystem temperatures
82.) (ψ_{AVA }+ ψ_{AVB})/2 = (4.353 + 28.328)/2 = 16.341
The discrepancy between the above value of 16.341 and ψ_{AV}=16.409 from Eq46 as temperature does not indicate a violation of energy conservation for temperature here is the average molecular energy biased toward the high energy molecules. This discrepancy is all but undetectable from temperature measure in realistic large K and N thermodynamic systems with K>>N.
Important is that we see the energy diversity of the system as the entropy increasing from an initial value of h_{i}=4.39 for (6, 4, 2, 34, 26, 24) to a final value of h_{AV}=h_{f }=5.85 for (11, 14, 15, 16, 17, 23). The change in energy diversity is, hence,
83.) Δh = h_{f} – h_{i} = 5.85 – 4.39 = +1.46
We see that Δh>0, as fits a 2^{nd} Law increase in entropy for thermal equilibration with entropy expressed as h energy diversity. There are two things different about this unorthodox formulation of 2^{nd} Law entropy increase in thermal equilibration. The first is that it is a change of the whole system of N=6 molecules that we are considering. And the second is that what happens physically is very clear intuitively, the entropy increase being understood as greater energy diversity or energy dispersal coming about from the random mixing of the total K=K_{A}+K_{B}=96 energy units of the initially isolated two subsystems over all N=N_{A}+N_{B}=6 molecules of the whole system. Nothing could be clearer. That is especially so in comparison to the standard take on microstate entropy increase from thermal equilibration as an increase in the Ω microstates of the system, which makes no sense whatever out of entropy physically. The diversity based entropy change quantitatively fits the sense of entropy as energy dispersal, (See Wikipedia), which though taken by many scientists to, indeed, be the qualitatively sensible interpretation of entropy, has never before been given a firm mathematical underpinning until now.
It is not by sheer coincidence that entropy and information have long been thought of as closely related to each other. Indeed Professor Poirier’s Chapter 10 in A Conceptual Guide to Thermodynamics is focused on making conceptual sense of entropy by explaining it in terms of information. Readers can judge for themselves if any real clarification is achieved, not in the slightest in our opinion and impossibly so because the socalled experts in these fields today, including Professor Poirier selfdescribed as such, are confused about both entropy and information, not surprising for as we have been trying to make clear, the science in both areas is seriously flawed.
Now let’s do
try to make sense out of the close relationship between information and entropy
with the revisions and clarifications we have made in both with diversity. First
note the closeness in value that the M=2^{H} diversity index of Eq33 has
to the h square root diversity index of Eq39 for the sets in Table 41.
Set of Unit Objects 
Subset Values 
M=2^{H} 
h 
(■■■, ■■■, ■■■, ■■■) 
x_{1}=x_{2}=_{ }x_{3}=x_{4}=3 
4 
4 
(■■■■■, ■■■, ■■■, ■) 
x_{1}=5, x_{2}=_{ }x_{3}=3, x_{4}=1 
3.444 
3.468 
(■■■■■■, ■■■■■■) 
x_{1}=x_{2}=6 
2 
2 
(■■■■, ■■■■, ■■■■) 
x_{1}=x_{2}=_{ }x_{3}=4 
3 
3 
(■■■■■■, ■■■■■■, ■■■■■■■■■) 
x_{1}=x_{2}=6, _{ }x_{3}=9 
2.942 
2.941 
(■■■■■■, ■■■■■, ■) 
x_{1}=6, x_{2}=5,_{ }x_{3}=1 
2.505 
2.538 
(■■■■■■■■■■, ■■■■■■■■■■, ■) 
x_{1}=x_{2}=10,_{ }x_{3}=1 
2.343 
2.394 
(■■, ■■, ■■, ■■, ■■, ■■, ■■, ■■) 
All x_{i}=2 
8 
8 
Table 41. Various Sets and Their D, M and h Biased Diversity Indices
With such a closeness in value between M and h, it should not be surprising that the average of M=2^{H} diversity over all the permuted states of a thermodynamic systems, M_{av}=2^H_{AV}, has the same extremely high Pearson’s correlation with S=k_{B}lnΩ as h_{AV}. This however does not come about from any direct conceptual connection between information and entropy as Professor Poirier argues but rather because information is most basically a diversity function and entropy is most basically a diversity function, be it as h_{AV} or as an approximation very close in value, M_{av}=2^H_{AV.} That’s their actual connection, their diversity commonality as also made clear by both diversity underpinned information being a measure of significance and diversity underpinned entropy being a measure of significance as the number of energetically significant molecules in a thermodynamic system.
Poirier, though, by expressing both S entropy and H information in terms of the same variable, Ω, as S=k_{B}lnΩ and H=log_{2}Ω, assumes that physical entropy somehow is connected up with the information we have about it, a mental association quite different from the diversity commonality that exists between the two concepts, which then gives him license to describe entropy “as the amount of information we don’t know about a thermodynamic system.” To make the point of how little we think about scientists in both areas, we would like to ridicule Poirier for possibly thinking that an explanation of entropy in terms of the information we don’t have about it could somehow be a clarification of it.
No, what ties information and entropy together is the common basis they have in both being specified functionally as mathematical diversity. The reason that they have this commonality is, going back to the introductory point we made about exactness in counting, that both information and entropy have to deal with sets of things that are different in size that can’t be enumerated in an exact way with a direct count of them. One reason for this boneheaded thinking is the inability to get off old ways of thinking in terms of theories developed by a saint of science like Boltzmann back a century ago who is understood as impossible to be wrong because his ideas conform to laboratory data numerically.
Accepting that as bedrock principle in physical science was, of course, the big error of the many who refused to give up the preNewtonian, preKeplerian astronomy of Ptolemy, which though it provided precise numbers for predicting astronomical events, had no basis in physical reality. In a parallel way we argue against accepting numerically correct Boltzmann statistical mechanics from that principal because it is blind to the more fundamental principle that nature cannot possibly operate from inexact functions. Hence when two functions are both correct numerically, one must choose the function that is exact. Conversely it should be clear that explanations of exactly operating nature expressed in inexact functions must minimally be understood as being inexact explanations, that is, wrong to some degree.
Which in the case for entropy as presently understood, which has produced characterizations of it for the last century like mysterious, uncomfortable, most misunderstood, less than satisfying, less than rigorous, confused, rather abstract, rather subjective and difficult to comprehend. And whose incorrect formulation tolerates a hogwash Zen Buddhist explanation of entropy reeking with vagueness and pomposity like Poirier’s take on entropy as the amount of information we don’t know, a quite silly sense of entropy that just perpetuates the confusion about it. To repeat as the proper corrective for both entropy and information science, what unites the two understood properly is the quantification of elements of different size not amenable to an exact count with exact diversity functions.
Very interesting in unraveling the twin mysteries of information and entropy is that understanding both as diversity reinforces our understanding of that most important concept for both of significance, which can only be made clear sense out of when it is appreciated at an intuitive level as information as the mind’s distinguishing between things (quantitatively) significant and insignificant and at a physical level as a central facet of diversity based entropy as the number of energetically significant molecules.
This significance of the higher energy molecules as explained by diversity based entropy is also what clearly explains for the first time in science the equally mysterious phenomenon of free energy. To wit, a decrease in free energy that comes about from an increase in entropy in chemical reactions and thermochemical processes as specified by the Gibbs free energy equation is clearly explained from the high frequency collisional bias of high energy molecules being reduced when entropy is increased from the entropy increase producing a less biased distribution of energy and a concomitant reduction in the high collision frequencies that are the underlying basis of free energy.
And worth repeating as the last word in this entropy argument is the most obvious and unarguable reality of our theory’s understanding temperature as a square root biased average of energy per molecule from the way it is physically measured by a thermometer, which instantly leads to entropy as diversity dimensionally via the ratio of energy to temperature (properly understood as a biased energy average) defining entropy in the Clausius expression for it, as we made clear in detail in the previous section. That is impossible to argue with, but yet we have run into a goodly number of physical scientists who wear blinders to such obviously inarguable reasoning to businesslike clutch onto the outmoded ideas that define their expertise and maintain their status in the academic community. Ignorance isn’t just what we don’t know but also retaining inferior notions from habit and a lack of caring about whether what we know is maximally correct as long as it retains value its value for us in the marketplace. The public in America as a recent survey shows thinks little of science these days. The attitudes of stagnation in thinking we are talking about here just perpetuate and increase this disrespect.
With that we leave further considerations of entropy to professional physical scientists out there who are bright enough to catch the value of our seminal efforts and want to continue on to refine and embellish their subject matter with this diversity based correction of it. And we return to a consideration of significance that goes beyond the mind’s assessment of quantitative significance to understand the significance of objects and events from the emotions we associate with them.
Note: The best synopsis of what we have done so far before we get into the human emotions mathematically is in this email I am sending off to Bill Poirier of Texas Tech that makes clear not only the reality of the relationship between information and entropy but also the resistance that stagnated theoretical physics has to new ideas in quantum mechanics and statistical mechanics because its practitioners, mostly hypergeeks or out and out homosexuals these days, guard their domains of confusing incorrect ideas as territory that confers on them the halo of “expert” and their status and position as their prime concern rather than honesty in argument and respect for truth, Poirier a perfect example of the latter. This is a good part of the public’s disdain for science and scientists these days.
You are up on the Internet now, Bill. What’s on there is much better edited than the first draft I sent you and less scathing in its characterization of you. I can explain it all in about 90 seconds if you would take the cotton out of your ears. There is no conceptual connection between information, a phenomenon of the human mind and its biophysics, and entropy, a phenomenon of molecules and energy and their much simpler physics. The only connection is the similarity in form of their mathematical specification. That similarity is what caused the mathematician, Jon von Neumann, to suggest the name “entropy” for Clause Shannon’s function for information back in 1948, hence the Shannon entropy, thence to confuse professionals and students alike as to the possibility of some conceptual connection between thermodynamic entropy and information.
Let me say it again. Information is something human beings get from other human beings and/or from the world around them as a central part of the human mind’s neural optimization of behavior. Information has nothing to do phenomenologically with entropy, which operates on a much different physics and is a part of nature that has no functional purpose for humans. The two are endlessly distant conceptually whatever your efforts in Chapter 10 ofConceptual Guide to make some sort of New Age connection between them as entropy being the information we don’t know about a thermodynamic system.
What they have in common, rather, and very much so, is their underlying mathematical structure, which is that both are described by and operate on the mathematically welldefined number set property of diversity. Both information and entropy do this because nature necessarily operates on exact measures and diversity, as we explain clearly enough in the introduction for a high school student to understand, is an exact quantification of a set of elements unequal in size, whether as a set of messages of unequal probabilities of being sent or a set of molecules in a thermodynamic system that are unequal in size in the molecules having different energies via the MaxwellBoltzmann energy distribution.
That both information and entropy take the form of diversity is difficult to deny if you will open your dumb eyes for a minute or two. For information, it is clear that the Shannon entropy has been used (with base e) in the technical literature as the Shannon Diversity Index for the last 60 years, same function. And that M=2^{H}, the linear expression for Shannon entropy (See, Pierce, Introduction to Information Theory) is near perfectly mimetic of the Simpson Reciprocal Diversity Index, which in turn is the nonlogarithmic sole variable in the Renyi (information) entropy, whose structure in the logarithmic form is also of diversity, of logarithmic diversity that is little different than the Shannon (Entropy) Diversity Index. With that understanding, which is absolutely inarguable unless by a fellow who stuck his foot in his mouth (in a text no less) in pontificating a juvenile conceptual connection between information and entropy and won’t dare admit his mistake, all information functions in information theory, logarithmic and linear, are seen to be diversity measures of one form or another.
Now fast forward to entropy. No point belaboring that it is also diversity in this email when on the website are three mathematically rigorous arguments clarifying entropy as diversity. If you had the sense to connect the dots on these very straightforward arguments you would see that what information and entropy have in common is they both are properly specified as diversity. That is their connection, period, a coincidence in mathematical structure. They both take the diversity form unavoidably because diversity is an exact quantification of sets of elements unequal in size. That is, the straight count of unequal sized elements, be they messages or molecules, is inarguably inexact both from the common sense commercial mathematics argument in Section 2 or if you prefer your arguments in print form, from 47 Chapters of examples of such inaccuracy in Sam Savage’s book, The Flaw of Averages.
In short, if it needs
repeating one more last time, message sets and molecules are both sets of
elements unequal in size that must be characterized by diversity to be mapped
exactly as they exist in nature, for nature is not comprised of inexact
functions that are approximations of convenience by the human mind. Your concluding
in Conceptual Guide that entropy is the information we don’t know about
thermodynamic systems misinterprets the mathematical similarity between the two
as a conceptual similarity to simplistically interpret entropy as a property of
our knowledge or information about things, which is worthy of a quick dismissal
at first, but after you persist in your error repeatedly in the course of our
correspondence, of ridicule. What you do marks you as either a true dummy,
possible, and/or a manipulator like a predatory businessman who operates with
only the thought of selfinterest in mind. What you do stinks.
6. The Mathematics of Human Emotion
We will develop the human emotions mathematically with information theory expanded and revised with the D linear diversity measure. Another interpretation of the H Shannon entropy of Eq24 in information theory is as the amount of uncertainty a message resolves in being received. Uncertainty and information are closely related in information coming about as the resolution of uncertainty. If you have no idea of the way Company XYZ you hold stock in is going and I tell you from what my cousin, the president of the company, told me that they are contemplating bankruptcy in two weeks, that message is information for you because you had uncertainty about the company’s situation to begin with. But if I tell you that Osama bin Laden was the mastermind of 9/11, something you certainly knew beforehand, that message would not be information for you because you had no uncertainty about that.
In a more mathematically treatable way, if you are playing a game where you must guess which of N=4 colors I’ll pick from the set of K=8 colored buttons, (■■, ■■, ■■, ■■), inherently you have uncertainty about what the color is. Keep in mind from our earlier considerations the H=2 bits amount of information associated with this set. That value of H=2 is a measure of the amount of uncertainty you have as the number of yesno binary questions one needs to ask about the colors in (■■, ■■, ■■, ■■) to determine which color I picked. By a yesno binary question is meant one that is answered with a “yes” or a “no” and, as binary, cuts the number of possible color answers in half.
One might ask of (■■, ■■, ■■, ■■), “Is the color picked a dark color?” meaning either purple or black? Whatever the answer, a “yes” or a “no”, the number of possible colors picked is cut in half. Assume the answer to the question was “no”, then the next question asked might be, “Is the color green?” If the answer to that next question is also “no”, by process of elimination the color I picked was red. It took H=2 such questions to find that out. So the amount of uncertainty about which color I picked is understood to be H=2. And the amount of information gotten from receiving a message about the color picked is H=2 bits understood as the amount of uncertainty felt beforehand.
Let’s play that game with (■■, ■■, ■■, ■■, ■■, ■■, ■■, ■■) now whose H=3 bits Shannon entropy is the amount of uncertainty you feel about which color I picked from that set of buttons because it takes H=3 yesno binary questions to determine the color. The first question for (■■, ■■, ■■, ■■, ■■, ■■, ■■, ■■) might be, “Is the color a light color?” meaning red, green, aqua or orange. When “no” is the answer, it halves the field of colors picked to (■■, ■■, ■■, ■■). And two more yesno binary questions will then reveal the color picked. The amount of uncertainty for the color picked from (■■, ■■, ■■, ■■, ■■, ■■, ■■, ■■) is then its H Shannon entropy of H=3 bits interpreted as 3 binary questions. And the amount of information you would get if I sent you a message about which color was picked would be H=3 bits of information as the resolution of the H=3 bits of uncertainty felt beforehand.
That information is affected by emotion is obvious from the sense of information underpinned by uncertainty, something people generally feel as an unpleasant emotion. Moreover when uncertainty is resolved, by whatever means, a person tends to feel something akin to relief or elation, a generally pleasant emotion. Now while it is true that the H Shannon entropy provides some measure of uncertainty as discussed above, the human mind really doesn’t work on logarithmic measures for the most part. We tend rather to evaluate uncertainty probabilistically. Let’s go back to guessing the color picked from the N=4 color set, (■■, ■■, ■■, ■■).
The probability of guessing correctly, which we’ll give the symbol, Z, to, is
42.)
And the probability of failing to make the correct guess, understood as the uncertainty in guessing, is
43.)
Now let’s recall the D diversity of a balanced set from Eq4 to be D=N. This allows us to understand the U uncertainty as
44.)
Now let’s make a table of sets of buttons that have more and more D diversity and list the U uncertainty in guessing the color picked from them.
Sets of Colored Buttons 
D=N 
U=(D–1)/D 
(■■, ■■) 
2 
1/2=.5 
(■■, ■■, ■■) 
3 
2/3=.667 
(■■, ■■, ■■, ■■) 
4 
3/4=.75 
(■■, ■■, ■■, ■■, ■■) 
5 
4/5=.8 
(■■, ■■, ■■, ■■, ■■, ■■) 
6 
5/6=.833 
(■■, ■■, ■■, ■■, ■■, ■■, ■■) 
7 
6/7=.857 
(■■, ■■, ■■, ■■, ■■, ■■, ■■, ■■) 
8 
7/8=.875 
Figure 45. Various Sets and Their D and U Values
Very obviously the U uncertainty is an increasing function of D. That is, as D
increases, U increases. And more formally U is an effectively continuous
monotonically increasing function of D. This from measure theory, a standard
rubric of mathematics, tells us that whatever D is a measure of, U is a measure
of. Earlier we made it clear that D diversity was a measure of information. And
that would also make U understandable as uncertainty as information, which fits
with the classical information theory take on it.
From the simplest perspective, this gives us two ways to evaluate uncertainty and information, whether as H in a logarithmic way or as U in a linear probabilistic way. But as we venture into this area of the human emotions, our intuitive sense tells us we are heading into meaningful information territory much in the same way that something being significant was synonymous with its being meaningful. One obvious way to make uncertainty meaningful is to associate it with something meaningful, and that something is money. And what is great about configuring uncertainty in terms of the probabilistic U function is that we can connect it up with that meaningful item of money by playing a guessing game that will penalize us cash if we fail to guess the color picked correctly. Let’s say that penalty is v=$100.
TO BE CONTINUED