Logical Categories of Learning and Communication: Part 1
All behavioral scientists are interested in “learning” in one sense or another. Moreover, since “learning” is a communicative phenomenon, all of them have been affected by the cybernetic revolution in thinking that has taken place over the past twenty-five years. This revolution was initiated by engineers and communication theorists, but its roots go back to the physiological work of Cannon and Claude Bernard, the physics of James Clerk Maxwell, and the mathematical philosophy of Russell and Whitehead. Since behavioral scientists continue to ignore the problems of the Principia Mathematica[1], they can be said to be about sixty years behind. However, it seems that the barriers of misunderstanding separating different groups of behavioral scientists can be clarified (though not eliminated) by applying Russell’s Theory of Logical Types to the concept of “learning,” which interests them all. The purpose of this article is to attempt such clarification.
The Theory of Logical Types
Gregory Bateson
First, let’s point out the subjects of the Theory of Logical Types: the theory asserts that no class in formal logical or mathematical reasoning can be a member of itself; that a class or classes cannot be one of the classes that are their members; that a name is not the thing named; that “John Bateson” is a class whose only member is that boy; and so on. These statements may seem trivial or even obvious, but as we will see, it is not uncommon for behavioral theorists to make mistakes exactly analogous to the error of classifying a name together with the thing named—that is, errors of logical typing. It’s like eating the menu instead of the meal.
Slightly less obvious is the following theoretical statement: a class cannot be one of those units that are correctly classified as its non-members. If we classify all chairs as the class of chairs, we may further note that tables and lamps are members of the broader class of “non-chairs,” but we would make a mistake in formal discourse if we considered the class of chairs to be a unit in the class of non-chairs.
Since no class can be a member of itself, the class of non-chairs clearly cannot be a non-chair. A simple consideration of symmetry may be convincing enough for the non-mathematical reader:
- a) The class of chairs belongs to the same order of abstraction (i.e., logical type) as the class of non-chairs;
- b) Since the class of chairs is not a chair, accordingly, the class of non-chairs is not a non-chair.
Finally, the theory asserts that if these simple rules of formal discourse are violated, paradoxes arise and the discourse becomes invalid.
Thus, the theory deals with highly abstract matters and first arose in the abstract world of logic. If it is shown in this world that a sequence of statements generates a paradox, then the entire structure of axioms, theorems, etc., involved in generating this paradox is denied and destroyed. It is as if it never existed. But in the real world (or at least in our descriptions of the real world), time is always present, and something that once existed cannot be totally denied in this way. A computer encountering a paradox due to programming errors does not itself disappear.
Logical “if…, then…” statements do not contain time. In a computer, causes and effects are used to simulate logical “if…, then…” statements; and the sequences of causes and effects necessarily include time. Conversely, one could say that in scientific reasoning, logical “if…, then…” statements are used to simulate causal “if…, then…” relationships.
A computer never actually encounters a logical paradox, only a simulation of a paradox through chains of cause and effect. Therefore, the computer does not disappear. It simply oscillates.
In fact, there are important differences between the world of logic and the phenomenal world, and these differences must always be taken into account when our arguments are based on the important but partial analogy between them.
The thesis of this article is that this partial analogy can give behavioral scientists an important key to classifying phenomena related to learning. Something like the theory of types should be applied specifically in the field of animal and machine communication.
Unfortunately, questions of this sort are not often discussed in zoological laboratories, anthropological field camps, or at psychiatric meetings. Therefore, it is necessary to show that these abstract considerations are important for behavioral scientists.
Consider the following syllogism:
- a) Changes in the frequency of observed types of mammalian behavior can be described and predicted in terms of various “laws” of reinforcement;
- b) The phenomenon of “exploration” observed in rats is a category or class of mammalian behavior;
- c) Therefore, changes in the frequency of “exploration” phenomena should be describable in terms of the same “laws” of reinforcement.
Let’s state right away: first, empirical data show that conclusion c) is incorrect; second, if it were possible to show that c) is correct, then either a) or b) would be incorrect[2].
For both logic and natural science, it is better if conclusion c) is expanded and corrected roughly as follows:
c) If, as stated in b), the phenomenon of “exploration” is not a type of mammalian behavior but a category of such types, then no descriptive statement true for types of behavior can be true for the phenomenon of “exploration.” Conversely, if descriptive statements true for types of behavior are also true for “exploration,” then this “exploration” is a type of behavior, not a category of types of behavior.
The whole question comes down to whether the distinction between a class and its members is an organizing principle for the behavioral phenomena we study.
In less formal language: when a rat investigates a certain unfamiliar object, it can be given reinforcement (positive or negative), and it will accordingly learn to approach or avoid it. But the very goal of exploration is to obtain information about which objects can be approached and which should be avoided. Therefore, discovering that a given object is dangerous is a success in gathering information. This success does not discourage the rat from further exploring other unfamiliar objects.
It can be asserted a priori that all perceptions and all reactions, all behavior and all classes of behavior, all learning and all genetics, all neurophysiology and endocrinology, all organization and all evolution—in short, the entire subject—should be considered communicative in nature and therefore related to those broad generalizations or “laws” that apply to communicative phenomena. Therefore, we are warned of the possibility of finding in our data those principles of order offered by fundamental communication theory. We expect that the Theory of Logical Types, Information Theory, and others will be our guides.
“Learning” in Computers, Rats, and Humans
The word “learning” undoubtedly indicates some kind of change. However, specifying what kind of change this is can be a delicate matter.
Nevertheless, such a broad common denominator as “change” allows us to conclude that our descriptions of “learning” should be based on the same assumptions as the variables of the logical type that has become standard in the physical sciences since Newton. The simplest and most familiar form of change is movement, and even if we work at a very basic physical level, we must structure our descriptions in terms such as “position or zero movement,” “constant velocity,” “acceleration,” “rate of change of acceleration,” and so on[3].
Change indicates a process. But processes themselves are subject to “changes.” A process can speed up, slow down, or undergo other types of changes that allow us to say it is now a “different” process.
These considerations show that we should begin organizing our ideas about “learning” from the simplest level.
Consider the case of a specific response or zero learning. In this case, the organism shows minimal changes in its response to a repeated type of sensory input. Phenomena reaching this level of simplicity arise in various contexts:
- a) Under experimental conditions when “learning” is complete and the animal gives approximately 100% correct responses to repeated stimuli;
- b) In cases of habituation, when the animal stops giving a clear response to a previously disturbing stimulus;
- c) In cases where the response pattern is minimally determined by experience and maximally determined by genetic factors;
- d) In cases where the response becomes highly stereotyped;
- e) In simple electronic circuits where the structure of the circuit cannot be changed as a result of impulses passing through it—that is, when the causal chains between “stimulus” and “response,” as engineers say, are “hardwired.”
In ordinary, non-technical speech, the word “learning” is often applied to what is called “zero learning” here, i.e., simply receiving information from an external event so that a similar event at the appropriate time in the future conveys the same information. For example: the factory whistle “taught” me that it is now twelve o’clock.
It is also interesting to note that, under our definition, many very simple mechanical devices display at least the phenomenon of zero learning. The question, therefore, is not “can machines learn,” but rather, what level of learning has a given machine achieved.
It is worth considering an extreme, though hypothetical, case: the “player” in a von Neumann game is a mathematical fiction, comparable to the Euclidean line in geometry or the Newtonian particle in physics. By definition, the “player” is capable of performing all calculations necessary to solve any problem arising in the game; he is incapable of failing to perform these calculations where they are needed; he always obeys the results of his calculations. Such a “player” receives information from the events of the game and acts according to this information. But his learning is limited to what is called zero learning here.
Exploring this formal fiction expands our definition of zero learning.
- From the events of the game, the “player” can receive information of higher or lower logical type, and he can use this information to make decisions of higher or lower level. That is, his decisions can be either strategic or tactical, and he can identify and respond to both tactical and strategic actions of his opponent. However, it is true that in the formal definition of a von Neumann game, all problems presented by the game are considered computable; that is, although the game may contain problems and information of many different logical types, the hierarchy of these types is strictly finite.
It becomes clear that the definition of zero learning does not depend on either the logical typing of the information received by the organism or the logical typing of the adaptive decisions made by the organism. A very high (though finite) order of complexity may characterize adaptive behavior that is based on nothing more than zero learning.
- The “player” can calculate the value of information useful to him, as well as calculate that this information is worth obtaining through “exploratory” moves. Or he can make empty and trial moves in anticipation of the needed information.
From this, it follows that a rat engaged in exploratory behavior may be doing so on the basis of zero learning.
- The “player” can calculate that random moves may be advantageous. In a coin toss game, he can calculate that if he chooses “heads” or “tails” at random, he will have an equal chance of winning. If he uses some plan or pattern, it will appear as a pattern or redundancy in the sequence of his moves, and his opponent will thus gain information. Therefore, the “player” will choose random play.
- The “player” is incapable of “error.” He may, for serious reasons, make random or exploratory moves, but by definition, he is incapable of “learning by trial and error.”
If we assume that in the name of this learning process the word “error” means the same as when we said the “player” is incapable of error, then “trial and error” is excluded from the repertoire of the von Neumann player. In fact, the von Neumann “player” forces us to examine very carefully what we mean by “learning by trial and error,” as well as, of course, everything meant by learning of any kind. The assumption related to the meaning of the word “error” is nontrivial and must be examined.
In a certain sense, the “player” can make mistakes. For example, he may make a decision based on probability and then make a move that, in light of the limited available information, is the most likely correct one. When more information becomes available, he may discover that the move was a mistake. But this discovery cannot add anything to his future skills. By definition, the player used all available information correctly. He correctly assessed the probabilities and made the move that was most likely correct. Discovering that at some point he was wrong cannot affect future situations. If the same problem arises again, he will do the same calculations, come to the same decisions—and be right. Moreover, the set of alternatives from which he will make his choice will remain the same—and that is correct.
By contrast, an organism is capable of making mistakes in many “ways” that the “player” is not. These wrong choices are appropriately called “errors” when they are such that they provide the organism with information that can increase its future skills. In all these cases, some available information is either ignored or used incorrectly. Various types of such useful errors can be classified.
Suppose an external event contains details that can inform the organism:
- a) from which set of alternatives it should choose its next move;
- b) which element of that set it should choose.
Such a situation allows for two orders of errors:
- 1) The organism may correctly use the information about which set of alternatives to choose from, but select the wrong alternative within that set;
- 2) It may choose from the wrong set of alternatives.
There is also an interesting class of cases in which sets of alternatives contain common elements. Therefore, the organism has the opportunity to be “right,” but for the wrong reasons. This form of error is inevitably self-reinforcing.
If we now accept the general proposition that any learning other than zero learning is to some extent stochastic (i.e., contains components of “trial and error”), then it follows that the ordering of the learning process can be built on a hierarchical classification of the types of errors that must be corrected in various learning processes. Zero learning will then become the designation for the immediate basis of all those acts (simple and complex) that are not corrected by trial and error. Learning-I will be an appropriate designation for revising choices within an unchanged set of alternatives; Learning-II will denote revising the set from which the choice is made, and so on.
To be continued…