Abstract: This paper supports the claims made in 'Chance And Degrees
Of Belief' concerning the testing of chancy hypotheses. We start from the
Inductive Presupposition that what we observe is typical of Nature. This
human presupposition is embodied in three important relationships summarising
behaviour: Cournot's rule, interpreted methodologically, describing how
humans link chances of events to degrees of belief in events; the Principal
Principle, describing the same link; Bayes' inverse relationship, describing
how humans link degrees of belief in events to degrees of belief in chances.
These relationships are only justified to the extent that the Inductive Presupposition is justified. We are sceptical concerning this justification.
We reject Howson and Urbach's argument that reasonable degrees of belief in a hypothesis, given some evidence, can be justified without this presupposition, using betting quotients.
(16.4.1997; 10 384 words; version 2.1)
Our problem is: "To what extent can chance hypotheses be tested ?"
An answer to this question, based on accepting an Inductive Presupposition, is outlined in 'Chance And Degree Of Belief'. In this essay we fill in that outline. As usual, we make no claims to originality. At most, we are placing some familiar ideas into new juxtapositions, as we seek the truth. Also, as usual, we are doing descriptive methodology, quarantining inductive doubt, and making no presupposition that human best practice is justified. The aim of our investigation is to reconstruct consensus practice, in a principled way, and to reveal the extent to which it is justified. If this extent is far less than we might hope, 'so it goes'.
The problem is this: According to Von Mises' theory of chance, when we ascribe a chance of 1/6 to a 5 coming up in a die-throwing system, we mean to hypothesise that the system possesses the property of generating, in an infinite sequence of outcomes, 5 1/6 of the times. However, this hypothesis, if true, unfortunately leads to no prediction as to how often 5 would appear in any finite sequence. Ratios of 5s in actual finite sequences generated by the system are not a consequence of the hypothesis. Indeed, it is a consequence that actual starting sections of the infinite sequence will include cases (events) in which 5 appears never, and cases in which 5 appears always. These sections can be immensely long.
Onto-semantically, this is satisfactory. If that is what a chance hypothesis is to be like, so be it.
But how can a chance hypothesis possibly be tested ? Do chance hypotheses have no empirical significance?
The problem firstly concerns testing such chancy claims as "This coin has a chance of 0.5 of coming up heads" and "This cobalt 60 nucleus has a chance of 0.002 of decaying in the next day". These claims are at the base level.
But secondly it also concerns testing non-chancy physics claims, at the meta-level. Consider low-observability claims leading to novel fact predictions. It is the low chance of these predictions being true, if the claims are entirely human fictions, that intuitively provides successful novel fact predictions with their evidential force. In other words, the intuition is that such success, for a claim involving, say, low-observability entities, would be extremely low-chance, unless the claim has some truth in it. The chancy hypothesis that is here being tested, is that a completely false low-observability claim will, by chance, have consequences for which it was not designed, and which were not expected, on background knowledge, to be true, but which are nonetheless true.
A similar intuitive argument involves simplicity: the idea that the structural simplicity of a low-observability claim's explanation of a variety of high-observability facts is evidence for its truth, because such simplicity is so unlikely to have arisen by chance.
THE POSITIVE ARGUMENT THAT COURNOT'S RULE, THE PRINCIPAL
PRINCIPLE, AND BAYES' RELATIONSHIP, ALL DEPEND ON THE INDUCTIVE PRESUPPOSITION
Part 1: The Inductive Presupposition
Human methods for using evidence to seek the truth are supported by
a grand presupposition, which can be broadly stated as: "Our experience
of Nature does not deceive us". One(1) (these are footnotes) of the
more specific forms of this presupposition can be summarised, in slogan
"Very low-chance events aren't observed(2)".
This is an aspect of induction. We shall refer to it as the Inductive Presupposition IP.
It is very hard for humans to tolerate the tension, the cognitive dissonance, generated by sceptical doubt - by the problem of induction (In this paper 'scepticism' refers to the problem of induction - not to the problem concerning the ability of our senses to reveal the nature of an external world). As a result of this tension, philosophical work often contains implicit contradictions: at one point, writers feel that they must not rely on explicit inductive intuitions; at another, they covertly accept them; at another, they criticise others for accepting them. They are torn, because on the one hand they feel that accepting the intuitions is intellectually unacceptable, but on the other hand they feel that not accepting them will lead to sterility.
This tension has become a characteristic feature of philosophy of science, with baleful effects.
To eliminate this tension, in the apparent absence of a solution to scepticism, we must explicitly state our inductive presumptions. We need not pretend to justify them; this would be to solve the problem of scepticism, which has plagued us for thousands of years. We merely locate them, and quarantine them. We identify them, and then we put them to one side, awaiting solution. We conditionalise our empirical methods: we make them conditional on the IP , whether this presupposition is justifiable or not.
By doing this, philosophy takes a large step towards common sense. Sceptical doubts seem absurd to physicists, mathematicians, statisticians, and everyday people. They are completely ignored by non-philosophers. Therefore, by quarantining them - by conditionalising our procedures for deriving consequences from chancy hypotheses; by conditionalising our methods for testing chancy, and non-chancy, claims - the Descriptive Methodologist regains contact with humanity. We find, unsurprisingly, that experimental methods that were not justified are justified/IP; evidence that provided no support provides support/IP.
Our problem suddenly dissolves into common sense. The only philosophy left, is to try to justify IP , which can be attempted in the quarantine ward.
This attitude does not imply that "anything goes", and subjectivism and relativism triumph. The IP is, as we have seen, extremely general. Once doubt concerning it is quarantined, all other specific conjectures are subject to the rules of consensus methodology. Astrologers accused of misusing evidence for their claims cannot, without hypocrisy, defend themselves against the charge of irrationality by counterattacking that the IP is unacceptable - because their every action shows that they presuppose it.
Isn't this victory valueless, because it is too easily won ? It is indeed easy - but that is good news. Our philosophical task has been divided into two: one, the Complete Justificationist one, is seemingly impossible; the other, the Descriptive one, is easy. The fly can come to see that there may be truly no way out of the very big bottle - but he is in good company, since everyone is in there with him. However, he sees that he can easily get out of the little bottle; in fact, he had got the little bottle confused with the very big one.
So the thesis of this essay is that:
Cournot's rule, the Principal Principle, and Bayes' relationship(3), all correctly describe human methods for the testing of chancy claims; all can only be justified conditional on the IP ; all are therefore presently unjustified(4).
"Our experience of nature does not deceive us"; "Very low-chance events aren't observed". These were the ways that we described the IP . They make the testing of chances unproblematic/IP, such testing becomes the usual commonsense, unreflective, practical procedure.
Suppose that we are on a jury trying to choose between two hypotheses to explain the death of a man's wife (a real case, I believe): H1 is the hypothesis that he murdered her by putting weedkiller into her dinner on purpose; H2 is that while he was filling the watering can in the kitchen, the weedkiller splashed into the open casserole dish. Suppose that we have no strong reason to think that he is a murderer.
We do a series of tests in which we fill a watering can in his kitchen, and record the amount of poison that splashes into the casserole. We find that in 50 000 tests, only 1 splashes enough poison to kill someone.
Perhaps, we muse, there is actually a chance of 0.999 that enough poison splashes; we just happen to have observed a very low-chance sequence. Perhaps... - but very low-chance events aren't observed (IP) so this is not true. So we need a chancy hypothesis that makes what we have observed higher chance; we settle on the hypothesis that the chance of enough poison splashing is 1/50000. Thus h2 is false, because, with a chance this small, major poison-splashing will not/IP be observed. The finger of suspicion begins to point at h1. (The man was convicted)
Are we saying that real people are sent to prison on the flimsy, unjustifiable, basis of the Inductive Presupposition? Indeed we are; if we were on the jury I hope that we would ignore our philosophy, and make the same decision.
The link between the evidence, the hypothesis, and the support, made for this everyday example, can be made for any case. In statistical testing, we simply presuppose that a hypothesis is false, if it has the consequence of giving observed events a very low chance of occurrence. The lower the chance, the more confident we are. 5%? 1%? It is arbitrary; the choice of confidence limits is arbitrary. Would we have convicted the man if our evidence was that 5 times in 100 enough poison splashed to kill someone? 1 time in 100? 1 in 1000? There is no right answer, merely an intuition of increasing confidence.
We have developed our use of the IP from digital to analog. We are now saying that the lower the chance of an event, the less we expect to observe it. If the chance is 0.000 1, we do not expect to observe it. If the chance is 0.999 9, we do expect to observe it. If the chance is in between, we expect intermediately to observe it. In fact, roughly speaking, the degree of our expectation (our reasonable/IP degree of belief) equals the magnitude of the chance.
In the next section we show how Cournot's rule is a version of the Inductive Presupposition.
Cournot's rule was proposed in 1843 (Cournot (1843, p.155). Given its intuitive plausibility, it was doubtless proposed by others. Quoting from Howson and Urbach (1989, p.120), Cournot wrote that events which are sufficiently improbable "are rightly regarded as physically impossible". Others have followed a similar path. Popper wrote that scientists should make "a methodological decision to regard highly improbable events as ruled out - as prohibited", and Watkins (1984, p.244) wrote that this is a "non-arbitrary way of reinterpreting probabilistic hypotheses so as to render them falsifiable". Following the same line of thought, Gillies (1973, pp.171-72) proposed that certain very improbable consequences of a chance hypothesis should be effectively prohibited. Howson and Urbach (1989, p.219) summarise a strand of the thought of Kolmogorov and Cramér as "you are entitled to regard a hypothesis as refuted by sample data if it ascribes a very small probability to an event instantiated by that sample".
"Our experience of nature does not deceive us"; "Very low-chance events aren't observed". These were the ways that we described the IP . We will now argue that these are identical to Cournot's rule, interpreted charitably. Before we reach this interpretation, however, we will need to dispose of an incoherent alternative interpretation.
The rule should be interpreted methodologically, not as a constraint on consequences in general , but as a constraint on observed consequences. It is then seen to be a version of the Inductive Presupposition, and hence both acceptable and unjustifiable.
Consider the following diagram:
Using an interpretation of Cournot's rule (what we call the rulec) to assist in deriving consequences of the chance hypothesis is contradictory, as indicated on the first line of the diagram. It is, to quote Howson and Urbach, (p.220) "obviously unsound ... (by this rule) every hypothesis about the value of this probability is rejected". This is because, on Von Mises theory, any particular conjectured value of the chance of 5, from nearly 0 to nearly 1, will generate, in consequential finite sequences, any other relative frequency of appearance of 5. This interpretation of the rule, which seeks to limit the consequences of a chance hypothesis, is inconsistent with the onto-semantics of chance.
But if Cournot's rulec is unsound, then those other philosophers' intuitions are also obviously unsound. The principle of charity requires that we therefore consider the methodological interpretation - Cournot's rulem.
On the second line of the diagram, we have indicated what we suggest these philosophers were intending (or what we are intending). We propose that the rule is consistent with a chance hypothesis, interpreted according to Von Mises, but only if it used, not as a rulec, a constraint on allowable consequences flowing from the hypothesis, but as a rulem, a Methodological Guideline for testing. All kinds of consequences (finite sequences) flow from the hypothesis that 5 has a chance of 1/6 of coming up; but humans never observe the vast majority of these consequences. What we observe is, perhaps, is just one. What are humans to do, with their grand ambitions, clutching their pathetic little bit of information? "Give up", would be the natural answer from a superbeing. But we are genetically determined not to do so. "Nature won't be deceiving us", we plead, "it won't have tricked us by allowing us to have collected one of the sequences which has a relative frequency completely different from the true chance". It is a sadly inadequate response.
So we are not proposing the following, incoherent, rulec: "Some events which, as a consequence of our hypothesis that the chance is 1/6, will occur, will not occur". Instead we are proposing the following, unjustifiable, rulem: "Some events which, as a consequence of our hypothesis that the chance is 1/6, will occur, are not the ones we humanly observe". We suggest that all the quotations with which our essay began should be interpreted in this way.
At the moment the rulem is vague, but we first want to establish this fundamental alteration in its role - as an aid to testing, rather than as a constraint on consequences. Notice that the rule now seems familiar: It is the Inductive Presupposition.
We do not have to make the rule completely precise. It is a methodological guideline, which summarises the judgements regarded by the human consensus as most reasonable. It is as precise as it is - as humanity has managed to make it. A descriptive methodologist must just take the methodology as he finds it. And a normative methodologist has no grounds for criticism, unless he can justify greater precision.
What is it, then that people appear to presume? "Some events which, as a consequence of our hypothesis that the chance is 1/6, will occur, are not the ones we have humanly observed". OK, but which events? Howson and Urbach are rightly critical of the attempts of thinkers to specify this: words like 'overwhelmingly large', 'extremely unlikely', 'so very few', 'enormous coincidence', 'practically certain', 'a large number of times', 'differ very slightly', do not inspire confidence.
Von Mises's definition has established what we will take 'chance' to mean . We are therefore at liberty to use the idea to assist us in clarifying, just as far as is descriptively correct, the human presumption we are calling Cournot's rulem. If we choose a random finite section of the collective, of length n , then the chance that the relative frequency of a 5 in it will be closer to the chance (1/6) than a given small number e, decreases as n increases. This chance can be specified. In other words, in an infinite sequence of such choices, the proportion of choices which would display a relative frequency further than e from 1/6 is calculable. As n increases, the proportion decreases. Cournot's rulem, therefore, is, roughly, that the smaller the chance is of randomly obtaining a finite sequence with that relative frequency from the system, given the conjectured chance in the system, the less the reasonable/IP degree of belief we have in the conjecture, if that finite sequence is obtained.
The rule can be put another way: the greater the chance, if the 1/6 chance conjecture was true, that a random sampled finite sequence would fall in the range as far, or less, from 1/6 than the observed sample has fallen, the more reasonable/Ip degree of belief humans have in the conjecture. Consider an example: Suppose that the relative frequency in a humanly-observed 300-outcome sample is 0.1663. For n = 300, if C=1/6, then the chance of a randomly chosen 300-outcome event having a relative frequency within 0.01 of 1/6 may be 95%. Meanwhile, if C=5/6, the chance of a randomly chosen 300-outcome event having the relative frequency within the necessary 0.67 of 5/6 may be 0.02%. The smaller this chance, the less reasonable/IP degree of belief we have in the hypothesis.
There is no cut-off value in the human judgement. The relationship is not quantitative - it is just a broad impression. It is metamethodologically unsound to demand a cut-off value. To do so, is to contradict the descriptive methodologists' aim, which is to describe, to the true extent of precision, what the human intuitions, the guidelines for hypothesis-choice, are. If the intuitive guidelines contain no cut-off value, then our description must not provide one. If the reader feels that this is unacceptably vague, he should not complain to us, but to humanity - who will presumably respond: "Very interesting. Perhaps you might like to help us to establish this numerical cut-off of yours - it would be very helpful". The prognosis for success is not good.
The reference to 'random sampling' of the events (the finite sequences) is acceptable, because we know what we are taking 'random' to mean - it is given in the axiom of randomness. We are not still in the analytical, onto-semantic, stage. We are meaning that events of length 300 are taken from the collective in a way such that there is no pattern to the selection - in such a way that no place selection function operating on the resulting sequence of events would alter the relative frequency of occurrence of any particular aspect of the events (such as the relative frequency of 5 in them).
But do we have evidence that the human observing of an event, a 300-outcome sequence, is random? Can humans justify the conjecture that it is?
(i) No - such justification is impossible: Suppose that the judgements include reference to random sampling of a collective. To justify the judgement, we would need to explain what evidence human beings have for the truth of the claim that human sampling of Nature's collectives, to put it loosely, is random - and how this evidence justifies the claim. But since the judgement we are discussing is precisely how to use evidence to justify chancy claims like this, we would need to already have a justification for our judgements, which, ex hypothesi , we do not have. In brief, no human evidence could justify the claim that the event of human observing of a 300-outcome sequence from a die-tossing system is random.
(ii) Absence of justification does not matter. Our task was to describe, as precisely as is truthful, the human judgements expressed by Cournot's rulem, and the extent of their justification. This we have done.
Part 4: Cournot's Rulem and Consequences
Cournot's rulem, we have proposed, applies without inconsistency to the testing of chance hypotheses - leading back from observed data to the hypothesis. But suppose that we have the hypothesis. Cannot some definite data consequences be drawn from it?
It would seem bizarre if we could test the hypothesis, yet not draw consequences - predictions - from it. Yet we have agreed that the hypothesis that a die-system S1 has a chance of 1/6 of giving a 5, has the inescapable consequence that in a finite sequence of tests of length n , any relative frequency could occur in Nature.
Our solution is artificially to adjust the consequences, to bring them into line with our rulem's attitude to tests. After all, the limiting frequencies that could occur in Nature are of little concern to us. Suppose that we instead consider the chance that a human being who runs a sequence of tests tomorrow, of length n , will get a relative frequency within 0.001 of 1/6. Given Cournot' rulem, we can propose that what held for testing also holds for consequences: consequence sequences of length n in which the relative frequency is much further than 0.001 from 1/6 will occur - Oh, Yes, definitely - but not tomorrow - not in our back yard. The further the proposed frequency is from 1/6, the less we take to be the chance that we will record it. If, after testing, one of these sequences is observed, then the further it is from 1/6, the more suspicious we will be of the truth of the chance hypothesis. Although we know that such events are inescapable, we will be uneasy.
Expressing this constraint is a delicate matter. We cannot consistently claim that our hypothesis does not have these rare events as consequences - it does. What we can claim is that our next test is not going to be such an event. Such events will eventually occur, but our next test will not happen to be one of them. We will not happen to observe an event which has a very low relative frequency of occurrence in the infinite sequence.
"Why?", any self-respecting philosopher interrupts, really quite upset. But there is no answer. We are descriptive methodologists; we have found that this is what human beings presume. It is completely unjustifiable.
We have referred to the 'chance' that that consequence sequence occurs. This is only a consistent usage if we suppose that there is a system to possess this property. The system cannot be S1. Instead it is S2: a person, choosing a time and situation in which draw consequences from the conjectured S1. We conjecture that this choice is random. If we consider the set of finite sequences of length n generated by S1, and consider that S2 selects from them at random, then we can estimate the chance that S2 will choose a sequence that departs further than an amount e from 1/6. The smaller the chance, the more we are inclined to believe that we will not choose a sequence which deviates that far from 1/6.
We repeat that this is unjustified. All the above argument does is to establish, if we wish it, the human right to talk of the chance that our chancy system S1 will actually have the consequence, tomorrow, that the 100-term sequence we plan will display a relative frequency of 5s which is closer than 0.001 to 1/6.
We have argued that Cournot's rule, sensibly interpreted, is a version of the Inductive Presupposition. We now show the same of the Principal Principle.
"Our experience of Nature does not deceive us"; "Very low-chance events aren't observed". These were the ways that we described the IP . At one extreme, if we have conjectured that the chance of an event is very low, then it is reasonable/IP to have a very small degree of belief that we will observe it. At the other extreme, If we have conjectured that the chance is very high, then it is reasonable/IP to have a very high degree of belief that we will observe it. Overall, our reasonable/IP degree of belief in an event, conditional on the chance of it being a , is a . This is the Principal Principle.
This explains why the Principle's status has proved puzzling: it is a contentful statement - it is synthetic : it is unsupported, and seemingly unsupportable, by empirical evidence, or in any other way; it is seeming inescapable. A distressing combination - yet all these are unremarkable consequences of it being the IP .
So far, we have shown that Cournot's rule, and the Principal Principle, are forms of the Inductive Presupposition. It remains for us to show how Bayes's relationship also joins our club.
"Our experience of Nature does not deceive us"; "Very low-chance events aren't observed". These were the ways that we described the IP. Since this relationship summarises important features of physicists' reasoning, we will invent an illustrative example - whose historical accuracy is irrelevant.
Suppose that we have the hypothesis h1 that Wresnel's low-observability fave theory of light is entirely a human fiction, with no truth at all in it. Wresnel has shown that a bright dot in the middle of the shadow of a small object illuminated by a parallel-sided beam of light is an inevitable consequence e of his theory. But if his theory is pure fiction, then any consequences drawn from it are mere unsupported guesses. Suppose further that background knowledge BK implies that the guess should be false; no available laws imply that the shadow should have a bright dot at its centre. We then consider that the chance of any such guesses turning out true, is very low.
Suppose now that the consequence turns out to be true. According to the above argument, this is a very low-chance event, and therefore should not have been observed. Therefore there is a fault somewhere in the premises. We propose that h1 is false; Wresnel's theory is not entirely a human fiction. If our degree of belief in the occurrence of e /(h &BK ) is low, then our degree of belief in h /(e &BK ) is low.
Suppose instead that our degree of belief in the occurrence of e was very high anyway, on the basis of background knowledge; suppose that Wresnel's theory had implied that a ray of light should reflect off a plane mirror at the same angle that it was incident on it - which had been known for millennia. This is a high-chance event, and therefore should/IP have been observed. The falsity or truth of h 1 is irrelevant to the chance of the event, and therefore its occurrence is irrelevant to h 1's truth or falsity. A high degree of belief in e /BK justifiably/IP drastically reduces our degree of belief in h .
Finally, suppose that according to background knowledge, our degree of belief in h 1 is very high anyway. This will tend to increase our final degree of belief given e , whatever effect the occurrence of e has. (This has nothing to do with IP )
Combining these three factors, we conclude that the reasonable/IP degree of belief in h , given e and BK , is proportional to RDB(e /h ), proportional to RDB(h /BK), and inversely proportional to RDB(e /BK). This is Bayes' relationship.
The importance of this argument is that the link between Bayes' relationship and the IP is so intimate that we find it hard to imagine a justification for the relationship, that was not also a justification for the IP . In the present absence of a justification for IP , we conclude that there is presently no justification for Bayes' relationship.
There is no alternative route to the justification of the relationship via mathematics. Being a theorem in the axiomatisation of the probability calculus, is no assistance, because the four axioms are merely a formal summary of the informal patterns of reasoning associated with the word 'probability' in consensus use. The abstracting and formalising injects no additional justification into the relationships. They are justified precisely to the extent that those informal patterns of reasoning agreed by a consensus of humans are justified, and no more.
On this note, dismal to a Complete Justificationist but, to a Descriptive Methodologist, merely interesting, we turn to a recent attempt to solve our problem without the IP . We will argue that it fails.
HOWSON AND URBACH FAIL TO PROVIDE ANY JUSTIFICATION FOR TESTING OF
Can the Principal Principle, Bayes' relationship, and, in general, our
degree of belief in a hypothesis, given the evidence, be justified, as
opposed to justified/IP? Our thesis is that they cannot - that no degree
of belief in a chancy hypothesis can be justified by evidence. In order
to make this more plausible, we now criticise one recent attempt to claim
otherwise - Colin Howson and Peter Urbach's book Scientific Reasoning
. Page references are to the first edition.
It is a shame that my criticism of this book conceals the pleasure that I have obtained from studying it, and the amount that I have learned from it. Its clarity is exemplary.
We agree that the probability calculus, including Bayes' relationship, elegantly describes human behaviour. We also agree that our task is both descriptive and normative: p.1 "Precisely how this (calculating the probability of a claim in the light of given information) is done and why it is reasonable is the topic of this book".
The normative thrust of their argument is consistency . Probability claims form a consistent network, such that no-one can, without contradicting themselves, simultaneously insist both that they are using 'probability' in the usual way, and that they do not accept the Bayesian relationship.
Such consistency, though necessary, is not sufficient for justification. It merely transfers the onus for justification from the individual to the consensus - from one claim to the linked system of claims. Suppose that a person not following the guidance of, say, Bayes' relationship, is accused of using words like 'probably' inconsistently . He could respond: "I am free, as an investigator, to coin new words if I wish; I have good reasons for coining a new sense of 'probablyx' not summarised by Bayes' relationship". To our reply that the common use should not be abandoned so lightly, because it enshrined best practice, he responds: "What are your arguments that it is best practice?". What can we reply? Consistency arguments do not protect us from this cold wind.
What the consistency requirement is rightly saying - and all that it is saying - is that Bayes' relationship is a summary of consensus human intuition about how evidence supports hypotheses. Certainly humans should not contradict themselves. I cannot, without contradiction, claim both that the far side of the Moon is made of cheddar cheese, and that it is made of camembert. OK, so I claim that it is made of cheddar, and that it is made of a quite firm, yellow, cheese. Good, now I am being consistent. But this has not got me any distance along the road to justifying my claim, in competition with you, who think that the Moon is made of rock.
In brief, Bayes' relationship, and the other axioms of the calculus, have no methodological force . What they give is a partial principled descriptive reconstruction of human intuitive methodology (how, for example, theories are supported by evidence; in particular how a chancy conjecture is supported). Bayes' relationship offers no philosophical foundation for a logic of inductive inference, merely a neat descriptive summary of practice.
So principled description is not justification. Consistency is not justification. And agreement with the calculus is not justification. These all merely establish agreement with the majority view of what is reasonable. What extent of justification for this majority view do Howson and Urbach provide? Why is the majority view reasonable?
Their argument, which we suggest fails, is in several steps:
They argue that fair betting quotient FBQ is an intuitive measure of reasonable degree of belief.
They then argue (pp.63-5) that the axioms of a calculus, and hence its theorems (especially including Bayes' relationship) summarise the fairness of these quotients; if the axioms are denied, then the quotients will always no longer be fair . In effect, if 'fairness' is defined for betting in the everyday way, then fair betting odds are related in the same way as the probability calculus. This constitutes a proof that the calculus describes fair betting. So the calculus then needs no further support .
Considering the fourth axiom of conditional probability, which is important because of its close relationship to Bayes' relationship. The form of their argument is this:
(i) Define 'fair' such that, if a bettor can lay a stake at odds such that the bettor-on is sure to profit, whatever the outcome (calculated by considering the payoffs for all possible outcomes), then the odds are 'unfair'.
(ii) Now offer particular odds for initial bets on (a &b ) and on b , and show that the fair payoff for a consequent conditional bet on a /b implies precisely those odds on the bet which follow from the calculus. This will show that the calculus tracks fairness.
To do this, they consider placing a bet, at arbitrary odds, on (a &b ), on b , and on a /b . Their aim is to show that unless the odds on a /b are related to those for (a &b ) and b , such as to be consistent with the 4th Axiom P(a /b ) = P(a &b ) /P(b ), then the bettor-on will always gain or lose at the payoff(5). Thus for all combinations of odds and stakes, fairness implies the calculus.
Definition of fairness : This is vital to clarify. The bettor on offers certain odds for a group of associated bets, based, say, on two propositions of which the two people do not know the truth-value. Typically the payoff depends on the odds, and on the stakes.
How shall we define 'fair odds' or a 'fair bet'?
Definition 1 : 'A fair set of odds' are such that, for all possible stakes, the bettor on will not definitely win (or lose)', which is equivalent to 'odds such that there are no particular stakes which will enable the better on to definitely win or lose'. If we could find stakes which make the bettor on win, come what may, then the odds are not fair. In other words, on this definition a set of fair odds is one where the better on cannot get a definite advantage, whatever the stake.
Definition 2 : 'Bets' include both the stakes and the odds; 'a fair set of bets' is such that the better on will not definitely win or lose'.
Definition 2 is no use to us. The set of bets no longer represents an associated set of fair degrees of belief or confidence in an outcome; for each bet, the stake is involved as well. It is therefore useless for the ultimate aim of establishing a degree of belief in a hypothesis, given some evidence.
Definition 1 is that used by Howson and Urbach; they are thinking about fair betting quotients, not fair bets: On p. 59 "fair odds have been characterised as those which assign zero advantage to either side of a bet at those odds, where the advantage is regarded as a function of the odds only and not the stakes"; the (p.71) "set of betting quotients ... could not all be fair".
What is to be proved?
Using definition 1, we now try to state the claim that Howson and Urbach are making , and what their basic strategy is for proving it. They are apparently following Ramsey and de Finetti (p.71): if the betting quotients are not calculus-related, then the bettor on, either by dictating stakes or perhaps with any stakes, will win, come what may; "if people were to bet at odds derived from betting quotients which do not satisfy the probability calculus, and the side of the bet they took and the size of the stake could be dictated by the other bettor, the latter could in principle win a positive sum come what may". The need for the reference to "dictating the size of stake" is, perhaps, that they realise that the payoff depends both on the odds and on the stake. They are worried that for some sets of non-calculus-related quotients, there would be some sets of stakes where the payoff would still be nothing. Still, they argue that with a non-calculus-related set of quotients, you will be able to find a particular set of stakes where there is a net payoff.
Note, however, that finding such a set of quotients would prove nothing. What needs proving, is that if the quotients are calculus related, the better on cannot play this nasty trick; in this case, he or she cannot find a particular set of stakes where there is a net payoff. The most obvious way of proving this is to show that the net payoff is independent of the stakes. Disproving it , however, may be easier, since it merely requires finding one set of stakes for which a calculus-related set of quotients gives a net payoff.
Notice that "a particular set of stakes" could figure in a proof that a set of betting quotients is fair. But it would not be by Method A : Choosing a particular set of stakes, and showing that only with these stakes are calculus-related quotients necessarily fair. On the contrary, it would be by Method B : Showing that no such choice of a particular set of stakes could upset the necessary relationship.
Our task then is to show that for, for example, axiom 4, if the odds on (a &b ), on b , and on (a /b ), are related according to the calculus, then there is no set of stakes which would give the better on a definite advantage.
Given the way this argument now develops, very much to Howson and Urbach's disadvantage, the reader must persuade herself that this characterisation is fair. Is this the correct task? Is there any other possibility? This is important, because Howson and Urbach take the wrong turning at this point; they choose Method A; they pick one set of stakes, and then show that calculus-related quotients are fair. This fails to prove their general result that iff quotients are calculus-related they give the fair zero payoff, stake independent. Worse, we can use Method B to show that this general result is false.
Howson and Urbach successfully show that if odds (BQ) don't obey axioms 1 and 2, then the bettor on can definitely make a profit, come what may - regardless of the stake. The trouble starts with axiom 3. We propose that all they manage to show for this, and axiom 4, is that if the odds are related according to the calculus, there is one linked special set of stakes which would not give the bettor on a definite profit; to put it another way, they show that for just one particular set of stakes, the odds are calculus-related. Unfortunately they thus prove that if the odds are calculus-related, all the other infinitely many sets of stakes give the better on a definite advantage. They thus fail to achieve their aim, which was to prove that, for calculus-related odds are, there is no set of stakes which would give the better on a definite advantage.
Showing that there is one set of stakes which enables the calculus-related odds to give no definite advantage is irrelevant. The proof needs to show that there is no definite advantage for calculus-related odds, independent of the set of stakes. Not only do Howson and Urbach not show this, but their method shows that it is not true. In general, the better on can gain an advantage, using the calculus-related odds.
Fairness implies the calculus result only for a very particular case - if the odds and stakes are specially related. In general , with arbitrary odds and arbitrary stakes , fairness does not imply the calculus result. For axiom 3, and 4, the reason is within their algebra:
Axiom 3 : The axiom states that for h1 and h2, which cannot simultaneously be true, a fair betting quotient for h1Vh2 equals the sum of the fair betting quotients for h1 and h2.
Suppose that we bet S1 on h1, at odds p1/(1-p1), and S2 on h2, at odds p2/(1-p2). Suppose that we bet S3 on h1Vh2, at odds x/(1-x). Using Howson and Urbach's method, we can establish the payoff that the better on can expect for each of the possible three outcome state of affairs.
First we look at the case where all three stakes are equal:
To get no guaranteed pay-off, for a Dutch book to be impossible, the items in the two columns at the left have to be the same: S(1-x) = S(1-(p1 + p2)) and
-Sx = -S(p1 + p2) So x has to equal p1 + p2.
This is Howson and Urbach's proof. But we have argued that showing that the set of fair betting quotients is calculus-related for a single set of stakes is irrelevant. We need to show it for all stakes.
If we now look at the case where the three stakes are different (it is probably obvious to the reader that it is not going to work, so we will compress it):
Equating the last two columns as before, we find, from the bottom row, that :
x = (S1P1 + S2P2)/S3 . Call this f. From the first row we find that:
x = f + (S3 - S1)/S3
Thus in general x does not equal P1 + P2.
It would be natural for Howson and Urbach to respond at this point, seeing how the wind is blowing, seeing that the same will happen with axiom 4 (which is vital to their argument): "This is ridiculous. Obviously you won't get the same payoffs if you stake different amounts. That's why we did our proof with the intuitively fair same amount staked on all three bets".
Firstly, this is unconvincing, because, as we will see, the proof for axiom 4 only works for very unintuitively fair stakes: Two have to be equal; the third has to be in a ratio to the first two, which is the inverse of the ratio of the betting quotients of the first two the second.
Secondly, it is indeed obvious. But unfortunately it is not ridiculous. If we are trying to prove things about fair betting quotients, fair odds, then we have no freedom to choose stakes to suit ourselves. Our proofs have to work for all stakes.
Once again, I urge the reader to study Howson and Urbach text (1st edn. p.63-5; 2nd edn. pp.82-4) to decide for herself whether I have missed something.
We now supply an outline of the failure of the proof of axiom 4:
Axiom 4 : The stake V on (a &b ), though arbitrary at the start, is later specially chosen to be equal to r , where the odds on b are r/(1-r)); similarly, the stake S on b is put equal to q , where the odds on (a &b ) are q/(1-q). In general, if, and only if, they choose the stakes such that V /S = r /q , will consistent fair odds - no payoff - imply the odds of the 4th axiom. If the stakes are in any other ratio, then the middle term, -Vq +Sr , in the payoff matrix for (a &b ), given b , does not cancel. The resulting payoff matrix for a , given b , is no longer equivalent to that we would get by using q /r in our odds formula (ie. having odds on (a /b ) of (q/r)/(1-q/r)).
Far from being true in the general case, only in a very particular family of cases - those where the stakes are specially related to the odds - does this axiom give the correct association between fair odds for betting on a given b , on (a &b ) and on b . In most cases, it gives the wrong, unfair, odds.
This failure fatally undermines the rest of their argument, which is to obtain a fair betting quotients for some evidence, given a chance (Step 2), and then use Bayes' relationship to obtain a fair betting quotient on the chance, given some evidence (Step 3). Nonetheless, we now show that their second step lacks justification. Even if they had not tripped at step 1, they would have fallen at step 2. Still, there was no trouble with step 3...
Step 2 : They now aim to justify the Principal Principle. A RDB in the outcome, given the chance hypothesis, has to equal the hypothesised value of the chance, on consistency grounds. This is argued via fair betting quotients FBQ (the chance-betting quotient link). They thus argue either:
(i) that the Principal Principle follows analytically from the definition of 'fairness', such that it is provably fair (ie. justified) to have a degree of belief a in an event, if the chance of it occurring is taken to be a .
(ii) that the Principal Principle follows from the definition of 'fairness', combined with the definition of 'chance', according to which an outcome sequence converges as n increases.
The argument is that if a person is laying odds on an event occurring, given that the chance of it occurring is a , then he must use a betting quotient of a , and odds of a /(1-a ), on pain of inconsistency. After all, fair odds are supposed to be ones which will not guarantee that the bettor-on can always gain (obtain a positive pay-off); but if the chance is as the person presumes, then in the long run the fair bet must be in accordance with the supposed relative frequency.
Comment : If 'fair' is taken to mean 'fairi', meaning 'fair over the entire infinite sequence', then argument (i) is possible. By postulating a chance for heads of p , we are postulating that, in the limit, the relative frequency of heads in the collective is p ; but only a betting quotient of p is then fairi, since any other quotient would give a guaranteed payoff. This is true, in the limit at infinity. Whether this falls within the intuitive meaning of a fair bet, intended to measure a reasonable degree of belief or confidence in an outcome, we doubt. No one - including Howson and Urbach - would judge the intuitive concept of a fair bet/reasonable degree of belief to refer only to the result of an infinite series of tests. It would be regarded as a misuse of language (ie. counter to common intuition) to insist that fair odds on heads are 1:1, while accepting with equanimity a run of 1 000 000 heads in a row. If the fair betting quotient is 1:1, you shouldn't be accepting that you can get, as many times as you try out 1 000 000 tests, 1 000 000 heads in a row.
But it is not important, because Howson and Urbach are arguing for (ii). On p.228 they write: "you can infer that the odds you have stated would lead to a loss (or gain) after finite time, and one which would continue thereafter " (my emphasis). They aim to employ the way that the relative frequency of heads in a coin-tossing collective converges closer and closer to its infinite limit as n increases. We therefore now interpret 'fairx' to mean 'fairf' - meaning 'fair over a finite sequence'.
Failure of the convergence argument
For a coin, if we think that it has a chance of 0.5 of coming up heads, then a RDB in getting a head in the next throw cannot, we suggest, be proved, from the Von Mises definition of 'chance', to be 0.5. What can be proved is that the RDB in getting a head 0.5 of the times in an infinite sequence of throws is 0.5.
The Von Mises axiom of convergence is equivalent to an empirical Strong Law of Large Numbers. We know about the infinite sequence; the consequences problem is how to make the move from the infinite definition to the finite , even singular , consequence. (Equivalent to the move from the Strong Law to the Weak Law)
The key question is : Does Von Mises' theoretical model of a chance include, or imply, that if the chance of heads is 0.5, then, as n increases, but remains finite , the recorded relative frequency of heads in a particular n-tuple test steadily, predictably, approaches nearer and nearer to 0.5? We suggest that the answer, fatally for Step 2, is "No". It is perfectly consistent with a coin having a Von Mises chance of 0.5 that the first 1 000 000 throws of the coin come up heads. 1 000 000 is negligible, compared to infinity. There is still an infinite number of tests to be done before the limit is reached.
Indeed, using an argument that Howson and Urbach use themselves to argue against Cournot's rule, Von Mises' model cannot consistently include this feature of finite convergence , as we now call it. There is a finite chance that an n-tuple selected from a collective of outcomes of coin tossing will have a relative frequency of heads which differs as much as we like from 0.5. If we happen to be collecting data within this n-tuple, then we will see no sign of the relative frequency approaching 0.5. If Von Mises' model included a condition eliminating this n-tuple, it would become inconsistent.
It would not even help us if Ron Mises (Von Mises' English cousin) had devised a second onto-semantic model for chance, chance2, in which finite convergence was guaranteed. This would fit as well as chance1 with our observations of real systems, as long we ignored the very rare occasions where finite convergence was not observed. We could now infer that if a system has a chance2 of 0.5 of giving heads, then after a sequence of tests of length n , the relative frequency will be within e of 0.5. Would we, by devising this model, have solved our problem? We would, since a bet at these odds (1:1) would now give no guaranteed payoff to the bettor-on. But the question now arises: "What is the extent our justification for claiming that the real coin-tossing system is an example of Ron Mises's chance, in preference to Von Mises' chance?" We seem to have no extent of justification at all.
This may seem odd to the reader. How could Howson and Urbach have come to use the supposition of finite convergence (the weak law of large numbers) if it is not justified? The point seems even odder when we consider that in mathematics books this law is regarded as derivable from Von Mises' model.
The resolution of this oddity demonstrates the difference between mathematics and philosophy. The philosopher intuitively realises that Von Mises' model could not establish finite convergence, because the model involves no presupposition that sceptical doubt is quarantined, no presumption that our human sampling of Nature is fair; its axioms allow for anything to happen. What mathematicians do, and then statisticians, is to feed into their derivations the natural, but unjustified, inductive presupposition. They are able to show Z : As n increases, the chance of an n-tuple differing from 0.5 by more than e decreases rapidly; as Howson and Urbach write (p.229): "the function of x xr(1-x)n-r, 0<x<1, peaks very sharply, for large n, in the neighbourhood of x = r/n, and is close to 0 elsewhere". Z is absolutely true, in the model - when chance is, naturally, consistently interpreted according to Von Mises; we are considering the collective of n-tuples extracted randomly from a collective of outcomes. Now we consider our human situation. If the model is presupposed, and if the system is conjectured to be chancy (on this model), then as n increases the chance of an n-tuple having a relative frequency very different from 0.5 becomes extremely small. This sounds excellent.
But unfortunately no progress has been made . The fundamental philosophical, methodological, problem, is the extent of justification of associating claims of chances, with reasonable degrees of belief. Suppose that I have just done an n-tuple, test, where n is large. It is very nice to know that in an infinite collective of n-tuple tests, the relative frequency of n-tuples in which the relative frequency is very different from 0.5 is extremely small. But unfortunately I am left wondering if the particular n-tuple test that I have just done (or set of tests - it doesn't help) is typical . Maybe I happen to have observed one of the inevitable n-tuples which has a relative frequency very different from 0.5. This wondering works, as usual, both ways, for consequences and for testing .
This mistake occurs because of slipping between mathematics and philosophy. The central philosophical problem is so silly, practically speaking, that any thinker tends to forget it, and latch onto arguments, especially technical ones, that short-circuit it. This is understandable, because it is commonsense - the one thing not allowed in philosophical thinking. There is absolutely no possibility that mathematicians and statisticians will have taken any notice at all of sceptical doubts when they were developing their procedures for consequences and testing of their model.
Remember that in Howson and Urbach's argument the reasonable betting quotient is conditional on the conjecture that the Von Mises chance of heads is 0.5. But if this conjecture gives no justification for predicting any particular n-tuple consequence from the system, then the RBQ is not constrained in any way by the chance condition. There is no inconsistency in conjecturing that the chance of heads is 0.5, but betting on heads with odds of 0.99.
We conclude that this vital step in Howson and Urbach's argument is unsound. All their references to the way the function "peaks very sharply" are irrelevant.
The only way to avoid this conclusion is to insist, which they explicitly do not do, that the betting is done on the basis of expectations of payoff not for finite, but only for infinite sequences. This would be a strange concept of betting; a reasonable degree of belief in a certain outcome is based on
Given its importance, we now repeat our argument against Step 2 with a different example, and some diagrams:
Betting involves two concepts: (i) the limiting relative frequency lrf that, say, Red Rum would win, if there was a collective of similar races (ii) the RDB in the outcome of just this one race.
The bettor-on and bettor-against have judged the lrf as best they can. Suppose that they have agreed that it is about 1/60. On this basis they agree that a reasonableL bet is at these odds - where 'reasonableL' means 'such that neither would gain an advantage in the limit'. This is provably reasonableL, in that if they offered another figure as a RLBQ, they would be contradicting themselves. This is one step along the path of the argument.
The next step, however, is to move from infinite limiting cases to the claim that the odds we have stated would lead to a loss (or gain) after finite time , and one that would continue thereafter. This needs a justification, and the natural one would be that in a collective, as n increases, the relative frequency narrows down on the value of C, such that, after a certain number of tests, it is as close as you like to the value of C.
If we can provide a reason to judge that our relative frequency will vary in this way, converging on the limiting frequency, then our claim goes through. So far, so good.
But what can our reason be? Our question is clear: What reason do we have for denying the possibility that, in the particular cases of human observation of finite sections of collectives, our observations cause the cumulative relative frequency to vary as shown in Fig. 3 (below), systematically misleading us - despite the fact that, ex hypothesi , if we carried on observing to infinity, the observed relative frequency would eventually drop down to the dotted line - satisfying our claim concerning the chance?
However large the number of tests completed, compared to infinity it is negligible.
We know that Von Mises model includes the possibility of this happening, so our only hope is to appeal to other aspects of the model, to establish that the chance of it happening is very small. After all, considering the collective of n -tuple collectives consistent with the chance being 0.5, it may well be that the relative frequency of occurrence of n -tuple collectives in which the relative frequency of heads differs from 0.5 by more than e , for a particular n , is extremely small.
But this is useless, pushing our problem one level further up. What is our justification for believing that the n -tuple that we have just collected is typical of the collective of n -tuple collectives? Certainly the chance that we have collected an aberrant n -tuple, with a relative frequency far away from 0.5, is small. But why do small chances justify small degrees of belief? No progress has been made.
Step 3 : Finally they use Bayes' relationship (pp.228-230) to derive a reasonable degree of belief RDB in a chancy hypothesis, given the outcome evidence, from a RDB in the outcome, given the hypothesis (evidence-hypothesis link)
Response : This step would be fine, but our intrepid intellectual traveller sadly never reaches it.
Summary: Howson and Urbach's task is to describe the How, and the Why, of 'Scientific Reasoning', when evidence is used to support a chancy hypothesis - a situation which is common at base level, and very common at the meta-level. Do they succeed?
How? We accept that the calculus describes aspects of scientific reasoning, including how evidence supports hypotheses - in general, and in the particular case of chancy hypotheses.
Why? They hope to prove that evidence provides a degree of support, a degree of confidence, a reasonable degree of belief, a fair betting quotient, in a chancy hypothesis. Unfortunately, their proof contains two serious errors. The first shows that fair betting quotients are not, in general, calculus-related. The second, much more important for this essay's theme, shows that the Inductive Presupposition is very hard to evade. Without it, attempts to establish evidential support tend to stagger forward a little way, and then collapse.
Even if their version of the Principal Principle were justified, which it isn't, fair betting quotients do not in general obey the probability calculus. Therefore there is no justification for assuming that we can use Bayes' relationship to get from a FBQ on the evidence, given the hypothesis, to our goal, which is a FBQ on the hypothesis, given the evidence.
How much of their book survives this failure, we will leave to the reader to judge.
SECTION C: CONCLUSION
Our problem was: "To what extent can chance hypotheses be tested?".
We conclude that:
(i) Onto-semantics : Von Mises' chance is the best available model to describe the property that we conjecture some systems to have.
(ii) Descriptive Methodology :All humans, including sophisticated investigators of Nature, unthinkingly use a consensus guideline summarised by the Principal Principle (Cournot's rulem; Bayes' relationship) to link their limited observations to their chance hypotheses. It is a form of the Inductive Presupposition. The same guideline summarises the procedure for statistical testing of hypotheses at the 5%, 1%, or other significance levels.
(iii) Extent of Justification of Methods : The problem of justifying this guideline has, so far, not been solved; it should be quarantined. philosophers - the only group of people interested in it - should remember that no-one else cares, and that nothing practical would change if it was solved.
It is the baleful influence of sceptical doubt - whose various camouflaged appearances can sometimes seem to define problems as 'philosophy' - which has generated this problem. With this influence located, and the inductive presupposition stated, sceptical doubt is forced into quarantine, and the problems disappear. The only work left to be done is for foolhardy philosophers - within the quarantine ward.
References (not completed)
Cournot. A.A. (1843) Exposition De La Théorie Des Chances Et Des Probabilités , Paris
Howson, C. and Urbach P. (1989) Scientific Reasoning , Open Court (1st edn.)
Howson, C. (1995) Theories Of Probability , British Journal For The philosophy Of Science, Vol.46. N0. 1. pp.1-32
Kolmogorov, A.N. (1950) Foundations Of The Theory Of Probability , New York, Chelsea
Popper, K.R. (1959) The Logic Of Scientific Discovery , London, Hutchinson
Watkins, J.W.N. (1985) Science And Scepticism London, Hutchinson
Gillies, D.A. (1973) An Objective Theory Of Probability , London, Methuen
Miller, D. (1994) Critical Rationalism
1 The other is "Nature is uniform, unless proven non-uniform". This summarises the human attitude to generalising, and ties in with the conjecture of Natural Kinds. We will not discuss it further in this essay.
2 Not "very low-chance events don't occur"; that is an unacceptable version, contradicting the meaning of chancy hypotheses (discussed further below).
3 I use the word 'relationship' because 'theorem' sounds misleadingly powerful. I know that mathematically it is a theorem. I use the weaker-sounding word to emphasise that a theorem is only justified to the extent that its axioms are justified.
4 The methods of Fisher, and of Neyman and Pearson, are natural extensions of Cournot's approach, taking us into the details of hypothesis testing. These classic methods are sound/IP. Howson and Urbach are right to point out that they all suffer from the problem that Cournot's suffers from. But in this essay we argue that the problem - identifiable as the Inductive Presupposition - is incurable, and universally ignored by non-philosophers. The classic methods are therefore justified/IP - and that is enough.
The predictable failure of Howson and Urbach to provide a cure is detailed in the second section of this essay.
5 Odds are directly related to the betting quotient, in that, for example, if the BQ(b) is r , then the odds on (b ) are r /(i-r )
Return to Home Page