An Analysis upon Concept and Basics of Probability Theory: A Review

by Rajkumar Ahuja*, Dr. Vinod Kumar Sharma,

- Published in Journal of Advances in Science and Technology, E-ISSN: 2230-9659

Volume 11, Issue No. 22, May 2016, Pages 0 - 0 (0)

Published by: Ignited Minds Journals


ABSTRACT

In recent years probabilistic knowledge-based systems such as Bayesian networks and influence diagrams have come to the fore as a means of representing and reasoning about complex real-world situations. Although some of the probabilities used in these models may be obtained statistically, where this is impossible or simply inconvenient, modellers rely on expert knowledge. Experts, however, typically find it difficult to specify exact probabilities and conventional representations cannot reflect any uncertainty they may have. In this way, the use of conventional point probabilities can damage the accuracy, robustness and interpretability of acquired models. In talking about no discrete probability spaces, it is difficult to avoid measure-theoretic concepts. However, to develop extensive formal machinery from measure theory before going into probability (as is done in most graduate programs in mathematics) would be inappropriate for the particular audience to whom the book is addressed. Thus I have tried to suggest, when possible, the underlying measure-theoretic ideas, while emphasizing the probabilistic way of thinking, which is likely to be quite novel to anyone studying this subject for the first time.

KEYWORD

probability theory, probabilistic knowledge-based systems, Bayesian networks, influence diagrams, expert knowledge, uncertainty, point probabilities, measure-theoretic concepts, formal machinery, underlying measure-theoretic ideas, probabilistic way of thinking

INTRODUCTION

The origin of probability theory lies in physical observations associated with games of chance. It was found that if an “unbiased” coin is tossed independ-ently n times, where n is very large, the relative frequency of heads, that is, the ratio of the number of heads to the total number of tosses, is very likely to be very close to 1/2. Similarly, if a card is drawn from a perfectly shuffled deck and then is replaced, the deck is reshuffled, and the process is repeated over and over again, there is (in some sense) convergence of the relative frequency of spades to 1/4. In the card experiment there are 52 possible outcomes when a single card is drawn. There is no reason to favor one outcome over another (the principle of “insufficient reason” or of “least astonishment”), and so the early workers in probability took as the probability of obtaining a spade the number of favorable outcomes divided by the total number of outcomes, that is, 13/52 or 1/4. This so-called “classical definition” of probability (the probability of an event is the number of outcomes favorable to the event, divided by the total number of outcomes, where all outcomes are equally likely) is first of all restrictive (it considers only experiments with a finite number of outcomes) and, more seriously, circular (no matter how you look at it, “equally likely” essentially means “equally probable,” and thus we are using the concept of probability to define probability itself). Thus we cannot use this idea as the basis of a mathematical theory of probability; however, the early proba- bilists were not prevented from deriving many valid and useful results. Similarly, an attempt at a frequency definition of probability will cause trouble. Ifis the number of occurrences of an event in n independent performances of an experiment, we expect physically that the relative frequencyshould co verge to a limit; however, we cannot assert that the limit exists in a mathematical sense. In the case of the tossing of an unbiased coin, we expect that, but a conceivable outcome of the process is that the coin will keep coming up heads forever. In other words it is possible that, or that any number between 0 and 1, or that has no limit at all. In this study we introduce the concepts that are to be used in the construction of a mathematical theory of probability. The first ingredient we need is a set, called the sample space, representing the collection of possible outcomes of a random experiment. For example, if a coin is tossed once we may experiment and we need a different , say; in this case one performance of the experiment corresponds to two tosses of the coin. If a single die is tossed, we may taketo consist of six points, sayHowever, another possible sample space consists of two points, corresponding to the outcomes “N is even” and “N is odd,” where N is the result of the toss. Thus different sample spaces can be associated with the same experiment. The nature of the particular problem under consideration will dictate which sample space is to be used. If we are interested, for example, in whether or notin a given performance of the experiment, the second sample space, corresponding to “N even” and “N odd,” will not be useful to us.

In general, the only physical requirement onis that a given performance of the experiment must produce a

result corresponding to exactly one of the points of. We have as yet no mathematical requirements on; it is simply a set of points. Next we come to the notion of event. An “event” associated with a random experiment corresponds to a question about the experiment that has a yes or no answer, and this in turn is associated with a subset of the sample space. For example, if a coin is tossed twice and, “the number of heads is” will be a condition that either occurs or does not occur in a given performance of the experiment. That is, after the experiment is performed, the question “Is the number of heads” can be answered yes or no. The subset ofcorresponding to a “yes” answer is; that is, if the outcome of the experiment is HT, TH, or TT, the answer

Figure 2.1 Coin-Tossing Experiment.

to the question “Is the number of heads” will be “yes,” and if the outcome is HH, the answer will be “no.” Similarly, the subset of Q associated with the “event” that the result of the first toss is the same as the result of the second toss is. space. Events will be denoted by capital letters at the beginning of the English alphabet, such as A, B, C, and so on. An event may be characterized by listing all of its points, or equivalently by describing the conditions under which the event will occur. For example, in the coin-tossing experiment just considered, we write A = {the number of heads is less than or equal to 1} This expression is to be read as “A is the set consisting of those outcomes which satisfy the condition that the number of heads is less than or equal to 1,” or, more simply, “A is the event that the number of heads is less than or equal to 1.” The event A consists of the points HT, TH, and TT; therefore we write, which is to be read “A is the event consisting of the points HT, TH, and TT." As another example, if B is the event that the result of the first toss is the same as the result of the second toss, we may describe B by writing B = {first toss = second toss} or, equivalently, (see Figure 2.1). Each point belonging to an event A is said to be favorable to A. The event A will occur in a given performance of the experiment if and only if the outcome of the experiment corresponds to one of the points of A. The entire sample spaceis said to be the sure (or certain) event; if must occur on any given performance of the experiment. On the other hand, the event consisting of none of the points of the sample space, that is, the empty set, is called the impossible event; it can never occur in a given performance of the experiment.

PROBABILITY

We now consider the assignment of probabilities to events. A technical complication arises here. It may not always be possible to regard all subsets ofas events. We may discard or fail to measure some of the information in the outcome corresponding to the point, so that for a given subset A of, it may not be possible to give a yes or no answer to the question “Is” For example, if the experiment involves tossing a coin five times, we may record the results of only the first three tosses, so that A = (at least four heads} will not be “measurable”; that is, membership ofcannot be determined from the given information about . In a given problem there will be a particular class of subsets ofcalled the “class of events.” For

Rajkumar Ahuja1* Dr. Vinod Kumar Sharma2

the event classform a sigma field, which is a collection of subsets of satisfying the following three requirements.

(2.1)

implies(2.2) That is,is closed under finite or countable union. implies(2.3) That is,is closed under complementation. Notice that if.thenby (2.3); hence by (2.2). By the DeMorgan laws, hence, by (2.3),. Thusis closed under finite or countable intersection. Also, by (2.1) and (2.3), the empty setbelongs to Thus, for example, if the question “Didoccur?” has a definite answer for n = 1, 2,... , so do the questions “Did at least one of theoccur?” and “Did all theoccur?” Note also that if we apply the algebraic operations to sets in , the new sets we obtain still belong to In many cases we shall be able to take= the collection of all subsets of, so that every subset ofis an event. Problems in whichcannot be chosen in this way generally arise in uncountably infinite sample spaces; for example,= the reals. Weare now ready to talk about the assignment of probabilities to events. If, the probability P{A) should somehow reflect the long-run relative frequency of A in a large number of independent repetitions of the experiment. Thus P(A) should be a number between 0 and 1, andshould be 1. Now if A and B are disjoint events, the number of occurrences of in n performances of the experiment is obtained by adding the number of occurrences of A to the number of occurrences of B. Thus we should have if A and B are disjoint and, similarly, if when we have a countably infinite family of disjoint events The assumption of countable father than simply finite additively has not been convincingly justified physically or philosophically; however, it leads to a much richer mathematical theory. A function that assigns a number P(A) to each set A in the sigma field is called a probability measure on, provided that the following conditions are satisfied. for every(2.4)

(2.5)

Ifare disjoint sets in, then

(2.6)

We may now give the underlying mathematical framework for probability theory.

BASIC AXIOMS

The probability of an event is the area of the rectangle that represents the event, and the sample space is the union of all events. This representation can be generalized to more abstract spaces and leads to an axiomatic definition of probability in terms of measure over a collection of subsets. This collection is assumed to contain the empty set, and to be closed under the complementation and countable union (i.e..)

Theorem 2.1 Let S denote the sample space. A set functiondefined inis a probability function if: 1. For any event A in, then;

= 1;

hencefor all i,j, then

From these axioms, the following elementary properties can be derived.

Properties 1 Letbe a probability function defined over the sample space. Thensatisfies the following properties:

1.= 0;

2.is finitely additive; ifare events in, such thatfor all, then

(2.7)

If these events form a partition ofS, i.e. they are such that, then; 3., so thatfor any A in; 4. ifthen;

Axiom (iii) is known as countable additively and it is rejected by a school of probabilists who replace the countable additivity by finite additivity. Consider now the two events A and B . If we computedas we would obtainthat exceeds 1. The error here is that, in computingas, the event A, B is counted twice. Indeed, we can decompose A intoand similarly B into. Since the intersection, the eventsand (A, B) are exclusive and there follows, from item 3 in Theorem 2.1, thatand similarly. The event is given by, and the three events are exclusive. Thus, from property (2.7) we have . The rule derived in this example holds in general:

Sample Space -

A probabilistic (or statistical!) experiment has the following characteristics: (a) the set of all possible outcomes of the experiment can be described; (b) the outcome of the experiment cannot be predicted with certainty prior to the performance of the experiment. The set of all possible outcomes (or sample points) of the experiment is called the sample space and is denoted by. For a given experiment it may be possible to define several sample spaces. Example For the experiment of tossing a coin three times, we could define (a) each outcome being an ordered sequence of results; or (b) each outcome being a possible value for the number of heads obtained. Ifconsists of a list of outcomes (finite or infinite in number),is a discrete sample space. Examples i. Tossing a die: ii. Tossing a coin until the first head appears: Otherwiseis an uncountable sample space. In particular, ifbelongs to a Euclidean space (e.g. real line, plane),is a continuous sample space. Example Lifetime of an electronic device:

Event Space -

Events - A specified collection of outcomes inis called an event: i.e., any subset of (including itself) is an event. When the experiment is performed, an event A occurs if the outcome is a member of A. Example In tossing a die once, let the event A be the occurrence of an even number: i.e., . If a 2 or 4 or 6 is obtained when the die is tossed, event A occurs.

Rajkumar Ahuja1* Dr. Vinod Kumar Sharma2

elementary event. If an event contains no outcomes, it is called the impossible or null event and is denoted by. Combination of events - Since events are sets, they may be combined using the notation of set theory: Venn diagrams are useful for exhibiting definitions and results, and you should draw such a diagram for each operation and identity introduced below. [In the following. A, B, C, A\, ...,An are events in the event space(discussed below), and are therefore subsets of the sample space. The union of A and B, denoted by, is the event „either A or B, or both‟. The intersection of A and B, denoted by , is the event „both A and B'. The union and intersection operations are commutative, i.e.

(2.8)

associative, i.e.

(2.9)

and distributive:

(2.10)

If A is a subset of B, denoted by, then and

The difference of A and B, denoted by A\B, is the event „A but not B’.

The complement of A, denoted byis the event „not A'. The complement operation has the properties:

(2.11)

and (2.12) be in our discussion). Two events A and B are termed mutually exclusive if Two events A and B are termed exhaustive if The above results may be generalized to combinations of n events: thusis the event „at least one of „. is the event „all of‟.

(2.13) (2.14)

or

(2.15)

or

(2.16)

(The last two results are known as de Morgan‟s Laws - see e.g. Ross for proofs.)

The eventsare termed mutually

exclusive iffor all The eventsare termed exhaustive if If the eventsare both mutually exclusive and exhaustive, they are called a

partition of

to all its members. This suggests the concept of event space. A collectionof subsets of the sample space is called an event space (or-field) if (a) the certain eventand the impossible eventbelong to (b) if, then (c) if, then, i.e.is closed under the operation of taking countable unions. It is readily shown that, if, then For (invoking properties (b) and (c) ofand the result follows from property (b). For a finite sample space, we normally use the collection of all subsets of(the power set of ) as the event space. For(or a subset of the real line), the collection of sets containing all one-point sets and all well-defined intervals is an event space.

CONDITIONAL PROBABILITY

The probabilities considered so far are unconditional probabilities. In some situations, however, we may be interested in the probability of an event given the occurrence of some other event. For instance, the probability of R: “Tomorrow, January 16th, it will rain in Amherst” would change, if we happened to know that tomorrow is a cloudy day. Formally, if we denote by C the event “Tomorrow, 16th of January, will be cloudy”, assuming the occurrence of C is equivalent to restricting our sample space, because other events as S (sunny day) are ruled out. We thus need to recompute the probability of R by taking into account this new piece of information. This is formally done by considering the conditional probability of R given that C occurs. This event is denoted by. Consider the events A and B. If we limit the scenario of possible events to A, the occurrence of B would be restricted to A, B. If we knew that A occurs, we would then deduce . However, since, we can only state that, where k is proportionality constant that accounts for the uncertainty in the occurrence of A. Clearly, we haveand also . From this, we deduce thatand the conditional probability is thus defined as follows. events in, and suppose that The conditional probability of B given A is:

(2.17)

To emphasize thatis unconditional,is called marginal probability.

Example 2.1 (Conditional Probability) Consider choosing a card from a well-shuffled standard deck of 52 playing cards. The probability that the first card extracted is an ace is clearly 4/52. Suppose that, after the first extraction, the card is not reinserted in the deck. What is the probability that the second card is an ace, given that the first card is an ace? Let A be the event that the first card is an ace, and let B be the event that the second card is an ace. The probability of A, B is

and. On using (2.17) we have

Indeed, there are three aces left in a deck of 51 cards.

From the definition of conditional probability in (2.17), we derive the probability of the intersection of two events, called their joint probability in terms of conditional and marginal probabilities:

(2.18)

This rule can be applied to a larger number of events and produces the multiplication rule or factorization rule.

Definition 2.2 (Multiplication Rule) The joint probability of a set of eventscan be expressed

as Consider again the events A and B . The events A andform a partition of, so that we can decompose B into the union of the two exclusive events A, B and. Thus, if we use (4) and the third axiom of we have: (2.19)

Rajkumar Ahuja1* Dr. Vinod Kumar Sharma2

as a weighted average of the conditional probabilities and with weights given by and . The importance of the Total Probability Theorem is that, sometimes, expressing conditional probabilities can be easier than expressing marginal probabilities, and (2.19) can be used to “break down” an event in more specific events, on which a more precise knowledge is available. Suppose, as an example, that B is the event that the result of a test to diagnose the presence of a disease A is positive. Quantifying the incidence of false positiveand false negative can be easier than quantifying the marginal probability of B. If, further, the incidence rate of A is known, then (2.19) can be used to derive. The multiplication rule and the Total probability theorem can be extended to conditional probabilities. So, and

REFERENCES

Carlos Al´os-Ferrer and Ana B. Ania (2005). The evolutionary stability of perfectly competitive behavior. Economic Theory, 26: pp. 497–516. Casella, G., and Berger, R. L. (2000). Statistical Inference. Duxbury Press, Belmont, Ca. Cover, T. M., and Thomas, M. (2001). Elements of Information Theory. Wiley, New York, NY. Evans, M., and Swartz, T. (2005). Methods for approximating integrals in statistics with special emphasis on Bayesian integration problems. Statistical Science, 10, pp. 254–272. Karr, A. F. (2002). Probability. Springer-Verlag, New York. Whittaker, J. (2000). Graphical Models in Applied Multivariate Statistics. Wiley, New York, NY.

Corresponding Author Rajkumar*

Author Designation

E-Mail – ahujarajkumar22@gmail.com