Below are notes that I took while going through Bite Size Bayes. I kinda gave up in/after chapter 7, as the theory wasn't explained clearly enough for me to follow what's happening in the code and graphs. I've since been supplementing this by reading the Bayesian statistics chapter of Statistical Thinking for the 21st Century.

Ch 3

Unnormalized probabilities (Unnorms) = those probs that precede division by total probability

The sum of the Unnorms (across all hypotheses considered) add up to the P(D); I suspect that that these add up to P(D) in this way because we're assuming that all possible cases (hypotheses) are accounted for, so the observations in those cases can be summed to form the full set of observations possible for the data.

Ch 4

At this point your intuition might tell you that it is the weighted sum of the conditional probabilities:

Nope, it did not.

we can describe [it] in terms of logical operators like this:

The outcome is 1 if you choose the 4-sided die **and** roll 1 **or** you roll the 6-sided die **and** roll 1.

Intriguing. I *had* intuitively considered this, but not to this degree of specificity: I'd considered it as if hypothesis 1 is true [while rolling a 1, the data] **or** hypothesis 2 is true [while rolling a 1, the data], which I suspect would indeed be formulated as above!

However, I had not realized that there'd be tension between whether the hypotheses were "mutually exclusive, which means only one of them can be true," or "Collectively exhaustive," where "at least one of them must be true."

Exercise: Suppose you have a 4-sided, 6-sided, and 8-sided die. You choose one at random and roll it, what is the probability of getting a 1?

Do you expect it to be higher or lower than in the previous example [which had only the first two die]?

I got this wrong. I expected it to be lower. Now that I know it should be higher, I suspect that I should've thought about this in terms of how the equation for the probability works with the disjunctions. ... After checking the equation again, I see that, indeed, the equation for total probability (which includes disjunctions, but the example is a case of mutual exclusivity so we don't need to worry about this much) suggests that the total probability can only increase with additional hypotheses. (Because probabilities are only positive.)

Regarding the section "Prediction and inference," I do not like how it's presented. I need more explanation about how to think about how to re-structure the problems to understand how the stated solutions fit the new, different scenario.

Ch 5

the positive test is evidence in favor of condition. But the prior is small and the evidence is not strong enough to overcome it; despite the positive test, the probability that Jimmy has the condition is only about 16%.

How do we determine when evidence is "strong enough"? [I'm still not sure.]

Regarding "The Elvis problem," I'm again wanting for better explanation about how we'd go about structuring the problem in the appropriate way (interpreting the problem and formulating it appropriately). The notebook that explores the Elvis problem is nice, but it doesn't quite address my concerns.


I did not understand this Chapter's treatment of PMF. I'll try to read it more, but I think it needs more explanation of what a PMF *is*.

Oh, now I see. As stated, "A PMF is a set of possible outcomes and their corresponding probabilities." The author describes the code that represents the PMF as "a function that takes a sequence of outcomes, xs, and a sequence of probabilities, ps, and returns a Pandas Series that represents a PMF." The "a sequence of outcomes" is the "set of possible outcomes" mentioned before and the "sequence of probabilities" are the possible outcomes' "corresponding probabilities". When the function bayes_update is later defined and used, it modifies the latter in the PMF but not the former.

Ch 7

Typo: In "For example, here's the probability that x is less than or equal to 60%", it should be 60, not 60%, since it's dealing with the index values, not their probabilities (which it sums later on).

I wonder, how related are the concepts of "credible intervals" and "confidence intervals"? [Statsthinking21 makes this comparison.]