I debated leaving this posting alone, but I didn't like any of the answers. Of course, my liking something must be the defining methodology to determine truth!
My difficulty is that the conclusion doesn't follow from the premises in the questions.
My simple questions are:
1) If we consider statistics as a science, instead of a philosophy or even a psychology, isn't it necessary that a statistical prediction (or the outcome of a statistical calculation) be tested in the real world (at least in principle), as physical sciences usually do?
2)If statistical calculations need to be tested (at least in principle), is the method of test the same as physical sciences, namely, through experiments and observations?
My thinking is that if the answer to both questions are yes, then the Bayesian probability can be interpreted as frequentist, and ultimately the two are not so different.
The answer to 1 is yes depending on how you mean "as physical sciences do?" Statistics are often tested by numerical simulation which is a physical simulation in that it is performed using pseudorandom numbers. They are also tested through validation in specific studies, regardless of whether hard or soft.
The answer to 2 may be no, again depending on what you man about experiment and observation. Many sciences have to use observational studies and cannot perform experiments. The public might get angry if meteorologists created a category five hurricane just to see if a hypothesis about it was false. They may get even angrier if it is to test the sufficiency of building materials in Miami. Finally, some statistics don't really need to be "tested," because if their assumptions are met in fact and if their properties are understood, then a test doesn't serve any purpose except to waste resources.
What does not follow is that Bayesian probability can be interpreted as Frequentist and they are not different. This is not remotely true unless you have a sample size that is approaching infinity. In that case, their predictions will match. Subjective beliefs will converge to objective realities and bad objective models will vanish so that the final models will produce identical predictions.
Consider the cookie story in https://stats.stackexchange.com/questions/2272/whats-the-difference-between-a-confidence-interval-and-a-credible-interval
What is important here is to note that in Winstein's first table, the probabilities sum vertically. This is a Frequentist table. In his fourth table, the same data creates probabilities but they are summed horizontally.
Consider the case in Winstein's first table and assume the null hypothesis is that the type of cookie jar is class A. There is a 1% chance of observing a cookie with no chips given that the null is true. Now consider Winstein's fourth table. Bayesian methods have no null hypothesis to condition on. As a result, the Bayesian probability that the class of cookie jar is A given that no chips were in the cookie that was drawn is 1.9%.
These are very different probability statements. The former is Pr(data|model) whereas the latter is Pr(model|data). This is not at all the same thing. The Frequentist is, in a sense, trying to minimize the maximum possible damage that could be done from having no prior information of using a model to make a wrong choice. The Bayesian is trying to minimize the average damage, using all available information, from using a model to make a wrong choice. The Bayesian can do this because the data is choosing the model rather than the other way around as the Frequentist is.
Consider how the two would solve the simple linear regression problem ax+by+cz+d+e=0, where e is some error term, (x,y,z) are random variables and (a,b,c,d) are unknown constants. The Frequentist would assert that the model is the true model, using Fisher's "no effect hypothesis," would assert that a=b=c=0 as the null. If that is rejected, then the estimated parameters would be used. This is not similar to the Bayesian method.
The Bayesian method would calculate the probability that each of the separate following models is true:
- ax+d+e=0
- by+d+e=0
- cz+d+e=0
- d+e=0
- ax+by+d+e=0
- ax+cz+d+e=0
- by+cz+d+e=0
- ax+by+cz+d+e=0
A probability weighting would be applied to each one. Now depending upon the cost or utility function, one would either be chosen, or some or all would be averaged together. This can only be accomplished because a prior distribution over the parameter and model space exists. This prior knowledge allows the researcher to let the data dominate the solution. Indeed, if averaging is used, then the researcher truly averages the risk of loss over the entire set of possible outcomes.
There is another set of issues that make them different. Bayesian probabilities are coherent and Frequentist probabilities are not. Fair gambles can be placed on Bayesian predictions because of this. Frequentist probabilities and statistics cannot be gambled on because a crafty opponent can set up a bookie for a sure loss in all possible outcomes. Conversely, a Bayesian cannot discuss statistical power. There is also no guaranteed coverage against false positives. This is due to the fact that the Bayesian model, where Pr(model|data), is driven by only and exactly this one sample.
Power and guarantees against false positives come from the seeing the sample with respect to the sample space, implying infinite repetition. This is only credible under Pr(data|model). It is a valid model that inherits sample properties from this potentially infinite set of possible outcomes.
Because Bayesian methods work in the parameter space, things such as testing logical assertions or gambling on the results of an outcome are reasonable things to do. Because Frequentist methods work in the parameter space, things such as guaranteeing a minimum protection against false positives and a guaranteed level of power are possible.
Bayesian and Frequentist methods use the same data to answer very different types of questions. Bayesian probabilities cannot be interpreted as Frequencies. Conversely, classical Frequencies cannot be interpreted as probabilities as they are worst-case distributions and not actual distributions.