a) Entropy is not about probabilities (although probabilities might be used to calculate it). In simple terms, entropy is a quantity that measures the potential of disorder. It is not a direct measure of disorder. The fact that low entropy implies high predictability is due to the reduction of the potential disorder, not to the reduction of disorder. At S=0, there is no potential for disorder, so, no disorder.
b) Entropy is a macrostatic quantity. Memorize that. That is: it is a descriptor of facts of perception, that is, ideas, sensations (for example, temperature is a macrostatic quantity: it is a feeling, which (thanks, 0th Law, sorry you didn't were defined first), it can be assessed as a physical quantity).
c) While macrostatic facts are related to ideas, microstatic facts are those related with physical phenomena. For example, microstatically, temperature is a quantity proportional to the statistical kinetic energy load of each molecule in a gas container.
d) Statistical entropy (S=k ln $ \omega $) describes the information carried for some macroscopic state:
S=ln2(64)=6
That is, in order to carry 64 states, 6 bits are necessary (logarithm base 2). So, in this example, the entropy of a system which can be in 64 states is 6.
d) Contrary to entropy, information is a microstatic quantity. Following the previous example, 111000 is the information that corresponds for a particular microstate of the system. Although this information is effectively carried in 6 bits, that is not necessarily the same of the entropy of the system. For example, Gibbs entropy can have a different value for the same 6 bits of information.
e) Thermodynamic entropy is equivalent to statistical entropy: it is a macrostatic quantity that measures the potential of disorder of a particular microstate. Perhaps the best interpretation here is to say that thermodynamic entropy is the measure of energetic order.
f) The big difference between thermodynamic and statistical entropy is that the latter allows measuring each microstate, at the cost of addressing systems always as discrete entities.
In addition to know that the entropy of a 6-bits message is 6, each message (the information) can also be obtained: 111000. Statistical entropy is mostly used to analyze fixed-size messages (but yes, it also allows addressing messages of different sizes). That is, statistical entropy is mostly used not to measure disorder, but how to assess the performance of a process at different values of disorder.
But the for the former, thermodynamic entropy, microstates cannot be measured. How would we know the actual configuration of X molecules having a value of entropy of S=0.nnnnn...? In addition, the absolute value of entropy of any substance cannot be measured directly, the only measurable value is dS=dQ/T, that is, change. So, thermodynamic entropy is mostly used to measure disorder as such.
Don't forget that S=0 is a convention for zero temperature, which experimentally shows that mass tends to order at 0∘K.