update monty

pglpm · Oct 18, 2024 · 7c3b55c · 7c3b55c
1 parent 958bae6
commit 7c3b55c
Show file tree

Hide file tree

Showing 3 changed files with 5 additions and 6 deletions.
diff --git a/docs/monty.html b/docs/monty.html
@@ -634,7 +634,7 @@ <h2 data-number="10.1" class="anchored" data-anchor-id="sec-monty-motivation"><s
 </div>
 <div class="callout-body-container callout-body">
 <p>As an example of our intuition can be completely astray in problems involving many data dimensions, consider the following fact.</p>
-<p>Take a one-dimensional Gaussian distribution of probability. You probably know that the probability that a data point is within three standard deviations from the peak is approximately <span style="display:inline-block;"><span class="math inline">\(99.73\%\)</span>.</span> If we take a two-dimensional (symmetric) Gaussian distribution, the probability that a data point (two real numbers) is within three standard deviations from the peak is <span style="display:inline-block;"><span class="math inline">\(98.89\%\)</span>:</span> slightly less than the one-dimensional case. For a three-dimensional Gaussian, the probability the analogous probability is <span style="display:inline-block;"><span class="math inline">\(97.07\%\)</span>:</span> slightly smaller yet.</p>
+<p>Take a one-dimensional Gaussian distribution of probability. You probably know that the probability that a data point is within three standard deviations from the peak is approximately <span style="display:inline-block;"><span class="math inline">\(99.73\%\)</span>.</span> If we take a two-dimensional (symmetric) Gaussian distribution, the probability that a data point (two real numbers) is within three standard deviations from the peak is <span style="display:inline-block;"><span class="math inline">\(98.89\%\)</span>,</span> slightly less than the one-dimensional case. For a three-dimensional Gaussian, the analogous probability is <span style="display:inline-block;"><span class="math inline">\(97.07\%\)</span>,</span> slightly smaller yet.</p>
 <p>Now try to answer this question: for a <em>100-dimensional</em> Gaussian, what is the probability that a data point is within three standard deviations from the peak? The answer is <span style="display:inline-block;"><span class="math inline">\(\boldsymbol{(1.83 \cdot 10^{-32})\%}\)</span>.</span> This probability is so small that you would never observe a data point within three standard deviations from the peak, even if you checked one data point every second for the same duration as the present age of the universe – which is “only” around <span style="display:inline-block;"><span class="math inline">\(4\cdot 10^{17}\)</span></span> seconds.</p>
 </div>
 </div>
@@ -953,7 +953,6 @@ <h2 data-number="10.6" class="anchored" data-anchor-id="sec-monty-solution"><spa
 </section>
 <section id="sec-monty-remarks" class="level2" data-number="10.7">
 <h2 data-number="10.7" class="anchored" data-anchor-id="sec-monty-remarks"><span class="header-section-number">10.7</span> Remarks on the use of Bayes’s theorem</h2>
-<p>The previous calculations may have been somewhat boring. But, again, our purpose was to see with our own eyes that the final result comes from the application of the four fundamental laws of inference to the initial probabilities – and from nothing else.</p>
 <p>You notice that at several points our calculations could have taken a different path. For instance, in order to find <span style="display:inline-block;"><span class="math inline">\(\mathrm{P}(\mathsfit{\small car1} \nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{\small host2} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{\small you1} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K})\)</span></span> we applied Bayes’s theorem to swap the sentences <span style="display:inline-block;"><span class="math inline">\(\mathsfit{\small car1}\)</span></span> and <span style="display:inline-block;"><span class="math inline">\(\mathsfit{\small host2}\)</span></span> in their proposal and conditional positions. Couldn’t we have swapped <span style="display:inline-block;"><span class="math inline">\(\mathsfit{\small car1}\)</span></span> and <span style="display:inline-block;"><span class="math inline">\(\mathsfit{\small host2}\land \mathsfit{\small you1}\)</span></span> instead? That is, couldn’t we have made a calculation starting with</p>
 <p><span class="math display">\[
 \mathrm{P}(\mathsfit{\small car1} \nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{\small host2} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{\small you1} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K})

diff --git a/docs/search.json b/docs/search.json
@@ -592,7 +592,7 @@
     "href": "monty.html#sec-monty-motivation",
     "title": "10  Monty Hall and related inference problems",
     "section": "",
-    "text": "Suppose you are on a game show and given a choice of three doors. Behind one is a car; behind the others are goats. You pick door No. 1, and the host, who knows what is behind them [and wouldn’t open the door with the car], opens No. 2, which has a goat. He then asks if you want to pick No. 3. Should you switch?\n\n\n\n\n\n\nWe want to be able to implement or encode the procedure algorithmically in an AI agent.\nWe generally cannot ground inferences on intuition. Intuition is shaky ground, and hopeless in data-science problems involving millions of data with thousands of numbers in abstract spaces of thousands of dimensions. To solve such complex problems we need to use a more mechanical procedure, a procedure mathematically guaranteed to be self-consistent. That’s the probability calculus. Intuition is only useful for arriving at a method which we can eventually prove, by mathematical and logical means, to be correct; or for approximately explaining a method that we already know, again by mathematical and logical means, to be correct.\n\n\n\n\n\n\n\n Misleading intuition in high dimensions\n\n\n\nAs an example of our intuition can be completely astray in problems involving many data dimensions, consider the following fact.\nTake a one-dimensional Gaussian distribution of probability. You probably know that the probability that a data point is within three standard deviations from the peak is approximately \\(99.73\\%\\). If we take a two-dimensional (symmetric) Gaussian distribution, the probability that a data point (two real numbers) is within three standard deviations from the peak is \\(98.89\\%\\): slightly less than the one-dimensional case. For a three-dimensional Gaussian, the probability the analogous probability is \\(97.07\\%\\): slightly smaller yet.\nNow try to answer this question: for a 100-dimensional Gaussian, what is the probability that a data point is within three standard deviations from the peak? The answer is \\(\\boldsymbol{(1.83 \\cdot 10^{-32})\\%}\\). This probability is so small that you would never observe a data point within three standard deviations from the peak, even if you checked one data point every second for the same duration as the present age of the universe – which is “only” around \\(4\\cdot 10^{17}\\) seconds.\n\n\n\n\n\n\n\n\n\n\n For the extra curious\n\n\n\nFor further examples of how our intuition leads us astray in high dimensions see\n\nCounterintuitive Properties of High Dimensional Space\nExercise 2.20 (and its solution) in Information Theory, Inference, and Learning Algorithms\n\n\n\n\n\n\n\n\n\n\n Exercise\n\n\n\nExamine what your intuition tells you the answer should be, without spending too much time thinking, just as if you were on the game show. Examine which kind of heuristics your intuition uses. If you already know the solution to this puzzle, try to remember what your intuition told you the first time you faced it. Keep your observations in mind for later on.",
+    "text": "Suppose you are on a game show and given a choice of three doors. Behind one is a car; behind the others are goats. You pick door No. 1, and the host, who knows what is behind them [and wouldn’t open the door with the car], opens No. 2, which has a goat. He then asks if you want to pick No. 3. Should you switch?\n\n\n\n\n\n\nWe want to be able to implement or encode the procedure algorithmically in an AI agent.\nWe generally cannot ground inferences on intuition. Intuition is shaky ground, and hopeless in data-science problems involving millions of data with thousands of numbers in abstract spaces of thousands of dimensions. To solve such complex problems we need to use a more mechanical procedure, a procedure mathematically guaranteed to be self-consistent. That’s the probability calculus. Intuition is only useful for arriving at a method which we can eventually prove, by mathematical and logical means, to be correct; or for approximately explaining a method that we already know, again by mathematical and logical means, to be correct.\n\n\n\n\n\n\n\n Misleading intuition in high dimensions\n\n\n\nAs an example of our intuition can be completely astray in problems involving many data dimensions, consider the following fact.\nTake a one-dimensional Gaussian distribution of probability. You probably know that the probability that a data point is within three standard deviations from the peak is approximately \\(99.73\\%\\). If we take a two-dimensional (symmetric) Gaussian distribution, the probability that a data point (two real numbers) is within three standard deviations from the peak is \\(98.89\\%\\), slightly less than the one-dimensional case. For a three-dimensional Gaussian, the analogous probability is \\(97.07\\%\\), slightly smaller yet.\nNow try to answer this question: for a 100-dimensional Gaussian, what is the probability that a data point is within three standard deviations from the peak? The answer is \\(\\boldsymbol{(1.83 \\cdot 10^{-32})\\%}\\). This probability is so small that you would never observe a data point within three standard deviations from the peak, even if you checked one data point every second for the same duration as the present age of the universe – which is “only” around \\(4\\cdot 10^{17}\\) seconds.\n\n\n\n\n\n\n\n\n\n\n For the extra curious\n\n\n\nFor further examples of how our intuition leads us astray in high dimensions see\n\nCounterintuitive Properties of High Dimensional Space\nExercise 2.20 (and its solution) in Information Theory, Inference, and Learning Algorithms\n\n\n\n\n\n\n\n\n\n\n Exercise\n\n\n\nExamine what your intuition tells you the answer should be, without spending too much time thinking, just as if you were on the game show. Examine which kind of heuristics your intuition uses. If you already know the solution to this puzzle, try to remember what your intuition told you the first time you faced it. Keep your observations in mind for later on.",
     "crumbs": [
       "[**Inference I**]{.green}",
       "<span class='chapter-number'>10</span>  <span class='chapter-title'>[Monty Hall and related inference problems]{.green}</span>"
@@ -658,7 +658,7 @@
     "href": "monty.html#sec-monty-remarks",
     "title": "10  Monty Hall and related inference problems",
     "section": "10.7 Remarks on the use of Bayes’s theorem",
-    "text": "10.7 Remarks on the use of Bayes’s theorem\nThe previous calculations may have been somewhat boring. But, again, our purpose was to see with our own eyes that the final result comes from the application of the four fundamental laws of inference to the initial probabilities – and from nothing else.\nYou notice that at several points our calculations could have taken a different path. For instance, in order to find \\(\\mathrm{P}(\\mathsfit{\\small car1} \\nonscript\\:\\vert\\nonscript\\:\\mathopen{} \\mathsfit{\\small host2} \\mathbin{\\mkern-0.5mu,\\mkern-0.5mu}\\mathsfit{\\small you1} \\mathbin{\\mkern-0.5mu,\\mkern-0.5mu}\\mathsfit{K})\\) we applied Bayes’s theorem to swap the sentences \\(\\mathsfit{\\small car1}\\) and \\(\\mathsfit{\\small host2}\\) in their proposal and conditional positions. Couldn’t we have swapped \\(\\mathsfit{\\small car1}\\) and \\(\\mathsfit{\\small host2}\\land \\mathsfit{\\small you1}\\) instead? That is, couldn’t we have made a calculation starting with\n\\[\n\\mathrm{P}(\\mathsfit{\\small car1} \\nonscript\\:\\vert\\nonscript\\:\\mathopen{} \\mathsfit{\\small host2} \\mathbin{\\mkern-0.5mu,\\mkern-0.5mu}\\mathsfit{\\small you1} \\mathbin{\\mkern-0.5mu,\\mkern-0.5mu}\\mathsfit{K})\n=\\frac{\n\\mathrm{P}(\\mathsfit{\\small host2} \\mathbin{\\mkern-0.5mu,\\mkern-0.5mu}\\mathsfit{\\small you1}\\nonscript\\:\\vert\\nonscript\\:\\mathopen{} \\mathsfit{\\small car1} \\mathbin{\\mkern-0.5mu,\\mkern-0.5mu}\\mathsfit{K}) \\cdot\n\\mathrm{P}(\\mathsfit{\\small car1} \\nonscript\\:\\vert\\nonscript\\:\\mathopen{} \\mathsfit{K})\n}{\\dotso} \\enspace ?\n\\]\nafter all, this is also a legitimate application of Bayes’s theorem.\nThe answer is: yes, we could have, and the final result would have been the same. The self-consistency of the probability calculus guarantees that there are no “wrong steps”, as long as every step is an application of one of the four fundamental rules (or of their shortcuts). The worst that can happen is that we take a longer route – but to exactly the same result. In fact it’s possible that there’s a shorter calculation route to arrive at the probabilities that we found in the previous section. But it doesn’t matter, because it would lead to the same result that we found.",
+    "text": "10.7 Remarks on the use of Bayes’s theorem\nYou notice that at several points our calculations could have taken a different path. For instance, in order to find \\(\\mathrm{P}(\\mathsfit{\\small car1} \\nonscript\\:\\vert\\nonscript\\:\\mathopen{} \\mathsfit{\\small host2} \\mathbin{\\mkern-0.5mu,\\mkern-0.5mu}\\mathsfit{\\small you1} \\mathbin{\\mkern-0.5mu,\\mkern-0.5mu}\\mathsfit{K})\\) we applied Bayes’s theorem to swap the sentences \\(\\mathsfit{\\small car1}\\) and \\(\\mathsfit{\\small host2}\\) in their proposal and conditional positions. Couldn’t we have swapped \\(\\mathsfit{\\small car1}\\) and \\(\\mathsfit{\\small host2}\\land \\mathsfit{\\small you1}\\) instead? That is, couldn’t we have made a calculation starting with\n\\[\n\\mathrm{P}(\\mathsfit{\\small car1} \\nonscript\\:\\vert\\nonscript\\:\\mathopen{} \\mathsfit{\\small host2} \\mathbin{\\mkern-0.5mu,\\mkern-0.5mu}\\mathsfit{\\small you1} \\mathbin{\\mkern-0.5mu,\\mkern-0.5mu}\\mathsfit{K})\n=\\frac{\n\\mathrm{P}(\\mathsfit{\\small host2} \\mathbin{\\mkern-0.5mu,\\mkern-0.5mu}\\mathsfit{\\small you1}\\nonscript\\:\\vert\\nonscript\\:\\mathopen{} \\mathsfit{\\small car1} \\mathbin{\\mkern-0.5mu,\\mkern-0.5mu}\\mathsfit{K}) \\cdot\n\\mathrm{P}(\\mathsfit{\\small car1} \\nonscript\\:\\vert\\nonscript\\:\\mathopen{} \\mathsfit{K})\n}{\\dotso} \\enspace ?\n\\]\nafter all, this is also a legitimate application of Bayes’s theorem.\nThe answer is: yes, we could have, and the final result would have been the same. The self-consistency of the probability calculus guarantees that there are no “wrong steps”, as long as every step is an application of one of the four fundamental rules (or of their shortcuts). The worst that can happen is that we take a longer route – but to exactly the same result. In fact it’s possible that there’s a shorter calculation route to arrive at the probabilities that we found in the previous section. But it doesn’t matter, because it would lead to the same result that we found.",
     "crumbs": [
       "[**Inference I**]{.green}",
       "<span class='chapter-number'>10</span>  <span class='chapter-title'>[Monty Hall and related inference problems]{.green}</span>"

diff --git a/monty.qmd b/monty.qmd
@@ -25,7 +25,7 @@ The web is full of insightful intuitive solutions and of informal probability di
 ## {{< fa exclamation-triangle >}} Misleading intuition in high dimensions
 As an example of our intuition can be completely astray in problems involving many data dimensions, consider the following fact.
 
-Take a one-dimensional Gaussian distribution of probability. You probably know that the probability that a data point is within three standard deviations from the peak is approximately $99.73\%$. If we take a two-dimensional (symmetric) Gaussian distribution, the probability that a data point (two real numbers) is within three standard deviations from the peak is $98.89\%$: slightly less than the one-dimensional case. For a three-dimensional Gaussian, the probability the analogous probability is $97.07\%$: slightly smaller yet.
+Take a one-dimensional Gaussian distribution of probability. You probably know that the probability that a data point is within three standard deviations from the peak is approximately $99.73\%$. If we take a two-dimensional (symmetric) Gaussian distribution, the probability that a data point (two real numbers) is within three standard deviations from the peak is $98.89\%$, slightly less than the one-dimensional case. For a three-dimensional Gaussian, the analogous probability is $97.07\%$, slightly smaller yet.
 
 Now try to answer this question: for a *100-dimensional* Gaussian, what is the probability that a data point is within three standard deviations from the peak? The answer is $\boldsymbol{(1.83 \cdot 10^{-32})\%}$. This probability is so small that you would never observe a data point within three standard deviations from the peak, even if you checked one data point every second for the same duration as the present age of the universe -- which is "only" around $4\cdot 10^{17}$ seconds.
 :::
@@ -365,7 +365,7 @@ Note that we found these probabilities, and solved the Monty Hall problem, just
 
 ## Remarks on the use of Bayes's theorem {#sec-monty-remarks}
 
-The previous calculations may have been somewhat boring. But, again, our purpose was to see with our own eyes that the final result comes from the application of the four fundamental laws of inference to the initial probabilities -- and from nothing else.
+<!-- The previous calculations may have been somewhat boring. But, again, our purpose was to see with our own eyes that the final result comes from the application of the four fundamental laws of inference to the initial probabilities -- and from nothing else. -->
 
 You notice that at several points our calculations could have taken a different path. For instance, in order to find $\P(\car{1} \| \host{2} \and \you{1} \and \yH)$ we applied Bayes's theorem to swap the sentences $\car{1}$ and $\host{2}$ in their proposal and conditional positions. Couldn't we have swapped $\car{1}$ and $\host{2}\land \you{1}$ instead? That is, couldn't we have made a calculation starting with