Jekyll2019-06-06T00:44:06-05:00https://honearyan.github.io/feed.xmlRyan Honea Project Management Analyst at FedEx ExpressRyan Honearyan@honea.infoUpcoming Individual Project2019-01-13T00:00:00-06:002019-01-13T00:00:00-06:00https://honearyan.github.io/posts/2019/01/blog-post-5<p>It’s been a really long time (nearly 9 months!) since I posted on this, but as a part of my ongoing professional development, I’ve decided that 2019 will see me posting weekly at the very least. I’ve been working on various projects behind the scenes, but I’d like to introduce two new projects that I will be working on in the coming future. One will be an independent project and another will be done with a close friend of mine. Without further ado, I’ll introduce the first project.</p>
<h3 id="the-solo-project---ahrimans-moon">The Solo Project - Ahriman’s Moon</h3>
<h4 id="problem-formulation">Problem Formulation</h4>
<p>I recently got married, and while on the honeymoon cruise, my fiancee and I did a lot of reading. One evening, while reading the second book of the Warhammer 40k Ahriman series from the Black Library, I had a vision of a fun stochastic problem for simulation as well as true mathematical analysis. The rules are such:</p>
<ol>
<li>Ahriman wants to reach the center of a moon but must go through a spherical maze with several entrances.</li>
<li>All possible entrances have a guaranteed path to the center as each entrance is connected.</li>
<li>Due to inconsistincies in the warp (a magical mysterious force of the universe), Ahriman can only remember some $m$ moves previously.</li>
<li>With some probability $p$, Ahriman gains warp insight and knows some amount $\rho$ of the shortest path to the center and travels it.</li>
<li>Ahriman never makes the same move on a path twice unless a dead end is reached.</li>
<li>Otherwise, behave as a random walk.</li>
</ol>
<p>We can approach this problem as an $n$-layered sphere of nodes that form circles of points at various rotations. We can ask ourselves several questions about this problem:</p>
<ul>
<li>What is the average normalized path lengths based on $n$, $\rho$, $p$, and $m$?</li>
<li>What is the minimum $p$ or insight needed for various sized $n$ for Ahriman’s expected time to reach the moon less than infinity?</li>
<li>Does this converge to some constant ratio?</li>
</ul>
<h4 id="solving-the-problem">Solving the Problem</h4>
<p>In order to solve this, I will need to separate my objectives into planning increments (because working on a scrum team has contagious effects). Based on how I complete my first objectives, I can determine my next increment with higher accuracy.</p>
<p>By January 27th, 2019, I’d like the following accomplished:</p>
<ul>
<li>Determine the best way to randomly generate a spherical maze with a guaranteed solution</li>
<li>Prototype a 2-dimensional solution to the problem in R or Python (to be determined)</li>
<li>Create an animation of a solved version of the 2-dimensional solution</li>
<li>Have necessary classes created in a C++ implementation for future simulation</li>
</ul>
<h3 id="additional-news">Additional News</h3>
<p>Hopefully, two weeks from now, I’ll have some great updates on the above project. Tomorrow, I’ll be posting details on the joint project with an old friend of mine, Logan Farr.</p>Ryan Honearyan@honea.infoIt’s been a really long time (nearly 9 months!) since I posted on this, but as a part of my ongoing professional development, I’ve decided that 2019 will see me posting weekly at the very least. I’ve been working on various projects behind the scenes, but I’d like to introduce two new projects that I will be working on in the coming future. One will be an independent project and another will be done with a close friend of mine. Without further ado, I’ll introduce the first project.Simulations at a Bar2018-04-11T00:00:00-05:002018-04-11T00:00:00-05:00https://honearyan.github.io/posts/2018/04/blog-post-4<p>I remember when I first entered the mathematics department, I did it specifically because I saw just how excited the professors were about their field. Those first few months were obviously not super exciting as I hadn’t really done any math since high school, and at the time, I had no interest in it nor any interest in continuing it.</p>
<p>Fast forward to last night where I’m up at 1 AM at a bar writing a simulation for fun. My friends and I had just finished a great round of trivia (3rd place and a correct sports question!) when <a href="http://paulmwatkins.com/">Matt</a> asked</p>
<blockquote>
<p>“What would happen if we generated points uniformly and had them connect to their nearest neighbor?”</p>
</blockquote>
<p>After much consideration with one arguing that it would be a cycle and two arguing that it would generate a tree, we decided to simulate it and just see what happens! So, the code to generate this random graph is as follows:</p>
<div class="language-R highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">X</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">();</span><span class="w"> </span><span class="n">X</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="n">Y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">();</span><span class="w"> </span><span class="n">Y</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">Y</span><span class="p">,</span><span class="w"> </span><span class="n">xlim</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">-1.05</span><span class="p">,</span><span class="m">1.05</span><span class="p">),</span><span class="w"> </span><span class="n">ylim</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">-1.05</span><span class="p">,</span><span class="m">1.05</span><span class="p">))</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">2</span><span class="o">:</span><span class="m">1000</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">runif</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">-1</span><span class="p">,</span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">Y</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">runif</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">-1</span><span class="p">,</span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">points</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">],</span><span class="w"> </span><span class="n">Y</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w">
</span><span class="n">min</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10000</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">X</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">j</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">dist_j</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dist</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">],</span><span class="w"> </span><span class="n">Y</span><span class="p">[</span><span class="n">i</span><span class="p">],</span><span class="w"> </span><span class="n">X</span><span class="p">[</span><span class="n">j</span><span class="p">],</span><span class="w"> </span><span class="n">Y</span><span class="p">[</span><span class="n">j</span><span class="p">])</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dist_j</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">min</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">min</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dist_j</span><span class="w">
</span><span class="n">min_x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">X</span><span class="p">[</span><span class="n">j</span><span class="p">]</span><span class="w">
</span><span class="n">min_y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">Y</span><span class="p">[</span><span class="n">j</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">segments</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">],</span><span class="w"> </span><span class="n">Y</span><span class="p">[</span><span class="n">i</span><span class="p">],</span><span class="w"> </span><span class="n">min_x</span><span class="p">,</span><span class="w"> </span><span class="n">min_y</span><span class="p">,</span><span class="w"> </span><span class="n">lwd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Excuse the brute force, but post-trivia isn’t always the time for optimization. So, we observe the below, which is a .gif of 20 different randomly generate trees (Take that Joe!) as well as one static example for easy of viewing.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Twenty Examples</th>
<th style="text-align: center">One Example</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><img src="https://honearyan.github.io/images/RanTree.gif" alt="RanTree" /></td>
<td style="text-align: center"><img src="https://honearyan.github.io/images/ExTree.png" alt="RanTreeEx" /></td>
</tr>
</tbody>
</table>
<p>Once we get here, we could stop, look, and say, “Yep. That’s a tree.” Instead, we start to question different properties of it like</p>
<ol>
<li>What is the average number of leaves if we consider the tree starting at the origin?</li>
<li>What is the expected depth of a tree?</li>
<li>For fun, what if we add colors to the edges! (Maybe these are weights? Maybe it just looks cool?)</li>
</ol>
<p>The first two questions are definitely solvable, but not by what is now two guys at a bar in one evening that both want to go to sleep. That being said, we still added colors for fun.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Twenty Examples</th>
<th style="text-align: center">One Example</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><img src="https://honearyan.github.io/images/RanColorTree.gif" alt="RanColorTree" /></td>
<td style="text-align: center"><img src="https://honearyan.github.io/images/ExColorTree.png" alt="RanColorTreeEx" /></td>
</tr>
</tbody>
</table>
<p>I made this post today because it was an exciting moment. It was a random question that we wanted to approach, and it’s those questions that remind me why I love my field.</p>Ryan Honearyan@honea.infoI remember when I first entered the mathematics department, I did it specifically because I saw just how excited the professors were about their field. Those first few months were obviously not super exciting as I hadn’t really done any math since high school, and at the time, I had no interest in it nor any interest in continuing it.The Secretary Problem Part Two2018-02-22T00:00:00-06:002018-02-22T00:00:00-06:00https://honearyan.github.io/posts/2018/02/blog-post-3<p>Before reading this, it’s important to have an understanding of The Secretary Problem, which is detailed in my previous post <a href="http://honea.info/posts/2018/02/SecretaryProblem/">The Secretary Problem</a>.</p>
<p>Now, suppose that we don’t just rank them from 1 to $n$ where $n$ is the number of candidates. Instead, we generate $n$ samples from a the Standard Normal Distribution. It’s highly likely that all values should be distinct, so the nature of the problem remains the same. By simulating candidates from the Standard Normal, we are able to see the effectiveness of this algorithm in choosing optimal candidates. The asymptotic probability of selecting the optimal candidates is only $\dfrac{1}{e}$, and so it is important to show that this algorithm still selects strong candidates.</p>
<p>From the first post, we know that the optimal $k$ number of candidates to skip is $\lfloor \dfrac{n}{e} \rfloor$, and so we will use that algorithm. This post will be by no means as in depth as the previous, so the code is presented as is.</p>
<div class="language-R highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sims</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">10000</span><span class="w">
</span><span class="n">candidates</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1000</span><span class="w">
</span><span class="n">k</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">floor</span><span class="p">(</span><span class="n">candidates</span><span class="o">/</span><span class="m">2.718</span><span class="p">)</span><span class="w">
</span><span class="n">score</span><span class="w"> </span><span class="o"><-</span><span class="nf">c</span><span class="p">()</span><span class="w">
</span><span class="k">for</span><span class="p">(</span><span class="n">sim_i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">sims</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rnorm</span><span class="p">(</span><span class="n">candidates</span><span class="p">)</span><span class="w">
</span><span class="n">init_best</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="n">k</span><span class="p">])</span><span class="w">
</span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">(</span><span class="n">k</span><span class="m">+1</span><span class="p">)</span><span class="o">:</span><span class="n">candidates</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">score</span><span class="p">[</span><span class="n">sim_i</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">init_best</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">score</span><span class="p">[</span><span class="n">sim_i</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w">
</span><span class="k">break</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">sims</span><span class="p">,</span><span class="w"> </span><span class="n">score</span><span class="p">,</span><span class="w"> </span><span class="n">xlab</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Sim Number"</span><span class="p">,</span><span class="w">
</span><span class="n">ylab</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Candidate Score"</span><span class="p">,</span><span class="w">
</span><span class="n">main</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"The Secretary Problem: Candidate Scoring"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>First, define the number of simulations and candidates, and then initialize a collection of scores. Finally, simulate the Secretary Problem with the chosen skipping algorithm and observe the score of the selected candidate.</p>
<p>The code results in the plot below.</p>
<p><img src="https://honearyan.github.io/images/CandidateScoring.png" alt="Results" /></p>
<p>It’s clear that the majority of candidate are around 3 standard deviations from the mean, which means that most candidates will be in the top 98%. If we include the observations when the manager doesn’t hire anyone, the expected candidate score is approximately 1.94. If we exclude the cases where a candidate is not hired, then we have</p>
<script type="math/tex; mode=display">\mu_{score} = 3.07 \quad \quad \sigma_{score} = .38</script>
<p>In the simulation, the probability of not hiring a candidate at all is $.3689$ which intuitively makes sense because the probability that the best candidate is within the skipped candidates should also be $\dfrac{1}{e}$.</p>
<p>So, in conclusion, the Secretary Problem results in the $\dfrac{n}{e}$ skip algorithm that still results in choosing candidates in the top 99% when candidates are actually chosen, which happens $\dfrac{e-1}{e}$ percent of the time.</p>Ryan Honearyan@honea.infoBefore reading this, it’s important to have an understanding of The Secretary Problem, which is detailed in my previous post The Secretary Problem.The Secretary Problem2018-02-21T00:00:00-06:002018-02-21T00:00:00-06:00https://honearyan.github.io/posts/2018/02/blog-post-2<p>When perusing Cross Validated, I came across someone seeking to simulate the Secretary Problem which was summed up quite nicely by this problem text:</p>
<blockquote>
<p>When applying for a secretarial position, $n$ candidates line up in random order for interviews, where the manager after each interview has to decide to take or decline that candidate. His assessment is qualitative in the sense that at interview $k\geq 2$, he observes only $Y_{k,n}$, the indicator that candidate $k$ is better than candidates $1,\ldots,k−1,$ and his goal is to find a strategy whose probability of selecting the best candidate is close to the maximal value $p_n^*$. It is easy to see that the optimal strategy is to select candidate</p>
<script type="math/tex; mode=display">\tau_{k,n}\stackrel{def}{=}\inf\{l\geq k: Y_{l,n}\} = 1\}\land n</script>
<p>for some $k\in{2,\ldots,n}$. Let $k_n^*$ denote the optimal such $k$. Plot simulation estimates of $k_n^*$ and $p_n^*$ for $n = 5,\ldots, 50,$ and compare with the known asymptotics $k_n^*\sim n/e, p_n^*\sim 1/e$.</p>
</blockquote>
<p>His questions were as follows:</p>
<ol>
<li>Why would you choose to simulate $k_n^*$?</li>
<li>How would you do it?</li>
</ol>
<p>Simulation is a hobby of mine so I decided to take up this question and answer it, the results of which will be here.</p>
<h2 id="why-simulate">Why Simulate?</h2>
<p>Quick research into the problem will reveal that the optimal algorithm to solve this problem is to rank the first $\frac{1}{e}$ applicants, not hire them, and then choose the first candidate who is better than your previous ones. For example, if there are ten candidates each with unique rankings, you might observe the following sequence.</p>
<script type="math/tex; mode=display">2 \quad 3 \quad 6 \quad 5 \quad 4 \quad 8 \quad 9 \quad 10 \quad 1 \quad 7</script>
<p>We observe that in the first $\lfloor \frac{1}{e} \rfloor$ candidates, or first 3 candidates, the max score was a six. So, we check the fourth candidate and observe a 5, so we don’t hire. Then we observe a 4, so we don’t hire. Then we observe a 9, so he hire them and halt the interview process. In this case, we didn’t select the optimal candidate, but we were close.</p>
<hr />
<p>But what if we don’t know this algorithm and we want to determine which $k$ is best? It isn’t immediately intuitive that $k \approx \frac{1}{e}$ is the best number of candidates to skip and so simulating can assist us in finding for which $k$ our probabilility of picking the best candidate is maximized. In the general case, simulations aid in intuition. If we can guess an optimal solution, or perhaps even prove an optimal solution through mathematics, then simulation helps to verify our results.</p>
<h2 id="how-would-you-simulate">How would you simulate?</h2>
<p>This is perhaps my favorite part. I will be using R to write this simulation and go through the steps on how to do this simulation.</p>
<p>We will start with a single simulation for arbitrary $k$, where we want to create 1000 candidates each with distinct rankings.</p>
<p><code class="highlighter-rouge">x <- sample(1:1000, 1000)</code></p>
<p>Now that we have our sample, we want to find the best candidate among the first $k$ candidates, which can be done with</p>
<p><code class="highlighter-rouge">init_best <- max(x[1:k])</code></p>
<p>Next, we begin by looping through the remaining candidates until we find one who is better than the best of our first $k$ candidates. Note that we only simuluate up to $n-1$ candidates for $k$, because it doesn’t make sense to skip every candidate as that always results in no hire.</p>
<div class="language-R highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">(</span><span class="n">k</span><span class="m">+1</span><span class="p">)</span><span class="o">:</span><span class="m">999</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">init_best</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">candidate_score</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w">
</span><span class="n">candidate_num</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">i</span><span class="w">
</span><span class="k">break</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>So, from here, we have recorded the candidates score. We can quickly verify if this candidate is the best candidate by checking if their ranking is the max ranking. Since we have 1000 candidates, we do this with</p>
<div class="language-R highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">candidate_score</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1000</span><span class="p">)</span><span class="w">
</span><span class="n">success</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">success</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1</span><span class="w">
</span></code></pre></div></div>
<p>That is, we record that we successfully chose the best candidate and increment some value that keeps track of that. Using these, we can write a few loops to run this simulation several times which is shown below</p>
<div class="language-R highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sims</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">10000</span><span class="w">
</span><span class="n">p</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">()</span><span class="w">
</span><span class="n">p</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="n">cand_ave</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">()</span><span class="w">
</span><span class="n">cand_ave</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">k</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">2</span><span class="o">:</span><span class="m">999</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">success</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">sims</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sample</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">1000</span><span class="p">,</span><span class="w"> </span><span class="m">1000</span><span class="p">)</span><span class="w">
</span><span class="n">init_best</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="n">k</span><span class="p">])</span><span class="w">
</span><span class="n">candidate_score</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">-1</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">(</span><span class="n">k</span><span class="m">+1</span><span class="p">)</span><span class="o">:</span><span class="m">1000</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">init_best</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">candidate_score</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w">
</span><span class="k">break</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">candidate_score</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1000</span><span class="p">)</span><span class="w">
</span><span class="n">success</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">success</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">p</span><span class="p">[</span><span class="n">k</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">success</span><span class="o">/</span><span class="n">sims</span><span class="w">
</span><span class="n">cat</span><span class="p">(</span><span class="n">k</span><span class="p">,</span><span class="w"> </span><span class="s2">" complete \n"</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">999</span><span class="p">,</span><span class="w"> </span><span class="n">p</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"l"</span><span class="p">,</span><span class="w"> </span><span class="n">xlab</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Candidates Skipped"</span><span class="p">,</span><span class="w">
</span><span class="n">ylab</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Probability of selecting Best Candidate"</span><span class="p">,</span><span class="w">
</span><span class="n">main</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"The Secretary Problem"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>So, we create a collection of probabilities so that later on, we can plot these values. We also initialize candidate_score at -1 before each loop so as to signify the case where we end up not hiring anyone. After running this simulation 10000 times for each $k$, the results are as follows</p>
<p>Within this simulation, we find that the $k$ value that maximizes $p$ is
$k = 332$ or $k = 374$ with $p = .3789$.</p>
<p>We know that the asymptotic $k_n^*$ and $p_n^*$ values are $\dfrac{n}{e}$ and $\dfrac{1}{e}$ respectively and these results are fairly close to those values which would be in this case</p>
<script type="math/tex; mode=display">k_{1000} \approx 369 \quad \quad p_{1000} \approx .369</script>
<p>So the simulation approximately confirms the asymptotic values.</p>Ryan Honearyan@honea.infoWhen perusing Cross Validated, I came across someone seeking to simulate the Secretary Problem which was summed up quite nicely by this problem text:High Lambda Poisson approaches Normal Distribution2017-08-31T00:00:00-05:002017-08-31T00:00:00-05:00https://honearyan.github.io/posts/2017/08/blog-post-1<p>In my Generalized Linear Model class, we were tasked with proving that with high $\lambda$, the Poisson distribution approaches the standardized normal distribution. I went ahead and wrote up the proof for the extra credit and really enjoyed doing it. I’ll note that converting the exponential to a series was not my idea and had a little Stack Exchange inspiration. It reminded me how useful series expansions were, so that will hopefully prove useful in future research.</p>
<h3 id="the-theorem">The Theorem</h3>
<p>The limiting distribution of the Poisson$(\lambda)$ distribution as $\lambda \rightarrow \infty$ is normal.</p>
<h3 id="the-proof">The Proof</h3>
<p>Let $X \sim Poisson(\lambda)$ which has the probability mass function</p>
<script type="math/tex; mode=display">f_{X}(x) = \frac{\lambda^xe^{-\lambda}}{x!} \quad \quad x = 0,1,2,...</script>
<p>and moment generating function</p>
<script type="math/tex; mode=display">\text{M}_X(t) = e^{\lambda(e^t - 1)}</script>
<p>We will specifically consider the standardized Poisson random variable $X$</p>
<script type="math/tex; mode=display">\frac{X-\lambda}{\sqrt{\lambda}}</script>
<p>which has the Moment Generating Function</p>
<script type="math/tex; mode=display">\text{M}_{(X-\lambda)/\sqrt{\lambda}}(t) = \text{E}\left[\exp\left({t*\frac{X-\lambda}{\sqrt{\lambda}}}\right)\right]\\
=\exp(-t\sqrt{\lambda})*\text{E}\left[\exp\left(\frac{tX}{\sqrt{\lambda}}\right)\right]\\
=\exp\left(-t\sqrt{\lambda}+\lambda(e^{t/\sqrt{\lambda}} - 1)\right)</script>
<p>Now, we take the limit as $\lambda$ approaches $\infty$ and utilize the taylor series expansion</p>
<script type="math/tex; mode=display">e^{t/\sqrt{\lambda}} = 1 + t\lambda^{-1/2} + \frac{t^2\lambda^{-1}}{2!} + \frac{t^3\lambda^{-3/2}}{3!} + ... + \frac{t^n\lambda^{-n/2}}{n!}</script>
<p>Therefore, we have</p>
<script type="math/tex; mode=display">\exp\left(-t\sqrt{\lambda}+\lambda(e^{t/\sqrt{\lambda}} - 1)\right)\\
=\exp\left(-t\sqrt{\lambda}+\lambda \left[ 1 + t\lambda^{-1/2} + \frac{t^2\lambda^{-1}}{2!} + \frac{t^3\lambda^{-3/2}}{3!} + ... + \frac{t^n\lambda^{-n/2}}{n!} - 1\right] \right)\\
= \exp\left(-t\sqrt{\lambda} + t\sqrt{\lambda} + \frac{t^2}{2} + \frac{t^3\lambda^{-1/2}}{6} + ... + \frac{t^n\alpha^{-(n-1)/2}}{n!}
\right)</script>
<p>To find the limiting moment generating function, we take the limit of this moment generating function as $\lambda \rightarrow \infty$ which results in</p>
<script type="math/tex; mode=display">\exp\left(\frac{t^2}{2}\right)</script>
<p>which is the moment generating function of N$(0,1)$.</p>Ryan Honearyan@honea.infoIn my Generalized Linear Model class, we were tasked with proving that with high $\lambda$, the Poisson distribution approaches the standardized normal distribution. I went ahead and wrote up the proof for the extra credit and really enjoyed doing it. I’ll note that converting the exponential to a series was not my idea and had a little Stack Exchange inspiration. It reminded me how useful series expansions were, so that will hopefully prove useful in future research.https://honearyan.github.io/2019-02-06-blog-post-6<p>I didn’t update the latest project in the amount of time I had planned on doing it due to, quite honestly, forgetting to do it because of work. I also realized that while I used to enjoy programming in R, and might still, I’d much prefer approaching this from the very start in C++. So, I’d like to go through some of my thoughts during the initial development of this project.</p>
<h3 id="ahrimans-moon-class-formulation">Ahriman’s Moon: Class Formulation</h3>
<h4 id="graphical-representation">Graphical Representation</h4>
<p>I want to begin by approaching this from a 2-dimensional view. We can construct a series of concentric circles of nodes quite easily. If we were to just want coordinates, then we could do the following for 25 nodes a circle with 3 circles.</p>
<pre><code class="language-C++">vector<double> x;
vector<double> y;
float xVal = 0.0;
float yVal = 0.0;
int nodes = 25;
int circles = 3;
x.push_back(xVal);
y.push_back(yVal);
for (int i = 0; i < circles; i++) {
for (int j = 0; j < nodes; j++) {
xVal = (i+1.0) * cos((2.0*M_PI/nodes) * j);
yVal = (i+1.0) * sin((2.0*M_PI/nodes) * j);
x.push_back(xVal);
y.push_back(yVal);
}
}
</code></pre>
<p>We can print these for verification, but that’s not necessary. Below are the results in gnuplot. Note the usage of</p>
<p>$\cos\left(\frac{2\pi}{#\text{nodes}<em>j}\right)$
and
$\sin\left(\frac{2\pi}{#\text{nodes}</em>j}\right)$</p>
<p>which provide the mathematical formulation of a circle. $i$ is used to specify the radius of the circle, where $j$ creates all the possile nodes by going from $0$ to $2\pi$, effectively.</p>
<p><img src="https://honearyan.github.io/images/Ahriman_Circles.png" alt="Circles" /></p>
<p>So, we have a visual of circles of nodes in 2-dimensions. That being said, it isn’t very useful to have this picture. We need an actual graph, with actual connections between these nodes.</p>
<h4 id="class-construction">Class Construction</h4>
<p>We can start with the basics. What do we need in each node as per our initial requirements and in order for us to be able to properly access the next ones?</p>
<p>Well, if we want to calculate any form of spacial distance, we need an $x$-coordinate and a $y$-coordinate. So, hypothetically, we could access each node by their coordinate! But, these coordinates can get nasty as well as have issues from rounding error with slight changes programmatically. Instead, I propose the following:</p>
<pre><code class="language-C++">class Node {
private:
double _xCoor;
double _yCoor;
int _ringNo;
int _nodeId;
vector<Node*> _connList;
}
</code></pre>
<p>The above provides the coordinate variables as well as</p>
<ul>
<li>Ring No: So we can identify a node by the ring it is in</li>
<li>Node ID: So we can identify where on the ring it is, starting from the 0th radian.</li>
<li>Connection List: A vector of pointers to nodes that the current node is connected to.</li>
</ul>
<p>That’s it for now! More to follow soon.</p>Ryan Honearyan@honea.info