<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Many But Finite</title>
  <subtitle>Tech and science for curious people.</subtitle>
  <link href="https://manybutfinite.com/feed.xml" rel="self"/>
  
  <link href="https://manybutfinite.com/"/>
  <updated>2020-06-12T17:36:02.365Z</updated>
  <id>https://manybutfinite.com/</id>
  
  <author>
    <name>Gustavo Duarte</name>
    <email>manybutfinite@duartes.org</email>
  </author>
  
  <generator uri="http://hexo.io/">Hexo</generator>
  
  <entry>
    <title>Understanding and Visualizing Covid Growth in the US</title>
    <link href="https://manybutfinite.com/post/visualizing-covid-growth/"/>
    <id>https://manybutfinite.com/post/visualizing-covid-growth/</id>
    <updated>2020-06-12T17:30:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>This post takes a look at Covid data with a particular focus on the number of
new daily cases and the growth (or reduction) of those daily cases over time. If
this were physics, we’d be looking at speed and acceleration, rather than the
total distance traveled.  I won’t try to convince you of anything, but rather
just try to build an understanding of where we’ve been, where we are, and what
to expect in the next few months.</p>
<p>Let’s start with the growth in daily cases for US states since March 10th, for
states reporting at least 20 cases:</p>
<div class="wideViz" id="nytStatesCaseGrowth7DayByDay"></div>
<p>Each dot represents the <em>growth</em> in the <em>number of new daily cases</em> for a US
state on a given day.  I discuss methodology further at the end of this post if
you’re interested. <sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup></p>
<p>We can clearly see a few crucial trends in this chart. Growth was furious for
all states in mid-March (20% daily growth means doubling in 3.8 days, as you’ve
surely heard) and showed a lot of variance.  Then nearly all states issued
stay-at-home orders between March 23rd and April 3rd.  <sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup> These
orders, no doubt coupled with some amount of anxiety and precautions from the
population, quickly reduced growth rates, which were clustered around 0% by
mid-April. This was a significant accomplishment. Sadly, we were unable to
improve from there, and never brought growth figures consistently or
substantially below zero. Here’s the same data seen by week in a slightly
different way:</p>
<div class="wideViz" id="nytStatesByCaseGrowthByWeek"></div>
<p>We started out red-hot and worsening in mid-March, but that gave way to slower
growth and calmer colors. The initial success was followed by stagnation, and
slight worsening in the last two weeks. Let’s look at our nationwide figures:</p>
<div class="wideViz" id="cases-ecdc-usa"></div>
<p>New daily cases peaked on April 10 in the US at about 32,000 cases/day. They
have since fallen to 21,500 cases/day. <sup class="footnote-ref"><a href="#fn1" id="fnref1:1">[1:1]</a></sup> Growth peaked at 40% on
March 24, shortly before the lockdowns started, then fell sharply hitting 0% on
April 15.</p>
<p>Now consider this: we had about 7,000 cases/day on March 25, as we headed into
lockdowns, and we have 21,500 cases/day now, as we are leaving them. That might
feel a little disheartening. What happened? Was there any point to this whole
thing? Did we just destroy countless jobs, businesses, and dreams for no good
reason?</p>
<p>There are three good answers here. The first is that the precipitous fall in
growth brought about by the lockdowns was a major win that probably averted
total disaster. However, unless you look at a plot of growth rates, or at least
look at daily cases and appreciate the trend, this win is somewhat hidden.
I hope the charts so far have done a decent job of showing this aspect of our
journey.</p>
<p>The second is that the lockdowns were indeed somewhat pointless. Not because
they are inherently so, but <em>because we’ve done a bad job</em> and failed to
significantly bring down case numbers while we had a perfect opportunity to do
it. We bought the lockdown with trillions of dollars and untold sacrifice, and
then squandered it.</p>
<p>The third answer is that we have to consider states separately to really analyze
the situation, because national data is just too blunt. States had varying
levels of success and peaked at different times, and to understand what worked
we need to factor that in.</p>
<p>Let’s look at what other countries achieved with their lockdowns:</p>
<div id="ecdcCases" class="flexCases wideViz"></div>
<p>Those curves show the kind of drastic reduction in the number of daily cases
that well-organized societies can achieve. They are able to push growth
significantly below zero and keep it there long enough to bring case numbers
down an order of magnitude or more. A smaller outbreak is then more amenable to
containment by well-design policies while economic and social activity is
restored.</p>
<p>Let’s look at more countries for better context. Here are the ten countries most
successful at containing the pandemic from a peak of at least 70 cases:</p>
<div class="wideViz" id="ecdcGreatestCaseReductions"></div>
<p>I have excluded China from the list due to controversies around their data. They
would have been 4th place with a 99.8% reduction from a peak of 4,687 cases/day.
We see some islands in there, some smaller populations, and also small peaks.
It’s worth pointing out that neither islandness nor a small population are any
guarantees, as the history of smallpox in Iceland can attest.
<sup class="footnote-ref"><a href="#fn3" id="fnref3">[3]</a></sup> Still, countries like Switzerland and Austria vanquished
pretty large outbreaks and are not islands last I checked.  Social cohesion and
good policies seem like the overriding factors.  But let’s look at a more
diverse group of places:</p>
<div class="wideViz" id="ecdcPeakBars"></div>
<p>Sweden is the only wealthy country in this list doing worse than the US.  This
was not cherry picked: that remains true when you look at the whole world, where
the US ranks 62nd by this metric. In the last week Sweden’s top epidemiologist
has admitted mistakes in their strategy. <sup class="footnote-ref"><a href="#fn4" id="fnref4">[4]</a></sup>
<sup class="footnote-ref"><a href="#fn5" id="fnref5">[5]</a></sup> However, the overall number of infections is low in Sweden,
and their growth has been kept mostly in check, never spiraling out of control.
They are a highly conscientious society that took a daring (and often
misrepresented) approach  with a clear understanding of the trade-offs involved.</p>
<p>The situation in the bottom countries is catastrophically different. They all
have strong growth of already sizable outbreaks, with Brazil in an especially
dire situation, no doubt the worst in the world, having recently overtaken the
US for the top spot in daily cases amid continued growth. Their president is now
attempting to censor Covid numbers, and it’s possible Brazilian data will no longer
be reliable over the next few weeks. <sup class="footnote-ref"><a href="#fn6" id="fnref6">[6]</a></sup></p>
<p>Even if we ignore any mistakes made before mid-March, it is clear from this data
that the US has not done a great job containing the pandemic. Despite remaining
in a fairly strict lockdown for weeks, we performed worse than all but one rich
nation in reducing case numbers.  But let’s not yet worry about whether we’re
a <a href="https://www.theatlantic.com/magazine/archive/2020/06/underlying-conditions/610261/" target="_blank" rel="noopener">failed
state</a>
or have been made great again.</p>
<p>After all, the US is a large and heterogeneous place, and looking at national
aggregate data obscures a lot of the story. States like Alaska, Montana and
Wyoming never had more than 25 cases/day, while New York reached 9900 cases/day,
a peak greater than every nation’s except for Brazil and Russia. Having seen
what other countries look like, here is what happened in US states with a peak
of at least 70 cases/day:</p>
<div class="wideViz" id="nytStatesPeakBars"></div>
<p>A handful of states managed substantial reductions in daily cases, including New
York, which had by far the largest outbreak in the US.  That’s cup half full.
Still, at 91.3% decrease New York is behind most developed countries.  It is
striking that <em>none</em> of our states have managed to do as well as Spain, Italy,
or Germany when it comes to reducing case numbers.</p>
<p>And then there are the states at the bottom of this list. When you see 0% that
means no reduction: these states are <em>currently</em> at their historical maximum and
growing, and we don’t know when and where they’ll peak.</p>
<p>Keep in mind the decreases in the chart above show the reduction in each state’s
daily cases measured against <em>its own peak</em>.  To get an idea of how states
changed since the <em>national</em> peak, and how the outbreak decreased in some areas
and increased in others, here are the most substantial deltas in daily cases by
state since the US peaked on April 10th:</p>
<div class="wideViz" id="usStatesCaseDeltasSinceApril10"></div>
<p>Since we peaked nationally on April 10, we have reduced daily cases by about
10,500/day, with most of the reduction coming from New York (9,000 cases/day)
and New Jersey (3,000 cases/day). It might strike you as odd that the national
decrease (10,500 cases/day) is smaller than the decrease from just New York and
New Jersey (a combined 12,000 cases/day). And sure enough, if we exclude those
two states, <em>daily cases have actually increased in the rest of the US</em> since
our national peak.  Without NY and NJ, on April 10 we were at 18,300 cases/day,
then we peaked on May 6 at 21,400 cases/day, and are now at 20,000 cases/day,
for a reduction of 7%.</p>
<p>So let’s talk the future and make some predictions. Think about these two
questions:</p>
<ol>
<li>How many states will see a daily cases peak at least 30% greater than any
peak they’ve had so far?</li>
<li>How many states will be forced back into lockdown?</li>
</ol>
<p>Then consider these facts: compared to other developed nations we have done
a much worse job reducing our outbreak; we did not use our lockdown period to
develop comprehensive policies to fight Covid; we have not used leadership to
galvanize the population to fight the pandemic and adopt practices that mitigate
spread - quite the opposite, we have started a culture war around wearing masks,
social distancing and whether to even take Covid seriously; many American
leaders undermine mitigation by deed and word; even while in lockdown, we have
only been able to achieve modest daily reductions in case numbers; people feel
like they have done their duty and should now be able to resume life, being
generally sick of hearing about Covid and all its controversies and
conspiracies; places highly prone to spread, such as gyms, churches, and
restaurants, will resume operations; domestic travel will resume so that any
counties with larger outbreaks might seed those with fewer cases; finally, if
daily growth increases even to a modest 5%, cases will double in two weeks under
the inexorable march of exponential increase.</p>
<p>Offsetting these is the fact that a large part of the population is much more
careful and attuned to the spread of Covid. Humans are remarkably adaptable, and
maybe smart on-demand interventions at the county and state levels can curb
local outbreaks.</p>
<p>Before answering those two questions, let’s take a look at the familiar
case-and-growth plots for the 40 states with cases/day currently over 70:</p>
<div id="nytStatesCases" class="flexCasesSmall wideViz"></div>
<p>Many of those curves don’t look great. Keep in mind some of the spikes we see
mid-graph are due to specific incidents like outbreaks at a prison.</p>
<p>But enough of the charts, let’s try our hand at divination.  Only eight states
have managed a decrease of 70% or more in their daily cases (nine if we count
Pennsylvania at 69.8%). These are the states most likely to keep things under
control: most have seen a serious situation, all have been effective by US
standards, and they are further down from their peaks. I’ll round up and say 10
states will avoid a greater peak in the future. The other 40 will see a peak at
least 30% greater than their current peak.  And of these 40, at least half will
adopt lockdown measures before the end of the year that affect a majority of
their population.</p>
<p>This is all the data I’ve got for now, but if you’ve read this far, you might as
well stick around for a few broader considerations.</p>
<p>First, the trade-off between economic outcomes and epidemiological outcomes has
become grossly overstated.  The more infection we have, the more the economy
will be affected as people shy away from economic activity.<sup class="footnote-ref"><a href="#fn7" id="fnref7">[7]</a></sup>
A failure to intelligently fight Covid is an economic failure as well. Brazil is
a sad example of this, as the out-of-control outbreak has wreaked havoc in the
economy.</p>
<p>Almost every containment strategy - personal behaviors, contact tracing,
widespread testing, effective quarantine of sick patients, etc. - ultimately
benefits the economy. Every leader who has mocked or sabotaged Covid containment
is hurting economic output.  And plenty of economic activity can be encouraged
with low risk, especially if smart mitigation is applied. Even where a trade-off
seems obvious, say opening up restaurants without restrictions, things are not
so simple: the net economic effect needs to account for the consequences of the
greater spread of Covid, which unfortunately is very likely in restaurants.</p>
<p>The trade-off is much more direct when it comes to personal freedom.  Church
services are a perfect example. They are simultaneously: 1) prone to spreading
Covid, 2) not responsible for a lot of economic activity, and 3) extremely
important to a large part of the population.</p>
<p>Or to pick a different demographic, look at skiing in Colorado.  Plenty of
people here would be willing to risk infection in order to ski, yet this choice
was denied to them. This may seem like a trivial sacrifice, but to many it is
deeply meaningful.  Skiing is a complex trade-off since it does involve a lot of
economic activity and also enormous Covid risk, as we saw when tourists started
various outbreaks in our ski towns. Yet there is also a strong personal freedom
component embedded in it. It is interesting that the restrictions which most
incensed Michigan protesters were related to personal freedoms, like the use of
personal boats.</p>
<p>The moral calculus around Covid trade-offs is complex. Risk to self; risk to
others you might infect; risk to society at large if we overrun the health
system; how to weigh death against hardship, enjoyment, and freedom; how much we
value the life of elderly people and those at greater risk of complications, and
so on.</p>
<p>But there are a lot of actions and personal decisions that remain invariant no
matter how you feel about trade-offs. God knows we are all sick of Covid, now
that the novelty wore off and this looks like a long haul. But stay as safe as
possible, and for whatever degree of risk-taking you decide on, mitigate as much
as possible.</p>
<p>I hope this has been useful and informative. Thanks for reading!</p>
<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p>All of the data for this post comes from either the European CDC
or the New York Times state-level dataset for Covid. The Covid Tracking
Project dataset has also been extremely helpful, but is not used here.  I used
7-day rolling averages for all Covid figures. The county, state, and national
reporting is very noisy with frequent spikes and troughs. They also tend to be
very sensitive to the day of the week and particularly to weekends.  The 7-day
average smooths this out with the nice benefit of capturing exactly one week,
which further helps with the day-of-week variations.  I also use a 7-day
interval to compute growth. This again smooths out noise and allows for more
meaningful comparisons. The growth figure is simply the seventh root of the
factor obtained by dividing a figure for day N by the figure for day N-7.
Whether to use cases, hospitalizations, or deaths is another interesting
decision. Cases and deaths data is more robust and widespread. Deaths are
a lot more sensitive to particularities of an outbreak: a high percentage of
deaths is linked to elderly care facilities, for example, so it is possible to
have high death figures that overstate the size of an outbreak. Deaths also
depend on quality of care, and are far more delayed, frequently happening
anywhere from 2 to 12 weeks after infection. Symptoms and detection of a new
case are much quicker and vary less. I feel that to understand the dynamics of
an outbreak, cases are more useful. Since these charts are all generated by
code, I did an experiment using deaths instead of cases and the trends held up
consistently, albeit delayed by 2-3 weeks. Cases are sensitive to the amount
of testing being done. If the amount of testing is somewhat constant, and the
percentage of detected cases is consistent, then at least the <em>relative</em>
changes in the number of cases will be meaningful, even if they only capture
a fraction of the total. But if testing is increased, this can show up as more
daily case numbers, when in reality only detection increased. Looking at the
percentage of positive tests vs. total tests can help detect that issue.
I have used the data from the Covid Tracking Project, which does provide
testing information, and also the figures for deaths, to see whether changes
in testing play a big role in these trends.  That does not seem to be the case
looking at the data. <a href="#fnref1" class="footnote-backref">↩︎</a> <a href="#fnref1:1" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn2" class="footnote-item"><p><a href="https://en.wikipedia.org/wiki/U.S._state_and_local_government_response_to_the_COVID-19_pandemic" target="_blank" rel="noopener">U.S. state and local government response to the COVID-19 pandemic</a> <a href="#fnref2" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn3" class="footnote-item"><p><a href="https://www.newyorker.com/magazine/2020/06/08/how-iceland-beat-the-coronavirus" target="_blank" rel="noopener">https://www.newyorker.com/magazine/2020/06/08/how-iceland-beat-the-coronavirus</a> <a href="#fnref3" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn4" class="footnote-item"><p><a href="https://www.bloomberg.com/news/articles/2020-06-03/man-behind-sweden-s-virus-strategy-says-he-got-some-things-wrong" target="_blank" rel="noopener">https://www.bloomberg.com/news/articles/2020-06-03/man-behind-sweden-s-virus-strategy-says-he-got-some-things-wrong</a> <a href="#fnref4" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn5" class="footnote-item"><p><a href="https://www.theguardian.com/world/2020/jun/03/architect-of-sweden-coronavirus-strategy-admits-too-many-died-anders-tegnell#maincontent" target="_blank" rel="noopener">https://www.theguardian.com/world/2020/jun/03/architect-of-sweden-coronavirus-strategy-admits-too-many-died-anders-tegnell#maincontent</a> <a href="#fnref5" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn6" class="footnote-item"><p><a href="https://www1.folha.uol.com.br/equilibrioesaude/2020/06/governo-deixa-de-informar-total-de-mortes-e-casos-de-covid-19-bolsonaro-diz-que-e-melhor-para-o-brasil.shtml" target="_blank" rel="noopener">https://www1.folha.uol.com.br/equilibrioesaude/2020/06/governo-deixa-de-informar-total-de-mortes-e-casos-de-covid-19-bolsonaro-diz-que-e-melhor-para-o-brasil.shtml</a> <a href="#fnref6" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn7" class="footnote-item"><p>Morning Consult tracks <a href="https://morningconsult.com/2020/06/09/tracking-consumer-comfort-with-dining-out-and-other-leisure-activities/" target="_blank" rel="noopener">how safe consumers
feel</a>
and <a href="https://morningconsult.com/2020/06/05/consumer-confidence-50-states/" target="_blank" rel="noopener">consumer
confidence</a>
more broadly. It will be interesting to see the relationship between economic
recovery and successful containment in various countries. <a href="#fnref7" class="footnote-backref">↩︎</a></p>
</li>
</ol>
</section>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;This post takes a look at Covid data with a particular focus on the number of
new daily cases and the growth (or
      
    
    </summary>
    
      <category term="science" scheme="https://manybutfinite.com/category/science/"/>
    
      <category term="covid-19" scheme="https://manybutfinite.com/category/covid-19/"/>
    
      <category term="dataviz" scheme="https://manybutfinite.com/category/dataviz/"/>
    
    
  </entry>
  
  <entry>
    <title>Covid-19 Data Sources for Programmers</title>
    <link href="https://manybutfinite.com/post/covid-data-sources-for-programmers/"/>
    <id>https://manybutfinite.com/post/covid-data-sources-for-programmers/</id>
    <updated>2020-04-08T17:00:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>I’ve been doing analysis of Covid cases to try to understand what to expect in
terms of lockdown length and disease progress, especially in Colorado and
Brazil, the places I spend the most time in. There are a lot of data sources
around, and it took me a few hours to find and test a number of them. I hope
this saves time for anyone interested in crunching Covid numbers. If you have
suggestions and tips on data sources, please open a PR or issue in my <a href="https://github.com/gduarte/blog" target="_blank" rel="noopener">Github
repo</a>. Here we go.</p>
<p><a href="https://twitter.com/jburnmurdoch" target="_blank" rel="noopener">John Burn-Murdoch</a> and his team at the
Financial Times have done a <a href="https://www.ft.com/coronavirus-latest" target="_blank" rel="noopener">great job</a> reporting visually on the
pandemic. They have fewer and simpler charts than many other sites but their
charts are done exquisitely and distill a lot of data to provide you the
clearest picture available of each country’s situation, plus a few of the
regional hotspots around the world.</p>
<p><a href="https://ourworldindata.org/" target="_blank" rel="noopener">Our World in Data</a> is a wonderful project based at
Oxford University that attempts to explain the world using rigorous data sources
and beautiful charts. They have been producing a lot of great Covid content
since the pandemic broke out. If you have some time, I suggest exploring the
non-covid areas of the site as well (and if you enjoy that, I highly recommend
the book <a href="https://www.amazon.com/Factfulness-Reasons-World-Things-Better/dp/1250123828/" target="_blank" rel="noopener">Factfulness</a>). All of their work is <a href="https://github.com/owid" target="_blank" rel="noopener">open sourced</a>.</p>
<p>The OWID data is in a <a href="https://github.com/owid/covid-19-data" target="_blank" rel="noopener">GitHub repo</a>.
Their main source is the <a href="https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide" target="_blank" rel="noopener">European CDC</a>, which publishes confirmed cases
and deaths aggregated by date and country for most of the world (not just
Europe) in JSON, CSV, and XML files.</p>
<p>Johns Hopkins University has built a wildly popular dashboard available in
<a href="https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6" target="_blank" rel="noopener">desktop</a> and <a href="http://www.arcgis.com/apps/opsdashboard/index.html#/85320e2ea5424dfaaa75ae62e5c06e61" target="_blank" rel="noopener">mobile</a> versions. Their
<a href="https://github.com/CSSEGISandData/COVID-19" target="_blank" rel="noopener">repository</a> is public and it
aggregates data from a variety of sources into easy to use CSV files (there’s
also a <a href="https://github.com/pomber/covid19" target="_blank" rel="noopener">JSON mirror</a>). In
addition to worldwide national totals, data is available for individual US
counties and states.  It includes number of cases and deaths along with
recovered and active patients. Since they aggregate data from the US CDC, China
CDC, European CDC and several other national institutions, this is a great way
to get your hands at worldwide data.</p>
<p>The New York Times offers a plethora of high-quality Covid <a href="https://www.nytimes.com/interactive/2020/world/coronavirus-maps.html" target="_blank" rel="noopener">maps and
visualizations</a>. It’s not a surprise Mike Bostock, creator of the
D3.js library, used to work there. The NYT open sourced a <a href="https://github.com/nytimes/covid-19-data" target="_blank" rel="noopener">repository</a>
providing high-quality and painstakingly verified data for US cases at both the
state and county level. This is probably the best source of data for analyzing
number of cases and deaths in the US.</p>
<p>Another outstanding US data source is the <a href="https://covidtracking.com/" target="_blank" rel="noopener">Covid Tracking
Project</a>, powered by dozens of volunteers attempting
to collect data on number of tests performed, positive and negative results,
hospitalizations, patients in the ICU and on ventilators, and so on. They face
a severe dearth of information in the US and the complete lack of centralized
reporting, but they’re making the best of it. If you want to attempt more
sophisticated analysis, this is a good source. But mind the gaps.</p>
<p>I’m sure you’ve hit <a href="https://www.worldometers.info/coronavirus/" target="_blank" rel="noopener">Worldometer</a>
while googling covid information. They provide encyclopedic amounts of data
about Covid infection worldwide through an effective bare-bones interface with
good charts. Data is aggregated by country and includes deaths and active cases,
both by day and totalized.</p>
<p>Finally, if you are interested in more regional data for other countries, there
are great repositories for <a href="https://code.montera34.com/numeroteca/covid19/-/tree/master" target="_blank" rel="noopener">Spain</a> and <a href="https://github.com/pcm-dpc/COVID-19" target="_blank" rel="noopener">Italy</a>.
It’s not easy to aggregate UK data, but Tom White has a <a href="https://github.com/tomwhite/covid-19-uk-data" target="_blank" rel="noopener">good
repo</a>. Álvaro Justen has done the
same for <a href="https://github.com/turicas/covid19-br" target="_blank" rel="noopener">Brazil</a>, while research lab
Fiocruz has a good <a href="https://bigdata-covid19.icict.fiocruz.br/" target="_blank" rel="noopener">web UI</a> for
Brazilian data.</p>
<p>If you know of other high-quality regional repos, please send a PR or <a href="https://github.com/gduarte/blog" target="_blank" rel="noopener">GitHub
issue</a>. I’d love to expand this post with the best repos for each
region.</p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;I’ve been doing analysis of Covid cases to try to understand what to expect in
terms of lockdown length and dise
      
    
    </summary>
    
      <category term="science" scheme="https://manybutfinite.com/category/science/"/>
    
      <category term="covid-19" scheme="https://manybutfinite.com/category/covid-19/"/>
    
    
  </entry>
  
  <entry>
    <title>iPhones, Armed Robbery, and Hacking</title>
    <link href="https://manybutfinite.com/post/iphones-armed-robbery-hacking/"/>
    <id>https://manybutfinite.com/post/iphones-armed-robbery-hacking/</id>
    <updated>2018-01-17T18:45:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>(Some security recommendations are summarized at the <a href="#recommendations">end</a>.)</p>
<h3 id="I-The-Robbery">I. The Robbery</h3>
<p>This past summer I was walking around in the neighborhood where I grew up,
happy-go-lucky, when some guy jumped off a motorcycle pointing a gun at me. It
was my first time at gunpoint, and from the outset the weapon was positively
spellbinding.  As I gazed at it, strange thoughts hit me: “Am I going to get
shot by this rusty piece of shit?  What a sorry way to die! And what if I get
tetanus?”</p>
<p>Those were thoughts I wouldn’t have anticipated, but as Dan Carlin says, humans
in extreme situations often behave unexpectedly.  And while a gun-toting thug is
a far cry from the Battle of Verdun, it is pretty extreme for me. This post
tells the story of the robbery and its surprising information security
developments. There are lessons here for both users and designers of technology.</p>
<p><img src="/img/misc/iphone-robbery/robbery.jpg" alt="Robbery scene" title="Everybody be cool, this is a robbery!"></p>
<p>My daughter and I were visiting Brazil in July, taking a carefree walk in
a boulevard lined with lush trees.  She had just gotten into “good kid, m.A.A.d.
city”, ironically enough an album about growing up in Compton amid dire
violence. So we were deep in conversation about the US criminal justice system,
drug laws, and the ideas of people like Bryan Stevenson and Michelle
Alexander<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup>.</p>
<p>Growing up in Brazil you get a crash course in street smarts. I was mugged twice
as a 10-year-old and once at 15.  That’s counting only the times when stuff was
actually taken. There were scores of near-muggings I dodged by either talking my
way out or running my way out.</p>
<p>But after 20 trouble-free years, I let my guard down. Absorbed in conversation,
I barely noticed the motorcycle driving on the other side of the street. By the
time it veered the wrong way into traffic and sped towards the sidewalk we were
on, it was too late.  The passenger jumped out while the motorcycle was still
moving, gun in hand pointed squarely at me.</p>
<p>The scene felt strangely removed - it’s cliche, but it really did feel like
a movie. Instead of panicked confusion, there was a strong pragmatic voice in my
head. I had thought about “what if” scenarios plenty of times before and they
kicked in. Who is the attacker? What is their motivation? What’s the best course
of action?</p>
<p>There are career criminals in Brazil who are downright <em>professional</em>.  I know
somebody whose house was invaded while they were home and the robbers let them
know how long the “job” was going to take, offered them water, and made sure
nobody freaked out.  Better than some moving companies I’ve used.</p>
<p>But when someone is robbing random people on the street using a <em>gun</em>, that’s
pretty far from professional. Way too volatile a situation with huge risks and
beggarly payoff. These were at best lowlifes and at worst jittery crackheads.
I felt two strong imperatives. First, keep the situation as absolutely relaxed
as possible. When they get nervous, they get scared. And when they get scared,
that’s when I accidentally get shot. But second, and more importantly: if they
want to kidnap my daughter, fight it at any cost whatsoever. Better to die on
that sidewalk than to let them take her.</p>
<p>I remember thinking, “take a deep breath, raise hands slowly, move smoothly,
stay relaxed, hand everything over.”  It worked. Who knows, maybe Andy from the
Headspace app saved my life. The bandits were gone, along with our two iPhones
and my watch. But the real fun was still to come.</p>
<p>After we got home, I logged into iCloud and put both phones in lost mode. They
had been turned off, predictably. Plenty of crooks have been caught by way of
“Find iPhone,” but they’ve learned by now. Thinking of my data in criminal hands
was uncomfortable, but the fact that iOS exploits <a href="https://www.zerodium.com/program.html" target="_blank" rel="noopener">sell for $1.5
million</a> made me feel a lot better. No small-timer is breaking into an
iPhone. I figured they would wipe it out and sell it.</p>
<p>I have two-factor authentication in all the accounts that matter, and whenever
possible my second factor is an iPhone app that generates time-based one-time
passwords (<a href="https://en.wikipedia.org/wiki/Time-based_One-time_Password_Algorithm" target="_blank" rel="noopener">TOTP</a>) for authentication. Google Authenticator is a popular app for
this, but I use <a href="https://itunes.apple.com/us/app/otp-auth-2step-auth-for-pros/id659877384" target="_blank" rel="noopener">OTP Auth</a> instead because it is more flexible (more on this in
the recommendations).  Here’s what it looks like, slightly sped up to make it
more exciting:</p>
<p><img src="/img/misc/iphone-robbery/otp-auth.gif" alt="Time-based one-time passwords" title="Time-based one-time passwords"></p>
<p>When it’s time to log into one of your accounts, you provide your login and
password as you normally would, plus the temporary code being shown by the app.</p>
<p>I also use a password manager with unique, long passwords for each site.  So my
main concern at that point was minimizing the impact of this whole thing on my
kid.  We had dinner planned with friends, tasting menu at a good Japanese place,
so I thought it best to go, have a good time, laugh and hopefully cushion the
blow. Later I could call T-Mobile and suspend the cell lines.</p>
<h3 id="II-The-Hacking">II. The Hacking</h3>
<p>A couple of hours later we were back, much happier, imbued with friendship and,
in my case, plenty of sake. I opened Gmail and got some shockers:</p>
<p><img src="/img/misc/iphone-robbery/unfortunate-emails.png" alt="Facebook password reset email" title="A flurry of unfortunate emails"></p>
<p>Wuh-wait what? I wasn’t expecting to see any of these, but least of all the
Facebook password reset. Before you read on, take a good look at those emails.
It’s fun to work out what happened here. Done? Let’s dig in:</p>
<p><img src="/img/misc/iphone-robbery/fb-pw-reset-email.png" alt="Facebook password reset email" title="Facebook password reset"></p>
<p>Whoah! Facebook password reset by phone number? How? Did they unlock my phone?
But also… why? At once I felt the sinking realization I misread the situation.
They seemed to be more sophisticated that I thought - not the motorcycle crew
themselves, but someone else in the operation (his identity would be unmasked
later that night).</p>
<p>The idea that somebody was hacking into my accounts <em>right at that moment</em>, with
<em>my phone</em> in hand, was deeply unsettling. A malevolent twist to the emotional
roller coaster of that evening.  But this was a technical problem, so it was
time to sober up as best as I could and work methodically.</p>
<p>The “how” was simple. The attacker took the SIM card out of the stolen iPhone
and put it in another phone. At that point he found out my phone number, whereas
previously he had no information on me. More importantly, he could also receive
my SMS text messages. He then attempted to log into Facebook using my phone
number as a login, clicked on “Forgot Password,” and reset the password via SMS.</p>
<p>So here is a big screw up and a couple of lessons. As I said, I have 2FA
(two-factor authentication) in the accounts “that matter.” But I rarely use
Facebook, so I didn’t enable 2FA there. Oops. Turns out it’s not such a great
idea to have an account in the world’s most popular app as a weak link in your
defenses.</p>
<p>Now consider Facebook’s account recovery policies. If the account has 2FA
enabled, passwords can only be reset by email. That’s good. But <em>without</em> 2FA,
if an account has an associated phone number, the password can be reset via SMS.
In such a case, a SIM card is an instant ticket to the account: find it and
reset its password in one fell swoop.</p>
<p>That’s a disaster. Facebook single handedly provides a way for attackers to go
from a SIM card, or hijacked SMS messages, to a trove of personal information
for the vast majority of people out there. By contrast, attackers made zero
progress in hacking my kid’s accounts, mostly because she doesn’t use Facebook.</p>
<p>But why the hell was this wretch logging into my FB account? I suspect it wasn’t
for my cousin’s mad political rants. Already shaken from the armed robbery,
my mind played tricks on me as paranoid thoughts of identity theft and
fraudulent bank transfers loomed.</p>
<p>I immediately logged into <a href="http://t-mobile.com" target="_blank" rel="noopener">t-mobile.com</a> and suspended both cell lines, disabling
the attacker’s main weapon. As an aside, T-Mobile has been great for
international travel. I love you guys, keep your website safe.  I tested sending
SMS messages to my suspended numbers and happily all attempts generated errors.</p>
<p>On to the other emails. The Facebook password reset arrived at 9:46pm Brazil
time. Curiously, at 11:23pm they briefly turned my phone back on with its SIM
card, and the phone went into lost mode and flashed on Find iPhone
<a href="https://goo.gl/maps/CWanq1At1ez" target="_blank" rel="noopener">here</a>.  But then there is that fourth email with a subject line of
“iPhone SE 64GB Silver Was Found!” arriving at 12:20am. Here it is:</p>
<p><img src="/img/misc/iphone-robbery/icloud-phishing-email.png" alt="iCloud phishing email" title="iCloud phishing attempt"></p>
<p>The phone model and storage capacity are exactly right. The spelling, grammar,
and layout are pretty well done. It was sent to my primary personal email,
lifted from Facebook. Imagine a regular user receiving this <em>right</em> after their
phone has been stolen, while they’re somewhat shaken, and when they’re not
native English speakers to boot. What are the odds they’ll realize this is
a phishing attempt for iCloud credentials?</p>
<p>Apart from checking the URL, the biggest clue is the exclamation mark in the
subject line, a little too enthusiastic for Apple. Either way, this is a nearly
perfect phishing piece, made more so by impeccable timing.  Maybe iCloud
accounts should be placed in some sort of restricted state after a device is put
into lost mode.</p>
<p>It’s stressful to face an ongoing, targeted, personal attack.  Deep breath
again. Time to methodically check every account for suspicious activity, change
passwords just in case, and recover compromised accounts. My main Google
account, protected by 2FA, was safe throughout the ordeal.  I reset my Facebook
password by email and got back in. GitHub, AWS, and other professional accounts
were also on 2FA and had no unauthorized activity. Audit logs never tasted so
sweet.  It was a relief knowing I wouldn’t have to tell clients, “Hey, how are
you? Great!  So, listen, this iPhone thieving ring probably has all your data,
isn’t that funny? Hah! But never mind that! Those Bitcoin prices, huh?”</p>
<p>Then I tried a secondary Gmail account I use for some mailing lists and other
non-critical tasks. You know… the kind of account for which one might leave
2FA disabled. Sure enough, the wretch had been there, and the password was
changed via SMS password reset. And he only <em>found</em> the account by the phone
number in the first place. Familiar? Here’s a quick recap:</p>
<p><img src="/img/misc/iphone-robbery/sms-hacking-diagram.jpg" alt="SMS hacking diagram" title="SMS hacking diagram"></p>
<p>This Gmail account did not have a recovery email set up, and ironically
I couldn’t use SMS anymore. Google offers a recovery algorithm where you try to
answer different questions with the ability to “Try another way” if you don’t
have a particular answer (quick: in which month and year did you create your
Google account?). I was locked out for a while, long enough to start thinking
I had lost the account, but eventually produced a couple of answers and got back
in.</p>
<p>Finally, all of my accounts were safe again. It was getting late, but I had to
find out why this person was frantically probing my accounts, and maybe, with
some luck, who they were.</p>
<p>I knew the data in the iPhones was safe, as per the Apple vs. government
showdown after the San Bernardino terrorist attack. But earlier I had assumed
the phones could still be wiped clean and used normally. But maybe they
couldn’t, and this whole rigmarole was about breaking into my iCloud account.
Hence the phishing.</p>
<p>A quick search confirmed the idea. Since iOS 7, released in 2013, Apple has
provided the Activation Lock feature, whereby if a device is linked to an iCloud
account, activating it requires the password to that account. This has created
some misery among people buying and selling used iPhones: if the seller forgets
to unlink the device from their iCloud account, the device is bricked until they
do so.</p>
<p>A warm wave of righteous schadenfreude washed over me: all the robbers had were
parts! They would fetch little money from this whole thing, especially since my
kid’s screen was cracked. You go, kid! Glad I hadn’t replaced it yet.  Also glad
for activation lock, though perversely my digital torment was its side effect:
the world is complicated. It turns out there was no sinister plot, just
a miserable scheme for a few hundred dollars.  Straight to the depths of hell is
where those cowards going.</p>
<p>It was sobering to realize the attacker <em>almost</em> succeeded. Up until a few
months prior to the robbery, my iCloud account did not have 2FA enabled and it
used the compromised Gmail address as the recovery email.  If the robbery had
happened then, they would have been able to get in, unlink the phones, and sell
them at full (used) price. They would have changed my iCloud password in the
process, and might have erased and locked my <em>other</em> Apple devices for the hell
of it, which would have been disastrous and possibly ruinous, depending on
whether I could get back into the account.  Whatever little data I have in
iCloud would have been stolen as well.</p>
<p>I hope this motivates you to enable 2FA on <em>all</em> of your accounts, even the
unimportant ones. They can interact in incremental and unexpected ways to become
your undoing.  Moreover, using TOTP apps as the second factor is far safer than
SMS.</p>
<p>Apple has done a fantastic job with iOS security and Find iPhone, curbing
everything from malware to exploits to theft.  But further improvements can be
made to better protect its customers.  In the next post you’ll see week-long
sustained hacking attempts and meet the maggot behind the attacks, operating in
a wretched hive of “iPhone unlockers.”</p>
<p><a name="recommendations"></a></p>
<h3 id="III-Recommendations">III. Recommendations</h3>
<ul>
<li>
<p>Make sure your accounts cannot be hacked via text message (SMS) password
resets. You can often do so by enabling two-factor authentication (2FA) for an
account, particularly if you use a time-based one-time password (TOTP) app
as the second factor. Two such apps are Google Authenticator and <a href="https://itunes.apple.com/us/app/otp-auth-2step-auth-for-pros/id659877384" target="_blank" rel="noopener">OTP Auth</a>.
You could also withhold your phone number from certain accounts.  Another
advantage of TOTP is that if you’re unable to receive SMS messages for
whatever reason, you can still log in.</p>
</li>
<li>
<p>Beware of your unprotected “less critical” accounts. They might provide a path
to your sensitive ones.</p>
</li>
<li>
<p>If you decide to go with a TOTP app, choose one that allows you to make an
encrypted backup of your account secrets. <a href="https://itunes.apple.com/us/app/otp-auth-2step-auth-for-pros/id659877384" target="_blank" rel="noopener">OTP Auth</a> provides that along with
encrypted iCloud sync, all optional and controlled by the user. <a href="https://authy.com/" target="_blank" rel="noopener">Authy</a> is
another good option. If you use Google Authenticator, make sure losing your
phone won’t lock you out of any accounts.</p>
</li>
<li>
<p>If you design apps, be careful with password resets via SMS. SIM cards are an
easy target, cell providers are subject to social engineering that could lead
to intercepted messages<sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup>, and SMS notifications can be seen on
lock screens in most phones.  Allow users to choose TOTP as a second factor.</p>
</li>
<li>
<p>If your iPhone is lost or stolen, go to <a href="http://iCloud.com" target="_blank" rel="noopener">iCloud.com</a> immediately, put it in lost
mode, and provide a phone number where you can be reached. Once you’ve done
that, you might want to temporarily suspend your phone line (many carriers
offer this on their websites). If you do so, you can no longer call your own
phone, and unless it’s on wifi, you also can’t “Play Sound” or “Erase iPhone”
via iCloud - keep that in mind. On the upside, nobody can use your SIM
card to hack your accounts via SMS password reset. It’s a trade-off.</p>
</li>
<li>
<p>If your iPhone is definitely stolen, rather than lost, it will probably appear
off in iCloud.  Put it in lost mode anyway. If you provide a phone number,
know that it might be targeted for iCloud phishing or social engineering as
crooks try to hack into your iCloud account (that’s why the attacker briefly
turned my phone on: to get a phone number to target). You almost surely want
to suspend your cell service immediately. You lose the tracking and other
goodies, but thieves generally know to keep the phone off, and handing them
a working SIM card is fraught with peril. Tread carefully and may the force be
with you.</p>
</li>
<li>
<p>You might want to protect your SIM card with a PIN. This requires you to enter
the PIN whenever you turn your phone on. Attackers are thus unable to
transplant your SIM card to another device and use it. However, if you lose
your iPhone and the battery dies, or the person who finds it turns it off,
it’s game over. Even if the phone is later turned on, it won’t connect to
the Internet, enter Lost Mode, show a phone number where you can be reached,
or “Play Sound” (this is true even if a known wifi is in
range<sup class="footnote-ref"><a href="#fn3" id="fnref3">[3]</a></sup>). A phone that otherwise might have been found could
be bricked and lost forever.</p>
</li>
<li>
<p>Beware of “Your iPhone was found” emails, text messages, and Whatsapp
messages. Scammers attempt to phish for your iCloud credentials in devious
ways <em>soon</em> after an iPhone is stolen. If you provided a lost mode phone
number, thieves will attempt to use it against you while trying to break into
your iCloud account.</p>
</li>
<li>
<p>Read Tech Solidarity’s <a href="https://techsolidarity.org/resources/basic_security.htm" target="_blank" rel="noopener">security guide</a>. It’s overkill for a regular user, but
know the rules before breaking them.</p>
</li>
</ul>
<p>Thank you for reading.</p>
<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p>If you’re interested in US criminal justice,
<a href="https://www.amazon.com/Ghettoside-True-Story-Murder-America/dp/0385529996" target="_blank" rel="noopener">Ghettoside</a> is a great book with better-than-fiction LA detective stories
interwoven with a serious discussion of criminality, murder clearance rates,
and other pressing topics.  <a href="https://www.amazon.com/New-Jim-Crow-Incarceration-Colorblindness/dp/1595586431" target="_blank" rel="noopener">The New Jim Crow</a> by Michelle Alexander is an
interesting read on mass incarceration, while Bryan Stevenson’s <a href="https://www.amazon.com/Just-Mercy-Story-Justice-Redemption/dp/081298496X" target="_blank" rel="noopener">Just Mercy</a>
offers a piercing look at the injustices we sometimes create.  Ezra Klein has
a <a href="https://soundcloud.com/ezra-klein-show/bryan-stevenson-on-why-the" target="_blank" rel="noopener">good interview</a> with Stevenson.  Glenn Loury’s
<a href="https://www.samharris.org/podcast/item/racism-and-violence-in-america" target="_blank" rel="noopener">interview</a> with Sam Harris offers a somewhat different
perspective. <a href="#fnref1" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn2" class="footnote-item"><p>VICE <a href="https://motherboard.vice.com/en_us/article/wjx3e4/t-mobile-website-allowed-hackers-to-access-your-account-data-with-just-your-phone-number" target="_blank" rel="noopener">reported</a> on a T-Mobile website bug
that leaked personal data based based on phone number alone, giving social
engineers a leg up. But this type of attack has worked against multiple
carriers all over the <a href="https://www.theguardian.com/money/2015/sep/26/sim-swap-fraud-mobile-phone-vodafone-customer" target="_blank" rel="noopener">world</a>. Prominent [hacks]
<a href="https://www.wired.com/2016/06/deray-twitter-hack-2-factor-isnt-enough/" target="_blank" rel="noopener">wired-deray-hack</a> have happened this way. <a href="#fnref2" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn3" class="footnote-item"><p>You can try this at home: turn your iPhone off and back
on. Until the passcode is entered at least once, it won’t connect to wifi. See
<a href="https://www.jamf.com/jamf-nation/discussions/13523/enable-wi-fi-on-device-locked-with-passcode" target="_blank" rel="noopener">this</a>. If you have a better link, please let me
know. <a href="#fnref3" class="footnote-backref">↩︎</a></p>
</li>
</ol>
</section>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;(Some security recommendations are summarized at the &lt;a href=&quot;#recommendations&quot;&gt;end&lt;/a&gt;.)&lt;/p&gt;
&lt;h3 id=&quot;I-The-Robb
      
    
    </summary>
    
      <category term="security" scheme="https://manybutfinite.com/category/security/"/>
    
      <category term="personal" scheme="https://manybutfinite.com/category/personal/"/>
    
    
  </entry>
  
  <entry>
    <title>Goto and the folly of dogma</title>
    <link href="https://manybutfinite.com/post/goto-and-the-folly-of-dogma/"/>
    <id>https://manybutfinite.com/post/goto-and-the-folly-of-dogma/</id>
    <updated>2018-01-11T19:00:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>Many programmers are surprised to find out that the <code>goto</code> statement is still
widely used in modern, high-quality codebases. Here are some examples, using the
first codebases that come to mind:</p>
<table class="table"><thead>
<tr>
<th>Repo</th>
<th><code>goto</code> usages</th>
<th>ratio to <code>continue</code></th>
</tr>
</thead>
<tbody>
<tr>
<td>Linux kernel</td>
<td><a href="https://grokbit.com/torvalds/linux/ref/master?keyword=goto" target="_blank" rel="noopener">150k</a></td>
<td><a href="https://grokbit.com/_s/gduarte/goto-statement/linux" target="_blank" rel="noopener">6.27</a></td>
</tr>
<tr>
<td>.net CLR</td>
<td><a href="https://grokbit.com/dotnet/coreclr/ref/master?keyword=goto" target="_blank" rel="noopener">5k</a></td>
<td><a href="https://grokbit.com/_s/gduarte/goto-statement/clr" target="_blank" rel="noopener">2.13</a></td>
</tr>
<tr>
<td>git</td>
<td><a href="https://grokbit.com/git/git/ref/master?keyword=goto" target="_blank" rel="noopener">960</a></td>
<td><a href="https://grokbit.com/_s/gduarte/goto-statement/git" target="_blank" rel="noopener">0.76</a></td>
</tr>
<tr>
<td>Python runtime</td>
<td><a href="https://grokbit.com/python/cpython/ref/master?keyword=goto" target="_blank" rel="noopener">5k</a></td>
<td><a href="https://grokbit.com/_s/gduarte/goto-statement/cpython" target="_blank" rel="noopener">16.9</a></td>
</tr>
<tr>
<td>Redis</td>
<td><a href="https://grokbit.com/antirez/redis/ref/unstable?keyword=goto" target="_blank" rel="noopener">554</a></td>
<td><a href="https://grokbit.com/_s/gduarte/goto-statement/redis" target="_blank" rel="noopener">2.14</a></td>
</tr>
</tbody>
</table>
<p>The ratio to usages of the <code>continue</code> keyword is provided to normalize for lines
of code and the prevalence of loops in the code.  This is not limited to C code
bases.  Lucene<span>.</span>net for example has <a href="https://grokbit.com/_s/gduarte/goto-statement/lucenenet" target="_blank" rel="noopener">1,511</a> <code>goto</code> usages
and a ratio of 3 <code>goto</code> usages to each <code>continue</code> usage. The C# compiler,
written itself in C#, clocks in at <a href="https://grokbit.com/_s/gduarte/goto-statement/roslyn" target="_blank" rel="noopener">297</a> <code>goto</code> usages and 0.22 ratio.</p>
<p>People who take “goto is evil” as dogma will point out that each of these usages
could be rewritten as gotoless alternatives. And that’s true, of course. But it
will come at a price: duplication of code, introduction of flags, several
<code>if</code> statements, and overall added complexity. These are highly reviewed
codebases written by talented people. When they use <code>goto</code>, it’s because they
find it to be the simplest approach.</p>
<p>This is exactly how dogma hurts software development. We take a sensible rule
that works most of the time and promote it to sacred edict, deeming violators as
inferior programmers, producers of unclean code.  Thus something that would have
been a helpful guideline becomes a hard constraint.  Pile up enough of these,
and code that could have been simple ends up in a tangled mess, all in the name
of “purity.”</p>
<p>We have a long tradition of dogmas, but <code>goto</code> is the seminal example, denounced
in Edsger Dijkstra’s famous letter, <a href="http://sci-hub.tw/10.1145/362929.362947" target="_blank" rel="noopener">Go To Statement Considered Harmful</a>. Just
barely over a page, it’s a good case study. The letter is good advice in the
vast majority of cases: misuse of <code>goto</code> will quickly land you in a maze of
twisty little passages, all alike.  Less helpful were the creation of a social
taboo (goto is the province of inferior programmers) and the absolutist calls
for abolition. Dijkstra himself came to regret how “others were making
a religion” out of his position, as quoted in Donald Knuth’s more level-headed
paper, <a href="http://sci-hub.tw/10.1145/356635.356640" target="_blank" rel="noopener">Structured Programming with go to Statements</a>.</p>
<p>Taboos tend to accrete over time. For example, overzealous object-oriented
design has produced a lot of lasagna code (too many layers) and a tendency
towards overly complex designs. Chasing semantic markup purity, we sometimes
resorted to hideous and even unreliable CSS hacks when much simpler solutions
were available in HTML. Now, with microservices, people sometimes break up
a trivial app into a hard-to-follow spiderweb of components. Again, these are
cases of people taking a valuable guideline for an end in itself.  Always keep
a hard-nosed pragmatic aim at the real goals: simplicity, clarity, generality.</p>
<p>When Linus Torvalds started the Linux kernel in 1991, the dogma was that
“monolithic” kernels were obsolete and that microkernels, a message-passing
alternative analogous to microservices, were the only way to build a new OS.
GNU had been working on microkernel designs since 1986. Torvalds, a pragmatist
if there was ever one, tossed out this orthodoxy to build Linux using the much
simpler monolithic design. Seems to have worked out.</p>
<p>Every programmer pays lip service to simplicity. But when push comes to shove,
most will readily give up simplicity to satisfy dogma. We should be willing to
break generic rules when the circumstances call for it.  Keep it simple.</p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;Many programmers are surprised to find out that the &lt;code&gt;goto&lt;/code&gt; statement is still
widely used in modern, 
      
    
    </summary>
    
      <category term="programming" scheme="https://manybutfinite.com/category/programming/"/>
    
    
  </entry>
  
  <entry>
    <title>Grokbit</title>
    <link href="https://manybutfinite.com/post/launching-grokbit/"/>
    <id>https://manybutfinite.com/post/launching-grokbit/</id>
    <updated>2016-06-21T18:30:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>TLDR: I launched <a href="http://grokbit.com" target="_blank" rel="noopener">Grokbit</a>, a code search and browsing tool.</p>
<p>When I was programming as a kid, I longed for a <em>hardcore codebase</em>, like
a compiler or operating system, to <em>really</em> understand computers. That stuff
just sounded so magical, some kind of Elvish secret way beyond mortals.  There
were decent books explaining how things worked, but that’s a poor substitute for
code.</p>
<p>Then the Internet reached Brazil when I was about 14, and all of a sudden there
was this “GNU C” compiler that was allegedly better than Microsoft’s, and the
code was <em>completely open!</em> And what’s more, there was an open Unix you could
run on your 386! No need to convince your parents to sell the car and buy
a SPARCstation!</p>
<p>This was the best present any kid could hope for. This is why, despite a lot of
issues in the tech community, my gratitude to these people is overwhelming. You
might think Stallman is a lunatic, and you might be right, but damn - he’s the
geek black-bearded Santa who brought the source to the children.</p>
<p>Now, reading this code was hard. The kernel, in particular, is tough to follow
because entry points and flow of execution are unclear. There was no tmux,
I didn’t know Vim, my regexes were weak. So I printed out a whole bunch of code
in my parent’s dot matrix printer, and spread it on the floor. A poor man’s
multi-pane Vim session. Here’s an interrupt handler, there’s a “bottom half”,
and hey, look!, the syscall is right over there by the socks.</p>
<p>I think reading code is second only to writing code in making you a better
programmer. When I write stuff like <a href="/post/anatomy-of-a-program-in-memory/">Anatomy of a program in memory</a> or
<a href="/post/what-does-an-idle-cpu-do/">What does an idle CPU do?</a>, a big part of the kick is sharing what
I think are beautiful designs with people who haven’t seen them before.</p>
<p>Still, I always wished I could give readers a better interface to actually dive
into the code. But our tools for handling an entire code base, especially in the
browser, are just not good enough at the moment. Searching also needs a lot of
improvement: I think we can do better than regexes and general full-text
searching when it comes to searching code.</p>
<p>That’s why I built <a href="http://grokbit.com" target="_blank" rel="noopener">Grokbit</a>. It has an indexing and search engine that’s
entirely tailored to code, so you can search semantically, like “give me the
definition of foo” or “search for an identifier named bar”. It’s also wicked
fast: you get real-time suggestions even in the largest repos I could try.</p>
<p>But searching was only half the battle. Having a rich UI - especially
a multi-pane one, was an absolute requirement for me. When you have function
A calling B calling C, it can be <em>enormously</em> helpful to have the 3 of them in
the screen at once, and navigate seamlessly. Plus being able to load multiple
large files, having back/forward in the browser work well, and an overall
smooth UX.</p>
<p>So that’s what I went for with Grokbit.  It’s still crude, lots of low hanging
fruit, but it has already been very useful, as you’ll be able to tell in future
blog posts. But before I put more weekends into it, I’d like some feedback. Try
it out, let me know what could be better, or which search features I should
build sooner (many easy wins here). I hope it’s as fun to use as it was to
build.</p>
<p>Finally, if you are interested in working on the project, reach out. I don’t
know if this will become a SaaS app, or a feature in another product, or an open
source project, but I am hellbent on solving this problem. Let’s carry the ring
into Mordor.</p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;TLDR: I launched &lt;a href=&quot;http://grokbit.com&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Grokbit&lt;/a&gt;, a code search and brow
      
    
    </summary>
    
      <category term="programming" scheme="https://manybutfinite.com/category/programming/"/>
    
    
  </entry>
  
  <entry>
    <title>Home Row Computing on Macs</title>
    <link href="https://manybutfinite.com/post/home-row-computing-on-mac/"/>
    <id>https://manybutfinite.com/post/home-row-computing-on-mac/</id>
    <updated>2014-11-24T16:15:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>For a number of years I’ve configured my desktops so that most tasks can be done
using only home row keys on the keyboard, a technique I call <strong>home row
computing</strong>. It takes the Vi idea of staying on the home row to every app, all
the time, but <em>without using modes</em> so things are simpler.</p>
<p>I’ve described an implementation for <a href="/post/home-row-computing/">Windows</a>, but I have since
moved to Macs and back to a qwerty keyboard (away from Dvorak). The current
setup is described in this post. It uses familiar Vi key bindings and is far more
suitable.  It’s fairly painless to configure on the Mac and has never given me
any problems, thanks to <a href="https://github.com/tekezo" target="_blank" rel="noopener">Takayama Fumihiko</a>’s awesome keyboard apps.</p>
<p>Using this is a joy. It’s <em>really</em> fast, easy on the hands, and makes you feel
like a geek god. If you don’t use Vim, you’ll now have one of its benefits in
your favorite editor <em>and</em> in other apps, plus a weapon against smug Vimmers. If
you already use Vim, your cherished <code>hjkl</code> keys become universal and pressing
<code>Esc</code> gets a hell of a lot easier.</p>
<p>Some of the important keys that must be moved to home row are the arrow keys,
<code>Esc</code>, <code>delete</code> (backspace) and <code>forward delete</code>. Another helpful home row
task is moving and resizing windows.  The key to all this is remapping <code>Caps Lock</code> to allow combinations of <code>Caps Lock</code> plus a home key to do these tasks.
Again, there are <em>no modes</em> involved here, <code>Caps Lock</code> works as a modifier like
the <code>cmd</code> and <code>fn</code> keys. Here’s a good start:</p>
<img src="/img/screenshots/mac-keyboard.png" class="center">
<p>I have left several keys unmapped so you can customize your own setup, and we’ll
get to window management in a moment. The first step is to set <code>Caps Lock</code> to
<code>No Action</code> in <code>System Preferences > Keyboard > Modifier keys</code>:</p>
<img src="/img/screenshots/disable-capslock.png" class="center">
<p>Now we must remap the <code>Caps Lock</code> key code to something else. To do so, you need
a small tool called <a href="https://pqrs.org/osx/karabiner/seil.html.en" target="_blank" rel="noopener">Seil</a>
(<a href="https://github.com/tekezo/Seil" target="_blank" rel="noopener">open source</a>). You can map <code>Caps Lock</code> to any
other key, like <code>cmd</code> or <code>option</code>. So if you don’t want to go all-out home row,
you can still benefit from the remapping.</p>
<p>I like to remap <code>Caps Lock</code> into something that guarantees <em>no conflicts ever</em>
for our combos. So I use key code 110, which is the Apps key on a Windows
keyboard and is safely absent from Apple keyboards:</p>
<img src="/img/screenshots/seil.png" class="center">
<p>Now we’re in business, the world - or at least the keyboard - is our oyster. The
maker of Seil also makes <a href="https://pqrs.org/osx/karabiner/" target="_blank" rel="noopener">Karabiner</a>,
<a href="https://github.com/tekezo/Karabiner" target="_blank" rel="noopener">open</a> as well and an <em>outstanding</em>
keyboard customizer for OS X. I have no affiliation with these tools, apart from
being a happy user for years. If you end up using them, please <a href="https://pqrs.org/osx/karabiner/donation.html.en" target="_blank" rel="noopener">donate</a>. So go
ahead and install Karabiner, and you’ll see a plethora of keyboard tweak
possibilities:</p>
<img src="/img/screenshots/karabiner.png" class="center">
<p>Each of the tweaks can be toggled on and off. There are even native Vi, Vim, and
Emacs modes. However, I don’t like the built-in ones, so I built my own config.
Go to <code>Misc & Uninstall</code> and click <code>Open private.xml</code>:</p>
<img src="/img/screenshots/customKarabiner.png" class="center">
<p>In this file, <code>~/Library/Application Support/Karabiner/private.xml</code>, you can
define your own keyboard remapping scheme. I actually symlink that to
a Dropbox file to keep the configuration consistent across my machines, but
at any rate, <a href="https://github.com/gduarte/blog/blob/master/code/misc/private.xml" target="_blank" rel="noopener">here</a> is a file you can use to implement what we have
discussed so far. Drop the file in, click <code>ReloadXML</code> and you’ll have this:</p>
<img src="/img/screenshots/reloadXML.png" class="center">
<p>Home Row Computing is at the top (prefixed with ! for sorting). Toggle it on,
and you’re done. Enjoy your new keyboard layout, do a search on Spotlight and
see how fast and smooth it is to choose an option.</p>
<p>Finally, there is window management. That’s an area where you can fumble quite
a bit, resizing and moving about clumsily with a mouse. My favorite options to
make it fast and homerow-friendly are
<a href="https://github.com/fikovnik/ShiftIt" target="_blank" rel="noopener">ShiftIt</a> (open) and
<a href="http://manytricks.com/moom/" target="_blank" rel="noopener">Moom</a> (best $10 I ever spent, no affiliation).
There are some others, but to me Moom towers above the rest. It has a great
two-step usage, where one hot key activates it:</p>
<img src="/img/screenshots/moom.png" class="center">
<p>And the following key triggers a command <em>you</em> get to define using window
primitives like move, zoom, resize, and change monitors. You can also define
shortcuts that run commands directly. Moom has some handy default actions:</p>
<img src="/img/screenshots/moomDefaults.png" class="center">
<p>Out of box, arrow keys can be used to send a window to the left, right, top, or
bottom of the screen, and Moom natively interprets hjkl as arrows making it easy
to stay on home row. You can associate keys with various commands and precise
window positions:</p>
<img src="/img/screenshots/moomCustom.png" class="center">
<p>This is gold for large monitors like Apple Thunderbolts.
I remap <code>Caps Lock</code> + <code>M</code> into the global Moom shortcut for painless activation.
This allows me to set the shortcut itself to something bizarre that won’t
conflict with anything but would be a dog to type. Currently it’s an
improbable <code>Fn</code> + <code>Control</code> + <code>Command</code> + <code>M</code>.
I also have <code>Caps Lock</code> + <code>N</code> activating a Moom command that cycles a window
between my two monitors. Both of these shortcuts are in the keyboard map
I provided.</p>
<p>If you have any questions, let me know. I know a number of keyboard nuts out
there use this scheme on Windows and Linux, and I hope this makes it easy to do
so on Macs.</p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;For a number of years I’ve configured my desktops so that most tasks can be done
using only home row keys on the
      
    
    </summary>
    
      <category term="productivity" scheme="https://manybutfinite.com/category/productivity/"/>
    
    
  </entry>
  
  <entry>
    <title>System Calls Make the World Go Round</title>
    <link href="https://manybutfinite.com/post/system-calls/"/>
    <id>https://manybutfinite.com/post/system-calls/</id>
    <updated>2014-11-07T00:00:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>I hate to break it to you, but a user application is a helpless brain in a vat:</p>
<img id="appInVat" class="center" src="/img/os/appInVat.png">
<p><em>Every</em> interaction with the outside world is mediated by the kernel through
<strong>system calls</strong>. If an app saves a file, writes to the terminal, or opens a TCP
connection, the kernel is involved. Apps are regarded as highly suspicious: at
best a bug-ridden mess, at worst the malicious brain of an evil genius.</p>
<p>These system calls are function calls from an app into the kernel. They use
a specific mechanism for safety reasons, but really you’re just calling the
kernel’s API. The term “system call” can refer to a specific function offered by
the kernel (<em>e.g.</em>, the <code>open()</code> system call) or to the calling mechanism.  You
can also say <strong>syscall</strong> for short.</p>
<p>This post looks at system calls, how they differ from calls to a library, and
tools to poke at this OS/app interface.  A solid understanding of what happens
<em>within an app</em> versus what happens through the OS can turn an impossible-to-fix
problem into a quick, fun puzzle.</p>
<p>So here’s a running program, a <em>user process</em>:</p>
<img id="sandbox" class="center" src="/img/os/sandbox.png">
<p>It has a private <a href="/post/anatomy-of-a-program-in-memory">virtual address space</a>, its very own memory sandbox.
The vat, if you will.  In its address space, the program’s binary file plus the
libraries it uses are all <a href="/post/page-cache-the-affair-between-memory-and-files/">memory mapped</a>.  Part of the address
space maps the kernel itself.</p>
<p>Below is the code for our program, <code>pid</code>, which simply retrieves its process id
via <a href="http://linux.die.net/man/2/getpid" target="_blank" rel="noopener">getpid(2)</a>:</p>
<figure class="highlight c"><figcaption><span>pid.c</span><a href="/code/x86-os/pid.c">view raw</a></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/types.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><unistd.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">    <span class="keyword">pid_t</span> p = getpid();</span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">"%d\n"</span>, p);</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>In Linux, a process isn’t born knowing its PID. It must ask the kernel, so this
requires a system call:</p>
<img id="syscallEnter" class="center" src="/img/os/syscallEnter.png">
<p>It all starts with a call to the C library’s <a href="https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/getpid.c;h=937b1d4e113b1cff4a5c698f83d662e130d596af;hb=4c6da7da9fb1f0f94e668e6d2966a4f50a7f0d85#l49" target="_blank" rel="noopener">getpid()</a>, which is
a <em>wrapper</em> for the system call. When you call functions like <code>open(2)</code>,
<code>read(2)</code>, and friends, you’re calling these wrappers. This is true for many
languages where the native methods ultimately end up in libc.</p>
<p>Wrappers offer convenience atop the bare-bones OS API, helping keep the kernel
lean. Lines of code is where bugs live, and <em>all kernel code</em> runs in privileged
mode, where mistakes can be disastrous.  Anything that can be done in user mode
should be done in user mode.  Let the libraries offer friendly methods and fancy
argument processing a la <code>printf(3)</code>.</p>
<p>Compared to web APIs, this is analogous to building the simplest possible HTTP
interface to a service and then offering language-specific libraries with
helper methods. Or maybe some caching, which is what libc’s
<code>getpid()</code> does: when first called it actually performs a system
call, but the PID is then cached to avoid the syscall overhead in subsequent
invocations.</p>
<p>Once the wrapper has done its initial work it’s time to jump into
<del>hyperspace</del> the kernel.  The mechanics of this transition vary by
processor architecture.  In Intel processors, arguments and the
<a href="https://github.com/torvalds/linux/blob/v3.17/arch/x86/syscalls/syscall_64.tbl#L48" target="_blank" rel="noopener">syscall number</a> are <a href="https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sysdep.h;h=4a619dafebd180426bf32ab6b6cb0e5e560b718a;hb=4c6da7da9fb1f0f94e668e6d2966a4f50a7f0d85#l139" target="_blank" rel="noopener">loaded into registers</a>,
then an <a href="https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sysdep.h;h=4a619dafebd180426bf32ab6b6cb0e5e560b718a;hb=4c6da7da9fb1f0f94e668e6d2966a4f50a7f0d85#l179" target="_blank" rel="noopener">instruction</a> is executed to put the CPU
in <a href="/post/cpu-rings-privilege-and-protection">privileged mode</a> and immediately transfer control to a global syscall
<a href="https://github.com/torvalds/linux/blob/v3.17/arch/x86/kernel/entry_64.S#L354-L386" target="_blank" rel="noopener">entry point</a> within the kernel. If you’re interested in
details, David Drysdale has two great articles in LWN (<a href="http://lwn.net/Articles/604287/" target="_blank" rel="noopener">first</a>,
<a href="http://lwn.net/Articles/604515/" target="_blank" rel="noopener">second</a>).</p>
<p>The kernel then uses the syscall number as an <a href="https://github.com/torvalds/linux/blob/v3.17/arch/x86/kernel/entry_64.S#L422" target="_blank" rel="noopener">index</a> into
<a href="https://github.com/torvalds/linux/blob/v3.17/arch/x86/kernel/syscall_64.c#L25" target="_blank" rel="noopener">sys_call_table</a>, an array of function pointers to each syscall implementation.
Here, <a href="https://github.com/torvalds/linux/blob/v3.17/kernel/sys.c#L800-L809" target="_blank" rel="noopener">sys_getpid</a> is called:</p>
<img id="syscallExit" class="center" src="/img/os/syscallExit.png">
<p>In Linux, syscall implementations are mostly arch-independent C functions,
sometimes <a href="https://github.com/torvalds/linux/blob/v3.17/kernel/sys.c#L800-L859" target="_blank" rel="noopener">trivial</a>, insulated from the syscall mechanism by
the kernel’s excellent design. They are regular code working on general data
structures. Well, apart from being <em>completely paranoid</em> about argument
validation.</p>
<p>Once their work is done they <code>return</code> normally, and the arch-specific code takes
care of transitioning back into user mode where the wrapper does some post
processing.  In our example, <a href="http://linux.die.net/man/2/getpid" target="_blank" rel="noopener">getpid(2)</a> now caches the PID returned by the
kernel. Other wrappers might set the global <code>errno</code> variable if the kernel
returns an error. Small things to let you know GNU cares.</p>
<p>If you want to be raw, glibc offers the <a href="http://linux.die.net/man/2/syscall" target="_blank" rel="noopener">syscall(2)</a> function, which makes
a system call without a wrapper.  You can also do so yourself in assembly.
There’s nothing magical or privileged about a C library.</p>
<p>This syscall design has far-reaching consequences. Let’s start with the
incredibly useful <a href="http://linux.die.net/man/1/strace" target="_blank" rel="noopener">strace(1)</a>, a tool you can use to spy on system calls made by
Linux processes (in Macs, see <a href="https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/dtruss.1m.html" target="_blank" rel="noopener">dtruss(1m)</a> and the amazing <a href="http://dtrace.org/blogs/brendan/2011/10/10/top-10-dtrace-scripts-for-mac-os-x/" target="_blank" rel="noopener">dtrace</a>; in Windows,
see <a href="http://technet.microsoft.com/en-us/sysinternals/bb842062.aspx" target="_blank" rel="noopener">sysinternals</a>). Here’s strace on <code>pid</code>:</p>
<figure class="highlight console"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">~/code/x86-os$ strace ./pid</span><br><span class="line"></span><br><span class="line">execve("./pid", ["./pid"], [/* 20 vars */]) = 0</span><br><span class="line">brk(0)                                  = 0x9aa0000</span><br><span class="line">access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)</span><br><span class="line">mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7767000</span><br><span class="line">access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)</span><br><span class="line">open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3</span><br><span class="line">fstat64(3, {st_mode=S_IFREG|0644, st_size=18056, ...}) = 0</span><br><span class="line">mmap2(NULL, 18056, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7762000</span><br><span class="line">close(3)                                = 0</span><br><span class="line"></span><br><span class="line">[...snip...]</span><br><span class="line"></span><br><span class="line">getpid()                                = 14678</span><br><span class="line">fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 1), ...}) = 0</span><br><span class="line">mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7766000</span><br><span class="line">write(1, "14678\n", 614678</span><br><span class="line">)                  = 6</span><br><span class="line">exit_group(6)                           = ?</span><br></pre></td></tr></tbody></table></figure>
<p>Each line of output shows a system call, its arguments, and a return value.
If you put <code>getpid(2)</code> in a loop running 1000 times, you would still have only
one <code>getpid()</code> syscall because of the PID caching.  We can also see that
<code>printf(3)</code> calls <code>write(2)</code> after formatting the output string.</p>
<p><code>strace</code> can start a new process and also attach to an already running one.  You
can learn a lot by looking at the syscalls made by different programs.  For
example, what does the <code>sshd</code> daemon do all day?</p>
<figure class="highlight console"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line">~/code/x86-os$ ps ax | grep sshd</span><br><span class="line">12218 ?        Ss     0:00 /usr/sbin/sshd -D</span><br><span class="line"></span><br><span class="line">~/code/x86-os$ sudo strace -p 12218</span><br><span class="line">Process 12218 attached - interrupt to quit</span><br><span class="line">select(7, [3 4], NULL, NULL, NULL</span><br><span class="line"></span><br><span class="line">[</span><br><span class="line">  ... nothing happens ...</span><br><span class="line">  No fun, it's just waiting for a connection using select(2)</span><br><span class="line">  If we wait long enough, we might see new keys being generated and so on, but</span><br><span class="line">  let's attach again, tell strace to follow forks (-f), and connect via SSH</span><br><span class="line">]</span><br><span class="line"></span><br><span class="line">~/code/x86-os$ sudo strace -p 12218 -f</span><br><span class="line"></span><br><span class="line">[lots of calls happen during an SSH login, only a few shown]</span><br><span class="line"></span><br><span class="line">[pid 14692] read(3, "-----BEGIN RSA PRIVATE KEY-----\n"..., 1024) = 1024</span><br><span class="line">[pid 14692] open("/usr/share/ssh/blacklist.RSA-2048", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)</span><br><span class="line">[pid 14692] open("/etc/ssh/blacklist.RSA-2048", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)</span><br><span class="line">[pid 14692] open("/etc/ssh/ssh_host_dsa_key", O_RDONLY|O_LARGEFILE) = 3</span><br><span class="line">[pid 14692] open("/etc/protocols", O_RDONLY|O_CLOEXEC) = 4</span><br><span class="line">[pid 14692] read(4, "# Internet (IP) protocols\n#\n# Up"..., 4096) = 2933</span><br><span class="line">[pid 14692] open("/etc/hosts.allow", O_RDONLY) = 4</span><br><span class="line">[pid 14692] open("/lib/i386-linux-gnu/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = 4</span><br><span class="line">[pid 14692] stat64("/etc/pam.d", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0</span><br><span class="line">[pid 14692] open("/etc/pam.d/common-password", O_RDONLY|O_LARGEFILE) = 8</span><br><span class="line">[pid 14692] open("/etc/pam.d/other", O_RDONLY|O_LARGEFILE) = 4</span><br></pre></td></tr></tbody></table></figure>
<p>SSH is a large chunk to bite off, but it gives a feel for strace usage.  Being
able to see which files an app opens can be useful (“where the hell is this
config coming from?”). If you have a process that appears stuck, you can strace
it and see what it might be doing via system calls. When some app is quitting
unexpectedly without a proper error message, check if a syscall failure explains
it. You can also use filters, time each call, and so so:</p>
<figure class="highlight console"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">~/code/x86-os$ strace -T -e trace=recv curl -silent www.google.com. > /dev/null</span><br><span class="line"></span><br><span class="line">recv(3, "HTTP/1.1 200 OK\r\nDate: Wed, 05 N"..., 16384, 0) = 4164 <0.000007></span><br><span class="line">recv(3, "fl a{color:#36c}a:visited{color:"..., 16384, 0) = 2776 <0.000005></span><br><span class="line">recv(3, "adient(top,#4d90fe,#4787ed);filt"..., 16384, 0) = 4164 <0.000007></span><br><span class="line">recv(3, "gbar.up.spd(b,d,1,!0);break;case"..., 16384, 0) = 2776 <0.000006></span><br><span class="line">recv(3, "$),a.i.G(!0)),window.gbar.up.sl("..., 16384, 0) = 1388 <0.000004></span><br><span class="line">recv(3, "margin:0;padding:5px 8px 0 6px;v"..., 16384, 0) = 1388 <0.000007></span><br><span class="line">recv(3, "){window.setTimeout(function(){v"..., 16384, 0) = 1484 <0.000006></span><br></pre></td></tr></tbody></table></figure>
<p>I encourage you to explore these tools in your OS. Using them well is like
having a super power.</p>
<p>But enough useful stuff, let’s go back to design. We’ve seen that a userland app
is trapped in its virtual address space running in ring 3 (unprivileged).  In
general, tasks that involve only computation and memory accesses do <em>not</em>
require syscalls. For example, C library functions like <a href="http://linux.die.net/man/3/strlen" target="_blank" rel="noopener">strlen(3)</a> and
<a href="http://linux.die.net/man/3/memcpy" target="_blank" rel="noopener">memcpy(3)</a> have nothing to do with the kernel. Those happen within the app.</p>
<p>The man page sections for a C library function (the 2 and 3 in parenthesis) also
offer clues. Section 2 is used for system call wrappers, while section
3 contains other C library functions. However, as we saw with <code>printf(3)</code>,
a library function might ultimately make one or more syscalls.</p>
<p>If you’re curious, here are full syscall listings for <a href="https://github.com/torvalds/linux/blob/v3.17/arch/x86/syscalls/syscall_64.tbl" target="_blank" rel="noopener">Linux</a>
(also <a href="https://filippo.io/linux-syscall-table/" target="_blank" rel="noopener">Filippo’s list</a>) and
<a href="http://j00ru.vexillium.org/ntapi/" target="_blank" rel="noopener">Windows</a>. They have ~310 and ~460 system
calls, respectively. It’s fun to look at those because, in a way, they represent
<em>all that software can do</em> on a modern computer. Plus, you might find gems to
help with things like interprocess communication and performance. This is an
area where “Those who do not understand Unix are condemned to reinvent it,
poorly.”</p>
<p>Many syscalls perform tasks that take <a href="/post/what-your-computer-does-while-you-wait/">eons</a> compared to CPU cycles, for
example reading from a hard drive. In those situations the calling process is
often <em>put to sleep</em> until the underlying work is completed. Because CPUs are so
fast, your average program is <strong>I/O bound</strong> and spends most of its life
sleeping, waiting on syscalls. By contrast, if you strace a program busy with
a computational task, you often see no syscalls being invoked. In such a case,
<a href="http://linux.die.net/man/1/top" target="_blank" rel="noopener">top(1)</a> would show intense CPU usage.</p>
<p>The overhead involved in a system call can be a problem. For example, SSDs are
so fast that general OS overhead can be <a href="http://danluu.com/clwb-pcommit/" target="_blank" rel="noopener">more expensive</a> than the I/O
operation itself. Programs doing large numbers of reads and writes can also have
OS overhead as their bottleneck.  <a href="http://en.wikipedia.org/wiki/Vectored_I/O" target="_blank" rel="noopener">Vectored I/O</a> can help some. So can
<a href="/post/page-cache-the-affair-between-memory-and-files/">memory mapped files</a>, which allow a program to read and write from
disk using only memory access.  Analogous mappings exist for things like video
card memory.  Eventually, the economics of cloud computing might lead us to
kernels that eliminate or minimize user/kernel mode switches.</p>
<p>Finally, syscalls have interesting security implications. One is that no matter
how obfuscated a binary, you can still examine its behavior by looking at the
system calls it makes. This can be used to detect malware, for example. We can
also record profiles of a known program’s syscall usage and alert on deviations,
or perhaps whitelist specific syscalls for programs so that exploiting
vulnerabilities becomes harder. We have a ton of research in this area, a number
of tools, but not a killer solution yet.</p>
<p>And that’s it for system calls. I’m sorry for the length of this post, I hope it
was helpful. More (and shorter) next week, <a href="https://manybutfinite.com/feed.xml">RSS</a> and <a href="http://twitter.com/manybutfinite" target="_blank" rel="noopener">Twitter</a>. Also, last night
I made a promise to the universe. This post is dedicated to the glorious Clube
Atlético Mineiro.</p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;I hate to break it to you, but a user application is a helpless brain in a vat:&lt;/p&gt;
&lt;img id=&quot;appInVat&quot; class=&quot;ce
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
      <category term="linux" scheme="https://manybutfinite.com/category/linux/"/>
    
    
  </entry>
  
  <entry>
    <title>What does an idle CPU do?</title>
    <link href="https://manybutfinite.com/post/what-does-an-idle-cpu-do/"/>
    <id>https://manybutfinite.com/post/what-does-an-idle-cpu-do/</id>
    <updated>2014-10-29T14:00:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>In the <a href="/post/when-does-your-os-run">last post</a> I said the fundamental axiom of OS behavior is that <em>at any
given time</em>, exactly <strong>one and only one task is active</strong> on a CPU.  But if
there’s absolutely nothing to do, then what?</p>
<p>It turns out that this situation is extremely common, and for most personal
computers it’s actually the norm: an ocean of sleeping processes, all waiting on
some condition to wake up, while nearly 100% of CPU time is going into the
mythical “idle task.” In fact, if the CPU is consistently busy for a normal
user, it’s often a misconfiguration, bug, or malware.</p>
<p>Since we can’t violate our axiom, <em>some task needs to be active</em> on a CPU.
First because it’s good design: it would be unwise to spread special cases all
over the kernel checking whether there <em>is</em> in fact an active task. A design is
far better when there are <em>no exceptions</em>. Whenever you write an <code>if</code> statement,
Nyan Cat cries. And second, we need to do <em>something</em> with all those idle CPUs,
lest they get spunky and, you know, create Skynet.</p>
<p>So to keep design consistency and be one step ahead of the devil, OS developers
create an <strong>idle task</strong> that gets scheduled to run when there’s no other work.
We have seen in the Linux <a href="/post/kernel-boot-process">boot process</a> that the idle task is process 0,
a direct descendent of the very first instruction that runs when a computer is
first turned on. It is initialized in <a href="https://github.com/torvalds/linux/blob/v3.17/init/main.c#L393" target="_blank" rel="noopener">rest_init</a>, where <a href="https://github.com/torvalds/linux/blob/v3.17/kernel/sched/core.c#L4538" target="_blank" rel="noopener">init_idle_bootup_task</a>
initializes the idle <strong>scheduling class</strong>.</p>
<p>Briefly, Linux supports different scheduling classes for things like real-time
processes, regular user processes, and so on. When it’s time to choose a process
to become the active task, these classes are queried in order of priority. That
way, the nuclear reactor control code always gets to run before the web browser.
Often, though, these classes return <code>NULL</code>, meaning they don’t have a suitable
process to run - they’re all sleeping. But the idle scheduling class, which runs
last, never fails: it always returns the idle task.</p>
<p>That’s all good, but let’s get down to just <em>what exactly</em> this idle task is
doing. So here is <a href="https://github.com/torvalds/linux/blob/v3.17/kernel/sched/idle.c#L183" target="_blank" rel="noopener">cpu_idle_loop</a>, courtesy of open source:</p>
<figure class="highlight c"><figcaption><span>cpu_idle_loop</span></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">while</span> (<span class="number">1</span>) {</span><br><span class="line">    <span class="keyword">while</span>(!need_resched()) {</span><br><span class="line">        cpuidle_idle_call();</span><br><span class="line">    }</span><br><span class="line"></span><br><span class="line">    <span class="comment">/*</span></span><br><span class="line"><span class="comment">      [Note: Switch to a different task. We will return to this loop when the</span></span><br><span class="line"><span class="comment">      idle task is again selected to run.]</span></span><br><span class="line"><span class="comment">    */</span></span><br><span class="line">    schedule_preempt_disabled();</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>I’ve omitted many details, and we’ll look at task switching closely later on,
but if you read the code you’ll get the gist of it: as long as there’s no need
to reschedule, meaning change the active task, stay idle. Measured in elapsed
time, this loop and its cousins in other OSes are probably the most executed
pieces of code in computing history.  For Intel processors, staying idle
traditionally meant running the <a href="https://github.com/torvalds/linux/blob/v3.17/arch/x86/include/asm/irqflags.h#L52" target="_blank" rel="noopener">halt</a> instruction:</p>
<figure class="highlight c"><figcaption><span>native_halt</span></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">inline</span> <span class="keyword">void</span> <span class="title">native_halt</span><span class="params">(<span class="keyword">void</span>)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">    <span class="function"><span class="keyword">asm</span> <span class="title">volatile</span><span class="params">(<span class="string">"hlt"</span>: : :<span class="string">"memory"</span>)</span></span>;</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p><code>hlt</code> stops code execution in the processor and puts it in a halted state. It’s
weird to think that across the world millions and millions of Intel-like CPUs
are spending the majority of their time halted, even while they’re powered up.
It’s also not terribly efficient, energy wise, which led chip makers to develop
deeper sleep states for processors, which trade off less power consumption for
longer wake-up latency. The kernel’s <a href="http://lwn.net/Articles/384146/" target="_blank" rel="noopener">cpuidle subsystem</a> is
responsible for taking advantage of these power-saving modes.</p>
<p>Now once we tell the CPU to halt, or sleep, we need to somehow bring it back to
life. If you’ve read the <a href="/post/when-does-your-os-run">last post</a>, you might suspect <em>interrupts</em> are
involved, and indeed they are.  Interrupts spur the CPU out of its halted state
and back into action. So putting this all together, here’s what your system
mostly does as you read a fully rendered web page:</p>
<img id="idle" class="center" src="/img/os/idle.png" usemap="#mapidle">
<map id="mapidle" name="mapidle">
<area shape="poly" coords="110,6,110,96,20,96,20,6" href="https://github.com/torvalds/linux/blob/v3.17/kernel/sched/idle.c#L183">
<area shape="poly" coords="593,6,593,96,503,96,503,6" href="https://github.com/torvalds/linux/blob/v3.17/kernel/time/tick-common.c#L78">
<area shape="poly" coords="754,6,754,96,664,96,664,6" href="https://github.com/torvalds/linux/blob/v3.17/kernel/sched/idle.c#L183">
</map>
<p>Other interrupts besides the timer interrupt also get the processor moving
again. That’s what happens if you click on a web page, for example: your mouse
issues an interrupt, its driver processes it, and suddenly a process is runnable
because it has fresh input. At that point <code>need_resched()</code> returns true, and the
idle task is booted out in favor of your browser.</p>
<p>But let’s stick to idleness in this post. Here’s the idle loop over time:</p>
<img id="idleCycles" class="center" src="/img/os/idleCycles.png">
<p>In this example the timer interrupt was programmed by the kernel to happen every
4 milliseconds (ms). This is the <em>tick period</em>. That means we get 250 ticks per
second, so the <em>tick rate</em> or <em>tick frequency</em> is 250 Hz. That’s a typical value
for Linux running on Intel processors, with 100 Hz being another crowd favorite.
This is defined in the <code>CONFIG_HZ</code> option when you build the kernel.</p>
<p>Now that looks like an awful lot of pointless work for an <em>idle CPU</em>, and it is.
Without fresh input from the outside world, the CPU will remain stuck in this
hellish nap getting woken up 250 times a second while your laptop battery is
drained.  If this is running in a virtual machine, we’re burning both power and
valuable cycles from the host CPU.</p>
<p>The solution here is to have a <a href="https://github.com/torvalds/linux/blob/v3.17/Documentation/timers/NO_HZ.txt#L17" target="_blank" rel="noopener">dynamic tick</a> so that when the CPU is idle, the
timer interrupt is either <a href="https://github.com/torvalds/linux/blob/v3.17/Documentation/timers/highres.txt#L215" target="_blank" rel="noopener">deactivated or reprogrammed</a> to
happen at a point where the kernel <em>knows</em> there will be work to do (for
example, a process might have a timer expiring in 5 seconds, so we must not
sleep past that). This is also called <em>tickless mode</em>.</p>
<p>Finally, suppose you have <em>one active process</em> in a system, for example
a long-running CPU-intensive task. That’s nearly identical to an idle system:
these diagrams remain about the same, just substitute the one process for the
idle task and the pictures are accurate. In that case it’s still pointless to
interrupt the task every 4 ms for no good reason: it’s merely OS jitter slowing
your work ever so slightly. Linux can also stop the fixed-rate tick in this
one-process scenario, in what’s called <a href="https://github.com/torvalds/linux/blob/v3.17/Documentation/timers/NO_HZ.txt#L100" target="_blank" rel="noopener">adaptive-tick</a> mode. Eventually,
a fixed-rate tick may be gone <a href="http://lwn.net/Articles/549580/" target="_blank" rel="noopener">altogether</a>.</p>
<p>That’s enough idleness for one post. The kernel’s idle behavior is an important
part of the OS puzzle, and it’s very similar to other situations we’ll see, so
this helps us build the picture of a running kernel. More next week, <a href="https://manybutfinite.com/feed.xml">RSS</a> and
<a href="http://twitter.com/manybutfinite" target="_blank" rel="noopener">Twitter</a>.</p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;In the &lt;a href=&quot;/post/when-does-your-os-run&quot;&gt;last post&lt;/a&gt; I said the fundamental axiom of OS behavior is that &lt;
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
      <category term="linux" scheme="https://manybutfinite.com/category/linux/"/>
    
    
  </entry>
  
  <entry>
    <title>When Does Your OS Run?</title>
    <link href="https://manybutfinite.com/post/when-does-your-os-run/"/>
    <id>https://manybutfinite.com/post/when-does-your-os-run/</id>
    <updated>2014-10-28T14:00:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>Here’s a question: in the time it takes you to read this sentence, has your OS
been <em>running</em>? Or was it only your browser? Or were they perhaps both idle,
just waiting for you to <em>do something already</em>?</p>
<p>These questions are simple but they cut through the essence of how software
works. To answer them accurately we need a good mental model of OS behavior,
which in turn informs performance, security, and troubleshooting decisions.
We’ll build such a model in this post series using Linux as the primary OS, with
guest appearances by OS X and Windows.  I’ll link to the Linux kernel sources
for those who want to delve deeper.</p>
<p>The fundamental axiom here is that <em>at any given moment, exactly one task is
active on a CPU</em>. The task is normally a program, like your browser or music
player, or it could be an operating system thread, but <strong>it is one task</strong>. Not
two or more. Never zero, either. One. <strong>Always</strong>.</p>
<p>This sounds like trouble. For what if, say, your music player hogs the CPU and
doesn’t let any other tasks run? You would not be able to open a tool to kill
it, and even mouse clicks would be futile as the OS wouldn’t process them.  You
could be stuck blaring “What does the fox say?” and incite a workplace riot.</p>
<p>That’s where <strong>interrupts</strong> come in. Much as the nervous system interrupts the
brain to bring in external stimuli - a loud noise, a touch on the shoulder - the
<a href="/post/motherboard-chipsets-memory-map">chipset</a> in a computer’s motherboard interrupts the CPU to deliver news of
outside events - key presses, the arrival of network packets, the completion of
a hard drive read, and so on.  Hardware peripherals, the interrupt controller on
the motherboard, and the CPU itself all work together to implement these
interruptions, called interrupts for short.</p>
<p>Interrupts are also essential in tracking that which we hold dearest: time.
During the <a href="/post/kernel-boot-process">boot process</a> the kernel programs a hardware timer to issue <strong>timer
interrupts</strong> at a periodic interval, for example every 10 milliseconds.
When the timer goes off, the kernel gets a shot at the CPU to update system
statistics and take stock of things: has the current program been running for
too long? Has a TCP timeout expired? Interrupts give the kernel a chance to both
ponder these questions and take appropriate actions. It’s as if you set periodic
alarms throughout the day and used them as checkpoints: should I be doing what
I’m doing right now? Is there anything more pressing? One day you find ten
years have got behind you.</p>
<p>These periodic hijackings of the CPU by the kernel are called <strong>ticks</strong>, so
interrupts quite literally make your OS tick. But there’s more: interrupts are
also used to handle some software events like integer overflows and page faults,
which involve no external hardware. <strong>Interrupts are the most frequent and
crucial entry point into the OS kernel.</strong> They’re not some oddity for the EE
people to worry about, they’re <em>the</em> mechanism whereby your OS runs.</p>
<p>Enough talk, let’s see some action. Below is a network card interrupt in an
Intel Core i5 system. The diagrams now have image maps, so you can click on
juicy bits for more information. For example, each device links to its Linux
driver.</p>
<p><img id="hw-interrupt" class="center" src="/img/os/hardware-interrupt.png" usemap="#mapHwInterrupt">
<map id="mapHwInterrupt" name="mapHwInterrupt">
<area shape="poly" coords="490,294,490,354,270,354,270,294" href="https://github.com/torvalds/linux/blob/v3.17/drivers/net/ethernet/intel/e1000e/netdev.c">
<area shape="poly" coords="754,294,754,354,534,354,534,294" href="https://github.com/torvalds/linux/blob/v3.16/drivers/hid/usbhid/usbkbd.c">
<area shape="poly" coords="488,490,488,598,273,598,273,490" href="https://github.com/torvalds/linux/blob/v3.16/arch/x86/kernel/apic/io_apic.c">
<area shape="poly" coords="720,490,720,598,506,598,506,490" href="https://github.com/torvalds/linux/blob/v3.17/arch/x86/kernel/hpet.c">
</map></p>
<p>Let’s take a look at this. First off, since there are many sources of
interrupts, it wouldn’t be very helpful if the hardware simply told the CPU
“hey, something happened!” and left it at that.  The suspense would be
unbearable. So each device is assigned an <strong>interrupt request line</strong>, or IRQ,
during power up. These IRQs are in turn mapped into <strong>interrupt vectors</strong>,
a number between 0 and 255, by the interrupt controller. By the time an
interrupt reaches the CPU it has a nice, well-defined number insulated from the
vagaries of hardware.</p>
<p>The CPU in turn has a pointer to what’s essentially an array of 255 functions,
supplied by the kernel, where each function is the handler for that
particular interrupt vector. We’ll look at this array, the <strong>Interrupt Descriptor Table (IDT)</strong>, in more detail later on.</p>
<p>Whenever an interrupt arrives, the CPU uses its vector as an index into the
IDT and runs the appropriate handler. This happens as a special function call
that takes place in the context of the currently running task, allowing the OS
to respond to external events quickly and with minimal overhead.  So web servers
out there indirectly <em>call a function in your CPU</em> when they send you data,
which is either pretty cool or terrifying.  Below we show a situation where
a CPU is busy running a Vim command when an interrupt arrives:</p>
<img src="/img/os/vim-interrupted.png" class="center">
<p>Notice how the interrupt’s arrival causes a switch to kernel mode
and <a href="/post/cpu-rings-privilege-and-protection">ring zero</a> but it <em>does not change the active task</em>. It’s as if Vim made
a magic function call straight into the kernel, but Vim is <em>still there</em>, its
<a href="/post/anatomy-of-a-program-in-memory" title="Anatomy of a Program in Memory">address space</a> intact, waiting for that call to return.</p>
<p>Exciting stuff! Alas, I need to keep this post-sized, so let’s finish up for
now.  I understand we have not answered the opening question and have in fact
opened up new questions, but you now suspect <strong>ticks</strong> were taking place while
you read that sentence. We’ll find the answers as we flesh out our model of
dynamic OS behavior, and the browser scenario will become clear.  If you
have questions, especially as the posts come out, fire away and I’ll try to
answer them in the posts themselves or as comments. Next installment is
tomorrow on <a href="https://manybutfinite.com/feed.xml">RSS</a> and <a href="http://twitter.com/manybutfinite" target="_blank" rel="noopener">Twitter</a>.</p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;Here’s a question: in the time it takes you to read this sentence, has your OS
been &lt;em&gt;running&lt;/em&gt;? Or was it 
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
      <category term="linux" scheme="https://manybutfinite.com/category/linux/"/>
    
    
  </entry>
  
  <entry>
    <title>Closures, Objects, and the Fauna of the Heap</title>
    <link href="https://manybutfinite.com/post/closures-objects-heap/"/>
    <id>https://manybutfinite.com/post/closures-objects-heap/</id>
    <updated>2014-10-27T13:40:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>The last post in this series looks at closures, objects, and other creatures
roaming beyond the stack. Much of what we’ll see is language neutral, but I’ll
focus on JavaScript with a dash of C.  Let’s start with a simple C program that
reads a song and a band name and outputs them back to the user:</p>
<figure class="highlight c"><figcaption><span>stackFolly.c</span><a href="/code/x86-stack/stackFolly.c">view raw</a></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><string.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">char</span> *<span class="title">read</span><span class="params">()</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">    <span class="keyword">char</span> data[<span class="number">64</span>];</span><br><span class="line">    fgets(data, <span class="number">64</span>, <span class="built_in">stdin</span>);</span><br><span class="line">    <span class="keyword">return</span> data;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc, <span class="keyword">char</span> *argv[])</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">    <span class="keyword">char</span> *song, *band;</span><br><span class="line"></span><br><span class="line">    <span class="built_in">puts</span>(<span class="string">"Enter song, then band:"</span>);</span><br><span class="line">    song = <span class="built_in">read</span>();</span><br><span class="line">    band = <span class="built_in">read</span>();</span><br><span class="line"></span><br><span class="line">    <span class="built_in">printf</span>(<span class="string">"\n%sby %s"</span>, song, band);</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>If you run this gem, here’s what you get (=> denotes program output):</p>
<figure class="highlight console"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">./stackFolly</span><br><span class="line">=> Enter song, then band:</span><br><span class="line">The Past is a Grotesque Animal</span><br><span class="line">of Montreal</span><br><span class="line"></span><br><span class="line">=> ?ǿontreal</span><br><span class="line">=> by ?ǿontreal</span><br></pre></td></tr></tbody></table></figure>
<p>Ayeee! Where did things go so wrong? (Said every C beginner, ever.)</p>
<p>It turns out that the contents of a function’s stack variables are <strong>only valid
while the stack frame is active</strong>, that is, until the function returns.  Upon
return, the memory used by the stack frame is <a href="/post/epilogues-canaries-buffer-overflows/">deemed free</a> and
liable to be overwritten in the next function call.</p>
<p>Below is <em>exactly</em> what happens in this case. The diagrams now have image maps,
so you can click on a piece of data to see the relevant gdb output (gdb commands
are <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/stackFolly-gdb-commands.txt" target="_blank" rel="noopener">here</a>). As soon as <code>read()</code> is done with the song
name, the stack is thus:</p>
<p><img id="readSong" class="center" src="/img/stack/readSong.png" usemap="#mapreadSong">
<map id="mapreadSong" name="mapreadSong">
<area shape="poly" coords="754,6,754,86,14,86,14,6" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/stackFolly-gdb-output.txt#L47">
<area shape="poly" coords="754,146,754,226,114,226,114,146" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/stackFolly-gdb-output.txt#L70">
</map></p>
<p>At this point, the <code>song</code> variable actually points to the song name. Sadly, the
memory storing that string is <em>ready to be reused</em> by the stack frame of
whatever function is called next. In this case, <code>read()</code> is called again, with
the same stack frame layout, so the result is this:</p>
<p><img id="readBand" class="center" src="/img/stack/readBand.png" usemap="#mapreadBand">
<map id="mapreadBand" name="mapreadBand">
<area shape="poly" coords="754,6,754,86,14,86,14,6" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/stackFolly-gdb-output.txt#L76">
<area shape="poly" coords="754,146,754,226,114,226,114,146" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/stackFolly-gdb-output.txt#L79">
</map></p>
<p>The band name is read into the same memory location and overwrites the
previously stored song name. <code>band</code> and <code>song</code> end up pointing to the exact
same spot. Finally, we didn’t even get “of Montreal” output correctly. Can you
guess why?</p>
<p>And so it happens that the stack, for all its usefulness, has this serious
limitation. It cannot be used by a function to store data that needs to outlive
the function’s execution. You must resort to the <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/readIntoHeap.c" target="_blank" rel="noopener">heap</a> and say
goodbye to the hot caches, deterministic instantaneous operations, and easily
computed offsets. On the plus side, it <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/readIntoHeap-gdb-output.txt#L47" target="_blank" rel="noopener">works</a>:</p>
<img id="readIntoHeap" class="center" src="/img/stack/readIntoHeap.png">
<p>The price is you must now remember to <code>free()</code> memory or take a performance hit
on a garbage collector, which finds unused heap objects and frees them. That’s
the fundamental tradeoff between stack and heap: performance vs. flexibility.</p>
<p>Most languages’ virtual machines take a middle road that mirrors what
C programmers do. The stack is used for <strong>value types</strong>, things like integers,
floats and booleans. These are stored <em>directly</em> in local variables and object
fields as a sequence of bytes specifying a <em>value</em> (like <code>argc</code> above).  In
contrast, heap inhabitants are <strong>reference types</strong> such as strings and
<a href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#37" target="_blank" rel="noopener">objects</a>.  Variables and fields contain a memory address that
<em>references</em> these objects, like <code>song</code> and <code>band</code> above.</p>
<p>Consider this JavaScript function:</p>
<figure class="highlight javascript"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">function</span> <span class="title">fn</span>(<span class="params"></span>)</span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">    <span class="keyword">var</span> a = <span class="number">10</span>;</span><br><span class="line">    <span class="keyword">var</span> b = { <span class="attr">name</span>: <span class="string">'foo'</span>, <span class="attr">n</span>: <span class="number">10</span> };</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>This might produce the following:</p>
<p><img id="fnFrame" class="center" src="/img/stack/fnFrame.png" usemap="#mapFnFrame">
<map id="mapFnFrame" name="mapFnFrame">
<area shape="poly" coords="524,36,524,116,424,116,424,36" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#1671">
<area shape="poly" coords="722,36,722,116,622,116,622,36" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#8656">
<area shape="poly" coords="514,176,514,256,434,256,434,176" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#1264">
</map></p>
<p>I say “might” because specific behaviors depend heavily on implementation. This
post takes a V8-centric approach with many diagram shapes linking to relevant
source code. In V8, only <a href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#1264" target="_blank" rel="noopener">small integers</a> are
<a href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#148" target="_blank" rel="noopener">stored as values</a>.  Also,
from now on I’ll show strings directly in objects to reduce visual noise, but
keep in mind they exist separately in the heap, as shown above.</p>
<p>Now let’s take a look at closures, which are simple but get weirdly hyped up and
mythologized. Take a trivial JS function:</p>
<figure class="highlight javascript"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">function</span> <span class="title">add</span>(<span class="params">a, b</span>)</span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">        <span class="keyword">var</span> c = a + b;</span><br><span class="line">        <span class="keyword">return</span> c;</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>This function defines a <strong>lexical scope</strong>, a happy little kingdom where the
names <code>a</code>, <code>b</code>, and <code>c</code> have precise meanings. They are the two parameters and
one local variable declared by the function. The program might use those same
names elsewhere, but within <code>add</code> <em>that’s what they refer to</em>.  And while
lexical scope is a fancy term, it aligns well with our intuitive understanding:
after all, we can quite literally <strong>see</strong> the bloody thing, much as a lexer
does, as a textual block in the program’s source.</p>
<p>Having seen stack frames in action, it’s easy to imagine an implementation for
this name specificity.  Within <code>add</code>, these names refer to stack locations
private to <em>each running instance</em> of the function. That’s in fact how it
often plays out in a VM.</p>
<p>So let’s nest two lexical scopes:</p>
<figure class="highlight javascript"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">function</span> <span class="title">makeGreeter</span>(<span class="params"></span>)</span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">    <span class="keyword">return</span> <span class="function"><span class="keyword">function</span> <span class="title">hi</span>(<span class="params">name</span>) </span>{</span><br><span class="line">        <span class="built_in">console</span>.log(<span class="string">'hi, '</span> + name);</span><br><span class="line">    }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">var</span> hi = makeGreeter();</span><br><span class="line">hi(<span class="string">'dear reader'</span>); <span class="comment">// prints "hi, dear reader"</span></span><br></pre></td></tr></tbody></table></figure>
<p>That’s more interesting. Function <code>hi</code> is built at runtime within <code>makeGreeter</code>.
It has its own lexical scope, where <code>name</code> is an argument on the stack, but
<em>visually</em> it sure looks like it can access its parent’s lexical scope as well,
which it can. Let’s take advantage of that:</p>
<figure class="highlight javascript"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">function</span> <span class="title">makeGreeter</span>(<span class="params">greeting</span>)</span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">    <span class="keyword">return</span> <span class="function"><span class="keyword">function</span> <span class="title">greet</span>(<span class="params">name</span>) </span>{</span><br><span class="line">        <span class="built_in">console</span>.log(greeting + <span class="string">', '</span> + name);</span><br><span class="line">    }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">var</span> heya = makeGreeter(<span class="string">'HEYA'</span>);</span><br><span class="line">heya(<span class="string">'dear reader'</span>); <span class="comment">// prints "HEYA, dear reader"</span></span><br></pre></td></tr></tbody></table></figure>
<p>A little strange, but pretty cool. There’s something about it though that
violates our intuition: <code>greeting</code> sure looks like a stack variable, the kind
that should be dead after <code>makeGreeter()</code> returns. And yet, since <code>greet()</code>
keeps working, <em>something funny</em> is going on. Enter the closure:</p>
<p><img id="closure" class="center" src="/img/stack/closure.png" usemap="#mapClosure">
<map id="mapClosure" name="mapClosure">
<area shape="poly" coords="260,36,260,126,80,126,80,36" href="https://code.google.com/p/v8/source/browse/trunk/src/contexts.h#188">
<area shape="poly" coords="681,36,681,126,321,126,321,36" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#7245">
</map></p>
<p>The VM allocated an object to store the parent variable used by the inner
<code>greet()</code>. It’s as if <code>makeGreeter</code>'s lexical scope had been <strong>closed over</strong> at
that moment, crystallized into a heap object for as long as needed (in this case,
the lifetime of the returned function).  Hence the name <strong>closure</strong>, which makes
a lot of sense when you see it that way. If more parent variables had been used
(or <em>captured</em>), the <code>Context</code> object would have more properties, one per
captured variable. Naturally, the code emitted for <code>greet()</code> knows to read
<code>greeting</code> from the Context object, rather than expect it on the stack.</p>
<p>Here’s a fuller example:</p>
<figure class="highlight javascript"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">function</span> <span class="title">makeGreeter</span>(<span class="params">greetings</span>)</span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">    <span class="keyword">var</span> count = <span class="number">0</span>;</span><br><span class="line">    <span class="keyword">var</span> greeter = {};</span><br><span class="line"></span><br><span class="line">    <span class="keyword">for</span> (<span class="keyword">var</span> i = <span class="number">0</span>; i < greetings.length; i++) {</span><br><span class="line">        <span class="keyword">var</span> greeting = greetings[i];</span><br><span class="line"></span><br><span class="line">        greeter[greeting] = <span class="function"><span class="keyword">function</span>(<span class="params">name</span>) </span>{</span><br><span class="line">            count++;</span><br><span class="line">            <span class="built_in">console</span>.log(greeting + <span class="string">', '</span> + name);</span><br><span class="line">        }</span><br><span class="line">    }</span><br><span class="line"></span><br><span class="line">    greeter.count = <span class="function"><span class="keyword">function</span>(<span class="params"></span>) </span>{ <span class="keyword">return</span> count; }</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> greeter;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">var</span> greeter = makeGreeter([<span class="string">"hi"</span>, <span class="string">"hello"</span>, <span class="string">"howdy"</span>])</span><br><span class="line">greeter.hi(<span class="string">'poppet'</span>); <span class="comment">// prints "howdy, poppet"</span></span><br><span class="line">greeter.hello(<span class="string">'darling'</span>); <span class="comment">// prints "howdy, darling"</span></span><br><span class="line">greeter.count(); <span class="comment">// returns 2</span></span><br></pre></td></tr></tbody></table></figure>
<p>Well… <code>count()</code> works, but our greeter is stuck in <em>howdy</em>.  Can you tell why?
What we’re doing with <code>count</code> is a clue: even though the lexical scope is closed
over into a heap object, the <em>values</em> taken by the variables (or object
properties) can still be changed. Here’s what we have:</p>
<img id="greeterFail" class="center" src="/img/stack/greeterFail.png" usemap="#mapGreeterFail">
<map id="mapGreeterFail" name="mapGreeterFail">
<area shape="poly" coords="118,186,118,326,18,326,18,186" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#1671">
<area shape="poly" coords="510,36,510,146,170,146,170,36" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#7245">
<area shape="poly" coords="510,156,510,266,170,266,170,156" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#7245">
<area shape="poly" coords="510,276,510,386,170,386,170,276" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#7245">
<area shape="poly" coords="510,396,510,466,170,466,170,396" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#7245">
<area shape="poly" coords="742,206,742,306,562,306,562,206" href="https://code.google.com/p/v8/source/browse/trunk/src/contexts.h#188">
</map>
<p>There is one common context shared by all functions. That’s why <code>count</code> works.
But the greeting is also being shared, and it was set to the last value iterated
over, “howdy” in this case. That’s a pretty common error, and the easiest way to
avoid it is to introduce a function call to take the closed-over variable as an
argument. In CoffeeScript, the <a href="http://coffeescript.org/#loops" target="_blank" rel="noopener">do</a> command provides an easy way to
do so.  Here’s a simple solution for our greeter:</p>
<figure class="highlight javascript"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">function</span> <span class="title">makeGreeter</span>(<span class="params">greetings</span>)</span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">    <span class="keyword">var</span> count = <span class="number">0</span>;</span><br><span class="line">    <span class="keyword">var</span> greeter = {};</span><br><span class="line"></span><br><span class="line">    greetings.forEach(<span class="function"><span class="keyword">function</span>(<span class="params">greeting</span>) </span>{</span><br><span class="line">        greeter[greeting] = <span class="function"><span class="keyword">function</span>(<span class="params">name</span>) </span>{</span><br><span class="line">            count++;</span><br><span class="line">            <span class="built_in">console</span>.log(greeting + <span class="string">', '</span> + name);</span><br><span class="line">        }</span><br><span class="line">    });</span><br><span class="line"></span><br><span class="line">    greeter.count = <span class="function"><span class="keyword">function</span>(<span class="params"></span>) </span>{ <span class="keyword">return</span> count; }</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> greeter;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">var</span> greeter = makeGreeter([<span class="string">"hi"</span>, <span class="string">"hello"</span>, <span class="string">"howdy"</span>])</span><br><span class="line">greeter.hi(<span class="string">'poppet'</span>); <span class="comment">// prints "hi, poppet"</span></span><br><span class="line">greeter.hello(<span class="string">'darling'</span>); <span class="comment">// prints "hello, darling"</span></span><br><span class="line">greeter.count(); <span class="comment">// returns 2</span></span><br></pre></td></tr></tbody></table></figure>
<p>It now works, and the result becomes:</p>
<p><img id="greeter" class="center" src="/img/stack/greeter.png" usemap="#mapGreeter">
<map id="mapGreeter" name="mapGreeter">
<area shape="poly" coords="118,146,118,286,18,286,18,146" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#1671">
<area shape="poly" coords="290,36,290,116,170,116,170,36" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#7245">
<area shape="poly" coords="290,126,290,206,170,206,170,126" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#7245">
<area shape="poly" coords="290,216,290,296,170,296,170,216" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#7245">
<area shape="poly" coords="290,306,290,386,170,386,170,306" href="https://code.google.com/p/v8/source/browse/trunk/src/objects.h#7245">
<area shape="poly" coords="511,36,511,116,351,116,351,36" href="https://code.google.com/p/v8/source/browse/trunk/src/contexts.h#188">
<area shape="poly" coords="511,126,511,206,351,206,351,126" href="https://code.google.com/p/v8/source/browse/trunk/src/contexts.h#188">
<area shape="poly" coords="511,216,511,296,351,296,351,216" href="https://code.google.com/p/v8/source/browse/trunk/src/contexts.h#188">
<area shape="poly" coords="742,166,742,266,562,266,562,166" href="https://code.google.com/p/v8/source/browse/trunk/src/contexts.h#188 ">
</map></p>
<p>That’s a lot of arrows! But here’s the interesting feature: in our code, we
closed over two nested lexical contexts, and sure enough we get two linked
Context objects in the heap. You could nest and close over many lexical
contexts, Russian-doll style, and you end up with essentially a linked list of
all these Context objects.</p>
<p>Of course, just as you can implement TCP over carrier pigeons, there are many
ways to implement these language features. For example, the ES6 spec defines
<a href="http://people.mozilla.org/~jorendorff/es6-draft.html#sec-lexical-environments" target="_blank" rel="noopener">lexical environments</a> as consisting of an <a href="http://people.mozilla.org/~jorendorff/es6-draft.html#sec-environment-records" target="_blank" rel="noopener">environment record</a> (roughly, the
local identifiers within a block) plus a link to an outer environment record,
allowing the nesting we have seen. The <em>logical rules</em> are nailed by the spec
(one hopes), but it’s up to the implementation to translate them into bits and
bytes.</p>
<p>You can also inspect the assembly code produced by V8 for specific cases.
<a href="http://mrale.ph" target="_blank" rel="noopener">Vyacheslav Egorov</a> has great posts and explains this process along with
V8 <a href="http://mrale.ph/blog/2012/09/23/grokking-v8-closures-for-fun.html" target="_blank" rel="noopener">closure internals</a> in detail. I’ve only started studying V8, so
pointers and corrections are welcome. If you know C#, inspecting the IL code
emitted for closures is enlightening - you will see the analog of V8 Contexts
explicitly defined and instantiated.</p>
<p>Closures are powerful beasts. They provide a succinct way to hide information
from a caller while sharing it among a set of functions.  I love that they
<strong>truly hide</strong> your data: unlike object fields, callers cannot access or even
<em>see</em> closed-over variables. Keeps the interface cleaner and safer.</p>
<p>But they’re no silver bullet. Sometimes an object nut and a closure fanatic will
argue endlessly about their relative merits. Like most tech discussions, it’s
often more about ego than real tradeoffs. At any rate, this <a href="http://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg03277.html" target="_blank" rel="noopener">epic koan</a> by
Anton van Straaten settles the issue:</p>
<blockquote><p>The venerable master Qc Na was walking with his student, Anton.  Hoping to
prompt the master into a discussion, Anton said “Master, I have heard that
objects are a very good thing - is this true?”  Qc Na looked pityingly at
his student and replied, “Foolish pupil - objects are merely a poor man’s
closures.”</p>
<p>Chastised, Anton took his leave from his master and returned to his cell,
intent on studying closures.  He carefully read the entire “Lambda: The
Ultimate…” series of papers and its cousins, and implemented a small
Scheme interpreter with a closure-based object system.  He learned much, and
looked forward to informing his master of his progress.</p>
<p>On his next walk with Qc Na, Anton attempted to impress his master by
saying “Master, I have diligently studied the matter, and now understand
that objects are truly a poor man’s closures.”  Qc Na responded by hitting
Anton with his stick, saying “When will you learn? Closures are a poor man’s
object.”  At that moment, Anton became enlightened.</p>
<footer><strong>Anton van Straaten</strong><cite><a href="http://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg03277.html" target="_blank" rel="noopener">What's so cool about Scheme?</a></cite></footer></blockquote>
<p>And that closes our stack series. In the future I plan to cover other language
implementation topics like object binding and vtables. But the call of the
kernel is strong, so there’s an OS post coming out tomorrow.  I invite you to
<a href="https://manybutfinite.com/feed.xml">subscribe</a> and <a href="http://twitter.com/manybutfinite" target="_blank" rel="noopener">follow me</a>.</p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;The last post in this series looks at closures, objects, and other creatures
roaming beyond the stack. Much of w
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
      <category term="programming" scheme="https://manybutfinite.com/category/programming/"/>
    
    
  </entry>
  
  <entry>
    <title>Tail Calls, Optimization, and ES6</title>
    <link href="https://manybutfinite.com/post/tail-calls-optimization-es6/"/>
    <id>https://manybutfinite.com/post/tail-calls-optimization-es6/</id>
    <updated>2014-05-23T11:00:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>In this penultimate post about the stack, we take a quick look at <strong>tail
calls</strong>, compiler optimizations, and the <em>proper tail calls</em> landing in the
newest version of JavaScript.</p>
<p>A <strong>tail call</strong> happens when a function <code>F</code> makes a function call as its final
action. At that point <code>F</code> will do absolutely no more work: it passes the ball to
whatever function is being called and vanishes from the game. This is notable
because it opens up the possibility of <strong>tail call optimization</strong>: instead of
<a href="/post/journey-to-the-stack" title="Journey to the Stack">creating a new stack frame</a> for the function call, we can simply <em>reuse</em>
<code>F</code>'s stack frame, thereby saving stack space and avoiding the work involved in
setting up a new frame. Here are some examples in C and their results compiled
with <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/asm-tco.sh" target="_blank" rel="noopener">mild optimization</a>:</p>
<figure class="highlight c"><figcaption><span>Simple Tail Calls</span><a href="/code/x86-stack/tail.c">view raw</a></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">add5</span><span class="params">(<span class="keyword">int</span> a)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">        <span class="keyword">return</span> a + <span class="number">5</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">add10</span><span class="params">(<span class="keyword">int</span> a)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">        <span class="keyword">int</span> b = add5(a); <span class="comment">// not tail</span></span><br><span class="line">        <span class="keyword">return</span> add5(b); <span class="comment">// tail</span></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">add5AndTriple</span><span class="params">(<span class="keyword">int</span> a)</span> </span>{</span><br><span class="line">        <span class="keyword">int</span> b = add5(a); <span class="comment">// not tail</span></span><br><span class="line">        <span class="keyword">return</span> <span class="number">3</span> * add5(a); <span class="comment">// not tail, doing work after the call</span></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">finicky</span><span class="params">(<span class="keyword">int</span> a)</span> </span>{</span><br><span class="line">        <span class="keyword">if</span> (a > <span class="number">10</span>) {</span><br><span class="line">                <span class="keyword">return</span> add5AndTriple(a); <span class="comment">// tail</span></span><br><span class="line">        }</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> (a > <span class="number">5</span>) {</span><br><span class="line">                <span class="keyword">int</span> b = add5(a); <span class="comment">// not tail</span></span><br><span class="line">                <span class="keyword">return</span> finicky(b); <span class="comment">// tail</span></span><br><span class="line">        }</span><br><span class="line"></span><br><span class="line">        <span class="keyword">return</span> add10(a); <span class="comment">// tail</span></span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>You can normally spot tail call optimization (hereafter, TCO) in compiler output
by seeing a <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/tail-tco.s#L27" target="_blank" rel="noopener">jump</a> instruction where a <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/tail.s#L37-L39" target="_blank" rel="noopener">call</a> would have been
expected. At runtime TCO leads to a reduced call stack.</p>
<p>A common misconception is that tail calls are necessarily
<a href="/post/recursion/">recursive</a>. That’s not the case: a tail call <em>may</em> be recursive,
such as in <code>finicky()</code> above, but it need not be. As long as caller <code>F</code> is
completely done at the call site, we’ve got ourselves a tail call. <em>Whether it
can be optimized</em> is a different question whose answer depends on your
programming environment.</p>
<p>“Yes, it can, always!” is the best answer we can hope for, which is famously the
case for Scheme, as discussed in <a href="http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-11.html" target="_blank" rel="noopener">SICP</a> (by the way, if when you program
you don’t feel like “a Sorcerer conjuring the spirits of the computer with your
spells,” I urge you to read that book). It’s also the case for <a href="http://www.lua.org/pil/6.3.html" target="_blank" rel="noopener">Lua</a>.  And
most importantly, it is the case for the next version of JavaScript, ES6, whose
spec does a good job defining <a href="https://people.mozilla.org/~jorendorff/es6-draft.html#sec-tail-position-calls" target="_blank" rel="noopener">tail position</a> and clarifying the few
conditions required for optimization, such as <a href="https://people.mozilla.org/~jorendorff/es6-draft.html#sec-strict-mode-code" target="_blank" rel="noopener">strict mode</a>.  When
a language guarantees TCO, it supports <em>proper tail calls</em>.</p>
<p>Now some of us can’t kick that C habit, heart bleed and all, and the answer
there is a more complicated “sometimes” that takes us into compiler optimization
territory.  We’ve seen the <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/tail.c" target="_blank" rel="noopener">simple examples</a> above; now let’s resurrect
our factorial from <a href="/post/recursion/">last post</a>:</p>
<figure class="highlight c"><figcaption><span>Recursive Factorial</span><a href="/code/x86-stack/factorial.c">view raw</a></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">factorial</span><span class="params">(<span class="keyword">int</span> n)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">	<span class="keyword">int</span> previous = <span class="number">0xdeadbeef</span>;	</span><br><span class="line"></span><br><span class="line">	<span class="keyword">if</span> (n == <span class="number">0</span> || n == <span class="number">1</span>) {</span><br><span class="line">		<span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line">	}</span><br><span class="line"></span><br><span class="line">	previous = factorial(n<span class="number">-1</span>);</span><br><span class="line">	<span class="keyword">return</span> n * previous;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">	<span class="keyword">int</span> answer = factorial(<span class="number">5</span>);</span><br><span class="line">	<span class="built_in">printf</span>(<span class="string">"%d\n"</span>, answer);</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>So, is line 11 a tail call? It’s not, because of the multiplication by <code>n</code>
afterwards. But if you’re not used to optimizations, gcc’s
<a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/factorial-o2.s" target="_blank" rel="noopener">result</a> with <a href="https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html" target="_blank" rel="noopener">O2 optimization</a> might shock you: not only it
transforms <code>factorial</code> into a <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/factorial-o2.s#L16-L19" target="_blank" rel="noopener">recursion-free loop</a>, but the
<code>factorial(5)</code> call is eliminated entirely and replaced by a
<a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/factorial-o2.s#L38" target="_blank" rel="noopener">compile-time constant</a> of 120 (5! == 120).  This is why debugging optimized
code can be hard sometimes. On the plus side, if you call this function it will
use a single stack frame regardless of n’s initial value.  Compiler algorithms
are pretty fun, and if you’re interested I suggest you check out
<a href="http://www.amazon.com/Building-Optimizing-Compiler-Bob-Morgan-ebook/dp/B008COCE9G/" target="_blank" rel="noopener">Building an Optimizing Compiler</a> and <a href="http://www.amazon.com/Advanced-Compiler-Design-Implementation-Muchnick-ebook/dp/B003VM7GGK/" target="_blank" rel="noopener">ACDI</a>.</p>
<p>However, what happened here was <strong>not</strong> tail call optimization, since there was
<em>no tail call to begin with</em>. gcc outsmarted us by analyzing what the function
does and optimizing away the needless recursion. The task was made easier by
the simple, deterministic nature of the operations being done. By adding a dash
of chaos (<em>e.g.</em>, <code>getpid()</code>) we can throw gcc off:</p>
<figure class="highlight c"><figcaption><span>Recursive PID Factorial</span><a href="/code/x86-stack/pidFactorial.c">view raw</a></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/types.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><unistd.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">pidFactorial</span><span class="params">(<span class="keyword">int</span> n)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">        <span class="keyword">if</span> (<span class="number">1</span> == n) {</span><br><span class="line">                <span class="keyword">return</span> getpid(); <span class="comment">// tail</span></span><br><span class="line">        }</span><br><span class="line"></span><br><span class="line">        <span class="keyword">return</span> n * pidFactorial(n<span class="number">-1</span>) * getpid(); <span class="comment">// not tail</span></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">        <span class="keyword">int</span> answer = pidFactorial(<span class="number">5</span>);</span><br><span class="line">        <span class="built_in">printf</span>(<span class="string">"%d\n"</span>, answer);</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>Optimize <em>that</em>, unix fairies! So now we have a regular
<a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/pidFactorial-o2.s#L20" target="_blank" rel="noopener">recursive call</a> and this function allocates O(n) stack
frames to do its work. Heroically, gcc still does <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/pidFactorial-o2.s#L43" target="_blank" rel="noopener">TCO for getpid</a>
in the recursion base case. If we now wished to make this function tail recursive,
we’d need a slight change:</p>
<figure class="highlight c"><figcaption><span>tailPidFactorial.c</span><a href="/code/x86-stack/tailPidFactorial.c">view raw</a></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/types.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><unistd.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">tailPidFactorial</span><span class="params">(<span class="keyword">int</span> n, <span class="keyword">int</span> acc)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">        <span class="keyword">if</span> (<span class="number">1</span> == n) {</span><br><span class="line">                <span class="keyword">return</span> acc * getpid(); <span class="comment">// not tail</span></span><br><span class="line">        }</span><br><span class="line"></span><br><span class="line">        acc = (acc * getpid() * n);</span><br><span class="line">        <span class="keyword">return</span> tailPidFactorial(n<span class="number">-1</span>, acc); <span class="comment">// tail</span></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">        <span class="keyword">int</span> answer = tailPidFactorial(<span class="number">5</span>, <span class="number">1</span>);</span><br><span class="line">        <span class="built_in">printf</span>(<span class="string">"%d\n"</span>, answer);</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>The accumulation of the result is now <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/tailPidFactorial-o2.s#L22-L27" target="_blank" rel="noopener">a loop</a> and we’ve
achieved true TCO. But before you go out partying, what can we say about the
general case in C? Sadly, while good C compilers do TCO in a number of cases,
there are many situations where they cannot do it. For example, as we saw in our
<a href="/post/epilogues-canaries-buffer-overflows/">function epilogues</a>, the <em>caller</em> is responsible for cleaning up the
stack after a function call using the standard C calling convention. So if
function <code>F</code> takes two arguments, it can only make TCO calls to functions taking
two or fewer arguments. This is one among many restrictions. Mark Probst wrote
an excellent thesis discussing <a href="http://www.complang.tuwien.ac.at/schani/diplarb.ps" target="_blank" rel="noopener">Proper Tail Recursion in C</a> where he
discusses these issues along with C stack behavior. He also does
<a href="http://www.complang.tuwien.ac.at/schani/jugglevids/index.html" target="_blank" rel="noopener">insanely cool juggling</a>.</p>
<p>“Sometimes” is a rocky foundation for any relationship, so you can’t rely on TCO
in C. It’s a discrete optimization that may or may not take place, rather than
a language <em>feature</em> like proper tail calls, though in practice the compiler
will optimize the vast majority of cases. But if you <em>must have it</em>, say for
transpiling Scheme into C, you will <a href="http://en.wikipedia.org/wiki/Tail_call#Through_trampolining" target="_blank" rel="noopener">suffer</a>.</p>
<p>Since JavaScript is now the most popular transpilation target, proper tail calls
become even more important there. So kudos to ES6 for delivering it along with
many other significant improvements. It’s like Christmas for JS programmers.</p>
<p>This concludes our brief tour of tail calls and compiler optimization.  Thanks
for reading and see you next time.</p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;In this penultimate post about the stack, we take a quick look at &lt;strong&gt;tail
calls&lt;/strong&gt;, compiler optimiza
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
      <category term="programming" scheme="https://manybutfinite.com/category/programming/"/>
    
    
  </entry>
  
  <entry>
    <title>Recursion: dream within a dream</title>
    <link href="https://manybutfinite.com/post/recursion/"/>
    <id>https://manybutfinite.com/post/recursion/</id>
    <updated>2014-04-10T18:00:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p><strong>Recursion</strong> is magic, but it suffers from the most awkward introduction in
programming books.  They’ll show you a recursive factorial implementation, then
warn you that while it sort of works it’s terribly slow and might crash due to
stack overflows.  “You could always dry your hair by sticking your head
into the microwave, but watch out for intracranial pressure and head explosions.
Or you can use a towel.” No wonder people are suspicious of it. Which is too
bad, because <strong>recursion is the single most powerful idea in algorithms</strong>.</p>
<p>Let’s take a look at the classic recursive factorial:</p>
<figure class="highlight c"><figcaption><span>Recursive Factorial - factorial.c</span></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">factorial</span><span class="params">(<span class="keyword">int</span> n)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">        <span class="keyword">int</span> previous = <span class="number">0xdeadbeef</span>;</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> (n == <span class="number">0</span> || n == <span class="number">1</span>) {</span><br><span class="line">                <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line">        }</span><br><span class="line"></span><br><span class="line">        previous = factorial(n<span class="number">-1</span>);</span><br><span class="line">        <span class="keyword">return</span> n * previous;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">        <span class="keyword">int</span> answer = factorial(<span class="number">5</span>);</span><br><span class="line">        <span class="built_in">printf</span>(<span class="string">"%d\n"</span>, answer);</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>The idea of a function calling itself is mystifying at first. To make it
concrete, here is <em>exactly</em> what is <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/factorial-gdb-output.txt" target="_blank" rel="noopener">on the stack</a> when
<code>factorial(5)</code> is called and reaches <code>n == 1</code>:</p>
<img src="/img/stack/factorial.png" class="center">
<p>Each call to <code>factorial</code> generates a new <a href="/post/journey-to-the-stack" title="Journey to the Stack">stack frame</a>. The creation and
<a href="/post/epilogues-canaries-buffer-overflows/">destruction</a> of these stack frames is what makes the recursive
factorial slower than its iterative counterpart. The accumulation of these
frames before the calls start returning is what can potentially exhaust stack
space and crash your program.</p>
<p>These concerns are often theoretical. For example, the stack frames for
<code>factorial</code> take 16 bytes each (this can vary depending on stack alignment and
other factors). If you are running a modern x86 Linux kernel on a computer, you
normally have 8 megabytes of stack space, so factorial could handle <code>n</code> up to
~512,000. This is a <a href="https://gist.github.com/gduarte/9944878" target="_blank" rel="noopener">monstrously large result</a> that takes
8,971,833 bits to represent, so stack space is the least of our problems: a puny
integer - even a 64-bit one - will overflow tens of thousands of times over
before we run out of stack space.</p>
<p>We’ll look at CPU usage in a moment, but for now let’s take a step back from the
bits and bytes and look at recursion as a general technique. Our factorial
algorithm boils down to pushing integers N, N-1, … 1 onto a stack, then
multiplying them in reverse order. The fact we’re using the program’s call stack
to do this is an implementation detail: we could allocate a stack on the heap
and use that instead. While the call stack does have special properties, it’s
just another data structure at your disposal. I hope the diagram makes that
clear.</p>
<p>Once you see the call stack as a data structure, something else becomes clear:
piling up all those integers to multiply them afterwards is <em>one dumbass idea</em>.
<em>That</em> is the real lameness of this implementation: it’s using a screwdriver to
hammer a nail. It’s far more sensible to use an iterative process to calculate
factorials.</p>
<p>But there are <em>plenty</em> of screws out there, so let’s pick one. There is
a traditional interview question where you’re given a mouse in a maze, and you
must help the mouse search for cheese. Suppose the mouse can turn either left
or right in the maze. How would you model and solve this problem?</p>
<p>Like most problems in life, you can reduce this rodent quest to a graph, in
particular a binary tree where the nodes represent positions in the maze.
You could then have the mouse attempt left turns whenever possible, and
backtrack to turn right when it reaches a dead end. Here’s the mouse walk in an
<a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/maze.h" target="_blank" rel="noopener">example maze</a>:</p>
<img src="/img/stack/mazeGraph.png" class="center">
<p>Each edge (line) is a left or right turn taking our mouse to a new position. If
either turn is blocked, the corresponding edge does not exist.  Now we’re
talking! This process is <em>inherently</em> recursive whether you use the call stack
or another data structure.  But using the call stack is just <em>so easy</em>:</p>
<figure class="highlight c"><figcaption><span>Recursive Maze Solver</span><a href="/code/x86-stack/maze.c">view raw</a></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string">"maze.h"</span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">explore</span><span class="params">(<span class="keyword">maze_t</span> *node)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">	<span class="keyword">int</span> found = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">if</span> (node == <span class="literal">NULL</span>) {</span><br><span class="line">        <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">    }</span><br><span class="line"></span><br><span class="line">    <span class="keyword">if</span> (node->hasCheese) {</span><br><span class="line">        <span class="keyword">return</span> <span class="number">1</span>; <span class="comment">// found cheese</span></span><br><span class="line">    }</span><br><span class="line"></span><br><span class="line">	found = explore(node->left) || explore(node->right);</span><br><span class="line">	<span class="keyword">return</span> found;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">        <span class="keyword">int</span> found = explore(&maze);</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>Below is the stack when we find the cheese in maze.c:13. You can also
see the detailed <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/maze-gdb-output.txt" target="_blank" rel="noopener">GDB output</a> and <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/maze-gdb-commands.txt" target="_blank" rel="noopener">commands</a>
used to gather data.</p>
<img src="/img/stack/mazeCallStack.png" class="center">
<p>This shows recursion in a much better light because it’s a suitable problem. And
that’s no oddity: when it comes to algorithms, <em>recursion is the rule, not the
exception</em>. It comes up when we search, when we traverse trees and other data
structures, when we parse, when we sort: it’s <em>everywhere</em>. You know how <strong>pi</strong>
or <strong>e</strong> come up in math all the time because they’re in the foundations of the
universe? Recursion is like that: it’s in the fabric of computation.</p>
<p>Steven Skienna’s excellent <a href="http://www.amazon.com/Algorithm-Design-Manual-Steven-Skiena/dp/1848000693/" target="_blank" rel="noopener">Algorithm Design Manual</a> is a great place to
see that in action as he works through his “war stories” and shows the reasoning
behind algorithmic solutions to real-world problems. It’s the best resource
I know of to develop your intuition for algorithms.  Another good read is
McCarthy’s <a href="https://github.com/papers-we-love/papers-we-love/blob/master/comp_sci_fundamentals_and_history/recursive-functions-of-symbolic-expressions-and-their-computation-by-machine-parti.pdf" target="_blank" rel="noopener">original paper on LISP</a>. Recursion is both in its title
and in the foundations of the language. The paper is readable and fun, it’s
always a pleasure to see a master at work.</p>
<p>Back to the maze. While it’s hard to get away from recursion here, it doesn’t
mean it must be done via the call stack. You could for example use a string like
<code>RRLL</code> to keep track of the turns, and rely on the string to decide on the
mouse’s next move. Or you can allocate something else to record the state of the
cheese hunt. You’d still be implementing a recursive process, but rolling your
own data structure.</p>
<p>That’s likely to be more complex because the call stack fits like a glove.
Each stack frame records not only the current node, but also the state of
computation in that node (in this case, whether we’ve taken only the left, or
are already attempting the right). Hence the code becomes trivial. Yet we
sometimes give up this sweetness for fear of overflows and hopes of performance.
That can be foolish.</p>
<p>As we’ve seen, the stack is large and frequently other constraints kick
in before stack space does. One can also check the problem size and ensure it
can be handled safely. The CPU worry is instilled chiefly by two widespread
pathological examples: the dumb factorial and the hideous O(2<sup>n</sup>)
<a href="http://stackoverflow.com/questions/360748/computational-complexity-of-fibonacci-sequence" target="_blank" rel="noopener">recursive Fibonacci</a> without memoization. These are <strong>not</strong> indicative of
sane stack-recursive algorithms.</p>
<p>The reality is that stack operations are <em>fast</em>. Often the offsets to data are
known exactly, the stack is hot in the <a href="/post/intel-cpu-caches/">caches</a>, and there are dedicated
instructions to get things done. Meanwhile, there is substantial overhead
involved in using your own heap-allocated data structures.  It’s not uncommon to
see people write something that ends up <em>more complex and less performant</em> than
call-stack recursion.  Finally, modern CPUs are <a href="/post/what-your-computer-does-while-you-wait/">pretty good</a>
and often not the bottleneck. Be careful about sacrificing simplicity and as
always with performance, <a href="/post/performance-is-a-science">measure</a>.</p>
<p>The next post is the last in this stack series, and we’ll look at Tail Calls,
Closures, and Other Fauna. Then it’ll be time to visit our old friend, the Linux
kernel. Thanks for reading!</p>
<img src="/img/stack/1000px-Sierpinski-build.png" class="center">
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;&lt;strong&gt;Recursion&lt;/strong&gt; is magic, but it suffers from the most awkward introduction in
programming books.  Th
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
      <category term="programming" scheme="https://manybutfinite.com/category/programming/"/>
    
    
  </entry>
  
  <entry>
    <title>Epilogues, Canaries, and Buffer Overflows</title>
    <link href="https://manybutfinite.com/post/epilogues-canaries-buffer-overflows/"/>
    <id>https://manybutfinite.com/post/epilogues-canaries-buffer-overflows/</id>
    <updated>2014-03-19T16:30:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>Last week we looked at <a href="/post/journey-to-the-stack" title="Journey to the Stack">how the stack works</a> and how stack frames are
built during function <em>prologues</em>. Now it’s time to look at the inverse process
as stack frames are destroyed in function <em>epilogues</em>.  Let’s bring back our
friend <code>add.c</code>:</p>
<figure class="highlight c"><figcaption><span>Simple Add Program - add.c</span></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">add</span><span class="params">(<span class="keyword">int</span> a, <span class="keyword">int</span> b)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">    <span class="keyword">int</span> result = a + b;</span><br><span class="line">    <span class="keyword">return</span> result;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">    <span class="keyword">int</span> answer;</span><br><span class="line">    answer = add(<span class="number">40</span>, <span class="number">2</span>);</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>We’re executing line 4, right after the assignment of <code>a + b</code> into <code>result</code>. This is
what happens:</p>
<p><img id="returnFromAdd" class="center" src="/img/stack/returnFromAdd.png" usemap="#mapreturnFromAdd">
<map id="mapreturnFromAdd" name="mapreturnFromAdd">
<area shape="poly" coords="754,6,754,312,6,312,6,6" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L156">
<area shape="poly" coords="754,312,754,618,6,618,6,312" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L162">
<area shape="poly" coords="754,618,754,924,6,924,6,618" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L162">
<area shape="poly" coords="754,924,754,1234,6,1234,6,924" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L162">
</map></p>
<p>The first instruction is redundant and a little silly because we know <code>eax</code> is
already equal to <code>result</code>, but this is what you get with optimization turned
off. The <code>leave</code> instruction then runs, doing two tasks for the price of one: it
resets <code>esp</code> to point to the start of the current stack frame, and then restores
the saved ebp value. These two operations are logically distinct and thus are
broken up in the diagram, but they happen atomically if you’re tracing with
a debugger.</p>
<p>After <code>leave</code> runs the previous stack frame is restored. The only vestige of the
call to <code>add</code> is the return address on top of the stack. It contains the address
of the instruction in <code>main</code> that must run after <code>add</code> is done. The <code>ret</code>
instruction takes care of it: it pops the return address into the <code>eip</code>
register, which points to the next instruction to be executed.  The program has
now returned to main, which resumes:</p>
<p><img id="returnFromMain" class="center" src="/img/stack/returnFromMain.png" usemap="#mapreturnFromMain">
<map id="mapreturnFromMain" name="mapreturnFromMain">
<area shape="poly" coords="754,6,754,312,6,312,6,6" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L175">
<area shape="poly" coords="754,312,754,618,6,618,6,312" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L181">
<area shape="poly" coords="754,618,754,924,6,924,6,618" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L181">
<area shape="poly" coords="754,924,754,1234,6,1234,6,924" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L181">
</map></p>
<p><code>main</code> copies the return value from <code>add</code> into local variable <code>answer</code> and then
runs its own epilogue, which is identical to any other. Again the only
peculiarity in <code>main</code> is that the saved ebp is null, since it is the first stack
frame in our code. In the last step, execution has been returned to the
C runtime (<code>libc</code>), which will exit to the operating system. Here’s a diagram
with the <a href="/img/stack/returnSequence.png">full return sequence</a> for those
who need it.</p>
<p>You now have an excellent grasp of how the stack operates, so let’s have some
fun and look at one of the most infamous hacks of all time: exploiting the stack
buffer overflow. Here is a vulnerable program:</p>
<figure class="highlight c"><figcaption><span>Vulnerable Program - buffer.c</span></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">doRead</span><span class="params">()</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">        <span class="keyword">char</span> <span class="built_in">buffer</span>[<span class="number">28</span>];</span><br><span class="line">        gets(<span class="built_in">buffer</span>);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">        doRead();</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>The code above uses <a href="http://linux.die.net/man/3/gets" target="_blank" rel="noopener">gets</a> to read from
standard input. <code>gets</code> keeps reading until it encounters a newline or end of
file. Here’s what the stack looks like after a string has been read:</p>
<img src="/img/stack/bufferCopy.png" class>
<p>The problem here is that <code>gets</code> is unaware of <code>buffer</code>'s size: it will blithely
keep reading input and stuffing data into the stack beyond <code>buffer</code>,
obliterating the saved ebp value, return address, and whatever else is below.
To exploit this behavior, attackers craft a precise payload and feed it into the
program. This is what the stack looks like during an attack, after the call to
<code>gets</code>:</p>
<img src="/img/stack/bufferOverflowExploit.png" class>
<p>The basic idea is to provide malicious assembly code to be executed <em>and</em>
overwrite the return address on the stack to point to that code. It is a bit
like a virus invading a cell, subverting it, and introducing some RNA to further
its goals.</p>
<p>And like a virus, the exploit’s payload has many notable features.  It starts
with several <code>nop</code> instructions to increase the odds of successful exploitation.
This is because the return address is absolute and must be guessed, since
attackers don’t know exactly where in the stack their code will be stored. But
as long as they land on a <code>nop</code>, the exploit works: the processor will execute
the nops until it hits the instructions that do work.</p>
<p>The <code>exec /bin/sh</code> symbolizes raw assembly instructions that execute a shell
(imagine for example that the vulnerability is in a networked program, so the
exploit might provide shell access to the system). The idea of feeding raw
assembly to a program expecting a command or user input is shocking at first,
but that’s part of what makes security research so fun and mind-expanding.  To
give you an idea of how weird things get, sometimes the vulnerable program calls
<code>tolower</code> or <code>toupper</code> on its inputs, forcing attackers to write assembly
instructions whose bytes do not fall into the range of upper- or lower-case
ascii letters.</p>
<p>Finally, attackers repeat the guessed return address several times, again to
tip the odds ever in their favor. By starting on a 4-byte boundary and providing
multiple repeats, they are more likely to overwrite the original return address
on the stack.</p>
<p>Thankfully, modern operating systems have a host of
<a href="http://paulmakowski.wordpress.com/2011/01/25/smashing-the-stack-in-2011/" target="_blank" rel="noopener">protections against buffer overflows</a>, including non-executable stacks and <em>stack canaries</em>. The “canary” name comes from the <a href="http://en.wiktionary.org/wiki/canary_in_a_coal_mine" target="_blank" rel="noopener">canary in a coal mine</a> expression, an addition to computer science’s rich vocabulary. In the words of Steve McConnell:</p>
<blockquote><p>Computer science has some of the most colorful language of any field. In what other field can you walk into a sterile room, carefully controlled at 68°F, and find viruses, Trojan horses, worms, bugs, bombs, crashes, flames, twisted sex changers, and fatal errors?</p>
<footer><strong>Steve McConnell</strong><cite>Code Complete 2</cite></footer></blockquote>
<p>At any rate, here’s what a stack canary looks like:</p>
<img src="/img/stack/bufferCanary.png" class>
<p>Canaries are implemented by the compiler. For example, GCC’s
<a href="http://gcc.gnu.org/onlinedocs/gcc-4.2.3/gcc/Optimize-Options.html" target="_blank" rel="noopener">stack-protector</a>
option causes canaries to be used in any function that is potentially
vulnerable. The function prologue loads a magic value into the canary location,
and the epilogue makes sure the value is intact. If it’s not, a buffer overflow
(or bug) likely happened and the program is aborted via
<a href="http://refspecs.linux-foundation.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic/libc---stack-chk-fail-1.html" target="_blank" rel="noopener">__stack_chk_fail</a>.
Due to their strategic location on the stack, canaries make the exploitation of
stack buffer overflows much harder.</p>
<p>This finishes our journey within the depths of the stack. We don’t want to delve
too greedily and too deep. Next week we’ll go up a notch in abstraction to take
a good look at recursion, tail calls and other tidbits, probably using Google’s
V8. To end this epilogue and prologue talk, I’ll close with a cherished quote
inscribed on a monument in the American National Archives:</p>
<img src="/img/stack/past-is-prologue.jpg" class>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;Last week we looked at &lt;a href=&quot;/post/journey-to-the-stack&quot; title=&quot;Journey to the Stack&quot;&gt;how the stack works&lt;/a&gt;
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
    
  </entry>
  
  <entry>
    <title>Journey to the Stack, Part I</title>
    <link href="https://manybutfinite.com/post/journey-to-the-stack/"/>
    <id>https://manybutfinite.com/post/journey-to-the-stack/</id>
    <updated>2014-03-10T15:00:00.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>Earlier we’ve explored the <a href="/post/anatomy-of-a-program-in-memory" title="Anatomy of a Program in Memory">anatomy of a program in memory</a>, the
landscape of how our programs run in a computer. Now we turn to the <em>call
stack</em>, the work horse in most programming languages and virtual machines.
Along the way we’ll meet fantastic creatures like closures, recursion, and
buffer overflows. But the first step is a precise picture of how the stack
operates.</p>
<p>The stack is so important because it keeps track of the <em>functions</em> running in
a program, and functions are in turn the building blocks of software.  In fact,
the internal operation of programs is normally very simple. It consists mostly
of functions pushing data onto and popping data off the stack as they call each
other, while allocating memory on the heap for data that must survive across
function calls. This is true for both low-level C software and VM-based
languages like JavaScript and C#. A solid grasp of this reality is invaluable
for debugging, performance tuning and generally knowing what the hell is going
on.</p>
<p>When a function is called, a <strong>stack frame</strong> is created to support the
function’s execution. The stack frame contains the function’s <em>local variables</em>
and the <em>arguments</em> passed to the function by its caller. The frame also
contains housekeeping information that allows the called function (the <em>callee</em>)
to return to the caller safely.  The exact contents and layout of the stack vary
by processor architecture and function call convention. In this post we look at
Intel x86 stacks using C-style function calls (<code>cdecl</code>). Here’s a single stack
frame sitting live on top of the stack:</p>
<img src="/img/stack/stackIntro.png" class>
<p>Right away, three CPU registers burst into the scene. The <em>stack pointer</em>,
<code>esp</code>, points to the top of the stack. The top is always occupied by the <em>last
item that was pushed</em> onto the stack <em>but has not yet been popped off</em>, just as
in a real-world stack of plates or $100 bills.</p>
<p>The address stored in <code>esp</code> constantly changes as stack items are pushed and
popped, such that it always points to the last item. Many CPU instructions
automatically update <code>esp</code> as a side effect, and it’s impractical to use the
stack without this register.</p>
<p>In the Intel architecture, as in most, the stack grows towards <em>lower memory
addresses</em>. So the “top” is the lowest memory address in the stack containing
live data: <code>local_buffer</code> in this case.  Notice there’s nothing vague about the
arrow from <code>esp</code> to <code>local_buffer</code>.  This arrow means business: it points
<em>specifically</em> to the <em>first byte</em> occupied by <code>local_buffer</code> because that is
the exact address stored in <code>esp</code>.</p>
<p>The second register tracking the stack is <code>ebp</code>, the <em>base pointer</em> or <em>frame
pointer</em>. It points to a fixed location within the stack frame of the function
<em>currently running</em> and provides a stable reference point (base) for access to
arguments and local variables. <code>ebp</code> changes only when a function call begins or
ends. Thus we can easily address each item in the stack as an offset from <code>ebp</code>,
as shown in the diagram.</p>
<p>Unlike <code>esp</code>, <code>ebp</code> is mostly maintained by program code with little CPU
interference. Sometimes there are performance benefits in ditching <code>ebp</code>
altogether, which can be done via <a href="http://stackoverflow.com/questions/14666665/trying-to-understand-gcc-option-fomit-frame-pointer" target="_blank" rel="noopener">compiler flags</a>.
The Linux kernel is one example where this is done.</p>
<p>Finally, the <code>eax</code> register is used by convention to transfer return values back
to the caller for most C data types.</p>
<p>Now let’s inspect the data in our stack frame. These diagram shows precise
byte-for-byte contents as you’d see in a debugger, with memory growing
left-to-right, top-to-bottom. Here it is:</p>
<img src="/img/stack/frameContents.png" class>
<p>The local variable <code>local_buffer</code> is a byte array containing a null-terminated
ascii string, a staple of C programs. The string was likely read from somewhere,
for example keyboard input or a file, and it is 7 bytes long. Since
<code>local_buffer</code> can hold 8 bytes, there’s 1 free byte left. The <em>content of this
byte is unknown</em> because in the stack’s infinite dance of pushes and pops, you
never know what memory holds <em>unless you write to it</em>.  Since the C compiler
does not initialize the memory for a stack frame, contents are undetermined</p>
<ul>
<li>and somewhat random - until written to. This has driven some into madness.</li>
</ul>
<p>Moving on, <code>local1</code> is a 4-byte integer and you can see the contents of each
byte.  It looks like a big number, with all those zeros following the 8, but
here your intuition leads you astray.</p>
<p>Intel processors are <em>little endian</em> machines, meaning that numbers in memory
start with the <em>little end</em> first. So the least significant byte of a multi-byte
number is in the lowest memory address. Since that is normally shown leftmost,
this departs from our usual representation of numbers. It helps to know that
this endian talk is borrowed from Gulliver’s Travels: just as folks in Lilliput
eat their eggs starting from the little end, Intel processors eat their numbers
starting from the little byte.</p>
<p>So <code>local1</code> in fact holds the number 8, as in the legs of an octopus.  <code>param1</code>,
however, has a value of 2 in the second byte position, so its mathematical value
is 2 * 256 = 512 (we multiply by 256 because each place value ranges from 0 to
255). Meanwhile, <code>param2</code> is carrying weight at 1 * 256 * 256 = 65536.</p>
<p>The housekeeping data in this stack frame consists of two crucial pieces: the
address of the <em>previous</em> stack frame (saved ebp) and the address of the
instruction to be executed upon the function’s exit (the return address).
Together, they make it possible for the function to return sanely and for the
program to keep running along.</p>
<p>Now let’s see the birth of a stack frame to build a clear mental picture of how
this all works together. Stack growth is puzzling at first because it happens
<em>in the opposite direction</em> you’d expect. For example, to allocate 8 bytes on
the stack one <em>subtracts</em> 8 from <code>esp</code>, and subtraction is an odd way to grow
something.</p>
<p>Let’s take a simple C program:</p>
<figure class="highlight c"><figcaption><span>Simple Add Program - add.c</span></figcaption><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">add</span><span class="params">(<span class="keyword">int</span> a, <span class="keyword">int</span> b)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">	<span class="keyword">int</span> result = a + b;</span><br><span class="line">	<span class="keyword">return</span> result;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">	<span class="keyword">int</span> answer;</span><br><span class="line">	answer = add(<span class="number">40</span>, <span class="number">2</span>);</span><br><span class="line">}</span><br></pre></td></tr></tbody></table></figure>
<p>Suppose we run this in Linux without command-line parameters.  When you run
a C program, the first code to actually execute is in the C runtime library,
which then calls our <code>main</code> function. The diagrams below show step-by-step what
happens as the program runs. Each diagram links to GDB output showing the state
of memory and registers. You may also see the <a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-commands.txt" target="_blank" rel="noopener">GDB commands</a> used and the whole
<a href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt" target="_blank" rel="noopener">GDB output</a>. Here we go:</p>
<p><img id="mainProlog" class="center" src="/img/stack/mainProlog.png" usemap="#mapMainProlog">
<map id="mapMainProlog" name="mapMainProlog">
<area shape="poly" coords="754,6,754,312,6,312,6,6" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L10">
<area shape="poly" coords="754,312,754,618,6,618,6,312" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L32">
<area shape="poly" coords="754,618,754,928,6,928,6,618" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L40">
</map></p>
<p>Steps 2 and 3, along with 4 below, are the <strong>function prologue</strong>, which is
common to nearly all functions: the current value of ebp is saved to the top of
the stack, and then <code>esp</code> is copied to <code>ebp</code>, establishing a new frame. main’s
prologue is like any other, but with the peculiarity that <code>ebp</code> is zeroed out
when the program starts.</p>
<p>If you were to inspect the stack below <code>argc</code> (to the right) you’d find more
data, including pointers to the program name and command-line parameters (the
traditional C <code>argv</code>), plus pointers to Unix environment variables and their
actual contents. But that’s not important here, so the ball keeps rolling
towards the <code>add()</code> call:</p>
<p><img id="callAdd" class="center" src="/img/stack/callAdd.png" usemap="#mapCallAdd">
<map id="mapCallAdd" name="mapCallAdd">
<area shape="poly" coords="754,6,754,312,6,312,6,6" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L46">
<area shape="poly" coords="754,312,754,642,6,642,6,312" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L55">
<area shape="poly" coords="754,642,754,952,6,952,6,642" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L73">
</map></p>
<p>After <code>main</code> subtracts 12 from <code>esp</code> to get the stack space it needs, it sets
the values for <code>a</code> and <code>b</code>. Values in memory are shown in hex and little-endian
format, as you’d see in a debugger. Once parameter values are set, <code>main</code> calls
<code>add</code> and it starts running:</p>
<p><img id="addProlog" class="center" src="/img/stack/addProlog.png" usemap="#mapaddProlog">
<map id="mapaddProlog" name="mapaddProlog">
<area shape="poly" coords="754,6,754,312,6,312,6,6" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L95">
<area shape="poly" coords="754,312,754,618,6,618,6,312" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L104">
<area shape="poly" coords="754,618,754,928,6,928,6,618" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L110">
</map></p>
<p>Now there’s some excitement! We get another prologue, but this time you can see
clearly how the stack frames form a linked list, starting at <code>ebp</code> and going
down the stack. This is how debuggers and <code>Exception</code> objects in higher-level
languages get their stack traces.  You can also see the much more typical
catching up of <code>ebp</code> to <code>esp</code> when a new frame is born. And again, we subtract
from <code>esp</code> to get more stack space.</p>
<p>There’s also the slightly weird reversal of bytes when the <code>ebp</code> register value
is copied to memory. What’s happening here is that registers don’t really have
endianness: there are no “growing addresses” inside a register as there are for
memory. Thus by convention debuggers show register values in the most natural
format to humans: most significant to least significant digits. So the results
of the copy in a little-endian machine appear reversed in the usual
left-to-right notation for memory. I want the diagrams to provide an accurate
picture of what you find in the trenches, so there you have it.</p>
<p>With the hard part behind us, we add:</p>
<p><img id="doAdd" class="center" src="/img/stack/doAdd.png" usemap="#mapdoAdd">
<map id="mapdoAdd" name="mapdoAdd">
<area shape="poly" coords="754,6,754,360,6,360,6,6" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L120">
<area shape="poly" coords="754,360,754,670,6,670,6,360" href="https://github.com/gduarte/blog/blob/master/code/x86-stack/add-gdb-output.txt#L138">
</map></p>
<p>There are guest register appearances to help out with the addition, but
otherwise no alarms and no surprises. <code>add</code> did its job, and at this point the
stack action would go in reverse, but we’ll save that for next time.</p>
<p>Anybody who’s read this far deserves a souvenir, so I’ve made a large diagram
showing <a href="/img/stack/callSequence.png">all the steps combined</a> in a fit of
nerd pride.</p>
<p>It looks tame once it’s all laid out. Those little boxes help <em>a lot</em>. In fact,
little boxes are the chief tool of computer science. I hope the pictures and
register movements provide an intuitive mental picture that integrates stack
growth and memory contents. Up close, our software doesn’t look too far from
a simple Turing machine.</p>
<p>This concludes the first part of our stack tour. There’s some more byte
spelunking ahead, and then it’s on to see higher level programming concepts
built on this foundation. See you next week.</p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;Earlier we’ve explored the &lt;a href=&quot;/post/anatomy-of-a-program-in-memory&quot; title=&quot;Anatomy of a Program in Memory&quot;
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
    
  </entry>
  
  <entry>
    <title>Page Cache, the Affair Between Memory and Files</title>
    <link href="https://manybutfinite.com/post/page-cache-the-affair-between-memory-and-files/"/>
    <id>https://manybutfinite.com/post/page-cache-the-affair-between-memory-and-files/</id>
    <updated>2009-02-11T13:20:18.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>Previously we looked at how the kernel <a href="/post/how-the-kernel-manages-your-memory">manages virtual memory</a>
for a user process, but files and I/O were left out. This post covers
the important and often misunderstood relationship between files and
memory and its consequences for performance.</p>
<p>Two serious problems must be solved by the OS when it comes to files.  The first
one is the mind-blowing slowness of hard drives, and
<a href="/post/what-your-computer-does-while-you-wait">disk seeks in particular</a>,
relative to memory. The second is the need to load file contents in physical
memory once and <em>share</em> the contents among programs. If you use
<a href="http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx" target="_blank" rel="noopener">Process Explorer</a> to poke
at Windows processes, you’ll see there are ~15MB worth of common DLLs loaded in
every process. My Windows box right now is running 100 processes, so without
sharing I’d be using up to ~1.5 GB of physical RAM <em>just for common DLLs</em>. No
good. Likewise, nearly all Linux programs need <a href="http://ld.so" target="_blank" rel="noopener">ld.so</a> and libc, plus other common
libraries.</p>
<p>Happily, both problems can be dealt with in one shot: the <strong>page
cache</strong>, where the kernel stores page-sized chunks of files. To
illustrate the page cache, I’ll conjure a Linux program named
<strong>render</strong>, which opens file <strong>scene.dat</strong> and reads it 512 bytes at a
time, storing the file contents into a heap-allocated block. The first
read goes like this:</p>
<p><img src="http://static.duartes.org/img/blogPosts/readFromPageCache.png" alt="Reading and the page cache"></p>
<p>After 12KB have been read, <code>render</code>'s heap and the relevant page frames
look thus:</p>
<p><img src="http://static.duartes.org/img/blogPosts/nonMappedFileRead.png" alt="Non-mapped file read"></p>
<p>This looks innocent enough, but there’s a lot going on. First, even
though this program uses regular <code>read</code> calls, three 4KB page frames are
now in the page cache storing part of <code>scene.dat</code>. People are sometimes
surprised by this, but <strong>all regular file I/O happens through the page
cache</strong>. In x86 Linux, the kernel thinks of a file as a sequence of 4KB
chunks. If you read a single byte from a file, the whole 4KB chunk
containing the byte you asked for is read from disk and placed into the
page cache. This makes sense because sustained disk throughput is pretty
good and programs normally read more than just a few bytes from a file
region. The page cache knows the position of each 4KB chunk within the
file, depicted above as #0, #1, etc. Windows uses 256KB <strong>views</strong>
analogous to pages in the Linux page cache.</p>
<p>Sadly, in a regular file read the kernel must copy the contents of the
page cache into a user buffer, which not only takes cpu time and hurts
the <a href="/post/intel-cpu-caches">cpu caches</a>, but
also <strong>wastes physical memory with duplicate data</strong>. As per the diagram
above, the <code>scene.dat</code> contents are stored twice, and each instance of
the program would store the contents an additional time. We’ve mitigated
the disk latency problem but failed miserably at everything else.
<strong>Memory-mapped files</strong> are the way out of this madness:</p>
<p><img src="http://static.duartes.org/img/blogPosts/mappedFileRead.png" alt="Mapped file read"></p>
<p>When you use file mapping, the kernel maps your program’s virtual pages
directly onto the page cache. This can deliver a significant performance
boost: <a href="http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321256190/" target="_blank" rel="noopener">Windows System Programming</a>
reports run time improvements of 30% and up relative to regular file
reads, while similar figures are reported for Linux and Solaris in
<a href="http://www.amazon.com/Programming-Environment-Addison-Wesley-Professional-Computing/dp/0321525949/" target="_blank" rel="noopener">Advanced Programming in the Unix Environment</a>.
You might also save large amounts of physical memory, depending on the
nature of your application.</p>
<p>As always with performance, <a href="/post/performance-is-a-science">measurement is everything</a>,
but memory mapping earns its keep in a programmer’s toolbox. The API is
pretty nice too, it allows you to access a file as bytes in memory and
does not require your soul and code readability in exchange for its
benefits. Mind your <a href="/post/anatomy-of-a-program-in-memory">address space</a>
and experiment with
<a href="http://www.kernel.org/doc/man-pages/online/pages/man2/mmap.2.html" target="_blank" rel="noopener">mmap</a>
in Unix-like systems,
<a href="http://msdn.microsoft.com/en-us/library/aa366537(VS.85).aspx" target="_blank" rel="noopener">CreateFileMapping</a>
in Windows, or the many wrappers available in high level languages. When
you map a file its contents are not brought into memory all at once, but
rather on demand via <a href="http://lxr.linux.no/linux+v2.6.28/mm/memory.c#L2678" target="_blank" rel="noopener">page faults</a>. The fault
handler <a href="http://lxr.linux.no/linux+v2.6.28/mm/memory.c#L2436" target="_blank" rel="noopener">maps your virtual pages</a> onto the
page cache after
<a href="http://lxr.linux.no/linux+v2.6.28/mm/filemap.c#L1424" target="_blank" rel="noopener">obtaining</a> a page
frame with the needed file contents. This involves disk I/O if the
contents weren’t cached to begin with.</p>
<p>Now for a pop quiz. Imagine that the last instance of our <code>render</code>
program exits. Would the pages storing <em>scene.dat</em> in the page cache be
freed immediately? People often think so, but that would be a bad idea.
When you think about it, it is very common for us to create a file in
one program, exit, then use the file in a second program. The page cache
must handle that case. When you think <em>more</em> about it, why should the
kernel <em>ever</em> get rid of page cache contents? Remember that disk is 5
orders of magnitude slower than RAM, hence a page cache hit is a huge
win. So long as there’s enough free physical memory, the cache should be
kept full. It is therefore <em>not</em> dependent on a particular process, but
rather it’s a system-wide resource. If you run <code>render</code> a week from now
and <code>scene.dat</code> is still cached, bonus! This is why the kernel cache
size climbs steadily until it hits a ceiling. It’s not because the OS is
garbage and hogs your RAM, it’s actually good behavior because in a way
free physical memory is a waste. Better use as much of the stuff for
caching as possible.</p>
<p>Due to the page cache architecture, when a program calls
<a href="http://www.kernel.org/doc/man-pages/online/pages/man2/write.2.html" target="_blank" rel="noopener">write()</a>
bytes are simply copied to the page cache and the page is marked dirty.
Disk I/O normally does <strong>not</strong> happen immediately, thus your program
doesn’t block waiting for the disk. On the downside, if the computer
crashes your writes will never make it, hence critical files like
database transaction logs must be
<a href="http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html" target="_blank" rel="noopener">fsync()</a>ed
(though one must still worry about drive controller caches, oy!). Reads,
on the other hand, normally block your program until the data is
available. Kernels employ eager loading to mitigate this problem, an
example of which is <strong>read ahead</strong> where the kernel preloads a few pages
into the page cache in anticipation of your reads. You can help the
kernel tune its eager loading behavior by providing hints on whether you
plan to read a file sequentially or randomly (see
<a href="http://www.kernel.org/doc/man-pages/online/pages/man2/madvise.2.html" target="_blank" rel="noopener">madvise()</a>,
<a href="http://www.kernel.org/doc/man-pages/online/pages/man2/readahead.2.html" target="_blank" rel="noopener">readahead()</a>,
<a href="http://msdn.microsoft.com/en-us/library/aa363858(VS.85).aspx#caching_behavior" target="_blank" rel="noopener">Windows cache hints</a>
).
Linux <a href="http://lxr.linux.no/linux+v2.6.28/mm/filemap.c#L1424" target="_blank" rel="noopener">does read-ahead</a> for
memory-mapped files, but I’m not sure about Windows. Finally, it’s
possible to bypass the page cache using
<a href="http://www.kernel.org/doc/man-pages/online/pages/man2/open.2.html" target="_blank" rel="noopener">O_DIRECT</a>
in Linux or
<a href="http://msdn.microsoft.com/en-us/library/cc644950(VS.85).aspx" target="_blank" rel="noopener">NO_BUFFERING</a>
in Windows, something database software often does.</p>
<p>A file mapping may be <strong>private</strong> or <strong>shared</strong>. This refers only to
<strong>updates</strong> made to the contents in memory: in a private mapping the
updates are not committed to disk or made visible to other processes,
whereas in a shared mapping they are. Kernels use the <strong>copy on write</strong>
mechanism, enabled by page table entries, to implement private mappings.
In the example below, both <code>render</code> and another program called
<code>render3d</code> (am I creative or what?) have mapped <code>scene.dat</code> privately.
<code>Render</code> then writes to its virtual memory area that maps the file:</p>
<p><img src="http://static.duartes.org/img/blogPosts/copyOnWrite.png" alt="The Copy-On-Write mechanism"></p>
<p>The read-only page table entries shown above do <em>not</em> mean the mapping
is read only, they’re merely a kernel trick to share physical memory
until the last possible moment. You can see how ‘private’ is a bit of a
misnomer until you remember it only applies to updates. A consequence of
this design is that a virtual page that maps a file privately sees
changes done to the file by other programs <em>as long as the page has only
been read from</em>. Once copy-on-write is done, changes by others are no
longer seen. This behavior is not guaranteed by the kernel, but it’s
what you get in x86 and makes sense from an API perspective. By
contrast, a shared mapping is simply mapped onto the page cache and
that’s it. Updates are visible to other processes and end up in the
disk. Finally, if the mapping above were read-only, page faults would
trigger a segmentation fault instead of copy on write.</p>
<p>Dynamically loaded libraries are brought into your program’s address
space via file mapping. There’s nothing magical about it, it’s the same
private file mapping available to you via regular APIs. Below is an
example showing part of the address spaces from two running instances of
the file-mapping <code>render</code> program, along with physical memory, to tie
together many of the concepts we’ve seen.</p>
<p><img src="http://static.duartes.org/img/blogPosts/virtualToPhysicalMapping.png" alt="Mapping virtual memory to physical memory"></p>
<p>This concludes our 3-part series on memory fundamentals. I hope the
series was useful and provided you with a good mental model of these OS
topics.</p>
<p><a href="/comments/page-cache.html">62 Comments</a></p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;Previously we looked at how the kernel &lt;a href=&quot;/post/how-the-kernel-manages-your-memory&quot;&gt;manages virtual memory
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internal" scheme="https://manybutfinite.com/category/internal/"/>
    
      <category term="linux" scheme="https://manybutfinite.com/category/linux/"/>
    
    
  </entry>
  
  <entry>
    <title>The Thing King</title>
    <link href="https://manybutfinite.com/post/the-thing-king/"/>
    <id>https://manybutfinite.com/post/the-thing-king/</id>
    <updated>2009-02-05T00:12:01.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>I hope the <a href="/post/how-the-kernel-manages-your-memory">previous post</a> explained virtual memory adequately, but I must admit I held back a much better explanation, which I first saw in <a href="http://www.amazon.com/Expert-Programming-Peter-van-Linden/dp/0131774298/" target="_blank" rel="noopener">Expert C Programming</a>. It wasn't written by the book's author, Peter van der Linden, but rather by Jeff Berryman in 1972. Here goes:</p> <p><b>The Thing King and the Paging Game</b></p> <p>This note is a formal non-working paper of the Project MAC Computer Systems Research Division. It should be reproduced and distributed wherever levity is lacking, and may be referenced at your own risk in other publications.</p> <h4>Rules</h4> <ol> <li>Each player gets several million things.</li> <li>Things are kept in crates that hold 4096 things each. Things in the same crate are called crate-mates.</li> <li>Crates are stored either in the workshop or the warehouses. The workshop is almost always too small to hold all the crates.</li> <li>There is only one workshop but there may be several warehouses. Everybody shares them.</li> <li>Each thing has its own thing number.</li> <li>What you do with a thing is to zark it. Everybody takes turns zarking.</li> <li>You can only zark your things, not anybody else's.</li> <li>Things can only be zarked when they are in the workshop.</li> <li>Only the Thing King knows whether a thing is in the workshop or in a warehouse.</li> <li>The longer a thing goes without being zarked, the grubbier it is said to become.</li> <li>The way you get things is to ask the Thing King. He only gives out things by the crateful. This is to keep the royal overhead down.</li> <li>The way you zark a thing is to give its thing number. If you give the number of a thing that happens to be in a workshop it gets zarked right away. If it is in a warehouse, the Thing King packs the crate containing your thing back into the workshop. If there is no room in the workshop, he first finds the grubbiest crate in the workshop, whether it be yours or somebody else's, and packs it off with all its crate-mates to a warehouse. In its place he puts the crate containing your thing. Your thing then gets zarked and you never know that it wasn't in the workshop all along.</li> <li>Each player's stock of things have the same numbers as everybody else's. The Thing King always knows who owns what thing and whose turn it is, so you can't ever accidentally zark somebody else's thing even if it has the same thing number as one of yours.</li> </ol> <h4>Notes</h4> <ol> <li>Traditionally, the Thing King sits at a large, segmented table and is attended to by pages (the so-called "table pages") whose job it is to help the king remember where all the things are and who they belong to.</li> <li>One consequence of Rule 13 is that everybody's thing numbers will be similar from game to game, regardless of the number of players.</li> <li>The Thing King has a few things of his own, some of which move back and forth between workshop and warehouse just like anybody else's, but some of which are just too heavy to move out of the workshop.</li> <li>With the given set of rules, oft-zarked things tend to get kept mostly in the workshop while little-zarked things stay mostly in a warehouse. This is efficient stock control.</li> </ol> <p><strong>Long Live the Thing King!</strong></p> 
<p><b>Update:</b> Alex pointed out the difficulties of measuring grubbiness in a comment below.</p>
<p><a href="/comments/thing-king.html">14 Comments</a></p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;I hope the &lt;a href=&quot;/post/how-the-kernel-manages-your-memory&quot;&gt;previous post&lt;/a&gt; explained virtual memory adequat
      
    
    </summary>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
      <category term="culture" scheme="https://manybutfinite.com/category/culture/"/>
    
    
  </entry>
  
  <entry>
    <title>How The Kernel Manages Your Memory</title>
    <link href="https://manybutfinite.com/post/how-the-kernel-manages-your-memory/"/>
    <id>https://manybutfinite.com/post/how-the-kernel-manages-your-memory/</id>
    <updated>2009-02-04T13:35:49.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>After examining the <a href="/post/anatomy-of-a-program-in-memory">virtual address layout</a> of a process, we turn to the kernel and its mechanisms for managing user memory. Here is gonzo again:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/mm_struct.png" alt="Linux kernel mm_struct"></p> <p>Linux processes are implemented in the kernel as instances of <a href="http://lxr.linux.no/linux+v2.6.28.1/include/linux/sched.h#L1075" target="_blank" rel="noopener">task_struct</a>, the process descriptor. The <a href="http://lxr.linux.no/linux+v2.6.28.1/include/linux/sched.h#L1129" target="_blank" rel="noopener">mm</a> field in task_struct points to the <strong>memory descriptor</strong>, <a href="http://lxr.linux.no/linux+v2.6.28.1/include/linux/mm_types.h#L173" target="_blank" rel="noopener">mm_struct</a>, which is an executive summary of a program's memory. It stores the start and end of memory segments as shown above, the <a href="http://lxr.linux.no/linux+v2.6.28.1/include/linux/mm_types.h#L197" target="_blank" rel="noopener">number</a> of physical memory pages used by the process (<strong>rss</strong> stands for Resident Set Size), the <a href="http://lxr.linux.no/linux+v2.6.28.1/include/linux/mm_types.h#L206" target="_blank" rel="noopener">amount</a> of virtual address space used, and other tidbits. Within the memory descriptor we also find the two work horses for managing program memory: the set of <strong>virtual memory areas</strong> and the <strong>page tables</strong>. Gonzo's memory areas are shown below:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/memoryDescriptorAndMemoryAreas.png" alt="Kernel memory descriptor and memory areas"></p> <p>Each virtual memory area (VMA) is a contiguous range of virtual addresses; these areas never overlap. An instance of <a href="http://lxr.linux.no/linux+v2.6.28.1/include/linux/mm_types.h#L99" target="_blank" rel="noopener">vm_area_struct</a> fully describes a memory area, including its start and end addresses, <a href="http://lxr.linux.no/linux+v2.6.28/include/linux/mm.h#L76" target="_blank" rel="noopener">flags</a> to determine access rights and behaviors, and the <a href="http://lxr.linux.no/linux+v2.6.28.1/include/linux/mm_types.h#L150" target="_blank" rel="noopener">vm_file</a> field to specify which file is being mapped by the area, if any. A VMA that does not map a file is <strong>anonymous</strong>. Each memory segment above (<em>e.g.</em>, heap, stack) corresponds to a single VMA, with the exception of the memory mapping segment. This is not a requirement, though it is usual in x86 machines. VMAs do not care which segment they are in.</p> <p>A program's VMAs are stored in its memory descriptor both as a linked list in the <a href="http://lxr.linux.no/linux+v2.6.28.1/include/linux/mm_types.h#L174" target="_blank" rel="noopener">mmap</a> field, ordered by starting virtual address, and as a <a href="http://en.wikipedia.org/wiki/Red_black_tree" target="_blank" rel="noopener">red-black tree</a> rooted at the <a href="http://lxr.linux.no/linux+v2.6.28.1/include/linux/mm_types.h#L175" target="_blank" rel="noopener">mm_rb</a> field. The red-black tree allows the kernel to search quickly for the memory area covering a given virtual address. When you read file <tt>/proc/pid_of_process/maps</tt>, the kernel is simply going through the linked list of VMAs for the process and <a href="http://lxr.linux.no/linux+v2.6.28.1/fs/proc/task_mmu.c#L201" target="_blank" rel="noopener">printing each one</a>.</p> <p>In Windows, the <a href="http://www.nirsoft.net/kernel_struct/vista/EPROCESS.html" target="_blank" rel="noopener">EPROCESS</a> block is roughly a mix of task_struct and mm_struct. The Windows analog to a VMA is the Virtual Address Descriptor, or <a href="http://www.nirsoft.net/kernel_struct/vista/MMVAD.html" target="_blank" rel="noopener">VAD</a>; they are stored in an <a href="http://en.wikipedia.org/wiki/AVL_tree" target="_blank" rel="noopener">AVL tree</a>. You know what the funniest thing about Windows and Linux is? It's the little differences.</p> <p>The 4GB virtual address space is divided into <strong>pages</strong>. x86 processors in 32-bit mode support page sizes of 4KB, 2MB, and 4MB. Both Linux and Windows map the user portion of the virtual address space using 4KB pages. Bytes 0-4095 fall in page 0, bytes 4096-8191 fall in page 1, and so on. The size of a VMA <em>must be a multiple of page size</em>. Here's 3GB of user space in 4KB pages:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/pagedVirtualSpace.png" alt="4KB Pages Virtual User Space"></p> <p>The processor consults <strong>page tables</strong> to translate a virtual address into a physical memory address. Each process has its own set of page tables; whenever a process switch occurs, page tables for user space are switched as well. Linux stores a pointer to a process' page tables in the <a href="http://lxr.linux.no/linux+v2.6.28.1/include/linux/mm_types.h#L185" target="_blank" rel="noopener">pgd</a> field of the memory descriptor. To each virtual page there corresponds one <strong>page table entry</strong> (PTE) in the page tables, which in regular x86 paging is a simple 4-byte record shown below:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/x86PageTableEntry4KB.png" alt="x86 Page Table Entry (PTE) for 4KB page"></p> <p>Linux has functions to <a href="http://lxr.linux.no/linux+v2.6.28.1/arch/x86/include/asm/pgtable.h#L173" target="_blank" rel="noopener">read</a> and <a href="http://lxr.linux.no/linux+v2.6.28.1/arch/x86/include/asm/pgtable.h#L230" target="_blank" rel="noopener">set</a> each flag in a PTE. Bit P tells the processor whether the virtual page is <strong>present</strong> in physical memory. If clear (equal to 0), accessing the page triggers a page fault. Keep in mind that when this bit is zero, <strong>the kernel can do whatever it pleases</strong> with the remaining fields. The R/W flag stands for read/write; if clear, the page is read-only. Flag U/S stands for user/supervisor; if clear, then the page can only be accessed by the kernel. These flags are used to implement the read-only memory and protected kernel space we saw before.</p> <p>Bits D and A are for <strong>dirty</strong> and <strong>accessed</strong>. A dirty page has had a write, while an accessed page has had a write or read. Both flags are sticky: the processor only sets them, they must be cleared by the kernel. Finally, the PTE stores the starting physical address that corresponds to this page, aligned to 4KB. This naive-looking field is the source of some pain, for it limits addressable physical memory to <a href="http://www.google.com/search?hl=en&q=2^20+*+2^12+bytes+in+GB" target="_blank" rel="noopener">4 GB</a>. The other PTE fields are for another day, as is Physical Address Extension.</p> <p>A virtual page is the unit of memory protection because all of its bytes share the U/S and R/W flags. However, the same physical memory could be mapped by different pages, possibly with different protection flags. Notice that execute permissions are nowhere to be seen in the PTE. This is why classic x86 paging allows code on the stack to be executed, making it easier to exploit stack buffer overflows (it's still possible to exploit non-executable stacks using <a href="http://en.wikipedia.org/wiki/Return-to-libc_attack" target="_blank" rel="noopener">return-to-libc</a> and other techniques). This lack of a PTE no-execute flag illustrates a broader fact: permission flags in a VMA may or may not translate cleanly into hardware protection. The kernel does what it can, but ultimately the architecture limits what is possible.</p> <p>Virtual memory doesn't store anything, it simply <em>maps</em> a program's address space onto the underlying physical memory, which is accessed by the processor as a large block called the <strong>physical address space</strong>. While memory operations on the bus are <a href="/post/getting-physical-with-memory">somewhat involved</a>, we can ignore that here and assume that physical addresses range from zero to the top of available memory in one-byte increments. This physical address space is broken down by the kernel into <strong>page frames</strong>. The processor doesn't know or care about frames, yet they are crucial to the kernel because <strong>the page frame is the unit of physical memory management.</strong> Both Linux and Windows use 4KB page frames in 32-bit mode; here is an example of a machine with 2GB of RAM:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/physicalAddressSpace.png" alt="Physical Address Space"></p> <p>In Linux each page frame is tracked by a <a href="http://lxr.linux.no/linux+v2.6.28/include/linux/mm_types.h#L32" target="_blank" rel="noopener">descriptor</a> and <a href="http://lxr.linux.no/linux+v2.6.28/include/linux/page-flags.h#L14" target="_blank" rel="noopener">several flags</a>. Together these descriptors track the entire physical memory in the computer; the precise state of each page frame is always known. Physical memory is managed with the <a href="http://en.wikipedia.org/wiki/Buddy_memory_allocation" target="_blank" rel="noopener">buddy memory allocation</a> technique, hence a page frame is <strong>free</strong> if it's available for allocation via the buddy system. An allocated page frame might be <strong>anonymous</strong>, holding program data, or it might be in the <strong>page cache</strong>, holding data stored in a file or block device. There are other exotic page frame uses, but leave them alone for now. Windows has an analogous Page Frame Number (PFN) database to track physical memory.</p> <p>Let's put together virtual memory areas, page table entries and page frames to understand how this all works. Below is an example of a user heap:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/heapMapped.png" alt="Physical Address Space"></p> <p>Blue rectangles represent pages in the VMA range, while arrows represent page table entries mapping pages onto page frames. Some virtual pages lack arrows; this means their corresponding PTEs have the <strong>Present</strong> flag clear. This could be because the pages have never been touched or because their contents have been swapped out. In either case access to these pages will lead to page faults, even though they are within the VMA. It may seem strange for the VMA and the page tables to disagree, yet this often happens.</p> <p>A VMA is like a contract between your program and the kernel. You ask for something to be done (memory allocated, a file mapped, etc.), the kernel says "sure", and it creates or updates the appropriate VMA. But <em>it does not</em> actually honor the request right away, it waits until a page fault happens to do real work. The kernel is a lazy, deceitful sack of scum; this is the fundamental principle of virtual memory. It applies in most situations, some familiar and some surprising, but the rule is that VMAs record what has been <em>agreed upon</em>, while PTEs reflect what has <em>actually been done</em> by the lazy kernel. These two data structures together manage a program's memory; both play a role in resolving page faults, freeing memory, swapping memory out, and so on. Let's take the simple case of memory allocation:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/heapAllocation.png" alt="Example of demand paging and memory allocation"></p> <p>When the program asks for more memory via the <a href="http://www.kernel.org/doc/man-pages/online/pages/man2/brk.2.html" target="_blank" rel="noopener">brk()</a> system call, the kernel simply <a href="http://lxr.linux.no/linux+v2.6.28.1/mm/mmap.c#L2050" target="_blank" rel="noopener">updates</a> the heap VMA and calls it good. No page frames are actually allocated at this point and the new pages are not present in physical memory. Once the program tries to access the pages, the processor page faults and <a href="http://lxr.linux.no/linux+v2.6.28/arch/x86/mm/fault.c#L583" target="_blank" rel="noopener">do_page_fault()</a> is called. It <a href="http://lxr.linux.no/linux+v2.6.28/arch/x86/mm/fault.c#L692" target="_blank" rel="noopener">searches</a> for the VMA covering the faulted virtual address using <a href="http://lxr.linux.no/linux+v2.6.28/mm/mmap.c#L1466" target="_blank" rel="noopener">find_vma()</a>. If found, the permissions on the VMA are also checked against the attempted access (read or write). If there's no suitable VMA, no contract covers the attempted memory access and the process is punished by Segmentation Fault.</p> <p>When a VMA is <a href="http://lxr.linux.no/linux+v2.6.28/arch/x86/mm/fault.c#L711" target="_blank" rel="noopener">found</a> the kernel must <a href="http://lxr.linux.no/linux+v2.6.28/mm/memory.c#L2653" target="_blank" rel="noopener">handle</a> the fault by looking at the PTE contents and the type of VMA. In our case, the PTE shows the page is <a href="http://lxr.linux.no/linux+v2.6.28/mm/memory.c#L2674" target="_blank" rel="noopener">not present</a>. In fact, our PTE is completely blank (all zeros), which in Linux means the virtual page has never been mapped. Since this is an anonymous VMA, we have a purely RAM affair that must be handled by <a href="http://lxr.linux.no/linux+v2.6.28/mm/memory.c#L2681" target="_blank" rel="noopener">do_anonymous_page()</a>, which allocates a page frame and makes a PTE to map the faulted virtual page onto the freshly allocated frame.</p> <p>Things could have been different. The PTE for a swapped out page, for example, has 0 in the Present flag but is not blank. Instead, it stores the swap location holding the page contents, which must be read from disk and loaded into a page frame by <a href="http://lxr.linux.no/linux+v2.6.28/mm/memory.c#L2280" target="_blank" rel="noopener">do_swap_page()</a> in what is called a <a href="http://lxr.linux.no/linux+v2.6.28/mm/memory.c#L2316" target="_blank" rel="noopener">major fault</a>.</p> <p>This concludes the first half of our tour through the kernel's user memory management. In the next post, we'll throw files into the mix to build a complete picture of memory fundamentals, including consequences for performance.</p>
<p><a href="/comments/kernel-memory.html">124 Comments</a></p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;After examining the &lt;a href=&quot;/post/anatomy-of-a-program-in-memory&quot;&gt;virtual address layout&lt;/a&gt; of a process, we t
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
      <category term="linux" scheme="https://manybutfinite.com/category/linux/"/>
    
    
  </entry>
  
  <entry>
    <title>Anatomy of a Program in Memory</title>
    <link href="https://manybutfinite.com/post/anatomy-of-a-program-in-memory/"/>
    <id>https://manybutfinite.com/post/anatomy-of-a-program-in-memory/</id>
    <updated>2009-01-27T14:34:13.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>Memory management is the heart of operating systems; it is crucial for both programming and system administration. In the next few posts I'll cover memory with an eye towards practical aspects, but without shying away from internals. While the concepts are generic, examples are mostly from Linux and Windows on 32-bit x86. This first post describes how programs are laid out in memory.</p> <p>Each process in a multi-tasking OS runs in its own memory sandbox. This sandbox is the <strong>virtual address space</strong>, which in 32-bit mode is <strong>always a 4GB block of memory addresses</strong>. These virtual addresses are mapped to physical memory by <strong>page tables</strong>, which are maintained by the operating system kernel and consulted by the processor. Each process has its own set of page tables, but there is a catch. Once virtual addresses are enabled, they apply to <em>all software</em> running in the machine, <em>including the kernel itself</em>. Thus a portion of the virtual address space must be reserved to the kernel:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/kernelUserMemorySplit.png" alt="Kernel/User Memory Split"></p> <p>This does <strong>not</strong> mean the kernel uses that much physical memory, only that it has that portion of address space available to map whatever physical memory it wishes. Kernel space is flagged in the page tables as exclusive to <a href="http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection" target="_blank" rel="noopener">privileged code</a> (ring 2 or lower), hence a page fault is triggered if user-mode programs try to touch it. In Linux, kernel space is constantly present and maps the same physical memory in all processes. Kernel code and data are always addressable, ready to handle interrupts or system calls at any time. By contrast, the mapping for the user-mode portion of the address space changes whenever a process switch happens:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/virtualMemoryInProcessSwitch.png" alt="Process Switch Effects on Virtual Memory"></p> <p>Blue regions represent virtual addresses that are mapped to physical memory, whereas white regions are unmapped. In the example above, Firefox has used far more of its virtual address space due to its legendary memory hunger. The distinct bands in the address space correspond to <strong>memory segments</strong> like the heap, stack, and so on. Keep in mind these segments are simply a range of memory addresses and <em>have nothing to do</em> with <a href="http://duartes.org/gustavo/blog/post/memory-translation-and-segmentation" target="_blank" rel="noopener">Intel-style segments</a>. Anyway, here is the standard segment layout in a Linux process:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/linuxFlexibleAddressSpaceLayout.png" alt="Flexible Process Address Space Layout In Linux"></p> <p>When computing was happy and safe and cuddly, the starting virtual addresses for the segments shown above were <strong>exactly the same</strong> for nearly every process in a machine. This made it easy to exploit security vulnerabilities remotely. An exploit often needs to reference absolute memory locations: an address on the stack, the address for a library function, etc. Remote attackers must choose this location blindly, counting on the fact that address spaces are all the same. When they are, people get pwned. Thus address space randomization has become popular. Linux randomizes the <a href="http://lxr.linux.no/linux+v2.6.28.1/fs/binfmt_elf.c#L542" target="_blank" rel="noopener">stack</a>,  <a href="http://lxr.linux.no/linux+v2.6.28.1/arch/x86/mm/mmap.c#L84" target="_blank" rel="noopener">memory mapping segment</a>, and <a href="http://lxr.linux.no/linux+v2.6.28.1/arch/x86/kernel/process_32.c#L729" target="_blank" rel="noopener">heap</a> by adding offsets to their starting addresses. Unfortunately the 32-bit address space is pretty tight, leaving little room for randomization and <a href="http://www.stanford.edu/~blp/papers/asrandom.pdf" target="_blank" rel="noopener">hampering its effectiveness</a>.</p> <p>The topmost segment in the process address space is the stack, which stores local variables and function parameters in most programming languages. Calling a method or function pushes a new <strong>stack frame</strong> onto the stack. The stack frame is destroyed when the function returns. This simple design, possible because the data obeys strict <a href="http://en.wikipedia.org/wiki/Lifo" target="_blank" rel="noopener">LIFO</a> order, means that no complex data structure is needed to track stack contents - a simple pointer to the top of the stack will do. Pushing and popping are thus very fast and deterministic. Also, the constant reuse of stack regions tends to keep active stack memory in the <a href="http://duartes.org/gustavo/blog/post/intel-cpu-caches" target="_blank" rel="noopener">cpu caches</a>, speeding up access. Each thread in a process gets its own stack.</p> <p>It is possible to exhaust the area mapping the stack by pushing more data than it can fit. This triggers a page fault that is handled in Linux by <a href="http://lxr.linux.no/linux+v2.6.28/mm/mmap.c#L1716" target="_blank" rel="noopener">expand_stack()</a>, which in turn calls <a href="http://lxr.linux.no/linux+v2.6.28/mm/mmap.c#L1544" target="_blank" rel="noopener">acct_stack_growth()</a> to check whether it's appropriate to grow the stack. If the stack size is below <tt>RLIMIT_STACK</tt> (usually 8MB), then normally the stack grows and the program continues merrily, unaware of what just happened. This is the normal mechanism whereby stack size adjusts to demand. However, if the maximum stack size has been reached, we have a <strong>stack overflow</strong> and the program receives a Segmentation Fault. While the mapped stack area expands to meet demand, it does not shrink back when the stack gets smaller. Like the federal budget, it only expands.</p> <p>Dynamic stack growth is the <a href="http://lxr.linux.no/linux+v2.6.28.1/arch/x86/mm/fault.c#L692" target="_blank" rel="noopener">only situation</a> in which access to an unmapped memory region, shown in white above, might be valid. Any other access to unmapped memory triggers a page fault that results in a Segmentation Fault. Some mapped areas are read-only, hence write attempts to these areas also lead to segfaults.</p> <p>Below the stack, we have the memory mapping segment. Here the kernel maps contents of files directly to memory. Any application can ask for such a mapping via the Linux <a href="http://www.kernel.org/doc/man-pages/online/pages/man2/mmap.2.html" target="_blank" rel="noopener">mmap()</a> system call (<a href="http://lxr.linux.no/linux+v2.6.28.1/arch/x86/kernel/sys_i386_32.c#L27" target="_blank" rel="noopener">implementation</a>) or <a href="http://msdn.microsoft.com/en-us/library/aa366537(VS.85).aspx" target="_blank" rel="noopener">CreateFileMapping()</a> / <a href="http://msdn.microsoft.com/en-us/library/aa366761(VS.85).aspx" target="_blank" rel="noopener">MapViewOfFile()</a> in Windows. Memory mapping is a convenient and high-performance way to do file I/O, so it is used for loading dynamic libraries. It is also possible to create an <strong>anonymous memory mapping</strong> that does not correspond to any files, being used instead for program data. In Linux, if you request a large block of memory via <a href="http://www.kernel.org/doc/man-pages/online/pages/man3/malloc.3.html" target="_blank" rel="noopener">malloc()</a>, the C library will create such an anonymous mapping instead of using heap memory. 'Large' means larger than <tt>MMAP_THRESHOLD</tt> bytes, 128 kB by default and adjustable via <a href="http://www.kernel.org/doc/man-pages/online/pages/man3/undocumented.3.html" target="_blank" rel="noopener">mallopt()</a>.</p> <p>Speaking of the heap, it comes next in our plunge into address space. The heap provides runtime memory allocation, like the stack, meant for data that must outlive the function doing the allocation, unlike the stack. Most languages provide heap management to programs. Satisfying memory requests is thus a joint affair between the language runtime and the kernel. In C, the interface to heap allocation is <a href="http://www.kernel.org/doc/man-pages/online/pages/man3/malloc.3.html" target="_blank" rel="noopener">malloc()</a> and friends, whereas in a garbage-collected language like C# the interface is the <tt>new</tt> keyword.</p> <p>If there is enough space in the heap to satisfy a memory request, it can be handled by the language runtime without kernel involvement. Otherwise the heap is enlarged via the <a href="http://www.kernel.org/doc/man-pages/online/pages/man2/brk.2.html" target="_blank" rel="noopener">brk()</a> system call (<a href="http://lxr.linux.no/linux+v2.6.28.1/mm/mmap.c#L248" target="_blank" rel="noopener">implementation</a>) to make room for the requested block. Heap management is <a href="http://g.oswego.edu/dl/html/malloc.html" target="_blank" rel="noopener">complex</a>, requiring sophisticated algorithms that strive for speed and efficient memory usage in the face of our programs' chaotic allocation patterns. The time needed to service a heap request can vary substantially. Real-time systems have <a href="http://rtportal.upv.es/rtmalloc/" target="_blank" rel="noopener">special-purpose allocators</a> to deal with this problem. Heaps also become <em>fragmented</em>, shown below:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/fragmentedHeap.png" alt="Fragmented Heap"></p> <p>Finally, we get to the lowest segments of memory: BSS, data, and program text. Both BSS and data store contents for static (global) variables in C. The difference is that BSS stores the contents of <em>uninitialized</em> static variables, whose values are not set by the programmer in source code. The BSS memory area is anonymous: it does not map any file. If you say <tt>static int cntActiveUsers</tt>, the contents of <tt>cntActiveUsers</tt> live in the BSS.</p> <p>The data segment, on the other hand, holds the contents for static variables initialized in source code. This memory area <strong>is not anonymous</strong>. It maps the part of the program's binary image that contains the initial static values given in source code. So if you say <tt>static int cntWorkerBees = 10</tt>, the contents of cntWorkerBees live in the data segment and start out as 10. Even though the data segment maps a file, it is a <strong>private memory mapping</strong>, which means that updates to memory are not reflected in the underlying file. This must be the case, otherwise assignments to global variables would change your on-disk binary image. Inconceivable!</p> <p>The data example in the diagram is trickier because it uses a pointer. In that case, the <em>contents</em> of pointer <tt>gonzo</tt> - a 4-byte memory address - live in the data segment. The actual string it points to does not, however. The string lives in the <strong>text</strong> segment, which is read-only and stores all of your code in addition to tidbits like string literals. The text segment also maps your binary file in memory, but writes to this area earn your program a Segmentation Fault. This helps prevent pointer bugs, though not as effectively as avoiding C in the first place. Here's a diagram showing these segments and our example variables:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/mappingBinaryImage.png" alt="ELF Binary Image Mapped Into Memory"></p> <p>You can examine the memory areas in a Linux process by reading the file <tt>/proc/pid_of_process/maps</tt>. Keep in mind that a segment may contain many areas. For example, each memory mapped file normally has its own area in the mmap segment, and dynamic libraries have extra areas similar to BSS and data. The next post will clarify what 'area' really means. Also, sometimes people say "data segment" meaning all of data + bss + heap.</p> <p>You can examine binary images using the <a href="http://manpages.ubuntu.com/manpages/intrepid/en/man1/nm.1.html" target="_blank" rel="noopener">nm</a> and <a href="http://manpages.ubuntu.com/manpages/intrepid/en/man1/objdump.1.html" target="_blank" rel="noopener">objdump</a> commands to display symbols, their addresses, segments, and so on. Finally, the virtual address layout described above is the "flexible" layout in Linux, which has been the default for a few years. It assumes that we have a value for <tt>RLIMIT_STACK</tt>. When that's not the case, Linux reverts back to the "classic" layout shown below:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/linuxClassicAddressSpaceLayout.png" alt="Classic Process Address Space Layout In Linux"></p> <p>That's it for virtual address space layout. The next post discusses how the kernel keeps track of these memory areas. Coming up we'll look at memory mapping, how file reading and writing ties into all this and what memory usage figures mean.</p>
<p><a href="/comments/anatomy.html">189 Comments</a></p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;Memory management is the heart of operating systems; it is crucial for both programming and system administratio
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
      <category term="linux" scheme="https://manybutfinite.com/category/linux/"/>
    
    
  </entry>
  
  <entry>
    <title>Getting Physical With Memory</title>
    <link href="https://manybutfinite.com/post/getting-physical-with-memory/"/>
    <id>https://manybutfinite.com/post/getting-physical-with-memory/</id>
    <updated>2009-01-16T01:15:24.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>When trying to understand complex systems, you can often learn a lot by stripping away abstractions and looking at their lowest levels. In that spirit we take a look at memory and I/O ports in their simplest and most fundamental level: the interface between the processor and bus. These details underlie higher level topics like thread synchronization and the need for the Core i7. Also, since I'm a programmer I ignore things EE people care about. Here's our friend the Core 2 again:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/physicalMemoryAccess.png" alt="Physical Memory Access"></p> <p>A Core 2 processor has 775 pins, about half of which only provide power and carry no data. Once you group the pins by functionality, the physical interface to the processor is surprisingly simple. The diagram shows the key pins involved in a memory or I/O port operation: address lines, data pins, and request pins. These operations take place in the context of a <strong>transaction</strong> on the front side bus. FSB transactions go through 5 phases: arbitration, request, snoop, response, and data. Throughout these phases, different roles are played by the components on the FSB, which are called <strong>agents</strong>. Normally the agents are all the processors plus the northbridge.</p> <p>We only look at the <strong>request phase</strong> in this post, in which 2 packets are output by the <strong>request agent</strong>, who is usually a processor. Here are the juiciest bits of the first packet, output by the address and request pins:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/fsbRequestPhasePacketA.png" alt="FSB Request Phase, Packet A"></p> <p>The address lines output the starting physical memory address for the transaction. We have 33 bits but they are interpreted as bits 35-3 of an address in which bits 2-0 are zero. Hence we have a 36-bit address, aligned to 8 bytes, for a total of <a href="http://www.google.com/search?hl=en&q=2^36+bytes" target="_blank" rel="noopener">64GB</a> addressable physical memory. This has been the case since the Pentium Pro. The request pins specify what type of transaction is being initiated; in I/O requests the address pins specify an I/O port rather than a memory address. After the first packet is output, the same pins transmit a second packet in the subsequent bus clock cycle:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/fsbRequestPhasePacketB.png" alt="FSB Request Phase, Packet B"></p> <p>The attribute signals are interesting: they reflect the 5 types of memory caching behavior available in Intel processors. By putting this information on the FSB, the request agent lets other processors know how this transaction affects their caches, and how the memory controller (northbridge) should behave. The processor determines the type of a given memory region mainly by looking at page tables, which are maintained by the kernel.</p> <p>Typically kernels treat <strong>all RAM memory as write-back</strong>, which yields the best performance. In write-back mode the unit of memory access is the <a href="/post/intel-cpu-caches">cache line</a>, 64 bytes in the Core 2. If a program reads a single byte in memory, the processor loads the whole cache line that contains that byte into the L2 and L1 caches. When a program <em>writes</em> to memory, the processor only modifies the line in the cache, but does <em>not</em> update main memory. Later, when it becomes necessary to post the modified line to the bus, the whole cache line is written at once. So most requests have 11 in their length field, for 64 bytes. Here's a read example in which the data is not in the caches:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/memoryRead.png" alt="Memory Read Sequence Diagram"></p> <p>Some of the physical memory range in an Intel computer is <a href="/post/motherboard-chipsets-memory-map">mapped to devices</a> like hard drives and network cards instead of actual RAM memory. This allows drivers to communicate with their devices by writing to and reading from memory. The kernel marks these memory regions as <strong>uncacheable</strong> in the page tables. Accesses to uncacheable memory regions are reproduced in the bus exactly as requested by a program or driver. Hence it's possible to read or write single bytes, words, and so on. This is done via the byte enable mask in packet B above.</p> <p>The primitives discussed here have many implications. For example:</p> <ol> <li>Performance-sensitive applications should try to pack data that is accessed together into the same cache line. Once the cache line is loaded, further reads are <a href="/post/what-your-computer-does-while-you-wait">much faster</a> and extra RAM accesses are avoided.</li> <li>Any memory access that falls within a single cache line is guaranteed to be atomic (assuming write-back memory). Such an access is serviced by the processor's L1 cache and the data is read or written all at once; it cannot be affected halfway by other processors or threads. In particular, 32-bit and 64-bit operations that don't cross cache line boundaries are atomic.</li> <li>The front bus is shared by all agents, who must arbitrate for bus ownership before they can start a transaction. Moreover, all agents must listen to all transactions in order to maintain cache coherence. Thus bus contention becomes a severe problem as more cores and processors are added to Intel computers. The Core i7 solves this by having processors attached directly to memory and communicating in a point-to-point rather than broadcast fashion.</li> </ol> <p>These are the highlights of physical memory requests; the bus will surface again later in connection with locking, multi-threading, and cache coherence. The first time I saw FSB packet descriptions I had a huge "ahhh!" moment so I hope someone out there gets the same benefit. In the next post we'll go back up the abstraction ladder to take a thorough look at virtual memory.</p>
[22 Comments](/comments/physical-memory.html)
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;When trying to understand complex systems, you can often learn a lot by stripping away abstractions and looking 
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
    
  </entry>
  
  <entry>
    <title>Cache: a place for concealment and safekeeping</title>
    <link href="https://manybutfinite.com/post/intel-cpu-caches/"/>
    <id>https://manybutfinite.com/post/intel-cpu-caches/</id>
    <updated>2009-01-12T13:11:08.000Z</updated>
    
    <content type="html"><![CDATA[<html><head></head><body><p>This post shows briefly how CPU caches are organized in modern Intel processors. Cache discussions often lack concrete examples, obfuscating the simple concepts involved. Or maybe my pretty little head is slow. At any rate, here's half the story on how a Core 2 L1 cache is accessed:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/L1CacheExample.png" alt="Selecting an L1 cache set (row)"></p> <p>The unit of data in the cache is the <strong>line</strong>, which is just a contiguous chunk of bytes in memory. This cache uses 64-byte lines. The lines are stored in cache banks or <strong>ways</strong>, and each way has a dedicated <strong>directory</strong> to store its housekeeping information. You can imagine each way and its directory as columns in a spreadsheet, in which case the rows are the <em>sets</em>. Then each cell in the way column contains a cache line, tracked by the corresponding cell in the directory. This particular cache has 64 sets and 8 ways, hence 512 cells to store cache lines, which adds up to 32KB of space.</p> <p>In this cache's view of the world, physical memory is divided into 4KB physical pages. Each page has <a href="http://www.google.com/search?hl=en&q=(4KB+/+64+bytes)" target="_blank" rel="noopener">4KB / 64 bytes</a> == 64 cache lines in it. When you look at a 4KB page, bytes 0 through 63 within that page are in the first cache line, bytes 64-127 in the second cache line, and so on. The pattern repeats for each page, so the 3rd line in page 0 is different than the 3rd line in page 1.</p> <p>In a <strong>fully associative cache</strong> any line in memory can be stored in any of the cache cells. This makes storage flexible, but it becomes expensive to search for cells when accessing them. Since the L1 and L2 caches operate under tight constraints of power consumption, physical space, and speed, a fully associative cache is not a good trade off in most scenarios.</p> <p>Instead, this cache is <strong>set associative</strong>, which means that a given line in memory can only be stored in one specific set (or row) shown above. So the first line of <em>any physical page</em> (bytes 0-63 within a page) <strong>must</strong> be stored in row 0, the second line in row 1, etc. Each row has 8 cells available to store the cache lines it is associated with, making this an 8-way associative set. When looking at a memory address, bits 11-6 determine the line number within the 4KB page and therefore the set to be used. For example, physical address 0x800010a0 has <a href="http://www.google.com/search?q=0x800010a0 in binary" target="_blank" rel="noopener">000010</a> in those bits so it must be stored in set 2.</p> <p>But we still have the problem of finding <em>which</em> cell in the row holds the data, if any. That's where the directory comes in. Each cached line is <em>tagged</em> by its corresponding directory cell; the tag is simply the number for the page where the line came from. The processor can address 64GB of physical RAM, so there are <a href="http://www.google.com/search?hl=en&q=lg(64GB+/+4KB)" target="_blank" rel="noopener">64GB / 4KB</a> == 2<sup>24</sup> of these pages and thus we need 24 bits for our tag. Our example physical address 0x800010a0 corresponds to page number <a href="http://www.google.com/search?hl=en&q=0x800010a0+Bytes+/+4KB" target="_blank" rel="noopener">524,289</a>. Here's the second half of the story:</p> <p align="center"><img src="http://static.duartes.org/img/blogPosts/selectingCacheLine.png" alt="Finding cache line by matching tags"></p> <p>Since we only need to look in one set of 8 ways, the tag matching is very fast; in fact, electrically all tags are compared simultaneously, which I tried to show with the arrows. If there's a valid cache line with a matching tag, we have a cache hit. Otherwise, the request is forwarded to the L2 cache, and failing that to main system memory. Intel builds large L2 caches by playing with the size and quantity of the ways, but the design is the same. For example, you could turn this into a 64KB cache by adding 8 more ways. Then increase the number of sets to 4096 and each way can store <a href="http://www.google.com/search?hl=en&q=64+Bytes+*+4096" target="_blank" rel="noopener">256KB</a>. These two modifications would deliver a 4MB L2 cache. In this scenario, you'd need 18 bits for the tags and 12 for the set index; the physical page size used by the cache is equal to its way size.</p> <p>If a set fills up, then a cache line must be evicted before another one can be stored. To avoid this, performance-sensitive programs try to organize their data so that memory accesses are evenly spread among cache lines. For example, suppose a program has an array of 512-byte objects such that some objects are 4KB apart in memory. Fields in these objects fall into the same lines and compete for the same cache set. If the program frequently accesses a given field (<em>e.g.</em>, the <a href="http://en.wikipedia.org/wiki/Vtable" target="_blank" rel="noopener">vtable</a> by calling a virtual method), the set will likely fill up and the cache will start trashing as lines are repeatedly evicted and later reloaded. Our example L1 cache can only hold the vtables for 8 of these objects due to set size. This is the cost of the set associativity trade-off: we can get cache misses due to set conflicts even when overall cache usage is not heavy. However, due to the <a href="/post/what-your-computer-does-while-you-wait">relative speeds</a> in a computer, most apps don't need to worry about this anyway.</p> <p>A memory access usually starts with a linear (virtual) address, so the L1 cache relies on the paging unit to obtain the physical page address used for the cache tags. By contrast, the set index comes from the least significant bits of the linear address and is used without translation (bits 11-6 in our example). Hence the L1 cache is <strong>physically tagged</strong> but <strong>virtually indexed</strong>, helping the CPU to parallelize lookup operations. Because the L1 way is never bigger than an MMU page, a given physical memory location is guaranteed to be associated with the same set even with virtual indexing. L2 caches, on the other hand, must be physically tagged and physically indexed because their way size can be bigger than MMU pages. But then again, by the time a request gets to the L2 cache the physical address was already resolved by the L1 cache, so it works out nicely.</p> <p>Finally, a directory cell also stores the <em>state</em> of its corresponding cached line. A line in the L1 code cache is either Invalid or Shared (which means valid, really). In the L1 data cache and the L2 cache, a line can be in any of the 4 MESI states: Modified, Exclusive, Shared, or Invalid. Intel caches are <strong>inclusive</strong>: the contents of the L1 cache are duplicated in the L2 cache. These states will play a part in later posts about threading, locking, and that kind of stuff. Next time we'll look at the front side bus and how memory access <em>really</em> works. This is going to be memory week.</p> 
<p><strong>Update</strong>: <a href="http://www.findinglisp.com/blog/" target="_blank" rel="noopener">Dave</a> brought up direct-mapped caches in a <a href="http://duartes.org/gustavo/blog/post/intel-cpu-caches#comment-12687" target="_blank" rel="noopener">comment below</a>. They’re basically a special case of set-associative caches that have only one way. In the trade-off spectrum, they’re the opposite of fully associative caches: blazing fast access, lots of conflict misses.</p>
<p><a href="/comments/cache.html">26 Comments</a></p>
</body></html>]]></content>
    
    <summary type="html">
    
      
      
        &lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;p&gt;This post shows briefly how CPU caches are organized in modern Intel processors. Cache discussions often lack co
      
    
    </summary>
    
      <category term="software illustrated" scheme="https://manybutfinite.com/category/software-illustrated/"/>
    
      <category term="internals" scheme="https://manybutfinite.com/category/internals/"/>
    
    
  </entry>
  
</feed>
