Math, Eloquentlytag:typepad.com,2003:weblog-16810102008-07-31T20:59:30-05:00Patient, detailed explanations of basic mathematical ideasTypePadWhat Is the "Margin of Error"?tag:typepad.com,2003:post-535526422008-07-31T20:59:30-05:002008-07-31T20:59:30-05:00Newspaper and television reporting on election polls often mentions the "margin of error" of a specific poll. In fact, if the poll doesn't state a margin of error, it's not considered very reliable, and it probably won't be reported. Statements...Polymath
<div xmlns="http://www.w3.org/1999/xhtml"><p>Newspaper and television reporting on election polls often mentions the "margin of error" of a specific poll. In fact, if the poll doesn't state a margin of error, it's not considered very reliable, and it probably won't be reported. Statements like the following are common: "In this recent poll, candidate A is preferred by 44% of those polled, while candidate B is preferred by 42%. But since the margin of error was ±3%, this is a statistical dead heat." Most mathematicians would consider that poor reporting, and most people don't really know what that "margin of error" means. So let me explain.</p><p>I will not go into deep statistical detail here, but I hope to give you enough insight to interpret polls a bit better. So let's consider an example situation: you live in the city of Electopolis, which (conveniently) has a voting population of exactly 1,000,000 people. If you could read everyone's mind about next month's election, you'd know that 490,000 people (or 49%) plan to vote for candidate A, 450,000 people (45%) for candidate B, and 60,000 (6%) haven't made up their minds yet.</p><p>But, of course, you can't read everyone's mind. Yet you'd like to have some idea of who might win this very important election. So you decide to call a bunch of people at home and ask them who they're planning on voting for (that is, you're going to take a poll). After calling 10 people, 6 tell you they're going to vote for candidate A, 3 for B, and 1 hasn't decided yet. Wow! Candidate A has a 30% lead in your poll! She will clearly win, right? Are you pretty confident about that prediction? If I also called 10 random people, are you pretty sure I'd get the same result?</p><p>When pressed, you're not so sure, are you? It does seem possible that the 10 people you happened to call might not accurately represent all 1,000,000 people in your city. In fact, while it seems unlikely, it does seem at least <em>a little bit possible</em> that if you did your little calling-10-people experiment often enough, then on some of those experiments, the results would be completely unrepresentative of the population. For example, it turns out that if you did the experiment 100,000 times, then about 34 of those times, all 10 people would happen to ones planning to vote for candidate B! That's a pretty small chance, of course. But how do you know this isn't one of those times, and your results are just completely useless?</p><p>Well, for most people, the solution is obvious: call more people. I think almost everyone has the right intuition here; that is, if you call 100 (or 1000) people, the chances are much better that your sample results will closely match the actual percentages from the population of the city. In fact, if you do the calculations, you'll find that it takes fewer calls than you might think to be reasonably sure your poll percentage matches your population percentage somewhat closely. 1000 is plenty, it turns out, and if you ask another 1000, it won't improve this accuracy much.</p><p>So here's the first (very rough) answer to the question posed in the title: The margin of error measures the effect of various sample sizes on the likelihood of a match between your poll results and the true percentages in the population. This is a <em>crucial</em> point: there are many, many factors that influence the error in a poll, and I'll discuss some of these below, but the reported "margin of error" measures <span style="text-decoration: underline;"><strong><em>only</em></strong></span> the effects of sample size. In other words, if Polly Pollster's margin of error is smaller than Quincy Questioner's, that can only mean that Polly asked more people than Quincy. This emphatically <em>does not</em> mean that Quincy's poll is less accurate—Quincy might have better polling techniques that reduce other possible sources of error. The only way to know is to look at their past poll results.</p><p>Okay, so the margin of error measures the effect of sample size on the accuracy of a poll. Let's make that more precise; what does the number reported as that margin of error mean, exactly? Well, let's continue our example. It turns out (you can look it up <a href="http://www.agbnielsen.co.nz/moe_table.aspx">here</a>) that if you have a sample of 1000 people, your margin of error is going be very close to 3%. Let's use the poll results from the first paragraph (44% for A, 42% for B, 14% undecided, margin of error is ±3%) in our Electopolis example. We now know that the 3% comes from having interviewed about 1000 people. But what does the 3% mean? Since the difference between the candidates is only 2 percentage points in the poll, does that mean we have to remain clueless about who's ahead? The answer is no. It is still more likely that candidate A is winning.</p><p>Here's why. Choosing 1000 people randomly <em>might</em> lead you astray if you happen to randomly pick a non-representative sample whose percentages don't match up with the population's percentages. But surely that's not the <em>most likely</em> result. The most likely result is that the sample matches the population. If you did the calling-1000-people experiment over and over and over, the results would cluster around the true percentages, except that every now and then, chance dictates that one or more of the numbers will be off—usually only a little, occasionally a lot. How far off? And how often would that happen? Aha! <strong>That</strong> is what the margin of error measures.</p><p>Before continuing, you have to know that in all of these polls, there's one number that they haven't told you. That number is usually (often enough that you can assume it) 95%. It's called the 95% confidence level. The idea of a confidence level should make some sense to you since we've been talking about how confident we are that our poll numbers reflect the actual numbers (I should really write "actual" numbers using quotes here—more on that in the first note on technical details below). The 3% margin of error means that the number reported in the poll will fall within a 3% stones-throw of the true percentage (answering 'how far off?') 95% of the time (answering 'how often?').</p><p>So that brings us to a much more refined answer to the question posed by the title. In our example, 49% of the population plans to vote for candidate A. The ±3% margin of error means that if you do the calling-1000-people experiment 100 times, then in about 95 of those times, we'd expect a score between 46% and 52% (that's 3 percentage points on either side of 49%) for candidate A. And on those occasions when we do happen to get a score outside of that range, probability dictates that most of those scores will not fall very far outside that range. Every once in a while, though, crazy things will happen, and we will indeed get a score quite far away from the true percentage.</p><p>Let's look at the poll results again. The 44% score for candidate A is apparently a little bit of a fluke—it falls outside (though not too far outside) the ±3% range. And by chance, candidate B's 42% score is also low, though (barely) within the margin of error. But when you hear the results of the poll, your reaction shouldn't be "the margin of error makes this too close to call". Rather, your reaction should be "This does seem to be a pretty close election, but the most likely state of the world is that candidate A is leading by a little bit".</p><p><em><strong>Technical note #1: the "actual" numbers?</strong></em></p><p>True statisticians might have cringed when I discussed comparing the poll results to the actual numbers. In their view, there's simply no way to know the actual numbers without asking every single person. Even the election result won't reveal the actual numbers at the time of the poll, since the undecided people will decide, and since some people will change their minds. So to a statistician, the best available estimate of the actual percentages <em>are</em> the percentages from the poll. Thinking about this in the way described above will help your understanding of poll results immensely, but <em>technically</em>, 44% for candidate A with a ±3% margin of error (at a 95% confidence level) really means that the pollster expects that 95% of identical polls (using a different, but equally-sized random sample) would put candidate A's score within 3 percentage points of 44%, not 49% (because there's no way to discover that 49% number). To the pollster, then, the margin of error is a measure, derived from the sample size, of how closely they would expect the results of similar polling experiments to match theirs.</p><p><em><strong>Technical note #2: small numbers.</strong></em></p><p>In our example, 6% of the population of Electopolis was undecided. If you think carefully about the chances of getting that 6% wrong in a 1000-person poll, you'll notice that it's harder to miss the 6% mark by a lot—say, 10% too high—than it would be to miss a 49% mark by 10%. There simply aren't that many undecided people to disproportionately choose in your random choice. So at smaller reported percentages in the poll, the margin of error decreases. In <a target="_blank" href="http://www.agbnielsen.co.nz/moe_table.aspx">this table</a><span>, for instance</span>, look at the right-most column, which represents margins of error for 1000-person samples. The 3% we've been working with shows up at the bottom, for a reported 50% score in the poll. But in the middle of that column, we can see that if the poll reports only 15%, the margin of error is just over 2%. So the 14% we found in our example poll for the undecided voters is actually extremely unlikely, given the actual 6% value. 95% of the time, that score should be between 4% and 8%—2 percentage points on either side of 6%, because the margin of error near 15% is ±2%, not ±3%. The reasoning in the example poll is that it was very unlikely that A's score would be 5% off, and it was somewhat unlikely that B's score would be 3% off, and it was even more unlikely that they would <em>both</em> turn out low (if you missed some A-voters in your random sampling, it should be more likely that you found B-voters instead of undecided voters simply because there's more of the B-voters).</p><p><em><strong>Technical note #3: other sources of error.</strong></em></p><p>This should probably not appear in a technical note because it's so important. But putting it in the text above would have distracted from the main point. We have seen that the margin of error measures the amount of possible discrepancy, due to sample size, between a poll's resulting figures and the presumed actual population figures. Crucially, then, the margin of error does <em>not</em> take other possible polling errors into account. There are many such possible errors, all discussed elsewhere on the web more extensively than I can here. But any list of reasons that it's so hard to conduct an accurate poll includes:</p><blockquote><p>1. How can you be sure your sample is really random?</p></blockquote><p>Do you call people during the day? Then you'll only reach people who don't have an out-of-the-house day job, such as night-shift workers, stay-at-home parents, the unemployed, etc. It seems highly plausible that this group of people would have different voting patterns than people who do have the out-of-the-house day jobs. Do you even use the telephone at all? The only listed numbers are land lines, not the cell phones that some people (especially young people who move a lot) use as their only phones, and you therefore risk missing a large demographic. This is a huge topic worthy of dissertations.</p><blockquote><p>2. Is your question phrased properly? Or is it leading?</p></blockquote><p>"Do you support redrawing the City's boundaries beyond their historical ones?" could sound a lot different from "Do you support increasing the City's tax base by annexing Edgeville?"</p><blockquote><p>3. Are people telling the truth?</p></blockquote><blockquote><p>4. Are the people who refuse to answer the survey question likely to hold similar positions?</p></blockquote><p>Accurate polling is tough, and pollsters themselves will be the first to tell you that. All sorts of factors can skew a poll away from divulging the true sentiment of the population, and only one of those factors is the sample size. However, that is the only factor whose effect is completely measurable, and that measure is exactly what we mean by the "margin of error".</p></div>
The Logic of Slope (Part 1)tag:typepad.com,2003:post-530301902008-07-24T21:36:13-05:002008-07-24T21:36:13-05:00If you've learned about graphing lines, you've almost surely learned about the concept of slope. This concept becomes more and more important as math gets more and more advanced—one of the basic forms of calculus (called differential calculus) has two...Polymath
<div xmlns="http://www.w3.org/1999/xhtml"><p>If you've learned about graphing lines, you've almost surely learned about the concept of slope. This concept becomes more and more important as math gets more and more advanced—one of the basic forms of calculus (called differential calculus) has two ideas at its core: the concept of slope, and the concept of limit (which you may not have studied yet). Yet many students learn a formula for slope without really understanding why that formula makes sense. This essay will try to explain why that formula absolutely must look the way it does if it is going to capture your basic intuitions about slope.</p><p>Let's imagine that you're riding a bike (a magic bike that can't crash!) on a hill. To keep things consistent, let's say you're riding from left to right like the cyclist below. When you look at the pictures, think about what your intuition is telling about which hill is the steepest.</p><p><a style="display: inline;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e553cb3e4e8834-pi"><img class="at-xid-6a00d8341bfda053ef00e553cb3e4e8834 " alt="Bikers" title="Bikers" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e553cb3e4e8834-800wi" border="0"></a>
</p><p>Most people, I think, have no trouble seeing that cyclist C is riding on the steepest hill. You can even say that C's hill is steeper than B's hill, even though one is uphill and one is downhill (going from left to right, remember). Comparing the steepness of A's hill with B's, though, is a little trickier. They appear to have about the same steepness in opposite directions.</p><p>The concept of 'slope' was created to put a measurement to this intuitive concept. By 'measurement' I mean that we're trying to come up with a number that quantifies the intuition you already have. This number should increase as the steepness gets bigger (like measurements of distance, weight, temperature, etc. increase when the length, mass, hotness, etc. get bigger), and if we can manage it, the ideal formula would also reflect the idea that hills A and B have the same slope in opposite directions.</p><p>So to come up with a way to quantify this idea, let's look at some other examples of steepness—this time using stairs instead of hills. Compare the steepness of the staircases below. Note carefully that among staircases A, D, and E, the <em>rise</em> of each successive step of the staircases is exactly the same. Only the <em>run</em> of each step (the space the penguin has to stand on) changes. This is clearly one way to change the slope of a staircase; even though the penguin has to step up the same height in staircases A and E, we want to be able to say that staircase E is nonetheless steeper.</p><p><a style="display: inline;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e553cb54128834-pi"><img class="at-xid-6a00d8341bfda053ef00e553cb54128834 image-full " alt="Penguins" title="Penguins" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e553cb54128834-800wi" border="0"></a>
</p><p>The other way to increase the steepness of the staircase, of course, is to increase the <em>rise</em> of each successive step while leaving the <em>run</em> alone. This is demonstrated by staircases A, B, and C.</p><p>These two notions give us our first hint about how to construct a calculation to measure slope (of a staircase, at least). Namely, the two things that determine the slope are the rise from each step to the next and the run (or depth) of any individual step. Furthermore the slope has to increase either when the rise increases (e.g. staircase B to staircase A to staircase C) or when the run decreases (e.g. staircase D to staircase A to staircase E).</p><p>But we can say more. Let's look more closely at the stairs above. Staircase A's rise and run are equal. To create staircases D and E, I doubled and halved the run (respectively). To create staircases B and C, I instead halved and doubled A's rise (respectively). And when I rearrange the staircases like I did below, you'll see that doubling the run and halving the rise give the same resulting slope. And, of course, vice-versa.</p><p><a style="display: inline;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e553afab228833-pi"><img class="at-xid-6a00d8341bfda053ef00e553afab228833 " alt="Penguins aligned" title="Penguins aligned" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e553afab228833-800wi" border="0"></a>
</p><p>Since most of you have come here for further explanation of slopes, you probably already know the solution to the slope calculation problem. The trick is to divide those numbers. Notice how well division works here. If we put the rise in the numerator (top) of the division to get the familiar (to most) "rise over run", everything we want to be true about the number really is true! If the rise increases by doubling, but the run stays the same (from staircase A to C), then the fraction increases—in fact, it doubles. Likewise, if the run (the denominator) is cut in half instead (staircase A to E), the fraction also doubles. Just what we want. And of course, decreasing the numerator or increasing the denominator decreases the fraction—again what we want. Note further, that we decided to put the rise on top in order for the fraction to increase and decrease in exactly this manner. If we had put the run on top instead, the slope numbers would increase when we'd want them to decrease. Be sure you understand that decision before you continue reading.</p><p>But "rise over run" is an informal idea. I almost hate for my students to remember it that way, because it doesn't guarantee that they'll remember how to use it to find an actual number when they need to calculate a slope. So let's go further, and see how to calculate using that idea. We will need, of course, some points on a graph to do this.</p><p><a style="display: inline;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e553afcdd38833-pi"><img class="at-xid-6a00d8341bfda053ef00e553afcdd38833 " alt="Graph" title="Graph" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e553afcdd38833-800wi" border="0"></a>
</p><p>First, let's calculate the slope between points Q and R. We need to know the lengths of the <em>rise</em> and the <em>run</em> from Q to R. That is, this calculation requires that we know the change in the vertical (<em>y</em>) coordinate and the change in the horizontal (<em>x</em>) coordinate if you walk from Q to R. That calculation is easy, right? The <em>y</em>-coordinate changes 3 (going from 2 to 5), and the <em>x</em>-coordinate changes 4 (going from 5 to 9). So the fraction we decided on above gives a slope of 3/4.</p><p>So, what arithmetic calculation did we do to get those numbers? Clearly, we subtracted "5 minus 2" to get 3 and "9 minus 5" to get 4. In fact, subtraction is just about always what the word "change" means in math. If the price of gas changes from $3.50 to $3.90 per gallon, we subtract "final price minus initial price" to get a change of +40¢. If it changes from $3.90 to $3.50 per gallon, we still subtract "final price minus initial price" to get a change of –40¢. So it should come as no surprise that we should be subtracting "final <em>y</em>-coordinate minus initial <em>y</em>-coordinate" to get the change in <em>y</em>-coordinate, which is the rise.</p><p>If you can remember that "change" requires a subtraction, then a more reliable way to memorize a word formula is:</p><p><a style="display: inline;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e553d2333b8834-pi"><img class="at-xid-6a00d8341bfda053ef00e553d2333b8834 " alt="Wordformula" title="Wordformula" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e553d2333b8834-800wi" border="0"></a>
</p><p>And once you automatically associate "change" and "subtraction", you can memorize the full symbolic formula:</p><p><a style="display: inline;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e553b60f9f8833-pi"><img class="at-xid-6a00d8341bfda053ef00e553b60f9f8833 " alt="Formula" title="Formula" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e553b60f9f8833-800wi" border="0"></a>
</p><p>I include the formula that uses the little triangle (the Greek letter 'delta') because that's the way many people learn it. The 'delta' is supposed to be translated as "change in", so that it turns directly into the word formula above it. But for people just learning the formula, the delta can be confusing, so I recommend using the version with plain subtraction. It means "subtract the <em>y</em>-coordinate of point number 2 minus the <em>y</em>-coordinate of point number 1, and the <em>x</em>-coordinate of point number 2 minus the <em>x</em>-coordinate of point number 1, and then divide those differences". And at this point, each part of that should make sense: subtract because you're finding changes, and divide to make the numbers work in a way that matches our intuition.</p><p>Any questions?</p><p><em><strong>But what if you use a different point in your calculation? Won't it turn out different?</strong></em></p><p>No, it won't; not while all the points are on the same line. Remember that if your rise and run are each (say) three times as long, the slope should stay the same (remember conjoining the different staircases?). That's just what happens if we use point P instead of point Q in our calculation. From point P to point R, the change in <em>y</em> is 3 times as large (9; and notice that subtracting "5 minus –4" works well to get the 9, even though one of the numbers is negative) as it was from point Q. But the change in <em>x</em> is also 3 times as large (12) as it was before. The fraction works perfectly! 9/12 simply reduces to 3/4 again, and thus using P instead didn't make any difference. If the calculation <em>had</em> turned out a different result, the three points could not have been on the same line. This demonstrates an important geometrical interpretation of a line: <em>a set of points all fall on the same line exactly when all pairs of points you might choose from that set give the same result in a slope calculation</em>.</p><p><em><strong>How did you know to make Q point number 1 and R point number 2. If I did it the other way, would I get it wrong?</strong></em></p><p>No, you wouldn't get it wrong. It shouldn't matter which you <em>call</em> point number 1 and point number 2. The only difference in the calculation would be that you and I would subtract each of our numbers in the reverse order. Notice that it won't change the result: reversing a subtraction just gives the opposite result, right? (10 minus 17 is the opposite of 17 minus 10.) But that will happen in <em>both</em> the top and the bottom of the fraction. So I will be dividing a positive number by a positive number, and you will be dividing a negative number by a negative number, and our results will be the same. As long as you're consistent (don't call Q point number 1 on top, and R point number 1 on the bottom), the naming of the points makes no difference. In terms of your intuition about hills, walking from point R to point Q does lead you downhill, but you're walking backwards (since left-to-right is forwards, remember), so the slope hasn't changed.</p><p><em><strong>What about downhill slopes? How are they different?</strong></em></p><p>Well, we have a good example of that going from point G to point H. The change in <em>y</em> is 7 minus –8, or 15. The change in <em>x</em> is –2 minus 1, or –3. Note that I used G as point 2 in my subtraction—the subtraction looked easier that way for the <em>y</em>-values, and if it makes no difference, I might as well make it easy on myself. When I divide, I get –5 for the slope. The downhill slope showed up in the algebra as a negative number. Note that in the case of a downhill slope, either the rise <em>or</em> the run (but crucially, not both) must be negative, giving a negative result when you divide. This is, of course, exactly the last criterion we wanted our slope calculation to meet. If two slopes are identical, except that one is uphill and the other downhill, that will indeed show up in the measurements of the slopes. The two slopes will be exact opposites of each other: a perfect match for our intuition.</p></div>
a Plus b, Squaredtag:typepad.com,2003:post-521016882008-07-03T08:16:23-05:002008-07-03T08:16:23-05:00Go ahead, try this: ask a high school math teacher for the single student error across all grade levels that's the most infuriating. I'd bet that the most common answer you get is some version of how students incorrectly square...Polymath
<div xmlns="http://www.w3.org/1999/xhtml"><br><p>Go ahead, try this: ask a high school math teacher for the single student error across all grade levels that's the most infuriating. I'd bet that the most common answer you get is some version of how students incorrectly square the sum of two numbers (as in (<em>a</em> + <em>b</em>)<sup>2</sup>) to get the sum of the squares of the individual numbers (as in <em>a</em><sup>2</sup> + <em>b</em><sup>2</sup>). In fact, though, those expressions are not equal; that's not how squaring works. Indeed, in my experience, students who can absorb the intuition about why they are not equal are much more likely to succeed in future. This lesson will try to make sense out of this inequality:</p><p><a style="display: block;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e55399b7ac8834-pi"><img class="at-xid-6a00d8341bfda053ef00e55399b7ac8834 " alt="Basic fact" title="Basic fact" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e55399b7ac8834-800wi" border="0"></a>
<br>Before explaining why it isn't true, I think it's important to understand why so many people think it <em>is</em> true. There are several reasons a student might think it's true, and each of those reasons has its own logic.</p><p>(1) <em>I know it's true for multiplication:</em></p><p><a style="display: block;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5537e4b3a8833-pi"><img class="at-xid-6a00d8341bfda053ef00e5537e4b3a8833 " alt="Ab squared" title="Ab squared" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5537e4b3a8833-800wi" border="0"></a><em>so why isn't it true if the symbol between them is addition instead?<br></em></p><p>This is a classic case of a student paying more attention to the notation than to what the symbols actually <em>mean</em>. Squaring means multiplying an expression times itself: (<em>ab</em>)<sup>2</sup> means (<em>ab</em>)(<em>ab</em>). But crucially, that's a whole bunch of multiplications. And we all know that you can regroup and rearrange items being multiplied at will (formally, we say that multiplication is associative and commutative). And when you do regroup and rearrange, you see there are two <em>a</em>'s and two <em>b</em>'s all multiplied together; thus <em>a</em><sup>2</sup><em>b</em><sup>2</sup>. But look at what (<em>a</em> + <em>b</em>)<sup>2</sup> means: (<em>a</em> + <em>b</em>)(<em>a</em> + <em>b</em>). Where's the possibility for rearranging so simply? It's not there. The mixture of <em>plus</em> and <em>times</em> makes this expression harder to work with than if it's all multiplication.</p><p>(2) <em>I know it's legal to do this thing called 'distributing':</em></p><p><a style="display: block;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5539b7a068834-pi"><img class="at-xid-6a00d8341bfda053ef00e5539b7a068834 " alt="Distribute" title="Distribute" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5539b7a068834-800wi" border="0"></a><em>so why can't I distribute the exponent just like I can distribute that funny symbol? (What is that thing, anyway?)</em></p><p>That thing is a capital Greek letter 'psi', and I used it to demonstrate how <em>anything</em> can be distributed, even if it looks complicated (we'll need that idea in a minute). But many students don't know that the word 'distribute' is actually a nickname for a much longer description of the process shown above: it's the 'distributive property of multiplication over addition'. Distributing leaflets over a parking lot means that every person in the parking lot gets a leaflet, and distributing multiplication over addition means that every element of the addition (the <em>a</em> and the <em>b</em>) gets multiplied. It's just a fact about our number system that you can choose to add two numbers before you multiply the sum by a third number (the left side of the above equation), or you can multiply each of the two numbers by the third number individually first (the right side), then add the products, and you'll get the same result. In fact, the equation in (1) above is a demonstration of a different distributive property: 'the distributive property of exponents over multiplication'. And that one is true because multiplication is associative and commutative.</p><p>But the 'distributive property of exponents over addition' is simply not valid. Various distributive properties are different mathematical processes (a subtlety you might miss if you don't know their complete names), and they don't all have to be valid. In fact, most are not; any one that <em>is</em> valid is special and important.</p><p>(3) <em>But...it just seems like they should be equal!</em></p><p>Well...only at first maybe. It's a common misconception that mathematical processes that look right on the page or seem right in your head must be true. Mathematicians approach it in the completely opposite direction—in a way, nothing should be true until you can prove it's true. And equality in this case just...isn't true! Too bad! Out of luck! Oh well, we'll learn to deal with it!</p><p>Once you look at this situation very carefully, using examples and various different mathematical interpretations, you'll find that you probably don't really even believe that the two sides could be equal. You already have the knowledge to convince yourself of that. Let's look at some of those examples.</p><p>If you're in a candy store scooping candy into a bag, and you buy one-and-a-half pounds of candy that costs $1.50 per pound, how much will you get charged? You might not know the answer off the top of your head, but I'll bet you know that $1.25 is a ridiculous answer. You're buying more than a pound of candy, so the price would just <em>have</em> to be more than $1.50. Yet if you think that the square of a sum is the sum of the individual squares, you'd have to believe that $1.25 is right:</p><p><a style="display: block;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5539b84b38834-pi"><img class="at-xid-6a00d8341bfda053ef00e5539b84b38834 " alt="1.5" title="1.5" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5539b84b38834-800wi" border="0"></a>One-and-a-half pounds times one-and-a-half dollars per pound doesn't equal one-and-a-quarter dollars. Note the crucial placement of the not-equals-sign. Be sure you understand the reason for every step of that line of math.</p><p>I'll bet you can probably square the number 20 in your head. 2 times 20 is 40, so 20 times 20 is 400. You might not be as quick with squaring 21. The actual answer isn't important, but do you have the intuition that it's <em>not</em> 401? Well that intuition comes from a deep-down intuition that the square of a sum is not the sum of the individual squares:</p><p><a style="display: block;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5538012498833-pi"><img class="at-xid-6a00d8341bfda053ef00e5538012498833 " alt="Numbers" title="Numbers" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5538012498833-800wi" border="0"></a>Again, some part of your brain already knows that (20 + 1)<sup>2</sup> isn't the same as 20<sup>2</sup> + 1<sup>2</sup>.</p><p>What <em>does</em> (<em>a</em> + <em>b</em>)<sup>2</sup> equal, then? Or, put another way, is there an expression without parentheses that always has the exact same value? The formal algebra is below. It uses the distributive property of multiplication over addition that we saw above. The popular term for the process that you might know is FOILing:</p><p><a style="display: block;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e55380175b8833-pi"><img class="at-xid-6a00d8341bfda053ef00e55380175b8833 image-full " alt="FOIL" title="FOIL" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e55380175b8833-800wi" border="0"></a>Again, be sure you understand every step in that process.</p><p>Now, <em>a</em><sup>2</sup> + 2<em>ab</em> + <em>b</em><sup>2</sup> isn't exactly an obvious way to rewrite (<em>a</em> + <em>b</em>)<sup>2</sup>, but it does have the advantage of being correct! If you try to interpret that in a different way, though (geometrically, for example), it can make much more sense. The natural way to understand the concept of squaring is through looking at the area of a square—which is calculated by squaring. So below is a picture of a square whose sides are each <em>a</em> + <em>b</em> long. To make that more clear, those sides are broken up into their separate <em>a</em> and <em>b</em> parts.</p><p><a style="display: block;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5539b8bf08834-pi"><img class="at-xid-6a00d8341bfda053ef00e5539b8bf08834 " alt="Square example" title="Square example" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5539b8bf08834-800wi" border="0"></a>
Asking about (<em>a</em> + <em>b</em>)<sup>2</sup>, then, is just like asking about the area of that whole square. But the whole square is broken up into smaller squares and rectangles, and we know enough information to calculate each of those smaller parts separately. The areas of the two smaller squares are calculated below.</p><br><p><a style="display: block;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5539b8e168834-pi"><img class="at-xid-6a00d8341bfda053ef00e5539b8e168834 " alt="Square with areas" title="Square with areas" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5539b8e168834-800wi" border="0"></a>
Notice that the areas of the two smaller squares together come nowhere close to totaling the area of the large square. In algebra terms, we'd have to say that (<em>a</em> + <em>b</em>)<sup>2</sup> must simply be greater than <em>a</em><sup>2</sup> + <em>b</em><sup>2</sup>. Of course that means they can't be equal, which is exactly what we've been trying to understand! This picture actually tells us even more, though. It tells us how much greater. Each of the blue rectangles has a length of <em>a</em> and a width of <em>b</em>, so they each have an area of <em>a</em> times <em>b</em>. And there's two of them. Which means precisely that (<em>a</em> + <em>b</em>)<sup>2</sup><em> = a</em><sup>2</sup> + 2<em>ab</em> + <em>b</em><sup>2</sup>, just as we saw in the algebra.</p><p>Finally, I'll show one more way to understand the original inequality. This last way requires that you know what the Pythagorean theorem says, and I'll assume here that you do (if not, you can skip this part). Note the square built on the hypotenuse of the triangle below:</p><p><a style="display: block;" href="http://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5538475cc8833-pi"><img class="at-xid-6a00d8341bfda053ef00e5538475cc8833 " alt="Pythagorean example" title="Pythagorean example" src="https://polymathematics.typepad.com/.a/6a00d8341bfda053ef00e5538475cc8833-800wi" border="0"></a>
</p><p></p><p>The square has an area of <em>c</em><sup>2</sup>, which the Pythagorean theorem says is equal to <em>a</em><sup>2</sup> + <em>b</em><sup>2</sup>. But that is exactly the right hand side of our original inequality. So it makes sense to ask about the square represented by the left hand side. A square with <em>that</em> area would have to have a side length of <em>a</em> + <em>b</em>. But it's clear from looking at the triangle that <em>a</em> + <em>b</em> has to be bigger than <em>c</em> (walking along the hypotenuse must require fewer steps than walking along the legs of the triangle—technically that's called 'the triangle inequality'). That is, the two sides of the inequality each geometrically represent the area of a square, but those squares can't be the same size, so the two expressions can't be equal.</p><p>That is, I hope, enough to convince you.</p></div>