Wednesday, February 11, 2015

Sampling Error- SE(p) Overview

Sampling Error

Research error is not an easy topic to discuss because it involves a variety of concepts, words, and formulas.  Because so many different types of people read this column, I have tried to simplify this discussion so just about anyone can understand it.

However, with that said, you may want to get a 6-pack of your favorite beverage because this answer is a bit long.  OK, so on to your questions . . .

All research that analyzes a sample selected from a population has some degree of sampling error or standard error, both of which are abbreviated "SE."  Sampling error is the estimated difference between the data from a sample and the "true" or "real" value that would be found if a census (analyzing the entire population) were conducted.  There is no sampling error when a census is conducted because the calculations are based on the entire population, but a census still contains measurement error and random error, so even a census is not a perfect measurement.

Now, before I send you to the Arbitron sampling error tables, I need to explain a few basics about sampling error.  First, there are two terms or concepts that important in sampling error:  Confidence Interval andConfidence Level.

Confidence Interval is the "±" percentage you hear about when a research study is conducted.  For example, a TV news anchor may report the results of a political poll and say something like, "The results of the poll have a margin of error of plus or minus 5 points (or 5 percent).  The plus or minus 5 percent is the Confidence Interval for the study.

Confidence Level is also a percentage and provides an indication of how confident a person can be with the results of a study.  For example, the 95% Confidence Level indicates that a person can be 95% certain that the results fall within the range of the Confidence Interval; the 99% Confidence Level indicates that a person can be 99% certain that a study's results fall within the range of the Confidence Interval.  Most behavioral researchers use the 95% confidence level; researchers in "hard" sciences of medicine, physics, and chemistry often use Confidence Intervals as high as 99.99% (or .0001).

Another way to understand the Confidence Level is that it provides an indication of how likely it is that the results of a study are due to error or chance.  For example, the 95% Confidence Level means that there is a 5% probability that the results of the study are due to error, chance, or both.  In other words, the higher the Confidence Level, the lower the probability that the results are spurious (false); the lower the Confidence Level, the more likely that the results are due to error or chance.  (Most behavioral science researchers use the 95% or 99% Confidence Level, although the 90% Level is occasionally be used in exploratory research—studies where the investigator is in the preliminary stages of an investigation. 

There is no rule related to the Confidence Level used in a research project.  Literally any Confidence Level can be used, and the decision to use one Confidence Level over another is purely arbitrary.  The most fundamental reason for selecting one Confidence Level is the desire for accuracy (remember . . . the higher the Confidence Level, the less likely the probability that results are due to error or chance).

For example, let's say that a radio researcher uses the 90% Confidence Level.  This means that the researcher (or client) is willing to accept the fact that there is a 10% probability that the results are due to error or chance.  There is nothing wrong with this as long as everyone involved in the research knows and understands the "rules" of the study.  However, researchers in the hard sciences would never use such a low Confidence Level.  As mentioned, most researchers in the hard sciences often use Confidence Levels of 99.99%, which means that there is only one chance in 1,000 that the results are due to error or chance.  Can you imagine the criticism a cancer researcher would receive if the results from a study testing for new drug were tested at the 90% Confidence Level?  Gag me.

I hope I haven't lost anyone yet, because this is important stuff if you want to read, interpret, and understand any research, including Arbitron and Nielsen ratings.

Summary up to this point . . . When you read any research study, you need to know the Confidence Interval and the Confidence Level.  The report, or the researcher, should say something like, "The results of this study have a Confidence Interval of ±5% at the 95% Confidence Level."  If you don't know these two pieces of information, you don't know how accurate the results are . . . ya gots ta know.

OK, Now we need to look at the formula for sampling error (remember, this is the Confidence Interval).  The formula for the Confidence Interval at the 95% Confidence Level:

where p is the result from the study, N = sample size, and Z represents the area under the normal curve related to the desired Confidence Level.

So, if 20% of the respondents from a sample of 500 give the same answer, the formula for the 95% confidence level (discussed below) is:

 
The 20% response has a maximum estimated sampling error of ±3.5%, which means that the "actual" or "real" percentage for the answer is somewhere between 16.5% and 23.5%.

Now, if you wanted to interpret the results at the 90% Confidence Level, you would multiply the result under the square root symbol times 1.64; at the 99% Confidence Level, multiply times 2.57.

Note:  The corresponding Z-scores in the formula come from a table known as the "Areas Under the Normal Curve."  If you want to see the table, I prepared one our textbook, Mass Media Research: An Introductionjust click here.  By the way, the Z-score for the 68% Confidence Level is 1.0, so there is no need to multiply the result under the square root.

So why is the discussion about Confidence Levels important?  Because the sampling error tables supplied by Arbitron are calculated at the 68% Confidence Level, which means that you can only be 68% certain that the data you interpret fall within the range of your Confidence Interval.  In other words, there is a 32% chance that the results you are reading are due to error or chance.

One more thing . . . The reason I explained the sampling error formula is that it is discussed on Arbitron's sampling error pages, but the formula is in prose, not numbers and symbols.  You need to know the origin of the sampling error.

So . . . Arbitron does have sampling error tables on the Internet, and you can view them by clicking here.

That's the answer to your first question.  On to the second . . .

You also ask what sampling error would I consider "OK," and what would cause me concern.  Well, we're dealing with behavioral research here, and humans are very difficult to describe and their behavior is particularly difficult to predict.  I don't like to exceed a Confidence Interval of 5%...

No comments:

Post a Comment