In two of our previous blogs, we discussed the importance of the sample frame and sampling techniques for any research project. Understanding the sampling frame, potential sample errors, and the best sampling technique for your specific project is a critical step that must be taken before data collection begins. However, even with careful planning, sometimes the sample you end up reaching does not match the sample universe you were aiming to meet. This could be due to factors including time or budget constraints, high non-response from certain groups, or a sample frame that did not perform as expected. In order to mitigate the effects of any sample imbalances, researchers often use survey weighting.
Weighting is a statistical technique in which datasets are manipulated through calculations in order to bring them more in line with the population being studied. The key difference between the initial sample composition and weighting is that weights are applied after data is collected, and allow researchers to correct for issues that occurred during data collection. For this reason, weighting is also known as post-stratification, as it takes place after the sample has been selected, as opposed to pre-stratification, which is used to balance a sample before data has been collected.
Researchers applying weights are most often weighting on demographic characteristics, such as age, gender, location, and education, but weighting can also account for the differences between those who partake or do not partake in research studies (known as self-selection bias). Weights can also minimize any effects the survey design or data collection mode may have on the sample makeup and resulting data.
In addition to weighting on common demographic variables, studies have found that weighting based on other variables such as internet usage and political affiliation can further reduce bias in some cases. If conducting a phone survey, for example, weights can be applied based on mobile versus landline phone users.
Survey Weighting Methods: Raking and Cell Weighting,
One of the simplest types of weighting, cell-based weighting can be used when you know the number of respondents your sample should have who are, for example, males age 15-24 or females age 25-34. If your desired sample included 100 males aged 15-24 and 80 females aged 25-34 but should have included 80 males aged 15-24 and 120 females aged 25-34, you can apply simple cell-based weights as illustrated to the left.
Raking or RIM Weighting
Raking, also known as random iterative method (RIM) weighting or iterative proportional fitting, is a slightly more complex method that can be used when you are weighting to a number of variables, but may not know how the variables interlock; For example, if you need 100 females and 120 people aged 25-34, but do not know how many females aged 25-34 are required. With raking, a researcher would first balance the sample based on one variable, such as gender, and then on the next variable, such as age. If the adjustments for one variable affect another variable too much, then more adjustments are performed until a balanced sample is achieved.
Raking is one of the most common and accepted methods of weighting for public opinion surveys, as it allows for weighting based on multiple variables and aims to adjust each variable by as small an amount as possible. It can be performed quite quickly using a statistical software such as SPSS.
Other methods of weighting include matching, in which a researcher selects a set of cases that is representative of the population from another dataset and aims to match cases from the dataset being studied. Logistic regression modelling and propensity weighting are other types of weighting that are used to account for selection bias amongst a sample. For a more in-depth explanation of various weighting methods see this paper from Pew Research and this from the Journal of Official Statistics.
Pros and Cons of Weighting Data
As with any technique used to manipulate a dataset, there are both pros and cons of weighting, and several guidelines that should be kept in mind when weighting data.
Advantages of weighting data include:
- Allows for a dataset to be corrected so that results more accurately represent the population being studied.
- Diminishes the effects of challenges during data collection or inherent biases of the survey mode being used
- Ensure the views of hard-to-reach demographic groups are still considered at an equal proportion to the population in the final data.
Disadvantages of weighting data are:
- Can over-represent the views of one or several people who may not be an accurate reflection of their entire demographic group
- Can inadvertently introduce additional biases into the dataset
- Can make the findings more variable as it increases the standard deviation of answers (check)
In order to reduce the impacts of data weighting, it’s recommended to weight by as few variables as possible. As the number of weighting variables goes up, the greater the risk that the weighting of one variable will confuse or interact with the weighting of another variable. Also, when data must be weighted, try to minimize the sizes of the weights. A general rule of thumb is never to weight a respondent less than .5 (a 50% weighting) nor more than 2.0 (a 200% weighting).
Additional Information on Data Weighting
GeoPoll provides nationally representative data through a combination of a carefully selected sample frame, use of quotas to manage the demographic composition of those who respond to surveys, and application of weights where necessary. GeoPoll can use multiple weighting methods, including both cell-based and raking, based on the project specifications. To learn more about GeoPoll’s data collection and research process, please contact us here.