







SUBJECT: BENCHMARKS FOR DATA COLLECTION, PROCESSING, AND ANALYSIS EFFECTIVE
DATE: 03/16/87 CES STANDARD 870304









PURPOSE: To establish minimum levels for performance in surveys and studies conducted by the Center. The levels of data completeness and minimum levels of data required for processing procedures and analysis are established to ensure that researchers and users will have confidence in 
the quality of the data. Benchmarks are reference points for judging quality. 









There are occasions when the benchmarks listed below are not appropriate to the study being conducted (e.g. certain types of experiments, pretests, or policy studies where a quick response is the highest priority). In these cases, the researcher will know in advance that the benchmarks are not appropriate to the study; the project officer should request an exception to the specific targets that he or she believes will not be met. This request must come in writing to the Chief Statistician and be part of the CES clearance process for the survey. The request must clearly state which targets or benchmarks will not be met, what the project officer's expectations are for the targets, and the reason why an exception should be granted. Reports for such studies must include in the technical appendices a statement about the deviation from the benchmarks including both the rationale for the deviation and the magnitude of the deviation. 









o The overall survey target response rate specified in RFP's should be at least 90 percent for longitudinal surveys, 85 percent for crosssectional surveys. In the case where the sample is selected hierarchically (e.g. schools, and then teachers within those schools), these rates apply to each hierarchy (e.g. 85 percent of schools responding, and then 85 percent of teachers within the responding schools, for an overall rate of 85% x 85% = 72.25%). Response rates for sample surveys are calculated on weighted data; response rates for census or administrative record data are based on unweighted data. 











o Within any stratum of a sample, the overall survey target response rate should be no less than 85 percent for longitudinal surveys and 80 percent for crosssectional surveys. In the case where the sample is selected hierarchically, these rates apply to each hierarchy. Response rates for sample surveys are calculated on weighted data; response rates for census or administrative record data are based on unweighted data. 










o The target item response rate for each critical variable should be at least 85 percent (critical variables are defined in the analysis plan for the study). Response rates for
sample surveys are calculated on weighted data; response rates for census or administrative record data are based on unweighted
data 












o Deviations from the benchmark figures given above should be anticipated in the planning phase of a survey. If the project officer expects the deviations to be severe, they should be documented in the analysis plan with a proposal as to how to minimize problems before they happen in the survey and a proposal regarding how the analysis of the data will adjust for deviations that cannot be overcome by the survey design. 









o If response rates and item response rates are lower than anticipated and fall below the benchmark levels, an analysis of the reasons for the low rates and the anticipated impact on the quality of the data must be conducted before any analysis of the survey data is done. 










o Variables with more than 30 percent missing data should not be used in analysis, except in tabulations where the missing data are tabulated as a separate category and clearly identified. 











o Weighting of sample data for nonresponse, when done by strata or within cells so that the weighting factors are calculated as ratios, should be based on a minimum of 30 respondents (unweighted) per cell. Cells that have fewer than 30 respondents should be collapsed with the "closest" cell with the fewest number of respondents. "Closest" is defined as logically closest, i.e. contiguous age categories or contiguous geographic areas that make substantive sense. 










o Weighting of sample data for nonresponse should also be done with caution. The ratio of the largest stratum nonresponse weighting factor to the smallest statum nonresponse weighting factor should be no more than 5. In cases where the ratio is larger than this, the smallest stratum in terms of unweighted respondents should be collapsed with the "closest" cell with the fewest number of respondents. 












o Estimates of means, proportions, and totals should be computed from at least 30 respondents for each subgroup for which the estimate is made; estimates of ratios, rates, regression coefficients (in cases where pairwise deletion methods are used in computation of the correlation matrix), and similar multivariate statistics should be based on at least 30 respondents in both the numerator and denominator of each statistic when the data used come from a survey or the numerator and denominator come from different data sources, at least one of which is a survey. These minimums do not apply when the data come from a census or administrative record study not involving sampling. 











o Confidence levels for any results of statistical tests reported in a document should be at least 90 percent before the null hypothesis is rejected. 
















o Confidence intervals around key statistics (as defined in the analysis plan) reported in a document or a table should be 95 percent confidence intervals and should be clearly identified as such. 















o There should not be more than 20 "simple" comparisons made within a bulletin or a report. "Simple" is defined as a ttest, chisquare test, or any other test that examines a simple hypothesis like the difference of means or proportions. Consideration must be given to use of multivariate techniques in analyses involving multiple variables, factors, or levels, and/or an analysis of overall error rates should be conducted where multiple comparisons and univariate variables are used. 














o Overall response rates (Ro) are to be calculated as the 






ratio of the number of completed interviews (see CES Standard 870501) divided by the number of sample respondents drawn minus respondents considered outofscope (in a household interview this would be number of units sampled minus vacant units, condemned units, or units that have been converted from residential to business use): 














weighted * of completed interviews
Ro = 
weighted # of units sampled  weighted # of outofscopes















o Item response rates (Ri) are to be calculated as the ratio of the number of respondents for which an inscope response was obtained (i.e. the response conformed to acceptable categories or ranges) divided by the number of completed interviews for which the question (or questions if a composite variable) was intended to be asked: 

















weighted * of respondents with inscope response 





Ri = 
weighted # of completed interviews for which question was intended to be asked















o When the items being studied are continuous or additive (e.g. number of teachers is discrete but additive, whereas affiliation categories are not), coverage rates should also be calculated. Coverage rates describe the relative loss of information because of size of the unit which gave an incomplete response, not the number not responding. 




























weighted total of inscope responses
Ci = 
weighted total of completed interviews for which question was intended to be asked (includes imputed values)

