Elizabeth Anne VanderPutten
CHAPTER III RESEARCH METHODOLOGY
This chapter reviews the strengths and limitations of qualitative research in general and of the secondary analysis of qualitative data in particular. It suggests that secondary analysis of such data is an efficient and effective means of understanding the slippage phenomenon.
The data for the
study——primarily the field notes, interim reports, and documents
collected by a team of researchers at Cornell University, under
contract with Youthwork, Inc., as well as other Youthwork documents,
including contracts, evaluation studies and internal memos——are
reviewed in terms of their quality and relevance to the research
design. The procedures used by the
The particular methods used in this study to collect data are presented. These include questionnaires employing open— and closed— ended questions. The validity and reliability of the methods used are discussed.
Finally, the chapter presents the methods used to demonstrate that slippage exists and that it is a function of the subgoals of the street— level bureaucrats implementing the projects.
B. QUALITATIVE RESEARCH
1. Uses of Qualitative Research
Qualitative research seeks to draw conclusions from observations of persons or sites in a natural setting. The term sometimes used synonymously with “case study” research, which Yin defines as a research strategy that examines “(a) a contemporary phenomenon in its real life con text, especially when (b) the boundaries between phenomenon and context are not clearly evident.”73-1 Qualitative researchers generally do not seek to make statistical generalizations.
Quantitative researchers, on the other hand, generally select a statistically representative sample of cases and then examine a few items of interest. A survey would be an example. Experimental researchers seek to alter one or more variables while controlling or holding constant other variables.73-2
It would be easy and misleading to make too much of the distinction between and within methods. The appropriateness of a particular method depends on what the researcher wishes to know. If it is essential, and possible, to identify the effects of a single variable (e.g. the effects of a drug), experimental research may be the methodology of choice. If it is necessary to generalize to a population as a whole (e.g., to predict how public opinion will react to a policy change), a survey based on a random sample may be appropriate. If the goal is to understand the intricacies of a person or case (e.g., to understand the implementation of a policy), the case study method may be appropriate.
Case study research seems particularly suited to understanding the process of implementing public policies. Federal and state policies do not get transmitted uniformly into local action. Something happens in the “black box” of the local service deliverer——school, hospital, health facility——that changes the policy and affects clients. The complexity of service agencies, as well as the nature of the services offered, makes it difficult to create experimental conditions.74-1 A survey might miss causal explanations.74-2 Case study methodology, where the researcher spends considerable time at a site observing and asking site specific questions, gives a view into the "box.” The researcher is able to observe the dynamics of the process of implementation, to talk to many persons at different levels of the organization, and to shape questions and record events in response to changing conditions.
Examples of case
study methodology for policy analysis are numerous. A particularly
important one in the field of educational policy was con ducted by
2. Limitations of Single Case Study Research
Despite its usefulness and growing popularity, the case study methodology has several drawbacks. Case studies are expensive and time consuming. Scarcity forces choices. Researchers may spend less time at the site than needed, or visit fewer sites, or send in less than senior re searchers. This latter creates another problem.
The validity of a case study depends ultimately on the ability of the researcher to accurately observe and record events and phenomena. This requires both training and experience. When researchers are pre pared, the results can yield rich descriptive and explanatory findings.
When the researchers are not so trained, the findings may be less useful. A well—trained researcher, however, is not enough. Because any event is more complex than can be recorded, the researcher must decide what to observe or record. Simply going to a site and observing is likely to produce little other than notes. Qualitative researchers are generally agreed that a conceptual framework with theoretically focused research questions is essential for validity. As Stake points out:
Finally, an often cited limitation of qualitative research is the inability to generalize to a larger population. At the same time, individual case studies can offer potentially powerful explanations, and leave little doubt that similar conditions exist elsewhere.76-2 Pressman’s and Wildavsky’s study, for example, of an economic development project in California established a general principle that the multiple layers of decision—making in federal programs will result in considerable slippage.76-3
3. Multiple Case Studies
Multiple case studies make the identification of patterns of behavior possible. When patterns of behavior are identified, it is possible to say that under similar circumstances similar behaviors might be found. This is not the same as saying that one can statistically generalize to other sites. It is saying that the process of pattern matching can lead to suggestions about relationships between phenomena that may hold in other situations.77-1
The difficulty for the multiple case study researcher is developing methods that permit the identification of patterns without doing in justice to the idiosyncratic nature of the sites. The problem has several parts. Data must be collected in such a way that comparisons can be made, but that still permit flexibility in the field. Cross—site analysis must occur while preserving some of the richness of the sites. Finally, means of documenting the basis on which the patterns are identified must be developed.
While there are few generally accepted guides for multiple case study research, a few rules seem generally valid and accepted. One is that field workers should have a thorough knowledge of the conceptual model guiding the work. This is accomplished either by having field workers assist in the design of the study——in effect sending the senior researchers into the field——or training the workers in the meaning of the study. The second is that most multiple case study research designs call for some form of data analysis packets.77-2 The field workers are given a more or less closed—ended list of research questions. These are not the questions that are actually asked of respondents; rather, field— workers gather information to answer the questions either at the end of each day or the visit. The more open the questions, the more the worker may get at the unique characteristics of the site, but the more difficult it may be to compare sites. The choice depends on the study’s purpose.
4. Aggregating Data
There is even less consensus on how to aggregate qualitative data. As in collecting data, the analyst is caught between the unique, contextually specific nature of each site and the need to make sense across sites. Under these conditions, the researcher is apt to make accurate but thin generalizations across cases.78-1
Yin suggests that there are two general approaches that could be used, and that most analysts combine the two.78-2 One involves comparing small or isolated variables. The second involves comparing broader patterns within a conceptual framework.
To compare isolated variables researchers use either a case survey method or some modification of content analysis. In either case, small, discrete units of analysis are identified. In a case survey, a closed ended questionnaire is developed which is “asked” of the cases by reader analysts.78-3 Answers are tabulated, and where appropriate, statistically analyzed. Similar procedures are used for content analysis, but there are usually fewer units of analysis.79-1
In both approaches, care must be taken to choose discrete units of analysis that relate to the research goals. Second, there must be sufficient units to sample to identify patterns.
The second approach, comparing patterns across case studies, is somewhat newer as a method. In this method, the researcher seeks to establish an explanation for a result or the relationships between variables in a single case, and to rule out other explanations. The re searcher sees if the explanation holds in another case and then another. In doing so, the original explanation may be modified by such statements as “under different conditions, the pattern does (does not) hold.’ As Yin suggests,79-2 both general approaches can be used in analyzing the same case studies. This is the method used in this paper.
A further problem with multi—case analysis is presenting findings so that readers can follow the chain of reasoning. Again, there are few clear guidelines. Some suggest presenting case studies in detail; others prefer cross—site analysis showing similarities and differences.79-3 In the latter, written case studies are not presented, although patterns found at individual sites are discussed. This study uses the latter option.
Qualitative case study research, then, provides a way of getting at the interactions at a site, and a means of understanding the behavior of the participants from their viewpoint. While it is not possible to statistically generalize from one or even several case studies to a population as a whole, multiple case studies can lead to the discovery of the explanations for observed patterns of behavior.
C. SECONDARY ANALYSIS OF ETHNOGRAPHIC DATA
1. An Emerging Procedure
Secondary analysis is common, indeed expected, for quantitative research. Government agencies supporting major surveys, as well as the researchers themselves, expect that findings will be checked by other researchers using the same data.
Secondary analysis of qualitative research, however, is not often done. While it is fairly common for a researcher to review the findings of different studies in a systematic way, there are few examples of studies where a second researcher, not part of the original project, reviews the field notes or original data for a new study. The reasons are many. Qualitative data are usually not in easily accessible form. While data tapes can be made and copied at minimal cost, field notes are bulky, often hand—written, filled with idiosyncratic codes. Moreover, field notes often contain references to individual persons or sites. Sharing field notes could compromise the promises of confidentiality.
The more data are refined, merged or sanitized, the more easily the secondary analysis can occur. At the same time, the more this occurs, the less the secondary researcher is in touch with the original data, and the more constrained by the research goals, biases, and analyses of the original study.
Nevertheless, the interest in secondary analysis of qualitative data is growing as the number of studies based on case studies increases.81-1 As support decreases for new research, there will be more of a need to reuse existing data.
2. Qualities of Secondary Analysis
The qualities that make for good secondary analysis of qualitative data are similar to those for original data collection and analysis. In many ways, the secondary analyst is revisiting the sites through the data. The researcher must identify and record events or phenomena, find factors that together lead to explanations for the relationships between events, and interpret those factors.
If the secondary research design depends on the identification of discrete, objective, easily identified units of analysis--the number of times a particular word is used, the number of times an individual is mentioned--minimally trained reader analysts can be used. In fact, computers are frequently used to perform this type of content analysis.
The procedure can be compared to a case study research design that includes administering a closed—ended survey to some persons. A research assistant can be trained to administer the instrument. An essential part of the training would be to ensure that the personality, Insights, education and experience of the person administering the instrument did not influence answers.
When units of analysis call for subjective judgments in original data collection or in secondary analysis, the use of research assistants may be inappropriate. In qualitative research there is no clear distinction between data collection and data analysis. The field worker must decide what evidence is relevant, what persons to interview, and when and how to get a person to elaborate on a point. Training, education and knowledge of the research design, the programs being studied and the constraints on the participants are essential. In secondary analysis, where the data collection procedures call for subjective judgments, similar qualities are required of the data collector.
As in original case study research, care must be taken to avoid, or at least to account for, the biases of the investigators. In both cases, a neutral colleague who provides feedback on the findings can be most useful. This colleague can “point to implausible data, holes in arguments, leaps of logic and alternative interpretations...this feedback can take the form of tough criticism that can wound the pride, but is essential to maintaining a balanced perspective.” 82-1
The quality of
secondary research also depends on the quality of the original data,
and their applicability to the research questions of the second study.
The next section discusses the
D. THE CORNELL UNIVERSITY/YOUTHWORK ETHNOGRAPHIC STUDIES
Youthwork, Inc., a
non-profit organization supported in part by federal funds and in part
by private foundations, administered ninety one contracts with school
systems, community—based organizations, private employers and CETA.
Prime Sponsors to conduct projects to reduce youth unemployment by
linking education and work. As part of a “Knowledge Development”
initiative, Youthwork awarded to
As previously discussed, the validity of a case study depends to a great extent on the ability of the researcher to develop a conceptual framework consistent with the objectives of the research. The principal investigator was qualified to perform these tasks. Not only had he conducted previous multi—site case studies, but he had also written extensively on the subject.
To ensure that
sufficient time was spent in the field, the
cross—site comparability, the
The procedures worked reasonably well. Because of the open—ended— ness of the analysis packets, field workers could get data about sites’ unique features, but some degree of uniformity across sites was lost. When combined with the fact that some of the field workers were less than highly qualified, this created some problems in cross—site analysis.
The researchers knew they would turn in field notes to Youthwork. To ensure the confidentiality of the data, code names were used for individuals, although sites were identified. Both original researchers and secondary analysts promised to ensure the confidentiality of the subjects.
It is difficult to determine the extent to which the bias of the individual field workers affected the data. Undoubtedly, it occurred. However, having the workers interview and observe several persons over a period of time, prepare answers to a set of research questions, and hand in field notes seem to have been reasonable procedures to minimize bias.
E. DETERMINING SLIPPAGE
This study seeks to identify the existence of three types of sub goals in the decision—making of street—level bureaucrats and to relate these subgoals to slippage. Slippage is defined as the lack of congruity between the policy intent and actual outputs. The first tasks then are to develop an operational definition of slippage and to categorize sites in terms of the degree of slippage experienced at them. This section discusses the procedures used to determine slippage. Because the degree of slippage experienced at a site is an analytic tool in this study, the findings are presented in this chapter.
1. Sampling Procedures
This study drew a
sample of ten sites from the original forty— seven case studies
Since analysis of all sites would not permit statistical generalizations, it was decided to select a subsample of the original set. This permitted n in—depth analysis of each site.
Sites were first selected on the basis of geography. Two criteria were used: location and type of area served. Location was defined as one of the four broad Census regions: North, South, East and West. Four types of areas were chosen: large and medium cities, non urban areas and county sites. Large cities were defined as cities having a population in excess of one million. Medium cities were defined as cities with populations between 200,000 and 999,999. Non urban areas were defined as those areas considered by the U.S. Census Bureau as not being urban or within a Standard Metropolitan Statistical Area.86-1 A county site was defined as a site serving an entire county with one or more school districts within that county. A county may serve both non urban and urban areas.
From these groupings, as Table 3.1 shows, sites were selected to represent the four focus areas of Youthwork interests. These focus areas will be discussed in more detail in the next section.
After an initial
selection was made using these criteria, some selected sites were
discarded because the
As a result of these procedures, a varied group of sites was chosen. No claims are made that the sites selected are a random or cross—sectional representation of the Youthwork projects as a whole.
2. Promised Services
a. Policy goals defined
For purposes of analysis, policy goals are defined as the goals and subgoals in the proposal and the signed contract for each project. There are several reasons for this choice. First, the contract is the primary thing to which a contractor is legally held responsible. It is, in effect, the embodiment of the federal policy and program. It is for this reason that each contract contains all the legal requirements of the law, plus a restatement of what was promised in the proposal. As the handbook on contract administration states: “No one, not even the contracting officer, can direct (nor should request) the contractor to do anything that is not in the general and special provisions of the written contract.88-1.
Second, in accepting a proposal and signing a contract, the project and contract officers are expressing their interpretation of the meaning of the policy and the program. Finally, staff of a local project rarely know the law or the regulations, but generally have some knowledge of the proposal submitted.88-2
b. Determining possible service
In order to determine
the types of services a project might offer, the application
guidelines, issued by the Department of Labor on
No matter which focus area was selected, all proposals had to promise to provide certain services. Projects were free to determine the relative emphasis to be placed on these services. The core services were: payment for work or work related experiences to low—income youth; services to combat sex role stereotyping; youth involvement in planning, operating and evaluating the project; basic skills instruction; and career planning and search activities. Again, the relative emphasis placed on these activities could vary, as could their specific implementation.
In each of the four focus areas, projects were permitted to provide other services, and to place relative emphasis on the services. These “elective services” included: counseling for personal problems; providing supplemental services including babysitting, referrals to social service agencies, and legal help; services designed to raise career and personal expectations; and services designed to improve the quality of work site or raise academic standards.
Projects funded by
the Department of Labor through Youthwork were considered
“exemplary” and, in addition to providing services, were expected
to provide information that could guide policymakers and policy
implementers. All projects were, therefore, required to carry out re
search. In Youthwork parlance, all projects were to add to
“knowledge development.” At the least, they were to employ an
independent evaluator, work with a
Using the application guidelines, a sixteen item list of possible services was developed. The list was then compared to other data sources to ensure validity. First, summaries of all Youthwork projects were read and categorized according to the types of services Youthwork officials thought the sites were providing.90-1 Second, discussions were held with Youthwork officials who participated in the selection process.90-2 Finally, the list was compared with the lists used in the Standardized Assessment System developed by the Department of Labor to review all projects funded under YEDPA.90-3 As a result, the total number of services remained the same, but descriptions of five of the services were changed.
c. Determining each project’s subgoal levels
The basic document used to determine what specific services or out puts each project agreed to provide and the level of effort devoted to each type of service was the contract between the project and the Deparment of Labor/Youthwork, Inc. The contract incorporated the proposal as revised during negotiations. Note was made of any significant changes between the final and original proposal for later evaluation, but only those items included in the final proposal were used in determining what services were promised. In addition to the contract, proposal review sheets were examined, as were internal Youthwork memos.
Each project service was rated by the researcher according to the emphasis that, in the researcher’s judgment, was placed on the service by the project. Ratings were on a scale of 1 to 10, where 1 indicated a very low emphasis and 10 a very high emphasis.
Findings show that the sites differed considerably in the emphasis placed on each of the possible services they could offer. Table 3.2 shows the range of promised outputs for each project. The range of emphasis on all questions was 6 or more. In eight cases, the range was from a promised "1" from one site to a promised "10" from another.
Two types of services were strongly promised in most proposals:
paid work experience and career education. Even for these categories of services, however, there were variations. While eight of the ten sites promised an 8 or better, for paid work experience, one site promised only a 2. For career guidance, again eight sites promised an 8 or better, but one site promised a 1 and another a 3.
It is clear from this chart that the offerers apparently took advantage of the freedom to design projects to meet local needs. It is also seems that Youthwork and the Department of Labor did not insist on selecting a particular type of project or blend of services.
3. Determining Service Outputs
In order to determine what services were actually delivered by a given project and at what level, the same list of sixteen services with a similar ten point scale was used. Data were obtained from several sets of documents and from interviews with Youthwork officials, a procedure similar to the "triangulation" used by field workers to ensure validity. The general sequence is discussed below.
First, the Cornell protocols (field notes) for each project were read separately by the researcher and an outside reader. Each person independently rated the project for each type of service. Each item and rating was then discussed, and a preliminary rating given to each of all sixteen activities.
It is worth noting that no instances of differences greater than two points between the two readers’ scores occurred on any item, although 16 cases of 2point differences were observed. Eleven cases involved either "research" or "youth participation in planning," and the differences were between a score of 1 and 3. Four of the other cases involved differences between 2 and 4. The readers differed in how to interpret the absence of information if something was rarely mentioned, did that mean the service was not delivered, or was given a very low emphasis? In the absence of other information at this stage, the lower score was recorded. In the final case of a difference of two points, the lower score was also used.
The researcher then reviewed the available evaluation and other re ports in the Youthwork files for each site. In most cases, evaluation reports focused primarily on student outcomes improvement in reading, job knowledge, and attitudes, and were therefore not of direct interest. The evaluators at eight sites, however, made references to what services or outputs were delivered. One evaluator actually scored the project as having delivered eighty percent of what it promised. Of more general use were the site visit reports by Youthwork monitors, reports of the project directors, and various letters from either the project or Youthwork officials.
Where appropriate, the "report card" for the project was adjusted to reflect the data. This happened at three sites. In two cases, scores were lowered by one point; in the other raised. In addition, it became apparent that the lower scores assigned in the initial disputed ratings were accurate. In one case where "research" had been lowered to a 1, the Youthwork site visit report indicated that the project had failed to deliver any of the required reports, had not hired an evaluator, and had not tested the youth as promised.
Finally, appropriate Youthwork staff members were consulted. For eight of the sites, it was possible to interview someone who had actually made a site visit. For the remaining two sites, the staff person who had made the site visit had left Youthwork and could not be reached. Because the site visits had been made at least two years prior to the interview, and because the Youthwork monitor did not personally think he remembered the sites well enough, he did not score each site. Instead, discussions centered on the overall nature of the services delivered and the quality of the services. The Youthwork monitor reviewed the sites "report cards." In no case did he believe the scores were inaccurate, although he did think that the score for research at one site was unduly low. His explanation was not that the site delivered more research, but considering the problems faced by the staff, as much was done as possible.
Table 3.3 presents the final score given for each service at each site. Six of the ten sites were judged to have delivered at a level of 8 or above on paid work experience for youth, while another six projects delivered career guidance at the same level. Career guidance, however, showed the greatest range across sites, with one site providing services at a 1 level and another at a 10. No site delivered more than 5 in supplemental services, or youth participation in planning. The range across sites for basic skills was very small no site delivered less than a 3 and none more than a 6.
In general, the sites placed greater emphasis on job related activities than on academic related ones. The average score across sites for worksite quality was 4.8, while for academic standards was 3.1. The average score for paid work experience was 8.2, while the average score for services aimed at basic skills instruction was 4.7.
The lowest average cross-site score was for supplemental services, 2.2, with youth participation in planning close behind with a 2.3. The highest average score was for paid work experience, 8.2, which is far ahead of the next highest score of 6.5 for career guidance.
4. Measuring Slippage
Having determined what services were promised and delivered, the next task was determining the degree of slippage at each site. This was done by comparing for each item the score for "contracted" and "de livered" services. A "slippage" value for each item was calculated asthe difference between what was promised and what was delivered. If a site promised a 9 and delivered a 4, a slippage value of 5 was assigned.
If, however, a site promised a 4 and delivered a 9, a score of 0 was given. There are two reasons for this. To define slippage as an absolute value the same whether a site delivered more than if it delivered less would be contractually irrelevant. From a federal viewpoint, the fact that sites give more on some items is nice, but does not justify failing to deliver promised services. Besides, as will be discussed later, the higher score is undoubtedly more a function of preexisting factors than the result of the federal policy.
By totaling the slippage rating for each question, an overall slippage index was derived for each site. These slippage indices ranged from a low of 7 to a high of 60, with an average of 28.5. Table 3.4 shows the overall results for the ten sites.
In turn, the first three projects were judged as "low slippage sites." The next four were termed "medium slippage" and the bottom three were grouped as high slippage sites.
Although the indices are based on a comprehensive review of available information, and steps were taken to ensure the validity of scores, more should not be made of the actual scores than necessary. The purposes of deriving slippage scores was to determine high, medium, and low slippage sites. The differences in the scores seem to justify the rankings.
Moreover, the slippage index correlates with other ranking criteria for the sites. A total "delivered score" was calculated for each site. This was simply the sum of scores for all items. The three lowest slip page sites also have the highest delivered score (98, 87, 85), while the three highest slippage sites have the lowest delivered scores (57, 47, 46).
A "gross delivery/slippage balance index" was also developed. This took into account the level of outputs a project delivered for each given service above that specified in the contract. If, for example, a project offered to provide personal counseling at a level of 5, but the staff actually produced a level 9, the project received an additional four points. If it produced only a level of 3 for a different service, but had contracted for a level 7, it was given a minus four points. These scores were combined, producing a gross delivery/slippage "balance."
It is interesting to note that when sites were ranked according to this gross delivery/slippage balance, the rankings remained essentially unchanged. The site with the lowest slippage rating had a balance score of +15, while the next lowest had a rating of 0, and the third lowest slippage site had a rating of 1. For the high slippage sites, the scores were 37, 52 and 61, respectively.
As a final check, the project officer at Youthwork was asked to review the slippage categories and site rankings. He agreed with the categories and that each project had been assigned to the appropriate group, although he had some disagreements about the relative ranking with in categories. It may be noted that he ranked sites primarily on the "balance" concept.
F. DATA ON SUBGOALS
1. General Procedures
The development process for the questionnaire used to "interview" project participants was evolutionary. Based on the Simon hypothesis and on an initial reading of three case study materials, thirty nine questions in five sets were developed. Each question was scored on a scale of 1 to 5. Three sets involved subgoals professional, organizational and personal. Another set was designed to obtain relevant back ground information about the person. The final set was intended to gather the Individual’s overall reactions.
2. Initial Questionnaire
The preliminary form was then tested on six professionals to deter mine its clarity and usefulness. The readers were given one set of field notes to read, and asked what problems they would have if they had to answer the questionnaire for the individuals in the site.
Feedback suggested a number of changes. Most felt that the five point scale required too fine a distinction and that they would probably end up scoring most as a 3. As a result, a four point, forced choice scale was later used. The readers also suggested that several questions needed rewording for the sake of clarity. Readers also asked for more guidance in selecting evidence to answer questions and in giving values to questions. As a result, a code book was developed. (See Appendix A.)
3. The Second Time Around
The next step involved what could be considered a first site visit. The protocols for each site were read by the Investigator and questionnaires completed for all individuals for whom information was available. Two outside readers, one familiar with the research design and one not, read one set of protocols and coded questionnaires.
Several changes were made as a result of the testing. First, three questions were dropped. The first concerned the educational level of the personnel. As it turned out, nearly everyone had a college degree and in only a few cases were there evidence that the individual had a Doctorate. The second question dropped pertained to the person’s educational field. This was eliminated because too little information was available. The third concerned previous positions held. The original questionnaire sought information on two previous positions. As it turned Out, most persons were holding either their first or second jobs. In many cases, data were not available for those who might have held more positions.
Perhaps more important, questions were the added. It became apparent in reading the field notes that ensuring structure and order was an important subgoal to many of the staff, so a question was added. A set of questions were added that were both theoretically sound and were obvious in the protocols. These involved the individuals’ degree of commitment to the organization, their perceived sense of project owner ship, perceived conflict between professional subgoals and project sub goals, and between organizational subgoals and project subgoals.
4. Code Book
Before a final coding of the questionnaire was done, a detailed "code book" was written. The code book contains detailed explanations of what each question meant, with illustrations from the protocols. In addition, explanations were given for each question as to what a score of 1 or 4 meant. See Appendix A for a copy of the code book.
5. The Final Step Data Collection
The researcher reread each protocol and recorded each questionnaire. This was done not only to answer the new questions, but also to over a three month period.
An outside reader also read and coded questionnaires for all sites. Responses on each of the questions were compared for both persons. An eighty seven percent agreement rate was noted (i.e., 87% of the time answers were the same) an average of four questions per interviewee. In only thirty cases (less than one percent) was the difference more than one point and in no cases more than two. Very few of the disputed
cases involved a score of 2 or 3. In most questions, a score of 1 or 2 indicated a negative response, while a score of 3 or 4 indicated a positive response. In most disputed questions, the difficulty was deciding whether the person was "very negative" or just "negative," or "very positive" or just "positive." All disputed cases were discussed until a final rating was agreed upon.
The softness of the data, which is inherent in qualitative research, and the relatively few number of cases may not support high level statistical analysis, nor is such analysis called for in the study. The only statistical devices used are tabulations, cross tabulations, percentages and weighted averages. The data, which are discussed in the following chapters, are interpreted mainly in terms of perceived trends or apparent tendencies. The data are adequate for such interpretations.
G. RELIABILITY AND VALIDITY OF DATA
Inherent in dealing with qualitative data are a variety of problems concerning bias in the data. Two cases may be of particular concern in this study. The first relates to the possibility that one set of data influences the collection of another. The second, that the persons collecting data at the sites and the persons conducting the secondary analyses may be biased in their procedures. Two general strategies were used to minimize the possibility of bias. The first involved using, wherever possible, separate information bases for data; the second, using procedures to minimize researcher bias.
2. Information Bases and Data Sets
Eight major and three secondary sources were used to obtain data for the study. A brief description of each source is presented in Table 3.6. Each of these sources is discussed in more detail in following sections of this and other chapters. The purpose here is not to demonstrate the validity of the bases, but to show the range of information that was used in the study.
To the extent possible, different data bases were used to create the five data sets for the study. These sets were
Table 3.7 shows the primary sources of data for each of the data sets. As the chart indicates, there was no overlap in sources.
The likelihood of bias is probably inescapable in working with qualitative data. By its nature, qualitative information is the result of someone’s judgments. Project directors have an interest in seeing (and particularly reporting) that things are going well. Staff may have an interest in reporting that all is well, or, for that matter, that nothing is well. Youthwork staff and its director had an interest in making the Department of Labor happy by supplying a voluminous amount of material containing head counts of all sorts. The ethnographers and evaluators all had personal biases. Just as clearly, the outside reader and the researcher were susceptible to bias. A number of procedures were employed to mitigate the influence of bias.
It should be noted at the outset, however, that the steps taken were generally aimed at neutralizing the effects of bias, not at eliminating it. Under any circumstance, it would probably not be possible to do so, but, more importantly, it would probably not be very desirable for that would mean eliminating most of the human elements that are, after all, the chief value of qualitative research.
a. Ethnographer bias
Interestingly, in the ethnographers’ field notes, which might be expected to be among the most dispassionate and least biased of the main documents used, bias was probably most easily visible. The ethnographers’ preferences, and likes and dislikes were clearly apparent. One was particularly interested in sex equity; others had a special concern for competency based education; and some were more empathetic to staff, others to administrators or employers.
One of the things that made their biases so evident is ‘that they were given a set of general questions and participants to interview. They were to record answers to questions, as well as observations that were relevant. Personal comments, criticisms and insights were welcomed, but they were to be clearly labeled as such. Overall, this convention was carefully observed. As a result, a reader could catch the main lines of bias and adjust for it.
A second type of ethnographer bias was evidenced the in selection of people to interview. Some ethnographers spent more time with principals; others with staff.
To adjust for this, a procedure was adopted whereby ten people were "reselected" at each site. What this involved was rereading the site protocols and focusing attention specifically on the ten per sons selected. They were the project director, one oversight person, one employer, one cooperating principal, one job site developer and five staff members. The oversight person, employer and principal were chosen, to the extent possible, on the basis of their representativeness of their group at the site. In some instances, there were few choices. For example, there was usually only one director. Often there were few CETA oversight people from whom to choose. Staff members were selected on the basis of their "availability" (meaning there was sufficient information about them to make informed judgments) and on their representativeness of staff jobs at the site. Some projects emphasized academic and classroom activity more than did others, and had a higher proportion of teachers, while other projects emphasized counseling and had a higher proportion of counselors.
Many biases were reflected in disproportionately long discussions or reports on certain topics. An ethnographer interested in staff relations might spend more time on it than on sex equity, for example. To mitigate the effects of this kind of bias, protocols were coded line by line in terms of subgoal references. If, for example, sex equity was the subject of discussion, the number assigned to sex-role de-stereotyping was placed in the margin alongside of the line containing the reference. In this way, it was possible to count up the number of references to each service.
This procedure served two important functions. One was that, at the end of a protocol, it was possible to review relevant information according to service category, and thereupon make a decision about the strength of a given participant’s behaviors concerning each service. For example, all references to work site quality could be reviewed as a group for an individual participant, followed by all references to paid work experience, and so on for all services. This procedure also served to help the readers avoid being misled by disproportionately long or frequent discussions on given subjects into believing they were more important to participants than they really were. In this way it was possible to judge whether a particular participant felt more strongly about a given service, just because there were multiple reports connecting the individual to the service. Because the ethnographer was interested and pushed a subject to the point that a staff member referred to it six times did not necessarily make it important to the participant.
Judgment rules were developed (see Code Book for details) that placed priority on what a person did and then on what the person said he or she did. Third, came what the ethnographer explained. Lowest weightings went to the ethnographer’s recording of what one person said that another did, said or believed.
b. Project director, and outside evaluator bias
While clearly existing, the biases of project directors and outside evaluators were less problematic. Unlike the statements and observations of the ethnographers, those of the evaluators and directors were subject to various forms of corroboration. Their reports were used primarily to determine outputs. These could be verified against other Youthwork documents. Moreover, the reports of the evaluators were generally less subject to personal bias, and, in most cases, as trained researchers, they used procedures to minimize the likelihood of bias.
c. Reader bias
Probably the most difficult form of bias to guard against was bias in reader’s judgment. Three procedures were used in this study. The first was simply having two readers code all questionnaires. The second was to have readers start at different places for balancing. The third was for each reader to do a second coding after a lapse of time.
It was recognized at the start that knowing what a project had promised to deliver could influence what judgments were made about professional subgoals. Conversely, knowing professional subgoals could influence judgments about the project.
It was decided, therefore, that the researcher would code information about the project’s subgoals first, and then, several weeks later, code participant subgoals. The second reader coded only participant subgoals. Project outputs were coded at a later date. Finally, both researcher and reader recoded each subgoal score to check against possible bias. Scores were then compared.
An inter-reader reliability coefficient of 0.83 was attained on the first set of site protocols. It reached 0.93 on the last set.
These coefficients suggest that reader bias does not appear to have been a major problem. The rise over time probably is a reflection of an increasing commonality of understanding of what precisely was meant by each question.
4. Testing for Bias
A wide variety of tests were conducted to try to determine the extent to which errors, whether caused by simple misjudgment, bias or tainted data, would change results.
In a preliminary test, one case study was read and questionnaires completed by three readers. Each of the 40 initial items was checked on a person by person basis. A computer check was then run to determine whether or not the responses of any class of participants (administrator, employer, teacher, principal, CETA or other oversight personnel) corresponded consistently with the responses of that same class on any other question. Part of the hypothesis suggested that there would be high similarity in answers since subgoals are derived, in part at least, from current position. However, any two items that correlated at higher than 80 percent were considered to be measuring the same thing. None failed this test.
This does not mean, however, that duplication does not exist. The total n is small and very small by class. Coding errors are not impossible. It does seem to suggest, nonetheless, that data are probably sound enough for the types of analyses performed.
5. Testing for Errors in Slippage
The data show a wide range of aggregate values between sites as to what was promised and what was delivered.
Since slippage was defined as the discrepancy between what a site promised and what it delivered, it seemed possible that a site could be both high delivery and high slippage. It delivered a lot, but because it promised so much, slippage occurred.
To investigate this possibility, two checks were conducted. The first involved confirming slippage ratings with Youthwork monitors, who agreed that sites were accurately coded within the categories of high, medium and low.
A second test involved the assumption that slippage could be the result of errors in coding project subgoals. To check for this, an average "project score" of 85 was arbitrarily set. The difference was then computed between (1) the sum of what a site delivered of each service and the reproportioned value (the proportional relationship as a constant between the value of 85 and the value initially assigned as the total of all output values), and (2) the sum of the delivered outputs for each service (using the same constant), but where delivery of a service above the promised level simply counts as zero slippage. Sites were then ranked according to the level delivered, according to net delivery and according to slippage.
The results did not change the composition of high, medium, and low slippage groupings, although it did change the relative rankings of several projects within the groupings. Site G, for example, which was ranked fourth, or the highest of the medium slippage sites, dropped to seventh. In brief, it did not appear that coding errors to the extent they existed had much influence on analyses.
6. Testing for Professional Subgoal Coding Errors
Briefly, a test was conducted for Chapter V to determine what effect the lowering or raising of average staff professional subgoals on each service by a factor of up to plus or minus three would have on findings. Top project subgoals were defined as those services valued by the project at a level of 6 or above.
When services rated by individual projects at a level of 7 or above and staff ratings for the same services were raised or lowered by up to three points, no changes were noted. Only when those services rated by the project at a level of 6 or above were considered and staff subgoals for that service were lowered by three points could any change be found that then amounted to less than 3 percent.
Part of this apparent lack of effect might involve very small numbers. The total of all cases is 160 (16 services at 10 sites), and the averaging staff subgoals. Since, however, there is no way to tie any given staff member’s subgoals to the level of output of a given service at a site, about the only thing that could be done was to explore the relationships between the average of staff values for a given service and output of that service at that site. This fact and the number of cases is simply one more limitation on the data.
Various problems associated with qualitative data were recognized. Contamination and bias were considered. Several controlling procedures were used to mitigate their effects, as were a number of tests to determine their presence. The fact that preventive measures were instituted and the fact that the tests were essentially negative are encouraging.
Overall, it seems appropriate to make two observations. One is that the balance struck here between qualitative data and quantification seems to "feel" right. The data concerning individuals and sites represents what might equitably be termed "an honest and fair" presentation. The second comment is that the ability to quantify even to the limited degree done here and to still use the richness of the qualitative data in fact offers an exciting opportunity for research.
This chapter suggests that the secondary analysis of qualitative data is an efficient research strategy that is particularly appropriate for a study of the Implementation of federal policies. Further, it is suggested in the chapter that combining a closed ended questionnaire with more traditional open ended approaches for collecting data from the field provides an opportunity for the secondary analyst to document conceptual analyses.
75-1Paul Berman and Milbrey
Case Studies in Science Education p. C-28.
79-1Content analysis has been
used by political scientists as a means of classifying, analyzing and
interpreting various types of communication including speeches,
letters, newspapers and interviews. The method has been used to
identify the characteristics of the communication, e.g., propaganda;
the attitudes and motives of the speakers and writers; and the effects
of the communication. The classic description of the method as applied
to political science is Ithiel de Sola Pool, Trends in Content
Analysis (Champaign—Urbana: University of Illinois Press, 1959).
81-1For a discussion of the problems and potential solutions see Lois-ellin Datta, “Strange Bedfellows: The Politics of Qualitative Methods. Mimeographed.
81-1For a discussion of the problems and potential solutions see Lois-ellin Datta, “Strange Bedfellows: The Politics of Qualitative Methods. Mimeographed.
82-1Jerome T. Murphy, Getting the Facts: A Fieldwork Guide for Evaluators and Policy Analysts (Santa Monica: Goodyear Publishing Co., 1980), p. 27.
83-1Youthwork Knowledge Development Plan (Washington, D.C.: Youthwork Inc. , 1978).
entitled “Program Organization: Inter-Institutional
“Emergent and Sustaining Relationships in the Delivery Sys—
tern;” “Employment Training and Education: The Interrelationships
of the Delivery Systems.”
Abstract of the
Process: Guide for Project Officers( D.C.: U.S.
Department of Health Education and Welfare 1977), p. 13.
89-1See the “Codebook” (Appendix A) for a detailed description of the services.
Annual Report, 1980 (Washington, D.C.: Youthwork, Inc., 1981).