9.4 Issues in Forensic Stylistic Analysis
Various issues have been raised regarding linguistic approaches to questioned authorship cases, both in documented research and criticism, as well as in the direct- and cross-examination testimony of various expert witnesses (lin- guists and nonlinguists). The principal focus of concern is the methodology used to exclude or identify potential writers as authors.
A critical focus on method is helpful because it forwards the process of testing and improving the theory and of unifying applications of it. To be ignored, of course, are isolated instances where some investigators take them- selves far too seriously on their detour as critics. For example, upon seeing how his own research is used to support nearly all other studies of individ- uality in writing, Crystal (1995:382) laments for himself and a coauthor, If we were dead, we would turn in our graves. Alas, poor Yorick! And, after inaccurately representing that American questioned document examiners claim a zero error rate, Chaski (2001:11) bravely tumbles her straw man with the comment that such claims should make any scientist shudder in disbelief.
The most constructive critical commentaries related to recent develop- ments in questioned authorship studies are those of Finnegan (1990), Crystal (1995), Goutsos (1995), and Grant and Baker (2001). Many of the issues that they raise have been consolidated in the questions and responses that follow.
9.4.1 Stylistics
Is stylistics an established field? Stylistics is a well established field in literary and linguistic studies. Its history of study is broad based and long, spanning many countries and nearly two centuries, and its documented bibliography contains hundreds of works in and on various languages. However, older techniques for establishing authorship are continually under the test of scientific scrutiny, and new methods are being progressively proposed and tested.
9.4.2 Variation
What is the norm and how can it be established? Since variation presupposes a norm, it is necessary to establish the norm to describe variation within or from it. The analyst must be able to establish the norms of language behavior associated with writings studied (see Section 5.2). Although norms may be difficult to identify in precise terms, speakers and writers do not find it difficult to identify variation, which means that they have at least an uncon- scious knowledge of the norms governing their language use. Any social, geographic, or situational norm can be used as the basis for describing variation in writing.
What is conscious and what is unconscious in composition? Related questions concern the possibility of forgery or disguised writing. Not enough is known about composition to establish precisely what in writing is con- scious or unconscious. The reasons for this are the difficulties associated with such studies, i.e., that every writers level of conscious choice of forms in writing is different, and that writers demonstrate varying levels of conscious- ness in language production, e.g., unconscious, subconscious, semiconscious, and conscious.
A reasonable assumption is that there are linguistic levels more distant from the conscious choice of a writer. For example, words and phrases are viewed as more consciously chosen than syntactic structures. Chaski (2001:8) says, syntactic processing is automatized, unconscious behavior and therefore is difficult either to disguise or imitate , but she refers to this as a fundamental idea about language individuality, i.e., still an assumption.
A solution to this problem is to exploit the relationship known to exist between natural language and patterned variation. Natural language: what can be applied of Labovs (1966:100) definition of casual speech to written language is that natural language in writing is everyday writing used in informal situations, where little or no attention is directed to the writing process. Patterned variation: countless studies demonstrate linguistic varia- tion to be structured, meaning that the pattern of linguistic and nonlinguistic constraints on the presence and probability of occurrence of each variable can be specified.
The less attention paid to language production, the more regularly struc- tured (real) the variation will be. On the other hand, if considerable attention is given to the writing process, especially in the contexts of imitation or disguise, variation will be unstructured, unpredictable, and different from that present in like writings of the same author. (See Sections 5.2 and 8.2.6 for examples.)
As a practical matter, then, the analyst can do at least two things to take full advantage of a writers entire range of variation (important in authorship cases), while mitigating the possibility of imitation or disguise. First, every effort should be made to obtain comparison writings of similar context and purpose. Second, if possible disguise is an issue, the analyst can include all variation as part of a given writers range of variation, then determine the validity of every variable based on its internal structure.
With respect to attempts to replicate another writers style, it has already been demonstrated earlier (Oregon v. Crescenzi in Chapter 3, and Estate of Violet Houssien in Chapters 7 and 8) that it is not always so difficult to identify attempted disguise in written language. Imitators cannot recognize the type or frequency of the variables in another writers range of variation. It is well known, for example, that the American radio comedians of decades past, Amos n Andy, were two white men imitating African-American English. Although they both had considerable lifetime contact with speakers of Afri- can American English, they really only succeeded in using a few stereotypical dialect features, and even those were not used at frequencies matching those of the speech community (Snyder, 1989).
What is a sufficient sample size? There are very many stylistic studies demonstrating various sample sizes (see McMenamin, 1993), yet an absolute low-end sample size has not been successfully determined as adequate for
authorship studies. The reason for this lies in the diversity of language itself. The sample of language is adequate when stylistic variables occur with suf- ficient frequency to establish patterns of variation. Although the test for what constitutes a pattern may be qualitative or quantitative, one of the long established requirements for a good linguistic variable is frequency of occur- rence. Of course, on the high end, the larger the sample, the more chance that linguistic patterns will demonstrate their structure.
9.4.3 Method of Data Analysis
How do different models of stylistic analysis relate to practical cases? The consistency model (as reviewed in Chapter 6) is used in many cases wherein the single or multiple authorship of various writings is in question. For example, an insurance company wanted to know if ten different witness statements were actually written by the people who signed them, or if they were all authored by the claimant and then presented for signature to the ten different writers. Sometimes the consistency of a set of writings is questioned in anticipation of using them as a single-author set of questioned writings to compare to those of a known suspect writer.
The resemblance model is the one most frequently applied. When content or external circumstances permit identification of just one suspect writer, the authorship question is limited to the resemblance between the questioned writings and known writings of that candidate.
The population model is an extension of the resemblance task. When content or external circumstances permit identification of a limited popula- tion of writers, the authorship question is narrowed to the resemblance between the questioned writings and known writings of a limited number (closed-set) of writers.
What is the method of analysis?
1. Get organized: arrange and organize questioned and known writings into manageable sets.
2. State the problem: articulate the authorship problem as you see it.
Articulate the research questions for descriptive analysis and quanti- tative analysis. Select the appropriate authorship models.
3. Procedural steps: assemble all questioned and known writings with the same or similar context of writing. Assess the range of stylistic varia- tion in each set. Identify style-markers: deviations from or variations within any appropriate norm. Note single occurrences of variation as well as habitual variation. If the writings are extensive, make a KWIC concordance to help identify variables. If variation so indicates, make a KPIC concordance of the punctuation in the writings.
4. Specify descriptive results: specify individual style-markers at all lin- guistic levels. Specify the range of variation: the aggregate set of all deviations and variations. Identify and separate style-markers that are class and individual characteristics.
5. Specify quantitative results: give results of statistical tests used to eval- uate the significance of variables. Estimate the joint probability of occurrence of variables in compared writings. Apply other appropriate quantitative approaches to style marker identification.
6. Specify exclusion conclusion: identify dissimilarities between the style- markers of questioned and known writings. Determine linguistic or statistical significance of dissimilarities. Determine to what degree the candidate writer can be excluded.
7. Specify identification conclusion: identify similarities between the style- markers of questioned and known writings. Determine linguistic or statistical significance of similarities. Determine to what degree the candidate writer can be identified.
8. Precedent cases: check linguistic and legal precedents related to style- markers used.
9. State an opinion: state an opinion for the court: exclusion level, iden- tification level, or inconclusive.
10. Write a report if so requested: write a report or declaration that follows the structure used in the sciences.
How are style-markers identified? This question is the single most important issue raised in current research on questioned authorship and is posed in various other ways, e.g.,: how are criteria for identification moti- vated? and how are stylistic variables selected and justified?
The single most important starting point for selecting style-markers is to work within a theoretical model of linguistics that views stylistic variation as inherent to the system of language itself, i.e., not a characteristic of lan- guage performance. This will assure the discovery of the patterned variation needed for authorship identification, as opposed to the accidental and less than systematic characteristics of performance.
The second analytical requirement is to recognize that unique markers are extremely rare, so authorship identification requires the identification of an aggregate of markers, each of which may be found in other writers, but all of which would unlikely be present together in any other writer. This means that it is highly unlikely that any single marker of writing style could be used to identify all writers of a language or dialect, or even one writers idiolect.
Therefore, the approach is to identify the whole range of variation in a given set of writings, and analyze it in any acceptable descriptive or quantitative way. The theoretical limitation to this, of course, is the recognition and definition of
the norm within or from which the language varies. This, however, does not present significant practical obstacles, except for an analyst who does not know the governing norms or who cannot identify the variation.
One recent attempt to tackle the issue of style marker identification (Chaski, 2001) is only marginally successful because it assumes that style- markers are like the same relatively small set of chromosomes used for DNA analysis, thereby ignoring the whole range of linguistic variation presented in writing (McMenamin, 2001). However, Grant and Baker (2001:76) and others have proposed a very promising approach to style marker identifica- tion, and it is consistent with the (above-mentioned) principles of stylistic variation: principal component analysis. This approach makes eminent sense. First, it recognizes that authorship identification is achieved through an array of markers: Components would consist of those markers which collectively account for the most variance in the texts (Grant and Baker, 2001). Second, it recognizes the near impossibility of identifying style-markers that are gen- erally valid and reliable for all writers (Grant and Baker, 2001). This indicates that they understand the inherent variability of language, i.e., that stylistic variation, whether dialectal or idiolectal, must be analyzed as part of the underlying competence of speakers and writers, not as a never-to-be-found universal of linguistic performance.
9.4.4 Qualitative Analysis
What is the role of the analysts intuition? Intuition is the analysts use of his or her own judgment to discover linguistic variation and suggest initial hypotheses to investigate. As a speaker or writer of the language and as a linguist, the analyst uses introspection to start the process of analysis. Lakoff comments on the use of introspection and informal observation that, any procedure is at some point introspective (Lakoff, 1975:5). A good discussion of the methodological role of intuition in linguistic research can be found in Johnstone (2000).
Are qualitative statements impressionistic? This is also asked in other ways: Is not qualitative analysis subjective and quantitative analysis objective? Is this description without analysis? While this question is elaborated on in Chapter 7, it bears repeating that stylistic analyses are both qualitative and quantitative, but the description of written language is the first and most important means for discovering style variation. The focus of the qualitative study of writing is a systematic linguistic description of what forms are used by a writer, and how and why they may be used.
What is the process of argumentation? The scientific basis of the argu- ment is that of any empirical study: observation, description, measurement, and conclusion. In the specific case of authorship studies, the argument is as follows:
Notice these style-markers in the corpus of writing.
The array of patterned markers is described as a, b, c,
Each of these markers has x probability of occurring in the writing of the speech community.
Taken as an aggregate set, they have y probability of occurring together in one writer.
The author-specific markers and their joint probability of occurrence are either the same as or different from those of a comparison corpus of writing.
9.4.5 Quantitative Methods of Data Analysis
Is statistical analysis necessary? As stated in Chapter 8, the measurement of variation in written language complements its description and is therefore important to the successful analysis and interpretation of style. The focus is on how much and how often forms are used by a writer, which is necessary to satisfy current requirements for the study of linguistic variation, as well as to satisfy external requirements for expert evidence as imposed by the judiciary.
Are there baseline norms for frequency statements? Baseline norms can be established on a case-by-case basis. Written language corpora are becom- ing more available, although it may not always be possible to match exactly the context of writing presented by documents in a particular case. Often, civil and criminal clients can produce an appropriate corpus from their own workplace or other writings produced in a context similar to those being analyzed. Specialized corpora developed by specialists in other fields can sometimes be found e.g., a corpus of suicide notes collected by a coroner.
How is frequency defined and what is its significance? In descriptive studies, one instance of a style variant does not constitute a pattern, unless it can be demonstrated to be unique through frequency estimates based on large corpora. However, a pattern (whose relative strength increases with frequency of occurrence) can be established with two or more instances. For example, the appearance of confidentuality in a questioned document and its double repetition in known writings is a definite pattern of variation. In quantitative studies, the determination of frequency is somewhat easier inso- far as statistical tests specify minimum numbers of instances necessary for their reliable use.
9.4.6 Other Questions
Does not the use of a suspect writers own name suggest that he is not the writer? No, it is well known in the field of questioned document examination that anonymous writers often use their own names to encourage investigators to see them as victims rather than possible perpetrators.
Does the analyst fail to look at other possible writers? The specific task determines the model of analysis to be used. If the population model is appropriate to the problem, then other possible writers are studied. If external information dictates the use of the resemblance model, then only one writer is considered. In those cases for which clients cannot articulate the problem and clearly indicate the task, this must be done with the linguists help before analysis is started.
Does the analyst look for exculpatory characteristics? Whether certain evidence is incriminating or exculpatory is the concern of the client and the trier of fact, not the expert witness. The linguist analyzes the writings pre- sented for both similarities and differences vis-à-vis comparison writings, then states his or her findings, conclusions, and opinions. The linguists participation usually stops there if the evidence is not consistent with the expectations of the client. The linguist may testify at deposition or in court if it is the case that his or her evidence supports the clients position.
9.5 The Linguist as Expert Witness
9.5.1 Qualifications
There are two ways to talk about a linguists qualifications: what level of professional preparation in linguistics and forensic science the individual should have, and what the Court determines is necessary to qualify the linguist as an expert. Although most forensic linguists who now testify in court have a doctorate, a lesser degree, graduate or undergraduate, would suffice when combined with sufficient case experience in analysis and in court. The Court will usually qualify the linguist as an expert in the field if the combination of education and experience demonstrates that the linguist can, in fact, provide evidence that will help a jury make an enlightened decision.
The Federal Rules of Evidence (FRE 702) state three requirements for an expert witness:
1. The witness must qualify as an expert by knowledge, skill, experience, training, or education greater than the average layperson in the area of his or her testimony. Since the expert is there to help the jury resolve a relevant issue, his or her relative level of expertise may affect the weight the judge or jury gives the opinion, but not its admissibility.
2. The expert must testify to scientific, technical, or other specialized knowledge. The reliability of the testimony is based on valid reasoning and a reliable methodology, as opposed to subjective observations or speculative conclusions.
3. The experts testimony must assist the trier of fact, i.e., be relevant to the task of the judge or the jury to understand the evidence or deter- mine disputed facts.
To be avoided is the imposition of artificial requirements that have noth- ing to do with the linguists ability or FRE 702. In one case, an opposing linguist implied that his opinion should carry more weight, noting that, while the other linguist was a professor at a state university, he taught at a well known private university elsewhere in California. In another case, a linguist proposed the narrow requirement that a good expert linguist be a graduate of one of a select few East Coast universities in the U.S. There are, in fact, excellent linguistics programs throughout the U.S. and the world, and thou- sands of good linguists (now more than 14,000 on The Linguist List), many capably handling cases in forensic linguistics.
9.5.2 Reports
The structure of the linguists statement will follow the report style of the empirical sciences, something along these lines:
1. Summary (equivalent to the abstract of an academic paper)
2. Articulation of problem (research problem and statement of hypotheses)
3. Upfront statement of opinion (usually at the end of an academic paper)
4. Previous work
Linguistic studies related to identified variables
Legal precedents related to problem or variation (often best handled by attorney)
5. Method
Outline of research tasks that match the specified problem
Data collection and organization
Data analysis
6. Findings
7. Discussion
8. Conclusion
(9.) Identification
(8.) Highly probably did write
(7.) Probably did write (6.) Indications did write (5.) Inconclusive
(4.) Indications did not write
(3.) Probably did not write
(2.) Highly probably did not write
(1.) Elimination
In studying various reports and testimony of opposing linguists a number of observations can be made. First, forensic linguists must learn to function in the oftentimes aggressive context of litigation without getting angry, defen- sive, petulant, or aggressive behaviors learned anywhere, but not infre- quently reinforced in academia.
Second, linguists should be reserved about the outcomes of cases, even if their testimony was important to the Courts decision. The contrary commu- nicates lack of scientific detachment, even possible advocacy, and, puffery on a website or overstating the importance of testimony in an academic article is unseemly. Also, linguistics testimony is rarely the only evidence. The linguist may not (and should not) know what other external (nonlanguage) evidence is used to support his or her clients case, all of which can have a significant bearing on the outcome of the case. In addition, differences of professional opinion and trial outcomes may also result from linguists on opposing sides having different data to work with, or having the occasional client who cares more about his or her advocacy role than the truth. There is also the danger, as once happened, that a linguist rushes into print with the truth, only to discover that the other side won a $6 million judgment against his client.
Some linguists are actually quite careful about this. In one case, reviewed by Kaplan (1998), significant differences in linguistic analyses and testimony existed. Kaplans presentation of the evidence clearly documents his analysis (with some reference to the differences), states the positive outcome for his client, and in no way exaggerates the issues.
Third, some linguists do not conduct their own analysis, but simply evaluate the analysis of an opposing linguist.
by GERALD R. McMENAMIN
Various issues have been raised regarding linguistic approaches to questioned authorship cases, both in documented research and criticism, as well as in the direct- and cross-examination testimony of various expert witnesses (lin- guists and nonlinguists). The principal focus of concern is the methodology used to exclude or identify potential writers as authors.
A critical focus on method is helpful because it forwards the process of testing and improving the theory and of unifying applications of it. To be ignored, of course, are isolated instances where some investigators take them- selves far too seriously on their detour as critics. For example, upon seeing how his own research is used to support nearly all other studies of individ- uality in writing, Crystal (1995:382) laments for himself and a coauthor, If we were dead, we would turn in our graves. Alas, poor Yorick! And, after inaccurately representing that American questioned document examiners claim a zero error rate, Chaski (2001:11) bravely tumbles her straw man with the comment that such claims should make any scientist shudder in disbelief.
The most constructive critical commentaries related to recent develop- ments in questioned authorship studies are those of Finnegan (1990), Crystal (1995), Goutsos (1995), and Grant and Baker (2001). Many of the issues that they raise have been consolidated in the questions and responses that follow.
9.4.1 Stylistics
Is stylistics an established field? Stylistics is a well established field in literary and linguistic studies. Its history of study is broad based and long, spanning many countries and nearly two centuries, and its documented bibliography contains hundreds of works in and on various languages. However, older techniques for establishing authorship are continually under the test of scientific scrutiny, and new methods are being progressively proposed and tested.
9.4.2 Variation
What is the norm and how can it be established? Since variation presupposes a norm, it is necessary to establish the norm to describe variation within or from it. The analyst must be able to establish the norms of language behavior associated with writings studied (see Section 5.2). Although norms may be difficult to identify in precise terms, speakers and writers do not find it difficult to identify variation, which means that they have at least an uncon- scious knowledge of the norms governing their language use. Any social, geographic, or situational norm can be used as the basis for describing variation in writing.
What is conscious and what is unconscious in composition? Related questions concern the possibility of forgery or disguised writing. Not enough is known about composition to establish precisely what in writing is con- scious or unconscious. The reasons for this are the difficulties associated with such studies, i.e., that every writers level of conscious choice of forms in writing is different, and that writers demonstrate varying levels of conscious- ness in language production, e.g., unconscious, subconscious, semiconscious, and conscious.
A reasonable assumption is that there are linguistic levels more distant from the conscious choice of a writer. For example, words and phrases are viewed as more consciously chosen than syntactic structures. Chaski (2001:8) says, syntactic processing is automatized, unconscious behavior and therefore is difficult either to disguise or imitate , but she refers to this as a fundamental idea about language individuality, i.e., still an assumption.
A solution to this problem is to exploit the relationship known to exist between natural language and patterned variation. Natural language: what can be applied of Labovs (1966:100) definition of casual speech to written language is that natural language in writing is everyday writing used in informal situations, where little or no attention is directed to the writing process. Patterned variation: countless studies demonstrate linguistic varia- tion to be structured, meaning that the pattern of linguistic and nonlinguistic constraints on the presence and probability of occurrence of each variable can be specified.
The less attention paid to language production, the more regularly struc- tured (real) the variation will be. On the other hand, if considerable attention is given to the writing process, especially in the contexts of imitation or disguise, variation will be unstructured, unpredictable, and different from that present in like writings of the same author. (See Sections 5.2 and 8.2.6 for examples.)
As a practical matter, then, the analyst can do at least two things to take full advantage of a writers entire range of variation (important in authorship cases), while mitigating the possibility of imitation or disguise. First, every effort should be made to obtain comparison writings of similar context and purpose. Second, if possible disguise is an issue, the analyst can include all variation as part of a given writers range of variation, then determine the validity of every variable based on its internal structure.
With respect to attempts to replicate another writers style, it has already been demonstrated earlier (Oregon v. Crescenzi in Chapter 3, and Estate of Violet Houssien in Chapters 7 and 8) that it is not always so difficult to identify attempted disguise in written language. Imitators cannot recognize the type or frequency of the variables in another writers range of variation. It is well known, for example, that the American radio comedians of decades past, Amos n Andy, were two white men imitating African-American English. Although they both had considerable lifetime contact with speakers of Afri- can American English, they really only succeeded in using a few stereotypical dialect features, and even those were not used at frequencies matching those of the speech community (Snyder, 1989).
What is a sufficient sample size? There are very many stylistic studies demonstrating various sample sizes (see McMenamin, 1993), yet an absolute low-end sample size has not been successfully determined as adequate for
authorship studies. The reason for this lies in the diversity of language itself. The sample of language is adequate when stylistic variables occur with suf- ficient frequency to establish patterns of variation. Although the test for what constitutes a pattern may be qualitative or quantitative, one of the long established requirements for a good linguistic variable is frequency of occur- rence. Of course, on the high end, the larger the sample, the more chance that linguistic patterns will demonstrate their structure.
9.4.3 Method of Data Analysis
How do different models of stylistic analysis relate to practical cases? The consistency model (as reviewed in Chapter 6) is used in many cases wherein the single or multiple authorship of various writings is in question. For example, an insurance company wanted to know if ten different witness statements were actually written by the people who signed them, or if they were all authored by the claimant and then presented for signature to the ten different writers. Sometimes the consistency of a set of writings is questioned in anticipation of using them as a single-author set of questioned writings to compare to those of a known suspect writer.
The resemblance model is the one most frequently applied. When content or external circumstances permit identification of just one suspect writer, the authorship question is limited to the resemblance between the questioned writings and known writings of that candidate.
The population model is an extension of the resemblance task. When content or external circumstances permit identification of a limited popula- tion of writers, the authorship question is narrowed to the resemblance between the questioned writings and known writings of a limited number (closed-set) of writers.
What is the method of analysis?
1. Get organized: arrange and organize questioned and known writings into manageable sets.
2. State the problem: articulate the authorship problem as you see it.
Articulate the research questions for descriptive analysis and quanti- tative analysis. Select the appropriate authorship models.
3. Procedural steps: assemble all questioned and known writings with the same or similar context of writing. Assess the range of stylistic varia- tion in each set. Identify style-markers: deviations from or variations within any appropriate norm. Note single occurrences of variation as well as habitual variation. If the writings are extensive, make a KWIC concordance to help identify variables. If variation so indicates, make a KPIC concordance of the punctuation in the writings.
4. Specify descriptive results: specify individual style-markers at all lin- guistic levels. Specify the range of variation: the aggregate set of all deviations and variations. Identify and separate style-markers that are class and individual characteristics.
5. Specify quantitative results: give results of statistical tests used to eval- uate the significance of variables. Estimate the joint probability of occurrence of variables in compared writings. Apply other appropriate quantitative approaches to style marker identification.
6. Specify exclusion conclusion: identify dissimilarities between the style- markers of questioned and known writings. Determine linguistic or statistical significance of dissimilarities. Determine to what degree the candidate writer can be excluded.
7. Specify identification conclusion: identify similarities between the style- markers of questioned and known writings. Determine linguistic or statistical significance of similarities. Determine to what degree the candidate writer can be identified.
8. Precedent cases: check linguistic and legal precedents related to style- markers used.
9. State an opinion: state an opinion for the court: exclusion level, iden- tification level, or inconclusive.
10. Write a report if so requested: write a report or declaration that follows the structure used in the sciences.
How are style-markers identified? This question is the single most important issue raised in current research on questioned authorship and is posed in various other ways, e.g.,: how are criteria for identification moti- vated? and how are stylistic variables selected and justified?
The single most important starting point for selecting style-markers is to work within a theoretical model of linguistics that views stylistic variation as inherent to the system of language itself, i.e., not a characteristic of lan- guage performance. This will assure the discovery of the patterned variation needed for authorship identification, as opposed to the accidental and less than systematic characteristics of performance.
The second analytical requirement is to recognize that unique markers are extremely rare, so authorship identification requires the identification of an aggregate of markers, each of which may be found in other writers, but all of which would unlikely be present together in any other writer. This means that it is highly unlikely that any single marker of writing style could be used to identify all writers of a language or dialect, or even one writers idiolect.
Therefore, the approach is to identify the whole range of variation in a given set of writings, and analyze it in any acceptable descriptive or quantitative way. The theoretical limitation to this, of course, is the recognition and definition of
the norm within or from which the language varies. This, however, does not present significant practical obstacles, except for an analyst who does not know the governing norms or who cannot identify the variation.
One recent attempt to tackle the issue of style marker identification (Chaski, 2001) is only marginally successful because it assumes that style- markers are like the same relatively small set of chromosomes used for DNA analysis, thereby ignoring the whole range of linguistic variation presented in writing (McMenamin, 2001). However, Grant and Baker (2001:76) and others have proposed a very promising approach to style marker identifica- tion, and it is consistent with the (above-mentioned) principles of stylistic variation: principal component analysis. This approach makes eminent sense. First, it recognizes that authorship identification is achieved through an array of markers: Components would consist of those markers which collectively account for the most variance in the texts (Grant and Baker, 2001). Second, it recognizes the near impossibility of identifying style-markers that are gen- erally valid and reliable for all writers (Grant and Baker, 2001). This indicates that they understand the inherent variability of language, i.e., that stylistic variation, whether dialectal or idiolectal, must be analyzed as part of the underlying competence of speakers and writers, not as a never-to-be-found universal of linguistic performance.
9.4.4 Qualitative Analysis
What is the role of the analysts intuition? Intuition is the analysts use of his or her own judgment to discover linguistic variation and suggest initial hypotheses to investigate. As a speaker or writer of the language and as a linguist, the analyst uses introspection to start the process of analysis. Lakoff comments on the use of introspection and informal observation that, any procedure is at some point introspective (Lakoff, 1975:5). A good discussion of the methodological role of intuition in linguistic research can be found in Johnstone (2000).
Are qualitative statements impressionistic? This is also asked in other ways: Is not qualitative analysis subjective and quantitative analysis objective? Is this description without analysis? While this question is elaborated on in Chapter 7, it bears repeating that stylistic analyses are both qualitative and quantitative, but the description of written language is the first and most important means for discovering style variation. The focus of the qualitative study of writing is a systematic linguistic description of what forms are used by a writer, and how and why they may be used.
What is the process of argumentation? The scientific basis of the argu- ment is that of any empirical study: observation, description, measurement, and conclusion. In the specific case of authorship studies, the argument is as follows:
Notice these style-markers in the corpus of writing.
The array of patterned markers is described as a, b, c,
Each of these markers has x probability of occurring in the writing of the speech community.
Taken as an aggregate set, they have y probability of occurring together in one writer.
The author-specific markers and their joint probability of occurrence are either the same as or different from those of a comparison corpus of writing.
9.4.5 Quantitative Methods of Data Analysis
Is statistical analysis necessary? As stated in Chapter 8, the measurement of variation in written language complements its description and is therefore important to the successful analysis and interpretation of style. The focus is on how much and how often forms are used by a writer, which is necessary to satisfy current requirements for the study of linguistic variation, as well as to satisfy external requirements for expert evidence as imposed by the judiciary.
Are there baseline norms for frequency statements? Baseline norms can be established on a case-by-case basis. Written language corpora are becom- ing more available, although it may not always be possible to match exactly the context of writing presented by documents in a particular case. Often, civil and criminal clients can produce an appropriate corpus from their own workplace or other writings produced in a context similar to those being analyzed. Specialized corpora developed by specialists in other fields can sometimes be found e.g., a corpus of suicide notes collected by a coroner.
How is frequency defined and what is its significance? In descriptive studies, one instance of a style variant does not constitute a pattern, unless it can be demonstrated to be unique through frequency estimates based on large corpora. However, a pattern (whose relative strength increases with frequency of occurrence) can be established with two or more instances. For example, the appearance of confidentuality in a questioned document and its double repetition in known writings is a definite pattern of variation. In quantitative studies, the determination of frequency is somewhat easier inso- far as statistical tests specify minimum numbers of instances necessary for their reliable use.
9.4.6 Other Questions
Does not the use of a suspect writers own name suggest that he is not the writer? No, it is well known in the field of questioned document examination that anonymous writers often use their own names to encourage investigators to see them as victims rather than possible perpetrators.
Does the analyst fail to look at other possible writers? The specific task determines the model of analysis to be used. If the population model is appropriate to the problem, then other possible writers are studied. If external information dictates the use of the resemblance model, then only one writer is considered. In those cases for which clients cannot articulate the problem and clearly indicate the task, this must be done with the linguists help before analysis is started.
Does the analyst look for exculpatory characteristics? Whether certain evidence is incriminating or exculpatory is the concern of the client and the trier of fact, not the expert witness. The linguist analyzes the writings pre- sented for both similarities and differences vis-à-vis comparison writings, then states his or her findings, conclusions, and opinions. The linguists participation usually stops there if the evidence is not consistent with the expectations of the client. The linguist may testify at deposition or in court if it is the case that his or her evidence supports the clients position.
9.5 The Linguist as Expert Witness
9.5.1 Qualifications
There are two ways to talk about a linguists qualifications: what level of professional preparation in linguistics and forensic science the individual should have, and what the Court determines is necessary to qualify the linguist as an expert. Although most forensic linguists who now testify in court have a doctorate, a lesser degree, graduate or undergraduate, would suffice when combined with sufficient case experience in analysis and in court. The Court will usually qualify the linguist as an expert in the field if the combination of education and experience demonstrates that the linguist can, in fact, provide evidence that will help a jury make an enlightened decision.
The Federal Rules of Evidence (FRE 702) state three requirements for an expert witness:
1. The witness must qualify as an expert by knowledge, skill, experience, training, or education greater than the average layperson in the area of his or her testimony. Since the expert is there to help the jury resolve a relevant issue, his or her relative level of expertise may affect the weight the judge or jury gives the opinion, but not its admissibility.
2. The expert must testify to scientific, technical, or other specialized knowledge. The reliability of the testimony is based on valid reasoning and a reliable methodology, as opposed to subjective observations or speculative conclusions.
3. The experts testimony must assist the trier of fact, i.e., be relevant to the task of the judge or the jury to understand the evidence or deter- mine disputed facts.
To be avoided is the imposition of artificial requirements that have noth- ing to do with the linguists ability or FRE 702. In one case, an opposing linguist implied that his opinion should carry more weight, noting that, while the other linguist was a professor at a state university, he taught at a well known private university elsewhere in California. In another case, a linguist proposed the narrow requirement that a good expert linguist be a graduate of one of a select few East Coast universities in the U.S. There are, in fact, excellent linguistics programs throughout the U.S. and the world, and thou- sands of good linguists (now more than 14,000 on The Linguist List), many capably handling cases in forensic linguistics.
9.5.2 Reports
The structure of the linguists statement will follow the report style of the empirical sciences, something along these lines:
1. Summary (equivalent to the abstract of an academic paper)
2. Articulation of problem (research problem and statement of hypotheses)
3. Upfront statement of opinion (usually at the end of an academic paper)
4. Previous work
Linguistic studies related to identified variables
Legal precedents related to problem or variation (often best handled by attorney)
5. Method
Outline of research tasks that match the specified problem
Data collection and organization
Data analysis
6. Findings
7. Discussion
8. Conclusion
(9.) Identification
(8.) Highly probably did write
(7.) Probably did write (6.) Indications did write (5.) Inconclusive
(4.) Indications did not write
(3.) Probably did not write
(2.) Highly probably did not write
(1.) Elimination
In studying various reports and testimony of opposing linguists a number of observations can be made. First, forensic linguists must learn to function in the oftentimes aggressive context of litigation without getting angry, defen- sive, petulant, or aggressive behaviors learned anywhere, but not infre- quently reinforced in academia.
Second, linguists should be reserved about the outcomes of cases, even if their testimony was important to the Courts decision. The contrary commu- nicates lack of scientific detachment, even possible advocacy, and, puffery on a website or overstating the importance of testimony in an academic article is unseemly. Also, linguistics testimony is rarely the only evidence. The linguist may not (and should not) know what other external (nonlanguage) evidence is used to support his or her clients case, all of which can have a significant bearing on the outcome of the case. In addition, differences of professional opinion and trial outcomes may also result from linguists on opposing sides having different data to work with, or having the occasional client who cares more about his or her advocacy role than the truth. There is also the danger, as once happened, that a linguist rushes into print with the truth, only to discover that the other side won a $6 million judgment against his client.
Some linguists are actually quite careful about this. In one case, reviewed by Kaplan (1998), significant differences in linguistic analyses and testimony existed. Kaplans presentation of the evidence clearly documents his analysis (with some reference to the differences), states the positive outcome for his client, and in no way exaggerates the issues.
Third, some linguists do not conduct their own analysis, but simply evaluate the analysis of an opposing linguist.
by GERALD R. McMENAMIN