CATyPI


Corpus of arguments from thesis and research proposals

 

Introduction

The corpus of arguments from thesis and research proposals (CATyPI) is composed of 444 sections; each section has annotated argumentative paragraphs, argumentative components, and relations. The writings come from Coltypi collection of theses (Gonzalez-Lopez and Lopez-Lopez, 2015). The collection has 468 theses and research proposals in the computer and information technologies domain, in Spanish. The texts are from undergraduate (TSU and Bachelor Degree) and graduate level (M. Sc. and Ph.D.). In particular, our study focuses on the sections of the problem statement, justification, and conclusions. These sections are considered highly argumentative (Lopez and Garcia, 2003).

The CATyPI corpus is created to identify the argumentative characteristics in academic writings of undergraduates and graduate level. The corpus had been used to detect paragraphs with arguments, assessment of justification sections and argument component identification.

Annotation process

We performed the annotation of 444 sections with two instructors who have experience reviewing theses, following the annotation guide. For the annotation process, we have designed first, a guide for argument annotation. We consider two argument components: premises and conclusions, as well as two types of relations between components: support and attack. In our annotation guide, we described different argumentative structures with their argument components (conclusion/premise) and their relations (attack/support). We also include types of arguments and a score to establish the level of an argument. Moreover, a set of examples taken from academic theses is included to support the annotator. Finally, at the end of the guide, we present the annotation procedure.

The annotation guide is available in anotation_guide_file.pdf

The annotation guide for BRAT is available in annotation_brat_file.pdf

Argumentative paragraphs

The level of an argument annotated for each paragraph was used to identified paragraphs without argument (level 0) and paragraphs with arguments (level 1, 2 and 3). In Table 1, we observed most sections have more than half of the paragraphs with arguments. We selected only the paragraphs where the two annotators agreed. The restriction reduces the number of paragraphs to 1,434 with 3,029 sentences and 112,572 words. From 1,434 paragraphs analyzed, we found that 1,090 are argumentative with a proportion of 76%. With the analysis, we observed that a significant amount of paragraphs in academic theses have arguments.

Paragraphs
with arguments
Paragraphs
without arguments
Problem Statement 275 119
Justification 268 92
Conclusion 547 133
Total 1090 344


Table 1: Distribution of argumentative paragraphs per sections

The distribution of paragraphs among academic degrees is 56.6% of undergraduate (812 paragraphs), 36.4% of master (522 paragraphs) and 7% of a doctoral (100 paragraphs). The section with more paragraphs is the undergraduate degree our main focus for analysis to help students at university.

In Table 2 we observe segments labeled by two annotators as conclusion, premises or without any label (none) per section. We only selected segments where the two annotators agreed. Only in 75 sections, a decision was made by a judge to resolve disagreements. This restriction reduced the number of segments to 3,488. We found a total of 1,700 premises and 1,165 conclusions, almost double the number of premises compared to conclusions.

Conclusions Premises None
Problem Statement 268 503 228
Justification 262 408 155
Conclusion 635 789 240
Total 1165 1700 623


Table 2: Distribution of argument components per section

Corpus download

To download the CATyPI corpus is required to complete the access form. Once completed an email message will be sent to the address with information to download the CATyPI corpus. The corpus is a product of an doctoral research entitled "Textual Analysis of Arguments in Academic Writing" by the student Jesús Miguel García-Gorrostieta advised by Dr. Aurelio López-López. The corpus is share for academic proposes under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Research published using the corpus must cite the corpus article . Garcia-Gorrostieta, J. M., Lopez-Lopez, A., Rico-Sulayes, A. & Carrillo, M. 2020. Argument corpus development and argument component classification: A study in academic. Digital Scholarship in the Humanities, 1-27. DOI:10.1093/llc/fqaa020

Access form

Full name:

Email adress:

Professional Title

Institution or Organization




References

Gonzalez-Lopez, S. and Lopez-Lopez, A. (2015). Coleccion de tesis y propuesta de investigacion en tics: un recurso para su analisis y estudio. In XIII Congreso Nacional de Investigacion Educativa, pages 1–15

Lopez Ferrero, C. and Garcia Negroni, M. (2003). La argumentacion en los generos academicos. In Actas del Congreso Internacional La Argumentacion, pages 1121–1129. Universidad de Buenos Aires, Buenos Aires.

 

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.