![]() | CATyPI |
![]() |
|
Corpus of arguments from thesis and research proposalsIntroductionThe corpus of arguments from thesis and research proposals (CATyPI) is composed of 444 sections; each section has annotated argumentative paragraphs, argumentative components, and relations. The writings come from Coltypi collection of theses (Gonzalez-Lopez and Lopez-Lopez, 2015). The collection has 468 theses and research proposals in the computer and information technologies domain, in Spanish. The texts are from undergraduate (TSU and Bachelor Degree) and graduate level (M. Sc. and Ph.D.). In particular, our study focuses on the sections of the problem statement, justification, and conclusions. These sections are considered highly argumentative (Lopez and Garcia, 2003). The CATyPI corpus is created to identify the argumentative characteristics in academic writings of undergraduates and graduate level. The corpus had been used to detect paragraphs with arguments, assessment of justification sections and argument component identification. Annotation processWe performed the annotation of 444 sections with two instructors who have experience reviewing theses, following the annotation guide. For the annotation process, we have designed first, a guide for argument annotation. We consider two argument components: premises and conclusions, as well as two types of relations between components: support and attack. In our annotation guide, we described different argumentative structures with their argument components (conclusion/premise) and their relations (attack/support). We also include types of arguments and a score to establish the level of an argument. Moreover, a set of examples taken from academic theses is included to support the annotator. Finally, at the end of the guide, we present the annotation procedure. The annotation guide is available in anotation_guide_file.pdf The annotation guide for BRAT is available in annotation_brat_file.pdf Argumentative paragraphsThe level of an argument annotated for each paragraph was used to identified paragraphs without argument (level 0) and paragraphs with arguments (level 1, 2 and 3). In Table 1, we observed most sections have more than half of the paragraphs with arguments. We selected only the paragraphs where the two annotators agreed. The restriction reduces the number of paragraphs to 1,434 with 3,029 sentences and 112,572 words. From 1,434 paragraphs analyzed, we found that 1,090 are argumentative with a proportion of 76%. With the analysis, we observed that a significant amount of paragraphs in academic theses have arguments.
Table 1: Distribution of argumentative paragraphs per sections The distribution of paragraphs among academic degrees is 56.6% of undergraduate (812 paragraphs), 36.4% of master (522 paragraphs) and 7% of a doctoral (100 paragraphs). The section with more paragraphs is the undergraduate degree our main focus for analysis to help students at university. In Table 2 we observe segments labeled by two annotators as conclusion, premises or without any label (none) per section. We only selected segments where the two annotators agreed. Only in 75 sections, a decision was made by a judge to resolve disagreements. This restriction reduced the number of segments to 3,488. We found a total of 1,700 premises and 1,165 conclusions, almost double the number of premises compared to conclusions.
Table 2: Distribution of argument components per section Corpus downloadTo download the CATyPI corpus is required to complete the access form. Once completed an email message will be sent to the address with information to download the CATyPI corpus. The corpus is a product of an doctoral research entitled "Textual Analysis of Arguments in Academic Writing" by the student Jesús Miguel García-Gorrostieta advised by Dr. Aurelio López-López. The corpus is share for academic proposes under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Research published using the corpus must cite the corpus article . Garcia-Gorrostieta, J. M., Lopez-Lopez, A., Rico-Sulayes, A. & Carrillo, M. 2020. Argument corpus development and argument component classification: A study in academic. Digital Scholarship in the Humanities, 1-27. DOI:10.1093/llc/fqaa020 Access form
|