![]() | CATyPI |
![]() |
|
|
Corpus of arguments from thesis and research proposalsIntroductionThe corpus of arguments from thesis and research proposals (CATyPI) is composed of 300 sections; each section has annotated argumentative paragraphs, argumentative components, and relations. The writings come from Coltypi collection of theses (Gonzalez-Lopez and Lopez-Lopez, 2015). The collection has 468 theses and research proposals in the computer and information technologies domain, in Spanish. The texts are from undergraduate (TSU and Bachelor Degree) and graduate level (M. Sc. and Ph.D.). In particular, our study focuses on the sections of the problem statement, justification, and conclusions. These sections are considered highly argumentative (Lopez and Garcia, 2003). The CATyPI corpus is created to identify the argumentative characteristics in academic writings of undergraduates and graduate level. The corpus had been used to detect paragraphs with arguments, assessment of justification sections and argument component identification. Annotation processWe performed the annotation of 300 sections with two instructors who have experience reviewing theses, following the annotation guide. For the annotation process, we have designed first, a guide for argument annotation. We consider two argument components: premises and conclusions, as well as two types of relations between components: support and attack. In our annotation guide, we described different argumentative structures with their argument components (conclusion/premise) and their relations (attack/support). We also include types of arguments and a score to establish the level of an argument. Moreover, a set of examples taken from academic theses is included to support the annotator. Finally, at the end of the guide, we present the annotation procedure. The annotation guide is available in anotation_guide_file.pdf We also convert the annotation of Word documents to BRAT system. The annotation guide for BRAT is available in annotation_brat_file.pdf Argumentative paragraphsThe level of an argument annotated for each paragraph was used to identified paragraphs without argument (level 0) and paragraphs with arguments (level 1, 2 and 3). In Table 1, we observed most sections have more than half of the paragraphs with arguments. We selected only the paragraphs where the two annotators agreed. The restriction reduces the number of paragraphs to 856 with 1,913 sentences and 76,841 words. From 856 paragraphs analyzed, we found that 584 are argumentative with a proportion of 68.2%. With the analysis, we observed that a significant amount of paragraphs in academic theses have arguments.
Table 1: Distribution of argumentative paragraphs per sections The distribution of paragraphs among academic degrees is 63.3% of undergraduate (542 paragraphs), 27.7% of master (237 paragraphs) and 9% of a doctoral (77 paragraphs). The section with more paragraphs is the undergraduate degree our main focus for analysis to help students at university. In Table 2 we observe segments labeled by two annotators as conclusion, premises or without any label (none) per section. We only selected segments where the two annotators agreed. This restriction reduced the number of segments to 2,104. We found a total of 1060 premises and 562 conclusions, almost double the number of premises compared to conclusions.
Table 2: Distribution of argument components per section Corpus downloadTo download the CATyPI corpus is required to complete the access form. Once completed an email message will be sent to the address with information to download the CATyPI corpus. The corpus is a product of an ongoing doctoral research entitled "Textual Analysis of Arguments in Academic Writing" by the student Author 1 advised by Author 2. The corpus is share for academic proposes under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Research published using the corpus must cite "Garcia-Gorrostieta, J. M., & López-López, A. 2019. A Corpus for Argument Analysis of Academic Writing: Argumentative Paragraph Detection. Journal of Intelligent & Fuzzy Systems, 36(5):4565-4577. DOI:10.3233/JIFS-179008". Access form
|