Skip to content
Snippets Groups Projects
Commit adb5df55 authored by David Rouquet's avatar David Rouquet
Browse files

Update README.md

parent 0341ba76
No related branches found
No related tags found
No related merge requests found
# nlreqdataset-unl-enco # nlreqdataset-unl-enco
This repo will contain all or part of the nlreqdataset of system requirements (http://fmt.isti.cnr.it/nlreqdataset/), enconverted in UNL with http://unl.ru/deco.html ans possibly post-edited. This repo will contain all or part of the nlreqdataset of system requirements (http://fmt.isti.cnr.it/nlreqdataset/), enconverted in UNL with http://unl.ru/deco.html and possibly post-edited.
\ No newline at end of file
The dataset is presented in the following abstract:
## PURE: a Dataset of Public Requirements Documents
Ferrari, Alessio; Spagnolo, Giorgio Oronzo; Gnesi, Stefania
This paper presents PURE (PUblic REquirements dataset), a dataset of 79 publicly available natural language requirements documents collected from the Web. The dataset includes 34,268 sentences and can be used for natural language processing tasks that are typical in requirements engineering, such as model synthesis, abstraction identification and document structure assessment. It can be further annotated to work as a benchmark for other tasks, such as ambiguity detection, requirements categorisation and identification of equivalent re-quirements. In the associated paper, we present the dataset and we compare its language with generic English texts, showing the peculiarities of the requirements jargon, made of a restricted vocabulary of domain-specific acronyms and words, and long sentences. We also present the common XML format to which we have manually ported a subset of the documents, with the goal of facilitating replication of NLP experiments. The XML documents are also available for download.
The paper associated to the dataset can be found here:
https://ieeexplore.ieee.org/document/8049173/
More info about the dataset is available here:
http://nlreqdataset.isti.cnr.it
Preprint of the paper available at ResearchGate:
https://goo.gl/HxJD7X
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment