diff --git a/README.md b/README.md index 66f1a64b604db5910a213955527e1e278de229a1..e1761b02e50b3166a1767952e72d171d031d1f6a 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,19 @@ The encoding script work on xml files conforming to `./data/orig/req_document.xs Examples of an input anf outputs are provided in `./data/examples/` +Ziped folders of "unlized" XML files of the corpus are available in the ./data folder. + +:bangbang: For some reason a namespace attribute of the root node make the script crash on the documents of the corpus. +Please modify the following : +``` +<req_document xsi:schemaLocation="req_document.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="req_document.xsd"> +``` +to +``` +<req_document xsi:schemaLocation="req_document.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> +``` +before submiting an xml of the corpus to the script. + First clone the repo (or at least download the scripts folder): ``` git clone https://gitlab.tetras-libre.fr/unl/nlreqdataset-unl-enco.git