Ajout d'une fct qui ignore les namespaces dans le document (pb car le NS avait...

Ajout d'une fct qui ignore les namespaces dans le document (pb car le NS avait été supprimé de l'exemple alors qu'il est présent dans les autres XML du corpus)

Ajout d'une fct qui ignore les namespaces dans le document (pb car le NS avait...
24553ca7 · David Rouquet · fda7018b · 24553ca7 · 24553ca7
Commit 24553ca7 authored 5 years ago by David Rouquet
--- a/README.md
+++ b/README.md
@@ -30,16 +30,7 @@ Examples of an input anf outputs are provided in `./data/examples/`
 Ziped folders of "unlized" XML files of the corpus are available in the ./data folder.
-:bangbang: For some reason a namespace attribute of the root node make the script crash on the documents of the corpus.
+:bangbang: `unlizeXml.py` ignores namespaces in the XML document.
-Please modify the following :
-```
-<req_document xsi:schemaLocation="req_document.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="req_document.xsd">
-```
-to
-```
-<req_document xsi:schemaLocation="req_document.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
-```
-before submiting an xml of the corpus to the script.
 First clone the repo (or at least download the scripts folder):
 ```

--- a/scripts/unlizeXml.py
+++ b/scripts/unlizeXml.py
@@ -7,6 +7,10 @@ import tempfile
 import os
 from subprocess import Popen, PIPE, STDOUT
+def remove_namespace(doc):
+    #Remove namespace in the passed document in place
+    for elem in doc.getiterator():
+        elem.tag=etree.QName(elem.tag).localname
 def unlize(text, lang, dry_run=False):
@@ -122,6 +126,7 @@ def unl2dot(text, path):
 def unlizeXml(input, output, lang, dry_run, svg, unltools_path):
    doc = etree.parse(input)
+    remove_namespace(doc)
    tags = ['title', 'text_body', 'term', 'meaning']
    for t in tags:
        for node in doc.xpath('//'+t):