Skip to content
Snippets Groups Projects
Commit 24553ca7 authored by David Rouquet's avatar David Rouquet
Browse files

Ajout d'une fct qui ignore les namespaces dans le document (pb car le NS avait...

Ajout d'une fct qui ignore les namespaces dans le document (pb car le NS avait été supprimé de l'exemple alors qu'il est présent dans les autres XML du corpus)
parent fda7018b
No related branches found
No related tags found
No related merge requests found
Pipeline #193 passed
...@@ -30,16 +30,7 @@ Examples of an input anf outputs are provided in `./data/examples/` ...@@ -30,16 +30,7 @@ Examples of an input anf outputs are provided in `./data/examples/`
Ziped folders of "unlized" XML files of the corpus are available in the ./data folder. Ziped folders of "unlized" XML files of the corpus are available in the ./data folder.
:bangbang: For some reason a namespace attribute of the root node make the script crash on the documents of the corpus. :bangbang: `unlizeXml.py` ignores namespaces in the XML document.
Please modify the following :
```
<req_document xsi:schemaLocation="req_document.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="req_document.xsd">
```
to
```
<req_document xsi:schemaLocation="req_document.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
```
before submiting an xml of the corpus to the script.
First clone the repo (or at least download the scripts folder): First clone the repo (or at least download the scripts folder):
``` ```
......
...@@ -7,6 +7,10 @@ import tempfile ...@@ -7,6 +7,10 @@ import tempfile
import os import os
from subprocess import Popen, PIPE, STDOUT from subprocess import Popen, PIPE, STDOUT
def remove_namespace(doc):
#Remove namespace in the passed document in place
for elem in doc.getiterator():
elem.tag=etree.QName(elem.tag).localname
def unlize(text, lang, dry_run=False): def unlize(text, lang, dry_run=False):
...@@ -122,6 +126,7 @@ def unl2dot(text, path): ...@@ -122,6 +126,7 @@ def unl2dot(text, path):
def unlizeXml(input, output, lang, dry_run, svg, unltools_path): def unlizeXml(input, output, lang, dry_run, svg, unltools_path):
doc = etree.parse(input) doc = etree.parse(input)
remove_namespace(doc)
tags = ['title', 'text_body', 'term', 'meaning'] tags = ['title', 'text_body', 'term', 'meaning']
for t in tags: for t in tags:
for node in doc.xpath('//'+t): for node in doc.xpath('//'+t):
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment