Skip to content
Snippets Groups Projects

TENET - Tool for Extraction using Net Extension by (semantic) Transduction


This tool exploits an intermediate semantic representation (UNL-RDF graphs) to construct an ontology representations of NL sentences. [TODO: compléter]

The treatment is carried out in two stages:

  1. Initialization: TODO.
  2. UNL sentences Loading: TODO.
  3. Transduction Process: the UNL-RDF graphs are extended to obtain semantic nets.
  4. Classification / Instanciation
  5. Reasonning

[TODO: compléter la description]

1 - Implementation

This implementation was made using Python languages, with UNL as pivot structure.

[TODO: talk about UNL-RDF graph (obtained using UNL-RDF schemas)]

The following module is included as main process:

  1. Semantic Transduction Process (stp) for semantic analysis with transduction schemes

The python script tenet.py is used to manage the tool's commands, using components of the directory scripts. The data to be processed must be placed in the directory corpus. All working data, including the results, will be processed in the directory workdata.

Transduction process configuration includes an ontology definition for semantic net, and several transduction schemes as SPARQL request.

2 - Environment Setup

[TODO: external module souces?]

The python code has been tested on Python 3.7. All dependencies are listed in requirements.txt. These dependencies are used for external modules.

The input directories contain evaluation files with some test corpus.

3 - Content

The config directory contains various configuration files for the process:

  • unl-rdf-schema.ttl: RDF schema for the interpretation of UNL graphs
  • smenet.ttl: RDF schema of the semantic rules

The corpus directory contains the corpora to be processed.

The frame directory contains files defining frame ontologies. These establish target frames to be completed by the extraction process, making it possible to obtain the expected representations.

4 - Execution

The application runs in a terminal using the tenet.py script: python3 tenet.py .

This prototype was tested with a standard computer configuration. The processing time is reasonable for both processing steps.

The following times were measured for the processing of a file of 10 sentences:

  • about xxx seconds for initialization and UNL sentences loading;
  • about xxx second for transduction, classification and instanciation process;
  • about xxx second for reasonning process.

5 - Commands

Following commands are proposed to execute the different steps of the process:

  • select: command to select a corpus.
  • load: command to load the UNL sentences of a given corpus.
  • extract: command to extract terminologies data from UNL-RDF graph.
  • reason: command to reason on terminology.
  • clean: command to clean the working directories.

These commands are used with the python script tenet.py.

6 - Example

[TODO: end-to-end example]

7 - Evaluation

[TODO: experimentation]


References


--