diff --git a/README.md b/README.md index bf4441a0fd6471bc1d255fb4e516b8cac51a5262..e28acb3a0ac7f1ff22884184c8046952d8b18dae 100644 --- a/README.md +++ b/README.md @@ -1,95 +1,45 @@ # TENET - Tool for Extraction using Net Extension by (semantic) Transduction ------------------------------------------------------------------------------- -This tool exploits an intermediate semantic representation (UNL-RDF graphs) to -construct an ontology representations of NL sentences. [TODO: compléter] +TENET is a python library for automatically constructing logical representations (OWL ontology) from textual documents. Its development is based on the W3C Semantic Web standards (RDF, OWL, SPARQL). It requires, as input, a set of pivot structures representing the document to be analysed, and gives as output a set of RDF-OWL triples forming an ontology, composed of classes, properties, instances and logical relations between these elements. -The treatment is carried out in two stages: - 1. Initialization: TODO. - 2. UNL sentences Loading: TODO. - 3. Transduction Process: the UNL-RDF graphs are extended to obtain semantic nets. - 4. Classification / Instanciation - 5. Reasonning - -[TODO: compléter la description] - - -## 1 - Implementation - -This implementation was made using Python languages, with UNL as pivot structure. - -[TODO: talk about UNL-RDF graph (obtained using UNL-RDF schemas)] - -The following module is included as main process: - - 1. Semantic Transduction Process (stp) for semantic analysis with transduction schemes - -The python script _tenet.py_ is used to manage the tool's commands, using components of the directory _scripts_. -The data to be processed must be placed in the directory _corpus_. All working data, including the results, -will be processed in the directory _workdata_. - -Transduction process configuration includes an ontology definition for semantic net, -and several transduction schemes as SPARQL request. - - -## 2 - Environment Setup +## 1 - Environment Setup -[TODO: external module souces?] - -The python code has been tested on Python 3.7. -All dependencies are listed in requirements.txt. These dependencies are used for external modules. +The python code has been tested under Python 3.7 and Linux Manjaro, but should be run on most common systems (Linux, Windows, Mac). +All dependencies are listed in **requirements.txt**. These dependencies are used for external modules. -The input directories contain evaluation files with some test corpus. +The **test** directory contains evaluation files with some test corpus. -## 3 - Content - -The **config** directory contains various configuration files for the process: - - - **unl-rdf-schema.ttl**: RDF schema for the interpretation of UNL graphs - - **smenet.ttl**: RDF schema of the semantic rules - -The **corpus** directory contains the corpora to be processed. - -The **frame** directory contains files defining frame ontologies. These establish target frames to be completed -by the extraction process, making it possible to obtain the expected representations. - -## 4 - Execution +## 2 - Library Usage -The application runs in a terminal using the tenet.py script: **python3 tenet.py <command> <args>**. +The script **test_tenet_main.py** (test directory) gives an example of using the library. -This prototype was tested with a standard computer configuration. The processing time is reasonable for both processing steps. +Two main methods are proposed to create an ontology from a file in amrlib format, or from a directory containing several files. These two methods take as parameters the path of the file or directory to be processed, and also some optional parameters. -The following times were measured for the processing of a file of 10 sentences: +The following code can be used to create an ontology from an AMR-Lib file: - - about xxx seconds for initialization and UNL sentences loading; - - about xxx second for transduction, classification and instanciation process; - - about xxx second for reasonning process. +import tenet +factoids = tenet.create_ontology_from_amrld_file(amrld_file_path, + onto_prefix=<onto_prefix>, + out_file_path=<out_file_path>, + technical_dir_path=<technical_dir_path>) +The following code can be used to create an ontology from an AMR-Lib directory: -## 5 - Commands - -Following commands are proposed to execute the different steps of the process: - - - **select**: command to select a corpus. - - **load**: command to load the UNL sentences of a given corpus. - - **extract**: command to extract terminologies data from UNL-RDF graph. - - **reason**: command to reason on terminology. - - **clean**: command to clean the working directories. - -These commands are used with the python script _tenet.py_. - -## 6 - Example - -[TODO: end-to-end example] +import tenet +factoids = tenet.create_ontology_from_amrld_dir(amrld_dir_path, + onto_prefix=<onto_prefix>, + out_file_path=<out_file_path>, + technical_dir_path=<technical_dir_path>) +## 3 - Content +TODO -## 7 - Evaluation -[TODO: experimentation] -------------------------------------------------------------------------------