Skip to content
Snippets Groups Projects
Select Git revision
  • master default protected
1 result

README.md

Blame
  • CM-Tool: Corpus Making Tool


    This repository gathers some useful programs to obtain experimental data and enable the construction of corpus about various topic.

    Source

    The "source" directory contains source data, which are raw text data from DBPedia.

    Data

    The "data" directery contains data in different representations:

    • sequence of sentences ('dataRef.sentence.txt')
    • AMRs Graph ('dataRef.amr.graph')
    • AMR Linked Data ('dataRef.amr.rdf')

    These data were obtained from the sources, by applying the script 'convert_text_to_amr.py'.

    Script <convert_text_to_amr.py>

    This script converts raw texts into AMR representations. It can be adapted as needed. Especially, parameters can be adjusted to specify the data to be processed.

    Installation

    This project was developp under Python 3 and Manjaro Linux system, but it should run on any common system.

    First, it is recommended to use a virtual environment. For example, 'ssc-env' can be create and use with the following commands:

    python3 -m venv ssc-env
    
    source ssc-env/bin/activate

    The necessary libraries are defined in the file 'requirements.txt'. They can be installed in the virtual environment using package installer as pip:

    pip install -r requirements.txt

    See specific installation instructions about amrlib (amrlib-install).

    So, it is necessary to install the models used by amrlib library. Models can be downloaded from amrlib-models.

    These files need to be extracted and reside in the install directory under amrlib/data and should be named model_stog for the default parse model. The default models is loaded with `stog = amrlib.load_stog_model()'. To have multiple models of the same type, you'll need to supply the directory name when loading, ie 'stog = amrlib.load_stog_model(model_dir='.../amrlib/data/model_parse_t5-v0_1_0')'.

    Usage

    Parameters can be adjusted in the script code as needed. The script take data ref as argument (for example, 'test'). It can be run using command line:

    python3 convert_text_to_amr.py test

    Library

    The "lib" directory contains useful library.

    References


    amrlib: A python library that makes AMR parsing, generation and visualization simple.

    amr-ld: A Python library for mapping AMRs to linked data formats (such as RDF and JSON-LD).

    Burns, G.A., Hermjakob, U., Ambite, J.L. (2016). Abstract Meaning Representations as Linked Data. In: , et al. The Semantic Web – ISWC 2016. ISWC 2016. Lecture Notes in Computer Science(), vol 9982. Springer, Cham. https://doi.org/10.1007/978-3-319-46547-0_2

    DBPedia: A crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects.