Skip to content
Snippets Groups Projects
Select Git revision
  • 833bce36a6944fe9ed33c6fdce1fda4906f85c28
  • main default protected
2 results

solar-system-corpus

Solar System Corpus


This repository gathers experimental data about the solar system, and some useful scripts/programs to obtain this data.

Data

The data is organized into the following folders:

  • abstractText: raw text data from DBPedia
  • amrGraph: sentence representations as AMRs Graph
  • amrLk: sentence representations as AMR Linked Data

Script: <convert_text_to_amr.py>

This script converts raw texts into AMR representations. It can be adapted as needed. Especially, parameters can be adjusted to specify the data to be processed.

Installation

This project was developp under Python 3 and Manjaro Linux system, but it should run on any common system.

First, it is recommended to use a virtual environment. For example, 'ssc-env' can be create and use with the following commands:

python3 -m venv ssc-env

source ssc-env/bin/activate

The necessary libraries are defined in the file 'requirements.txt'. They can be installed in the virtual environment using package installer as pip:

pip install -r requirements.txt

See specific installation instructions about amrlib (amrlib-install).

So, it is necessary to install the models used by amrlib library. Models can be downloaded from amrlib-models.

These files need to be extracted and reside in the install directory under amrlib/data and should be named model_stog for the default parse model. The default models is loaded with `stog = amrlib.load_stog_model()'. To have multiple models of the same type, you'll need to supply the directory name when loading, ie 'stog = amrlib.load_stog_model(model_dir='.../amrlib/data/model_parse_t5-v0_1_0')'.

Usage

Parameters can be adjusted in the script code as needed. The script take data ref as argument (for example, 'test'). It can be run using command line:

python3 convert_text_to_amr.py test

References


amrlib: A python library that makes AMR parsing, generation and visualization simple.

amr-ld: A Python library for mapping AMRs to linked data formats (such as RDF and JSON-LD).

Burns, G.A., Hermjakob, U., Ambite, J.L. (2016). Abstract Meaning Representations as Linked Data. In: , et al. The Semantic Web – ISWC 2016. ISWC 2016. Lecture Notes in Computer Science(), vol 9982. Springer, Cham. https://doi.org/10.1007/978-3-319-46547-0_2

DBPedia: A crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects.