Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
1 result

solar-system-corpus

  • Clone with SSH
  • Clone with HTTPS
  • Name Last commit Last update
    data
    source
    .gitignore
    README.md

    Solar System Corpus


    This repository gathers experimental data about the solar system.

    Source

    The "source" directory contains source data, which are raw text data from DBPedia:

    Data

    The "data" directery contains data in different representations. These data are organized in sub-folders, each folder corresponding to one sentence according to the following convention:

    • a first level from SSC-01 to SSC-23, corresponding to the 23 original sentences in solar system abstract
    • a second level (to each folder SSC-XX) with several sub-folders, denoted SSC-XX-01, SSC-XX-02, etc), corresponding to several sentence with similar meaning

    For example, SSC-01-01 corresponds to the first sentence of original english abstract, while SSC-01-02, SSC-01-03, etc, correspond to some sentence with similar meaning (but not exactly the same) of the first sentence of original english abstract. SSC-02-01 corresponds to the second sentence, and so on.

    To each sentence (in each sub-folder SSC-XX-YY), there is several files corresponding to different representation of sentence.

    Some of these data were obtained from the sources, by using cm-tool project:

    • AMR Graph ('dataRef.stog.amr.graph')
    • AMR Linked Data in turtle format ('dataRef.amr.ttl')
    • AMR Linked Data in ntriple format ('dataRef.amr.nt')

    Other data were produced by hand:

    • factoids from sentence as OWL representations ('dataRef.owl.ttl' or 'dataRef.xx.owl.ttl')
    • CTS (Compositionnal Transduction Schemes) usable to obtain factoids from AMR Graph ('dataRef.cts.txt')