Skip to content
Snippets Groups Projects
Name Last commit Last update
data
source
.gitignore
README.md

Solar System Corpus


This repository gathers experimental data about the solar system.

Source

The "source" directory contains source data, which are raw text data from DBPedia:

Data

The "data" directery contains data in different representations. These data are organized in sub-folders, each folder corresponding to one sentence according to the following convention:

  • a first level from SSC-01 to SSC-23, corresponding to the 23 original sentences in solar system abstract
  • a second level (to each folder SSC-XX) with several sub-folders, denoted SSC-XX-01, SSC-XX-02, etc), corresponding to several sentence with similar meaning

For example, SSC-01-01 corresponds to the first sentence of original english abstract, while SSC-01-02, SSC-01-03, etc, correspond to some sentence with similar meaning (but not exactly the same) of the first sentence of original english abstract. SSC-02-01 corresponds to the second sentence, and so on.

To each sentence (in each sub-folder SSC-XX-YY), there is several files corresponding to different representation of sentence.

Some of these data were obtained from the sources, by using cm-tool project:

  • AMR Graph ('dataRef.stog.amr.graph')
  • AMR Linked Data in turtle format ('dataRef.amr.ttl')
  • AMR Linked Data in ntriple format ('dataRef.amr.nt')

Other data were produced by hand:

  • factoids from sentence as OWL representations ('dataRef.owl.ttl' or 'dataRef.xx.owl.ttl')
  • CTS (Compositionnal Transduction Schemes) usable to obtain factoids from AMR Graph ('dataRef.cts.txt')