Solar System Corpus
This repository gathers experimental data about the solar system.
Source
The "source" directory contains source data, which are raw text data from DBPedia:
- test: some simple sentences for test
- solar-system: english abstract from https://dbpedia.org/page/Solar_System
Data
The "data" directery contains data in different representations. These data are organized in sub-folders, each folder corresponding to one sentence according to the following convention:
- a first level from SSC-01 to SSC-23, corresponding to the 23 original sentences in solar system abstract
- a second level (to each folder SSC-XX) with several sub-folders, denoted SSC-XX-01, SSC-XX-02, etc), corresponding to several sentence with similar meaning
For example, SSC-01-01 corresponds to the first sentence of original english abstract, while SSC-01-02, SSC-01-03, etc, correspond to some sentence with similar meaning (but not exactly the same) of the first sentence of original english abstract. SSC-02-01 corresponds to the second sentence, and so on.
To each sentence (in each sub-folder SSC-XX-YY), there is several files corresponding to different representation of sentence.
Some of these data were obtained from the sources, by using cm-tool project:
- AMR Graph ('dataRef.stog.amr.graph')
- AMR Linked Data in turtle format ('dataRef.amr.ttl')
- AMR Linked Data in ntriple format ('dataRef.amr.nt')
Other data were produced by hand:
- factoids from sentence as OWL representations ('dataRef.owl.ttl' or 'dataRef.xx.owl.ttl')
- CTS (Compositionnal Transduction Schemes) usable to obtain factoids from AMR Graph ('dataRef.cts.txt')