-
Aurélien Lamercerie authoredAurélien Lamercerie authored
Solar System Corpus
This repository gathers experimental data about the solar system.
Source
The "source" directory contains source data, which are raw text data from DBPedia:
- test: some simple sentences for test
- solar-system: english abstract from https://dbpedia.org/page/Solar_System
Data
The "data" directery contains data in different representations. These data are organized in sub-folders, each folder corresponding to one sentence according to the following convention:
- a first level from SSC-01 to SSC-23, corresponding to the 23 original sentences in solar system abstract
- a second level (to each folder SSC-XX) with several sub-folders, denoted SSC-XX-01, SSC-XX-02, etc), corresponding to several sentence with similar meaning
For example, SSC-01-01 corresponds to the first sentence of original english abstract, while SSC-01-02, SSC-01-03, etc, correspond to some sentence with similar meaning (but not exactly the same) of the first sentence of original english abstract. SSC-02-01 corresponds to the second sentence, and so on.
To each sentence (in each sub-folder SSC-XX-YY), there is several files corresponding to different representation of sentence.
Some of these data were obtained from the sources, by using cm-tool project:
- AMR Graph ('dataRef.stog.amr.graph')
- AMR Linked Data in turtle format ('dataRef.amr.ttl')
- AMR Linked Data in ntriple format ('dataRef.amr.nt')
Other data were produced by hand:
- factoids from sentence as OWL representations ('dataRef.owl.ttl' or 'dataRef.xx.owl.ttl')
- CTS (Compositionnal Transduction Schemes) usable to obtain factoids from AMR Graph ('dataRef.cts.txt')