You can generate sample RDF data using bsbmtools. (Read more about BSBM, the Berlin SPARQL Benchmark, here.)
Prerequisites
Checkout and build bsbmtools
This section is an unofficial update of this section of the blazegraph/database repo, given the latter is out of date. Stardog has no affiliation with this repo.
Run the following commands:
svn checkout svn://svn.code.sf.net/p/bsbmtools/code/trunk bsbmtools-code
cd bsbmtools/trunk
ant
Before running ant
, update the build.xml
file in bsbmtools/trunk
to change the following lines:
<property name="java.source" value="1.6"/>
<property name="java.target" value="1.6"/>
to:
<property name="java.source" value="1.8"/>
<property name="java.target" value="1.8"/>
Without this change, ant
will be expecting Java 6. Your system has Java 11 installed (since this is required for running Stardog), so making the above change will make ant
expect Java 8 (which is compatible with Java 11).
Generate a dataset
This section follows this section of the blazegraph/database repo.
Run the following commands to generate a (roughly) 100 million triple dataset:
mkdir td_100m
./generate -fc -pc 284826 -fn td_100m/dataset -dir td_100m/td_data
gzip td_100m/dataset.nt
100 million triples generated this way is roughly 24GB. To generate a larger/smaller dataset, change the number in the -pc
flag by the corresponding multiple. In other words, to generate a 200M triple dataset, double the number used with -pc
:
mkdir td_200m
./generate -fc -pc 566496 -fn td_200m/dataset -dir td_200m/td_data
gzip td_200m/dataset.nt
To generate a 10M triple dataset, divide the number used with -pc
by 10:
mkdir td_10m
./generate -fc -pc 28483 -fn td_10m/dataset -dir td_10m/td_data
gzip td_100m/dataset.nt
The corresponding file generated will change by roughly the same multiple (e.g., a 10M triple dataset will be about 2.5GB).
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article