NoSQL Databases for RDF: An Empirical Evaluation
Navigation: Home | Documentation | Full Paper | Berlin SPARQL Benchmark Results | DBpedia SPARQL Benchmark Results

Abstract & Authors

Processing large volumes of RDF data requires sophisticated tools. In recent years, much effort was spent on optimizing native RDF stores and on repurposing relational query engines for large-scale RDF processing. Concurrently, a number of new data management systems—regrouped under the NoSQL (for "not only SQL") umbrella—rapidly rose to prominence and represent today a popular alternative to classical databases. Though NoSQL systems are increasingly used to manage RDF data, it is still difficult to grasp their key advantages and drawbacks in this context. This work is, to the best of our knowledge, the first systematic attempt at characterizing and comparing NoSQL stores for RDF processing. In the following, we describe four different NoSQL stores and compare their key characteristics when running standard RDF benchmarks on a popular cloud infrastructure using both single-machine and distributed deployments.

Authors (Listed in Alphabetical Order)
Philippe Cudré-Mauroux
Iliya Enchev
Sever Fundatureanu
Paul Groth
Albert Haque
Andreas Harth
Felix Leif Keppmann
Daniel Miranker
Juan Sequeda
Marcin Wylot

From the following institutions:
University of Fribourg
VU University Amsterdam
University of Texas at Austin
Karlsruhe Institute of Technology

Links & Downloads

Spreadsheet With All Benchmark Results
[Google Doc] [xlsx]

Code Repositories, Documentation, & Machine Images

  System Website / Github How-To AMI Image ID (N. Virginia Region) Other Notes
  4store Website How-To Zookeeper AMI: ami-32542a5b
Worker AMI: ami-00542a69
  Jena+HBase Github How-To Use Amazon EMR HBase Configuration
  Hive+HBase Github How-To Use Amazon EMR HBase Configuration
  CumulusRDF Code  
  Couchbase Github 1 / Github 2 How-To Image ID: ami-6771010e

Last Modified: Wed, 27 Nov 2013 08:57:02 -0600.