COMMunication
RESearch.NET

This is an old revision of the document!

install.packages(c('SPARQL','igraph','network','ergm'),dependencies=TRUE)

Now load the packages by calling:

library(SPARQL)
library(igraph)
library(network)
library(ergm)

Define the endpoint that will provide you with the triples by

endpoint <- "http://live.dbpedia.org/sparql"

State that there are no further options to send to the SPARQL server. These options are sent as HTTP parameters and differ per end point. For example, Jena Fuseki can take the option “output=xml” to dictate that it should return XML, SWI-Prolog Cliopatria can take “entailment=rdfs” or “entailment=none” to state which kind of reasoning to apply.

options <- NULL

For a local Jena Fuseki installation hosting the same data in the LOP graph you can use the following options (uncommented, i.e., without the leading #):

# endpoint <- "http://localhost:3030/movie/sparql"
# options <- "output=xml"

To shorten the URIs of the data that we get back, use some namespace declarations like this

prefix <- c("db","http://dbpedia.org/resource/")
sparql_prefix <- "PREFIX dbp: <http://dbpedia.org/property/>
                  PREFIX dc: <http://purl.org/dc/terms/>
                  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
                  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
"

The data you will now be able to access follows the DBpedia schema. An example of the structure of the graphs in the triple store is shown below.

dbpedia_movie_schema
The queries will match parts of this graph.

Let's write a query that gets all actors, the movies they star in, and the director and release date of those movies. Also, we only want American movies, names in english, and dates that are correct XML Schema dates (ISO dates). If you use the SPARQL function to fire the query, you will get back an R data frame that contains the results. Every variable in the SPARQL query will correspond to a column in a result table data frame.

q <- paste(sparql_prefix,
  'SELECT ?actor ?movie ?director ?movie_date
   WHERE {
     ?m dc:subject <http://dbpedia.org/resource/Category:American_films> .
     ?m rdfs:label ?movie .
     FILTER(LANG(?movie) = "en")
     ?m dbp:released ?movie_date .
     FILTER(DATATYPE(?movie_date) = xsd:date)
     ?m dbp:starring ?a .
     ?a rdfs:label ?actor .
     FILTER(LANG(?actor) = "en")
     ?m dbp:director ?d .
     ?d rdfs:label ?director .
     FILTER(LANG(?director) = "en")
   }')

Sys.setenv(TZ = "UTC")
res <- SPARQL(endpoint,q,ns=prefix,extra=options)$results

res

# output:
#                   actor                  movie          director movie_date
# 1 "Harland Williams"@en "Big Money Hustlas"@en "John Cafiero"@en  993506400
# 2   "Jamie Spaniolo"@en "Big Money Hustlas"@en "John Cafiero"@en  993506400
# 3     "Paul Methric"@en "Big Money Hustlas"@en "John Cafiero"@en  993506400
# 4    "Joseph Utsler"@en "Big Money Hustlas"@en "John Cafiero"@en  993506400
# 5   "Rudy Ray Moore"@en "Big Money Hustlas"@en "John Cafiero"@en  993506400
# ...

res$movie_date <- as.Date(as.POSIXct(res$movie_date,origin="1970-01-01"))

# output:
#                   actor                  movie          director movie_date
# 1 "Harland Williams"@en "Big Money Hustlas"@en "John Cafiero"@en 2001-06-25
# 2   "Jamie Spaniolo"@en "Big Money Hustlas"@en "John Cafiero"@en 2001-06-25
# 3     "Paul Methric"@en "Big Money Hustlas"@en "John Cafiero"@en 2001-06-25
# 4    "Joseph Utsler"@en "Big Money Hustlas"@en "John Cafiero"@en 2001-06-25
# 5   "Rudy Ray Moore"@en "Big Money Hustlas"@en "John Cafiero"@en 2001-06-25
# ...