This is an old revision of the document!
install.packages(c('SPARQL','igraph','network','ergm'),dependencies=TRUE)
Now load the packages by calling:
library(SPARQL) library(igraph) library(network) library(ergm)
Define the endpoint that will provide you with the triples by
endpoint <- "http://live.dbpedia.org/sparql"
State that there are no further options to send to the SPARQL server. These options are sent as HTTP parameters and differ per end point. For example, Jena Fuseki can take the option “output=xml” to dictate that it should return XML, SWI-Prolog Cliopatria can take “entailment=rdfs” or “entailment=none” to state which kind of reasoning to apply.
options <- NULL
For a local Jena Fuseki installation hosting the same data in the LOP graph you can use the following options (uncommented, i.e., without the leading #):
# endpoint <- "http://localhost:3030/movie/sparql" # options <- "output=xml"
To shorten the URIs of the data that we get back, use some namespace declarations like this
prefix <- c("db","http://dbpedia.org/resource/") sparql_prefix <- "PREFIX dbp: <http://dbpedia.org/property/> PREFIX dc: <http://purl.org/dc/terms/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> "
The data you will now be able to access follows the DBpedia schema. An example of the structure of the graphs in the triple store is shown below.
dbpedia_movie_schema
The queries will match parts of this graph.
Let's write a query that gets all actors, the movies they star in, and the director and release date of those movies. Also, we only want American movies, names in english, and dates that are correct XML Schema dates (ISO dates). If you use the SPARQL function to fire the query, you will get back an R data frame that contains the results. Every variable in the SPARQL query will correspond to a column in a result table data frame.
q <- paste(sparql_prefix, 'SELECT ?actor ?movie ?director ?movie_date WHERE { ?m dc:subject <http://dbpedia.org/resource/Category:American_films> . ?m rdfs:label ?movie . FILTER(LANG(?movie) = "en") ?m dbp:released ?movie_date . FILTER(DATATYPE(?movie_date) = xsd:date) ?m dbp:starring ?a . ?a rdfs:label ?actor . FILTER(LANG(?actor) = "en") ?m dbp:director ?d . ?d rdfs:label ?director . FILTER(LANG(?director) = "en") }') Sys.setenv(TZ = "UTC") res <- SPARQL(endpoint,q,ns=prefix,extra=options)$results res
# output: # actor movie director movie_date # 1 "Harland Williams"@en "Big Money Hustlas"@en "John Cafiero"@en 993506400 # 2 "Jamie Spaniolo"@en "Big Money Hustlas"@en "John Cafiero"@en 993506400 # 3 "Paul Methric"@en "Big Money Hustlas"@en "John Cafiero"@en 993506400 # 4 "Joseph Utsler"@en "Big Money Hustlas"@en "John Cafiero"@en 993506400 # 5 "Rudy Ray Moore"@en "Big Money Hustlas"@en "John Cafiero"@en 993506400 # ...
res$movie_date <- as.Date(as.POSIXct(res$movie_date,origin="1970-01-01"))
# output: # actor movie director movie_date # 1 "Harland Williams"@en "Big Money Hustlas"@en "John Cafiero"@en 2001-06-25 # 2 "Jamie Spaniolo"@en "Big Money Hustlas"@en "John Cafiero"@en 2001-06-25 # 3 "Paul Methric"@en "Big Money Hustlas"@en "John Cafiero"@en 2001-06-25 # 4 "Joseph Utsler"@en "Big Money Hustlas"@en "John Cafiero"@en 2001-06-25 # 5 "Rudy Ray Moore"@en "Big Money Hustlas"@en "John Cafiero"@en 2001-06-25 # ...