User Tools

Site Tools


hollywood_social_network_analysis
install.packages(c('SPARQL','igraph','network','ergm'),dependencies=TRUE)

Now load the packages by calling:

library(SPARQL)
library(igraph)
library(network)
library(ergm)

Define the endpoint that will provide you with the triples by

endpoint <- "http://live.dbpedia.org/sparql"

State that there are no further options to send to the SPARQL server. These options are sent as HTTP parameters and differ per end point. For example, Jena Fuseki can take the option “output=xml” to dictate that it should return XML, SWI-Prolog Cliopatria can take “entailment=rdfs” or “entailment=none” to state which kind of reasoning to apply.

options <- NULL

For a local Jena Fuseki installation hosting the same data in the LOP graph you can use the following options (uncommented, i.e., without the leading #):

# endpoint <- "http://localhost:3030/movie/sparql"
# options <- "output=xml"

To shorten the URIs of the data that we get back, use some namespace declarations like this

prefix <- c("db","http://dbpedia.org/resource/")
sparql_prefix <- "PREFIX dbp: <http://dbpedia.org/property/>
                  PREFIX dc: <http://purl.org/dc/terms/>
                  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
                  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
"

The data you will now be able to access follows the DBpedia schema. An example of the structure of the graphs in the triple store is shown below.

dbpedia_movie_schema
The queries will match parts of this graph.

Let's write a query that gets all actors, the movies they star in, and the director and release date of those movies. Also, we only want American movies, names in english, and dates that are correct XML Schema dates (ISO dates). If you use the SPARQL function to fire the query, you will get back an R data frame that contains the results. Every variable in the SPARQL query will correspond to a column in a result table data frame.

q <- paste(sparql_prefix,
  'SELECT ?actor ?movie ?director ?movie_date
   WHERE {
     ?m dc:subject <http://dbpedia.org/resource/Category:American_films> .
     ?m rdfs:label ?movie .
     FILTER(LANG(?movie) = "en")
     ?m dbp:released ?movie_date .
     FILTER(xsd:date(?movie_date) > "1900-01-01"^^xsd:date)
     ?m dbp:starring ?a .
     ?a rdfs:label ?actor .
     FILTER(LANG(?actor) = "en")
     ?m dbp:director ?d .
     ?d rdfs:label ?director .
     FILTER(LANG(?director) = "en")
   }')

res <- SPARQL(endpoint,q,ns=prefix,extra=options)$results
head(res)
> head(res)
                     actor                        movie               director
1 "Pat O'Brien (actor)"@en "Flying High (1931 film)"@en   "Charles Reisner"@en
2             "Tom Mix"@en         "The Feud (film)"@en    "Edward LeSaint"@en
3       "Jaime Pressly"@en      "Ticker (2001 film)"@en       "Albert Pyun"@en
4  "Kevin Gage (actor)"@en      "Ticker (2001 film)"@en       "Albert Pyun"@en
5       "Jimmy Durante"@en        "Roadhouse Nights"@en     "Hobart Henley"@en
6        "Rex Harrison"@en  "Doctor Dolittle (film)"@en "Richard Fleischer"@en
   movie_date
1 -1203379200
2 -1580083200
3  1005609600
4  1005609600
5 -1257724800
6   -64886400
res$movie_date <- as.Date(as.POSIXct(res$movie_date,origin="1970-01-01"))
head(res)
> head(res)
                     actor                        movie               director
1 "Pat O'Brien (actor)"@en "Flying High (1931 film)"@en   "Charles Reisner"@en
2             "Tom Mix"@en         "The Feud (film)"@en    "Edward LeSaint"@en
3       "Jaime Pressly"@en      "Ticker (2001 film)"@en       "Albert Pyun"@en
4  "Kevin Gage (actor)"@en      "Ticker (2001 film)"@en       "Albert Pyun"@en
5       "Jimmy Durante"@en        "Roadhouse Nights"@en     "Hobart Henley"@en
6        "Rex Harrison"@en  "Doctor Dolittle (film)"@en "Richard Fleischer"@en
  movie_date
1 1931-11-14
2 1919-12-07
3 2001-11-13
4 2001-11-13
5 1930-02-23
6 1967-12-12
> 
# write the data somewhere else
write.csv(res, file="hollywood_movie_sna_data.csv")
getwd()
actor_movie_matrix[849:853,3:7]  # just a little sample of the matrix
> actor_movie_matrix[849:853,3:7]  # just a little sample of the matrix
                                       
                                        "Mission: Impossible <e2>\u0080<93> Ghost Protocol"@en "The Hunger Games: Mockingjay <e2>\u0080<93> Part 2"@en
  "William Russell (American actor)"@en                                                      0                                                       0
  "William Schallert"@en                                                                     0                                                       0
  "Willow Shields"@en                                                                        0                                                       1
  "Woody Harrelson"@en                                                                       0                                                       1
  "Wynne Gibson"@en                                                                          0                                                       0
                                       
                                        "The Hunger Games: Mockingjay <e2>\u0080<93> Part 1"@en "Species <e2>\u0080<93> The Awakening"@en "$1,000 Reward"@en
  "William Russell (American actor)"@en                                                       0                                         0                  0
  "William Schallert"@en                                                                      0                                         0                  0
  "Willow Shields"@en                                                                         1                                         0                  0
  "Woody Harrelson"@en                                                                        1                                         0                  0
  "Wynne Gibson"@en                                                                           0                                         0                  0
> 
hollywood_social_network_analysis.txt · Last modified: 2019/12/11 07:39 by hkimscil