[[./|Class page]]
multivariate statistics in R
network analysis in R
* A User’s Guide to Network Analysis in R (Use R!)
* Statistical Analysis of Network Data with R (Use R!) 2014th Edition
[[https://lagunita.stanford.edu]]
[[https://campus.datacamp.com/courses/network-analysis-in-r|Network Analysis in R]] using igraph package -- from Datacamp
[[https://campus.datacamp.com/courses/marketing-analytics-in-r-statistical-modeling/|Marketing analysis in r statistics]] from Datacamp
====== Week01 (Sep 4, 6) ======
===== ideas and concepts =====
Introduction to R and others
- Downloading and Installing R
- [[:the_r_project_for_statistical_computing]]
- [[:r]], [[:r:getting started]]
- Starting R
- Entering Commands
- Exiting from R
- Interrupting R
- Viewing the Supplied Documentation
- Getting Help on a Function
- Searching the Supplied Documentation
- Getting Help on a Package
- Searching the Web for Help
- Finding Relevant Functions and Packages
- Searching the Mailing Lists
- Submitting Questions to the Mailing Lists
using [[:theories]] [[http://commres.net/wiki/research_methods_lecture_note#%EC%BB%A4%EB%AE%A4%EB%8B%88%EC%BC%80%EC%9D%B4%EC%85%98_%EC%97%B0%EA%B5%AC%EB%AC%B8%EC%A0%9C_%EC%A0%9C%EA%B8%B0%EC%99%80_%EA%B0%80%EC%84%A4|연구문제와 가설]] and making [[:hypothesis|hypotheses]]
Installing R
===== Assignment =====
====== Week02 (Sep 11, 13) ======
===== Concepts and ideas =====
Some [[:R:basics|basics]]
- Introduction
- Printing Something
- Setting Variables
- Listing Variables
- Deleting Variables
- Creating a Vector
- Computing Basic Statistics
- Creating Sequences
- Comparing Vectors
- Selecting Vector Elements
- Performing Vector Arithmetic
- Getting Operator Precedence Right
- Defining a Function
- Typing Less and Accomplishing More
- Avoiding Some Common Mistakes
* [[:Research Question]]
* [[:Hypothesis]]
* Educated guess (via theories)
* Difference
* Association
* Variables (vs. ideas, concepts, and constructs)
* [[:Operationalization]]
* [[:Variables]], [[:Types of Variables]]
* see [[http://chohongjoong.com/gnu4/bbs/board.php?bo_table=board02&wr_id=311&sfl=&stx=&sst=wr_datetime&sod=desc&sop=and&page=1|this blog]] written in Korean
* [[:Independent Variable|IV]] 독립변인
* [[:Dependent Variable|DV]] 종속변인
* Control variable 제어변인
* Mediating (Intervening) variable 매개변인
===== Assignment =====
====== Week03 (Sep 18, 20) ======
===== Activities =====
* Grouping. See [[./Group]] page
* Group discussion on group works
===== Concepts and ideas =====
You __should be knoweldgeable__ about [[:research question]] and [[:hypothesis]] building. However, we will be deal with the issue in the class. Please read the two and [[:research_methods_lecture_note#커뮤니케이션_연구문제_제기와_가설]] individually. The materials will be on quizzes.
[[:r:navigating|Navigating]] software
- Introduction
- Getting and Setting the Working Directory
- Saving Your Workspace
- Viewing Your Command History
- Saving the Result of the Previous Command
- Displaying the Search Path
- Accessing the Functions in a Package
- Accessing Built-in Datasets
- Viewing the List of Installed Packages
- Installing Packages from CRAN
- Setting a Default CRAN Mirror
- Suppressing the Startup Message
- Running a Script
- Running a Batch Script
- Getting and Setting Environment Variables
- Locating the R Home Directory
- Customizing R
[[:r:input_output|Input and output]]
- Introduction
- Entering Data from the Keyboard
- Printing Fewer Digits (or More Digits)
- Redirecting Output to a File
- Listing Files
- Dealing with “Cannot Open File” in Windows
- Reading Fixed-Width Records
- Reading Tabular Data Files
- Reading from CSV Files
- Writing to CSV Files
- Reading Tabular or CSV Data from the Web
- Reading Data from HTML Tables
- Reading Files with a Complex Structure
- Reading from MySQL Databases
- Saving and Transporting Objects
===== Assignment =====
Assignment for all
* Read [[:research_methods_lecture_note#커뮤니케이션_연구문제_제기와_가설]]
* Read [[:research question]]
* Read [[:hypothesis]]
Group assignment
* Hypothesis 문서의 [[:hypothesis#예_1]]의 "제3자 효과이론과 침묵의 나선이론 연계성" 논문을 읽고 가설을 기술하시오.
* 각 가설의 독립변인(Independent variables), 종속변인 (dependent variabless) 등을 나열하시오.
* 이 논문에 사용된 이론은 무엇인지 기술하고 설명하시오.
====== Week04 (Sep 25, 27) ======
===== Class Activity =====
* 가설 만들어 보기
* No need to read [[:theories]]
* the third person effect
* [[:Spiral of Silence]]
* [[:cognitive dissonance]]
* Read [[:hypothesis]]
* [[http://behavioralsciencewriting.blogspot.kr/2011/09/how-to-write-hypothesis.html|how to write hypothesis]] at behavioral science writing.
* One sample hypothesis [[http://www.socialresearchmethods.net/kb/hypothes.php|Hypothesis]] at www.socialresearchmethods.net
===== Concepts and ideas =====
[[:r:Data Structures]]
- Introduction
- Appending Data to a Vector
- Inserting Data into a Vector
- Understanding the Recycling Rule
- Creating a Factor (Categorical Variable)
- Combining Multiple Vectors into One Vector and a Factor
- Creating a List
- Selecting List Elements by Position
- Selecting List Elements by Name
- Building a Name/Value Association List
- Removing an Element from a List
- Flatten a List into a Vector
- Removing NULL Elements from a List
- Removing List Elements Using a Condition
- Initializing a Matrix
- Performing Matrix Operations
- Giving Descriptive Names to the Rows and Columns of a Matrix
- Selecting One Row or Column from a Matrix
- Initializing a Data Frame from Column Data
- Initializing a Data Frame from Row Data
- Appending Rows to a Data Frame
- Preallocating a Data Frame
- Selecting Data Frame Columns by Position
- Selecting Data Frame Columns by Name
- Selecting Rows and Columns More Easily
- Changing the Names of Data Frame Columns
- Editing a Data Frame
- Removing NAs from a Data Frame
- Excluding Columns by Name
- Combining Two Data Frames
- Merging Data Frames by Common Column
- Accessing Data Frame Contents More Easily
- Converting One Atomic Value into Another
- Converting One Structured Data Type into Another
[[:r:data transformations]]
===== Assignment =====
ga04.making.hypothesis 가설 연습 ajoubb
* 첫번째, R(rstudio사용)에서 default로 구할 수 있는 mtcars 데이터를 이용하여 t-test와 anova test를 할 수 있는 가설을 만들고, R에서 분석해 보세요.
* 가설에 대해서는 [[:hypothesis testing]] 문서를 참조하시기 바랍니다.
* t-test는 [[:t-test]]를 참조하시기 바랍니다.
* 4가지 종류의 t-test 중에서 mtcars 데이터의 경우는 몇 번째 것을 사용해야 하는가에 대해서 확인하세요.
* anova에 대해서는 [[:anova]] 문서를 참조하세요.
* R에서의 분석은 각각 t.test와 aov 펑션을 이용해야 합니다.
* 두번째, 신문에서의 여론조사 결과에 나오는 error of margin에 대해서 확인해보시기 바랍니다.
* 여론조사 결과가 내용인 신문기사 2개를 고릅니다.
* [[http://www.realmeter.net/%ea%b3%a0%ec%9c%84%ec%a7%81-%ec%9e%90%eb%85%80-%ec%9e%85%ec%8b%9c%eb%b9%84%eb%a6%ac-%ec%a0%84%ec%88%98%ec%a1%b0%ec%82%ac-%ec%b0%ac%ec%84%b175vs%eb%b0%98%eb%8c%8018/|예]]
* 일반적인 se값은 아래와 같이 구합니다.
* $ \displaystyle \sigma_{\hat{p}} = \sqrt{\frac{p*q}{n}} , \;\;\; q = (1 - p) $
* $ p = .752 $ = 75.2%
* 파일을 upload한다면 파일이름은
* ga04.making.hypothesis.그룹이름.ext 과 같이 저장한 후에 올리시기 바랍니다.
* 위에서 "그룹이름"과 "ext"은 그룹에 따라서 바꾸야 합니다.
* 3조의 경우는 "그룹이름"대신 03을 사용합니다.
* ms word파일로 저장을 했다면 파일extension으로 "docx"가 생길겁니다. text파일로 저장을 했다면 "txt"가 생길 것입니다.
* 따라서 위의 예에 따르면 과제 이름은
* ga04.making.hypothesis.03.txt와 같을 겁니다.
====== Week05 (Oct 2, 4) ======
===== ideas and concepts =====
[[:r:probability]]
[[:r:General Statistics]]
==== t.test: mtcars ====
> mdata <- split(mtcars$mpg, mtcars$am)
> mdata
$`0`
[1] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[13] 10.4 14.7 21.5 15.5 15.2 13.3 19.2
$`1`
[1] 21.0 21.0 22.8 32.4 30.4 33.9 27.3 26.0 30.4 15.8 19.7 15.0
[13] 21.4
> stack(mdata)
values ind
1 21.4 0
2 18.7 0
3 18.1 0
4 14.3 0
5 24.4 0
6 22.8 0
7 19.2 0
8 17.8 0
9 16.4 0
10 17.3 0
11 15.2 0
12 10.4 0
13 10.4 0
14 14.7 0
15 21.5 0
16 15.5 0
17 15.2 0
18 13.3 0
19 19.2 0
20 21.0 1
21 21.0 1
22 22.8 1
23 32.4 1
24 30.4 1
25 33.9 1
26 27.3 1
27 26.0 1
28 30.4 1
29 15.8 1
30 19.7 1
31 15.0 1
32 21.4 1
> mdata
$`0`
[1] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[13] 10.4 14.7 21.5 15.5 15.2 13.3 19.2
$`1`
[1] 21.0 21.0 22.8 32.4 30.4 33.9 27.3 26.0 30.4 15.8 19.7 15.0
[13] 21.4
> t.test(mpg~am, data=mtcars)
Welch Two Sample t-test
data: mpg by am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.280194 -3.209684
sample estimates:
mean in group 0 mean in group 1
17.14737 24.39231
> t.test(mpg~am, data=mtcars, var.equal=T)
Two Sample t-test
data: mpg by am
t = -4.1061, df = 30, p-value = 0.000285
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-10.84837 -3.64151
sample estimates:
mean in group 0 mean in group 1
17.14737 24.39231
> m1 <- mdata[[1]]
> m2 <- mdata[[2]]
> m1
[1] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[13] 10.4 14.7 21.5 15.5 15.2 13.3 19.2
> m2
[1] 21.0 21.0 22.8 32.4 30.4 33.9 27.3 26.0 30.4 15.8 19.7 15.0
[13] 21.4
> m1.var <- var(m1)
> m2.var <- var(m2)
> m1.n <- length(m1)
> m2.n <- length(m2)
> m1.df <- length(m1)-1
> m2.df <- length(m2)-1
> m1.ss <- m1.var*m1.df
> m2.ss <- m2.var*m2.df
> m1.ss
[1] 264.5874
> m2.ss
[1] 456.3092
> m12.ss <- m1.ss+m2.ss
> m12.ss
[1] 720.8966
> m12.df <- m1.df+m2.df
> pv <- m12.ss/m12.df
> pv
[1] 24.02989
> pv/m1.n
[1] 1.264731
> pv/m2.n
[1] 1.848453
> m.se <- sqrt((pv/m1.n)+(pv/m2.n))
> m.se
[1] 1.764422
> m1.m <- mean(m1)
> m2.m <- mean(m2)
> m.tvalue <- (m1.m-m2.m)/m.se
> m.tvalue
[1] -4.106127
> t.test(mpg~am, data=mtcars, var.equal=T)
Two Sample t-test
data: mpg by am
t = -4.1061, df = 30, p-value = 0.000285
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-10.84837 -3.64151
sample estimates:
mean in group 0 mean in group 1
17.14737 24.39231
==== anova: mtcars ====
stats4each = function(x,y) {
meani <- tapply(x,y,mean)
vari <- tapply(x,y,var)
ni <- tapply(x,y,length)
dfi <- tapply(x,y,length)-1
ssi <- tapply(x,y,var)*(tapply(x,y,length)-1)
out <- rbind(meani,vari,ni,dfi,ssi)
return(out)
}
library(MASS)
tempd <- iris
x <- tempd$Species
y <- tempd$Sepal.Width
tempd <- mtcars
x <- tempd$gear
y <- tempd$mpg
tempd <- mtcars
x <- tempd$am
y <- tempd$mpg
x <- factor(x)
dfbetween <- nlevels(x)-1
stats <- stats4each(y, x)
stats
sswithin <- sum(stats[5,])
sstotal <- var(y)*(length(y)-1)
ssbetween <- sstotal-sswithin
round(sswithin,2)
round(ssbetween,2)
round(sstotal,2)
dfwithin <- sum(stats[4,])
dftotal <- length(y)-1
dfwithin
dfbetween
dftotal
mswithin <- sswithin / dfwithin
msbetween <- ssbetween / dfbetween
mstotal <- sstotal / dftotal
round(mswithin,2)
round(msbetween,2)
round(mstotal,2)
fval <- round(msbetween/mswithin,2)
fval
siglevel <- pf(q=fval, df1=dfbetween, df2=dfwithin, lower.tail=FALSE)
siglevel
mod <- aov(y~x, data=tempd)
summary(mod)
==== cor ====
attach(mtcars)
cor(mpg, hp)
mycor <- cov(mpg,hp)/(sd(mpg)*sd(hp))
mycor
sp <- cov(mpg,hp)*(length(mtcars$hp)-1)
ssx <- var(mpg)*(length(mtcars$mpg)-1)
ssy <- var(hp)*(length(mtcars$hp)-1)
mycor2 <- sp/sqrt(ssx*ssy)
mycor2
mycor2 == mycor
mycor == cor(mpg,hp)
mycor2 == cor(mpg,hp)
===== Assignment =====
====== Week06 (Oct 9, 11) ======
===== ideas and concepts =====
[[:correlation]]
[[:regression]]
[[:multiple regression]]
* [[:r:correlation|correlation in r]]
* [[:r:multiple regression|multiple regression in r]]
[[:Partial and semipartial correlation]]
[[:using dummy variables]]
[[:Statistical Regression Methods]]
[[:Sequential Regression]]
===== Assignment =====
- Public opinion in online environments ((refer to {{:public.opinion.theories.introduction.pdf}} ))
* [[:Spiral of Silence]]
* [[:Pluralistic Ignorance]]
* [[:The Third Person Effect]]
* etc. 여론형성과 관련된 사회학적 혹은 사회심리학적 이론을 찾아보고 소개하기, 예로 위의 세가지. 얼마전 사회현상을 어떻게 설명하면 좋을까에 대해서 논의정리하기? 정확한 온라인 환경에서의 여론파악을 위해서 어떤 것이 필요할까?
* 혹은 다른 문제에 대해서 (. . . 조에 따른 . . .)
- Hypotheses
* Multiple regression hypotheses.
* Google Survey Questions
====== Week07 (Oct 16, 18) ======
===== ideas and concepts =====
===== Assignment =====
====== Week08 (Oct 23, 25) ======
__**Mid-term period**__
===== Quiz the first one =====
* Lecture materials + textbook
* Textbook: r cookbook: textbook과 관련해서는 예상되는 아웃풋, 아웃풋을 얻기위한 명령어, 명령어(function)에 사용되는 옵션이 의미하는 것 등에 대한 사지선다 혹은 단답식 질문이 나옵니다. 펑션의 옵션사용 등과 같은 정확한 것에 대해서는 질문이 나오지 않습니다.
* 예
* one sample t-test를 하기 위한 명령어를 쓰시오 (x)
* t.test(sample, mu=100)에서 mu는 무엇을 의미하는가? (o)
* 다음 중 sapply의 아웃풋 모양으로 적당한 것은? 등등
* [[:The r project for statistical computing]]
* [[:r:Getting started]]
* [[:r:Basics]]
* [[:r:Navigating]]
* [[:r:Input output]]
* [[:r:Data structures]]
* [[:r:Data transformations]]
* Lecture content
* [[:Hypothesis]],
* [[:Research question]],
* [[:Research methods lecture note#커뮤니케이션_연구문제_제기와_가설|커뮤니케이션 연구문제 제기와 가설]] 부분만
* [[:Operationalization]],
* [[:Variables]],
* [[:Types of variables]]
* [[:Hypothesis testing]]
* [[:T-test]]
* 정확한 t test 공식등은 외울 필요가 없습니다. (제공됩니다).
* 간단한 t test 계산을 요구할 수 있습니다.
* ANOVA도 마찬가지입니다.
* [[:ANOVA]]
====== Week09 (Oct 30, Nov 1) ======
===== ideas and concepts =====
[[:correlation]]
[[:regression]]
[[:multiple regression]]
* [[:r:correlation|correlation in r]]
* [[:r:multiple regression|multiple regression in r]]
[[:Partial and semipartial correlation]]
[[:using dummy variables]]
[[:Statistical Regression Methods]]
[[:Sequential Regression]]
===== Activity =====
[[c/ma/2019/Multiple Regression Exercise]]
===== Assignment =====
====== Week10 (Nov 6, 8) ======
===== ideas and concepts =====
[[:factor analysis]]
===== Assignment =====
====== Week11 (Nov 13, 15) ======
===== ideas and concepts =====
===== Assignment =====
====== Week12 (Nov 20, 22) ======
===== ideas and concepts =====
===== Assignment =====
[[factor analysis assignment]]
====== Week13 (Nov 27, 29) ======
===== ideas and concepts =====
[[:social network analysis]]
[[:r:social network analysis tutorial]]
[[:r:social network analysis|sna in r]]
[[:sna_eg_stanford|Stanford University egs.]]
===== announcement =====
Quiz 2 (on Friday Dec. the 6th) covers:
* [[:correlation]]
* [[:regression]]
* [[:multiple regression]]
* [[:partial and semipartial correlation]]
* [[:using dummy variables]]
* [[:factor analysis]]
Some R outputs will be used to ask the related concepts and ideas (the above).
===== Assignment =====
====== Week14 (Dec 4, 6) ======
Group Presentation
====== Week15 (Dec 11, 13) ======
[[./assignment week15]]
====== Week16 (June 18, 20) ======
__**Final-term**__ covers:
correlation
regression
multiple regression
partial and semipartial correlation
using dummy variables
factor analysis
[[:social network analysis]]
[[:r:social network analysis tutorial|sna tutorial]]
[[:r:social network analysis|sna in r]]
[[:sna_eg_stanford:lab06|SNA e.g. lab 06]]
Some R outputs will be used to ask the related concepts and ideas (the above).