December 11, 2018

First Step: CSV Importing

Since the tasks I plan to use R for in research center around spatial point pattern analysis, I have decided to use a limited amount of data from my lab to do a very, very basic spatial and statistical analysis.

tfh1 <- read.csv("T1_Tfh_f1_ki67-.csv", stringsAsFactors = FALSE)
bc1 <- read.csv("export_T1_f1_Ki67+.csv", stringsAsFactors = FALSE)
tfh2 <- read.csv("T1_Tfh_f2_ki67-.csv", stringsAsFactors = FALSE)
bc2 <- read.csv("export_T1_f2_Ki67+.csv", stringsAsFactors = FALSE)
tfh3 <- read.csv("T1_Tfh_f3_ki67-.csv", stringsAsFactors = FALSE)
bc3 <- read.csv("export_T1_f3_Ki67+.csv", stringsAsFactors = FALSE)
wranges <- read.csv("fwindowsizes.csv", stringsAsFactors= FALSE)

Background Information

The data sets I will import are .csv files. Each pair corresponds to a different location in this sample. The .csv files consist of lists of cells belonging to a certain population and has recorded several variable values for each cell, of particular importance, a 'Position X' and a 'Position Y' variable. There is also one more file containing the min and max X/Y values for each observational window.

Loading tidyverse/spatstat and Data Preparation

The next step is simplifying each variable to just the X/Y columns for each population, which is made easy using tidyverse then converting them into a form usable by the spatstat package.

library(tidyverse)
library(spatstat)
w1 <- as.vector(wranges[1,2:5], mode ="numeric")
w2 <- as.vector(wranges[2,2:5], mode ="numeric")
w3 <- as.vector(wranges[3,2:5], mode ="numeric")
tfhp1 <- 
  tfh1 %>%
  select(Position.X, Position.Y) %>%
  as.ppp(tfh1,W=w1)
tfhp2 <-
  tfh2 %>%
  select(Position.X, Position.Y) %>%
  as.ppp(tfh2,W=w2)
tfhp3 <-
  tfh3 %>%
  select(Position.X, Position.Y) %>%
  as.ppp(tfh3,W=w3)

Loading tidyverse/spatstat and Data Preparation

bcp1 <-
  bc1 %>%
  select(Position.X, Position.Y) %>%
  as.ppp(bc1,W=w1)
bcp2 <-
  bc2 %>%
  select(Position.X, Position.Y) %>%
  as.ppp(bc2,W=w2)
bcp3 <-
  bc3 %>%
  select(Position.X, Position.Y) %>%
  as.ppp(bc3,W=w3)

Manipulating/Processing Data - Calculating General Intensity

Next, I will do a very (I must emphasize) extremely basic density calculation of these various provided point patterns using the spatstat package, which has a function for calculating the average intensity (points per square unit area) of an observational window, and then combine this data all into one table to make graphing easier

tfh <- c(intensity(tfhp1), intensity(tfhp2), intensity(tfhp3))
bc <- c(intensity(bcp1), intensity(bcp2), intensity(bcp3))
region <- c("f1","f2","f3")
intframe <- data.frame(region,tfh, bc) 
intframer <-
  intframe %>%
  select(region,tfh,bc) %>%
  gather(key="population",value="intensity",-region)

Graphing The Average Intensities - Bar Graph

ggplot(intframer, aes(x = population, y = intensity, fill=region)) +
  geom_bar(stat="identity",position=position_dodge())

Graphing The Average Intensities - Box Plot

ggplot(intframer, aes(x = population, y = intensity)) + geom_boxplot()

Testing for Normal Distribution

Next, I will see if these 3 points per population can be considered normally distributed or not using the Shapiro Test. The null hypothesis is that they are normally distributed. It appears that we cannot reject the null, so going forward I must assume they are normally distributed.

shapiro.test(bc)
## 
##  Shapiro-Wilk normality test
## 
## data:  bc
## W = 0.99986, p-value = 0.9774
shapiro.test(tfh)
## 
##  Shapiro-Wilk normality test
## 
## data:  tfh
## W = 0.99728, p-value = 0.9003

Statistical Comparison

Now I will do a basic statistical test to determine if there is a statistically significant difference in population density between B Cells and T cells in the recorded locations of this sample. Since I am assuming they are normally distributed, I will use the two-sample t-test.

t.test(bc,tfh,alternative="two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  bc and tfh
## t = 0.25856, df = 2.2737, p-value = 0.8175
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.001888279  0.002160729
## sample estimates:
##    mean of x    mean of y 
## 0.0009900843 0.0008538591

As can be seen, a statistically significant difference in average cell population density cannot be determined with this sample size and thus the null cannot be rejected.