Using the pbc data set from the last assignment, answer the following questions:

library(survival)
data(pbc)
  1. What is the probability of the null hypothesis that the age of the study participants in normally distributed?

    swtest <- shapiro.test(pbc$age)
    swtest$p.value
    ## [1] 0.008918234
  2. Is the proportion of women in the study what would be expected if particpants were selcted randomly from the us population (females = 50.8% of population). What is the p-value associated with that null hypothesis?

    p0 = 0.508
    ptest <- prop.test(x = sum(pbc$sex=='f'),
                                   n = nrow(pbc),
                                   p = p0)
    ptest.alternative <- prop.test(x = rev(table(pbc$sex)), p=p0)
    # Note, we have to reverse the order, since the table tabulates males first, but we need females first
    ptest$p.value
    ## [1] 5.263938e-56
    ptest.alternative$p.value
    ## [1] 5.263938e-56
  3. Is there a statistically significant difference between the cholesterol levels of men versus women?

    ttest <- t.test(pbc$chol[pbc$sex=='m'], pbc$chol[pbc$sex=='f'])
    ttest1 <- t.test(chol ~ sex, data=pbc)
    ttest$p.value
    ## [1] 0.8129404
    ttest1$p.value
    ## [1] 0.8129404
    wtest <- wilcox.test(pbc$chol[pbc$sex=='m'], pbc$chol[pbc$sex=='f'])
    wtest1 <- wilcox.test(chol ~ sex, data=pbc)
    wtest$p.value
    ## [1] 0.945674
    wtest1$p.value
    ## [1] 0.945674
  4. Is there a statistically significant relationship between sex and disease stage? You will need to use the “table” command to put these columns (pbc[,c(“sex”,“stage”)]) into a contingency table first.

    tab1 <- table(pbc[,c('sex','stage')])
    chitst <- chisq.test(tab1)
    ## Warning in chisq.test(tab1): Chi-squared approximation may be incorrect
    fishertst <- fisher.test(tab1)
    chitst$p.value
    ## [1] 0.8307365
    fishertst$p.value
    ## [1] 0.7765807
  5. Is there a statistically significant correlation between serum albumin and serum cholesterol?

    cortst <- cor.test(pbc$albumin, pbc$chol)
    cortst1 <- cor.test(~ albumin + chol, data=pbc)
    cortst$p.value
    ## [1] 0.2414376
    cortst1$p.value
    ## [1] 0.2414376
  6. Is there a statistically significant correlation between age and cholesterol?

    cortst <- cor.test(~age +chol, data=pbc)
    cortst1 <- cor.test(pbc$age, pbc$chol)
    cortst$p.value
    ## [1] 0.007786129
    cortst1$p.value
    ## [1] 0.007786129
  7. Did you use parametric or non-parametric statistical tests to answer the previous questions? If challenged by a reviewer, how would you justify your choice?
    Typically one chooses parametric tests unless there is an obvious deviation from the assumptions based on descriptive statistics and figures. Generally one should not test for normality (i.e. whether a distribution follows a normal distribution), since it will affect your overall Type I error rate which will no longer be 0.05 (if that’s what you planned). Moreover, the tests of normality (Shapiro-Wilks or Kolmogorov-Smirnov) have poor statistical power to detect deviations from normality in small to medium sample sizes.

    Most parametric tests, including the t-test and chi-square test, are fairly robust to deviations from normality. However if you think the normality assumptions are not met, go ahead and perform the non-parametric versions of your tests, with the knowledge that except for substantial deviations from normality, the statistical power of these tests are lower than the statistical power of the corresponding parametric tests. The Wilcox and Fisher tests are commonly used in small sample situations.
  8. When deciding whether your results were statistically significant, what did you use as your threshold p-value? Why?

    A threshold of 0.05 is typically chosen for statistical significance. However, in many cases a different threshold is used. For example, in GWAS studies, a typical threshold of 5 x 10-8 is used to correct for Type I error.

    Generally a p-value should be reported without commenting on statistical significance. The p-value that would meet statistical significance can be decided by the reader based on their understanding of the problem. The traditional thresholds of 0.05, 0.01 and 0.001 are artificial and reflect merely the inability of the founders of statistics to compute critical values of standard distributions for all possible Type I error levels. Current computers make this moot.

Some further notes

  1. The general rubric for hypothesis tests in R is that the p-value can te extracted as testobj$p.value, where the test that is run is stored in testobj. For example,

    library(survival)
    data(pbc)
    ttst <- t.test(age ~ sex, data=pbc)
    ttst$p.value

    There is a package called broom which makes extracting information from hypothesis tests (and models) a lot easier. This package has a function tidy that converts the results of a test into a data.frame. For example,

    if(!('broom' %in% installed.packages()[,'Package'])){
      install.packages('broom')
    }
    library(broom)
    tidy(ttst)
    ## # A tibble: 1 x 10
    ##   estimate estimate1 estimate2 statistic p.value parameter conf.low
    ##      <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>
    ## 1     5.55      55.7      50.2      3.20 0.00236      52.2     2.07
    ## # ... with 3 more variables: conf.high <dbl>, method <chr>,
    ## #   alternative <chr>
  2. Specially for two-sample tests with small sample sizes, the correct test to do is often the permutation test. This test assumes that under the null hypothesis there is no difference in means between the two groups. In that case, under the null hypothesis, we should be able to shuffle the group assignments and see no real difference in the means. To implement this to test, for example, whether in the pbc data, age is different between males and females, we can do the following:

    obs_difference <-  mean(pbc$age[pbc$sex=='m']) - mean(pbc$age[pbc$sex=='f'])
    n <- nrow(pbc)
    n_males <- sum(pbc$sex=='m')
    combined_data <- pbc$age
    perm_means <- rep(0,1000) # initialize storage of permuation null distribution
    set.seed(12) # set seed for the random numbers
    for(i in 1:1000){ # 100 interations to generate the null distribution
      index_male <- sample(1:n, size=n_males, replace = FALSE) # which elemnts are deemed male under H0
      index_female <- setdiff(1:n, index_male) # which elements are deemed female under H0
      perm_means[i] = mean(pbc$age[index_male])-mean(pbc$age[index_female])
    }
    hist(perm_means, main = 'Permutation null distribution')

    p_value = mean(abs(perm_means) > abs(obs_difference)) # Two-sided alternative
    print(p_value)
    ## [1] 0.001

    The permutation test can also be done by the coin package:

    if(!('coin' %in% installed.packages()[,'Package'])){
      install.packages('coin')
    }
    library(coin)
    # Permutation test based on 1000 samples
    wilcox_test(age ~ sex, data=pbc, distribution=approximate(B=1000))
    ## 
    ##  Approximative Wilcoxon-Mann-Whitney Test
    ## 
    ## data:  age by sex (m, f)
    ## Z = 3.0052, p-value = 0.001
    ## alternative hypothesis: true mu is not equal to 0