Regression Exercise #1

The following exercises will help illustrate concepts on how two variables are related, as demonstrated by their covariance and correlation. Feel free to run your own R code on your local machine, or utilize the code blocks on this site. You can use Ctrl / + ↵ Enter to evaluate all the code in a given chunk.

Creating random variables


R let’s you generate your own random variables from common distribution functions like the normal, binomial, poisson, and uniform distributions we discussed in class.

For example, the following code generates 3 realizations/random samples from a normal distribution with mean 45 (mean) and standard deviation (sd) 19. The function rnorm() takes in these arguments, along with the number of draws you want (n) and returns n random draws from the normal distribution you specify.

Next, generate a vector of 100 realizations from your favorite distribution and save the results to a vector (call it whatever you want!). Then, plot a histograms of all 100 realizations. This should resemble the distribution you chose to a degree. Feel free to increase the number of realizations (say from n = 100 to n = 1000) and see what happens to the histogram.

Covariance, correlation, and independence


In R we can calculate the correlation and covariance using the cor()and cov() functions. For example, the code below draws 1000 realizations from a normal distribution with mean 10 and standard deviation 2, and another 1000 realizations from a normal distribution with mean 8 and standard deviation 3. You can think of these as data on 1000 individuals on two variables: x and y. What do you get when you run this code? Why might that be?

What happens if we redefine y as a function of x? In other words, what happens if y and x are related by some given function \(y = 10 \times x\)? Obtain the correlation and covariance between these two variables with this new definition of y.

To generate two random variables with known covariance and/or correlation we can also simulate a joint distribution, in this case the multivariate normal, using a function from the MASS package. Recall that a package in R is just a set of functions and objects that have been compiled into a single collection. One function from MASS is mvrnorm(), which as it suggests, returns values from the multi-variate normal distribution.

Use the mvrnorm() function to generate 1000 samples/realizations from the multivariate normal distribution. Variable \(X\) should have mean 10 and variable \(Y\) should have mean 15. The variance of \(X\) is 30, and the variance of \(Y\) is 60. Let the covariance between \(X\) and \(Y\) be 40. You can read about the documentation for this function here, which will tell you how to specify the arguments mu, Sigma, and n. Also recall that the correlation \(\rho\) between two variables is given by:

\[ \rho(X, Y) = \frac{\text{Cov}(X,Y)}{\sigma(X) \sigma(Y)}\]

Where \(\text{Cov()}\) is the covariance, and \(\sigma()\) is the standard deviation for each variable. The sd() function returns standard deviation values in R. Print the covariance and correlation matrices between \(X\) and \(Y\), and graph the relationship between these two variables on a 2-dimensional plane. What do you see?

This scatter plot tells us that \(X\) and \(Y\) co-occur with greater frequency around their means. If we want to visualize the multi-variate probability distribution, we need to extend our perspective to 3 dimensions. The plotly package allows us to create really neat interactive plots – check out the one below!

library(MASS)
library(plotly)
library(mvtnorm)
library(ggplot2)
library(dplyr)
library(widgetframe)

# generate data from the multivariate normal
mvnorm1000 <- 
  mvrnorm(
    n = 1000, 
    mu = c(10, 15), 
    Sigma = 
      matrix(c(30, 40, 40, 60), 
             nrow=2, ncol=2, byrow=FALSE)
          )

# compute density surface
dens <- kde2d(mvnorm1000[,1], mvnorm1000[,2])

# plot object
p <-
  plot_ly(x = dens$x,
          y = dens$y,
          z = dens$z) %>% 
  add_surface()

# place plot object in iframe
frameWidget(p, height = 500, width = '100%', )