Part III: Building Predictive Models (30 minutes)

Introduction to machine learning in R

Brief overview of machine learning: Machine learning in R involves using statistical techniques to enable computers to improve at tasks with experience. It encompasses a variety of techniques for classification, regression, clustering, and more.

# Load necessary libraries
#install.packages("caret")
library(caret)
#> Loading required package: ggplot2
#> Loading required package: lattice


# Example: Splitting a dataset into training and testing sets
data(iris)
set.seed(123) # Setting seed for reproducibility
trainingIndex <- createDataPartition(iris$Species, p = 0.8, list = FALSE)
trainingData <- iris[trainingIndex, ]
testingData <- iris[-trainingIndex, ]

Exercise:

Load a different dataset and partition it into training and testing sets.

Creating predictive models with caret


# Example: Building a predictive model for the iris dataset
model <- train(Species ~ ., data = trainingData, method = "rpart")
print(model)
#> CART 
#> 
#> 120 samples
#>   4 predictor
#>   3 classes: 'setosa', 'versicolor', 'virginica' 
#> 
#> No pre-processing
#> Resampling: Bootstrapped (25 reps) 
#> Summary of sample sizes: 120, 120, 120, 120, 120, 120, ... 
#> Resampling results across tuning parameters:
#> 
#>   cp    Accuracy   Kappa    
#>   0.00  0.9398492  0.9086993
#>   0.45  0.7426390  0.6253355
#>   0.50  0.5557896  0.3665192
#> 
#> Accuracy was used to select the optimal model using
#>  the largest value.
#> The final value used for the model was cp = 0.

# Predicting using the model
predictions <- predict(model, testingData)
confusionMatrix(predictions, testingData$Species)
#> Confusion Matrix and Statistics
#> 
#>             Reference
#> Prediction   setosa versicolor virginica
#>   setosa         10          0         0
#>   versicolor      0         10         2
#>   virginica       0          0         8
#> 
#> Overall Statistics
#>                                           
#>                Accuracy : 0.9333          
#>                  95% CI : (0.7793, 0.9918)
#>     No Information Rate : 0.3333          
#>     P-Value [Acc > NIR] : 8.747e-12       
#>                                           
#>                   Kappa : 0.9             
#>                                           
#>  Mcnemar's Test P-Value : NA              
#> 
#> Statistics by Class:
#> 
#>                      Class: setosa Class: versicolor
#> Sensitivity                 1.0000            1.0000
#> Specificity                 1.0000            0.9000
#> Pos Pred Value              1.0000            0.8333
#> Neg Pred Value              1.0000            1.0000
#> Prevalence                  0.3333            0.3333
#> Detection Rate              0.3333            0.3333
#> Detection Prevalence        0.3333            0.4000
#> Balanced Accuracy           1.0000            0.9500
#>                      Class: virginica
#> Sensitivity                    0.8000
#> Specificity                    1.0000
#> Pos Pred Value                 1.0000
#> Neg Pred Value                 0.9091
#> Prevalence                     0.3333
#> Detection Rate                 0.2667
#> Detection Prevalence           0.2667
#> Balanced Accuracy              0.9000

Exercise: 2. Build a predictive model for another dataset and evaluate its performance.



<!--chapter:end:13-Advanced-R-Part3.Rmd-->

# Part IV: Interactive Dashboards with Shiny (30 minutes) {-}
## Introduction to Shiny for building web-based data dashboards {-}

Shiny is an R package that makes it easy to build interactive web applications (apps) straight from R. It allows you to turn analyses into interactive web applications without requiring HTML, CSS, or JavaScript knowledge.


```r
# Load the Shiny package
#install.packages("shiny")
library(shiny)

The basic structure of a Shiny app involves two main parts:

A user interface (UI) script, which controls the layout and appearance of the app.
A server script, which contains the instructions to build and rebuild the app based on user input.

Creating a simple Shiny app

UI Component: The UI has a sliderInput for selecting the mpg range and a tableOutput to display the filtered data.

Server Logic: The reactive function creates a reactive subset of mtcars based on the selected mpg range. The renderTable function then renders this filtered data as a table in the main panel.

Running the App: As with any Shiny app, shinyApp(ui = ui, server = server) runs the app.


# Example: A simple Shiny app for displaying a plot

# Define UI
ui <- fluidPage(
  titlePanel("Simple Shiny App"),
  sidebarLayout(
    sidebarPanel(
      sliderInput("num", "Number of bins:", 
                  min = 1, max = 50, value = 30)
    ),
    mainPanel(
       plotOutput("distPlot")
    )
  )
)

# Define server logic
server <- function(input, output) {
  output$distPlot <- renderPlot({
    x <- faithful$eruptions
    bins <- seq(min(x), max(x), length.out = input$num + 1)
    hist(x, breaks = bins, col = 'darkgray', border = 'white')
  })
}

# Run the application 
shinyApp(ui = ui, server = server)

Shiny applications not supported in static R Markdown documents


# Define UI
ui <- fluidPage(
  titlePanel("Data Filtering App"),
  sidebarLayout(
    sidebarPanel(
      sliderInput("mpgRange", "Miles per Gallon (mpg):",
                  min = min(mtcars$mpg), max = max(mtcars$mpg),
                  value = c(min(mtcars$mpg), max(mtcars$mpg))
      )
    ),
    mainPanel(
      tableOutput("filteredData")
    )
  )
)

# Define server logic
server <- function(input, output) {
  filteredData <- reactive({
    mtcars[mtcars$mpg >= input$mpgRange[1] & mtcars$mpg <= input$mpgRange[2], ]
  })

  output$filteredData <- renderTable({
    filteredData()
  })
}

# Run the application 
shinyApp(ui = ui, server = server)

Shiny applications not supported in static R Markdown documents

Exercise:

Modify the example Shiny app to include a dataset of your choice and create a different type of plot.
Add additional input options, like checkboxes or dropdown menus, to manipulate the plot.

Part II: Text Data Processing (30 minutes)

Workshop 1