What is data wrangling? Intro, Motivation, Outline, Setup – Pt. 1 Data Wrangling Introduction
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. These videos introduce you to these tools. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw 01:44 Intro and what’s covered Ground Rules 02:40 What’s a tibble 04:50 Use View 05:25 The Pipe operator: 07:20 What do I mean by data wrangling? Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM /00:48 Goal 1 Making your data suitable for R /01:40 tidyr “Tidy” Data introduced and motivated /08:15 tidyr::gather /12:38 tidyr::spread /15:30 tidyr::unite /15:30 tidyr::separate Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U 00.40 setup /02:00 dplyr::select /03:40 dplyr::filter /05:05 dplyr::mutate /07:05 dplyr::summarise /08:30 dplyr::arrange /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation) /11:45 dplyr::group_by /15:00 dplyr::group_by Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg Combining two datasets together
/00.42 dplyr::bind_cols /01:27 dplyr::bind_rows /01:42 Set operations dplyr::union, dplyr::intersect, dplyr::set_diff /02:15 joining data dplyr::left_join, dplyr::inner_join, dplyr::right_join, dplyr::full_join, Cheatsheets: https://www.rstudio.com/resources/cheatsheets/ Documentation: tidyr docs: tidyr.tidyverse.org/reference/
tidyr vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html dplyr docs: http://dplyr.tidyverse.org/reference/ dplyr one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html dplyr two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html New York Times “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”, By STEVE LOHRAUG. 17, 2014 https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
cheatsheets
dplyr
rstudio
tibble
tidyr
tidyverse
tidyverse.org
Grammar of Data Manipulation
Data Science
Data Wrangling
Applied Statistics
Statistics
RStudio
Data Manipulation