class: left, middle, inverse, title-slide # Sharpening the Tools in Your Data Science Toolbox
### Jessica Minnier, PhD
OHSU-PSU School of Public Health
Knight Cancer Institute Biostatistics Shared Resource
Oregon Health & Science University
###
Joint Statistical Meetings
, July 29, 2019
bit.ly/jmin-jsm19
datapointier
--- layout: true <div class="my-footer"><span>bit.ly/jmin-jsm19</span></div> --- class: inverse, middle # Who am I? - Use R every day, git often, unix sometimes, python rarely - Teach intro to R, R markdown, intro to stats for data science, R Shiny (and Math/Stats) - Collaborative statistician (mostly applied work, no PhD students, no Postdocs, some MS students and staff) - Involved in: organizing Cascadia R conf, PDX R Meetup, R Ladies, BioData Club --- # Statistics is changing A non-random sample of buzzwords: - Big Data - Data Science - Machine Learning - Reproducible Coding - Version Control - Interactive Dashboards --- <center><img src="img/asa_statement_datasci.png" width="80%"/></center> > Sustained and substantial collaborative effort with researchers with expertise in data organization and in the flow and distribution of computation. <font color="red">Statisticians must engage them, learn from them</font>, teach them, and work with them. > Statistical <font color="red">education and training must continue to evolve</font>—the next generation of statistical professionals needs a <font color="red">broader skill set</font> and must be more able to engage with database and distributed systems experts. > The next generation must include more researchers with skills that <font color="red">cross the traditional boundaries of statistics</font>, databases, and distributed systems; there will be an ever-increasing demand for such “multi-lingual” experts. --- # Programming itself is changing - Data Science is programming intensive + Big data and complex models (machine learning beyond regression) - Beyond SAS + R is easier to learn than ever before (see: tidyverse) + R packages are easier to make and distribute + R is not "slow" anymore + [FDA allows R](https://blog.revolutionanalytics.com/2012/06/fda-r-ok.html) for clinical trials - Python is most common in data science + but not statistics! --- ## Data Science Jobs: Python, SQL, .... R .pull-left[ <center><img src="img/r4stats_popularity_jobs.png" width="100%"/></center> [R4stats - Robert A. Muenchen](http://r4stats.com/articles/popularity/) ] .pull-right[ <center><img src="img/r4stats_popularity_jobs_pct.png" width="100%"/></center> ] --- ## By contrast, scholarly articles: SPSS, R, SAS .pull-left[ <center><img src="img/r4stats_popularity_articles.png" width="90%"/></center> ] .pull-right[ <center><img src="img/r4stats_popularity_articles_trend.png" width="98%"/></center> [R4stats - Robert A. Muenchen](http://r4stats.com/articles/popularity/) ] --- .pull-left-40[ ## How to keep up? Learning takes time, dedication, courage. ] .pull-right-60[ <center><img src="img/draw_owl.png" width="100%"/></center> ] --- ## Highlights of my education in programming .pull-left[ CS courses in college (C, Java): <img src="img/college_cs2.png" width="100%"align="top"/> My PhD advisor wrote a *ton* of code and functions, shared it all with me. <img src="img/library_gradschool.png" width="100%"align="top"/> ] .pull-right[ Postdoc (learned git/github): <img src="img/softwarecarp.png" width="100%"align="top"/> First year Assistant Professor (R Markdown, Shiny were all new & exciting!) <img src="img/user2014.png" width="100%"align="top"/> ] --- ## Learning while teaching <img src="img/pres1.png" width="35%"align="top"/><img src="img/pres2.png" width="30%"align="top"/><img src="img/pres3.png" width="35%"align="top"/> <img src="img/pres4.png" width="28%"align="top"/><img src="img/pres5.png" width="36%"align="top"/><img src="img/pres6.png" width="36%"align="top"/> [github.com/jminnier/talks_etc](https://github.com/jminnier/talks_etc) --- ## Learning while teaching with new tools (also: find collaborators better than you at all this - thanks [Ted Laderas!]()) <img src="img/teach1.png" width="27%"align="top"/><img src="img/teach2.png" width="33%"align="top"/><img src="img/teach3.png" width="33%"align="top"/> ###### [r-bootcamp.netlify.com](https://r-bootcamp.netlify.com/); [minnier.shinyapps.io/ODSI_categoricalData/](https://minnier.shinyapps.io/ODSI_categoricalData); [minnier.shinyapps.io/ODSI_continuousData](https://minnier.shinyapps.io/ODSI_continuousData/) --- # Software development .pull-left[ - Unix/shell (i.e. for cluster computing) - Programming skills - Version control (i.e. git/github/gitlab, bitbucket) - Unit Testing (does your code work?) - Collaborative coding - Using databases (i.e. SQL) - Automation (i.e. make) ] .pull-right[ <img src="img/softwarecarp.png" width="100%" align="top"/> https://software-carpentry.org/ <img src="img/whattheyforgot.png" width="100%" align="top"/> https://whattheyforgot.org/ ] --- # Improve your coding skills ### Similar to learning a new language - Difficult to do when busy/sporadically, unless you set aside time (make a deadline) or find an immersive experience - Learn more R? SAS/STATA? Find workshops! (JSM, CSP, SDSS, Rstudio::conf, UseR!, local R meetups and conferences) - Learn git/github/software development? Host a [Software Carpentry](https://software-carpentry.org/) workshop <img src="img/sdss2020.png" width="25%"align="top"/><img src="img/rstudioconf2020.png" width="35%"align="top"/> <img src="img/rstudioconfworkshop.png" width="35%"align="top"/> --- layout: false ## Coding in R is more fun these days - R Markdown `\(\to\)` reproducible documents + transformative for collaborative statistical analyses + these slides are made with knitr/r markdown + make textbooks, webpages, much more - Shiny `\(\to\)` webpages to show off methodology/analyses <center><img src="img/horst_rmarkdown_wizards.png" width="47%" height="50%" align="top"><img src="img/horst_ggplot2_masterpiece.png" width="41%" height="50%" align="top"><a href="https://github.com/allisonhorst/stats-illustrations"><br>Allison Horst</a></center> --- # Find a community & learn together .pull-left[ We all have something to learn! - Meetup groups (R Users, R Ladies, Women Who Code) - Hackweeks - Data Science Workshops - Data competitions (DataFest) - Workshops at conferences - Online: [#rstats Twitter](https://twitter.com/hashtag/rstats?src=hash), [Rstudio community](https://community.rstudio.com/), [R 4 Data Science Learning community](https://www.rfordatasci.com/), [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday) - Start your own club! ] .pull-right[ <img src="img/biodataclub1.png" width="70%" align="top"> https://biodata-club.github.io/ ] --- ## It takes time to learn how to learn .pull-left[ #### R specific learning checks from Alison Hill's [Big Magic with R: Creative Learning Beyond Fear](https://alison.rbind.io/talk/2018-cascadia-bigmagic/) - watch the [video](https://www.youtube.com/watch?v=seERGgG6uVs&list=PLxHlPKedTUbL7ckoP5Elrdrw64V7o5_h6&index=3&t=0s)! <img src="img/hill_magic.png" width="95%" align="top"> ] .pull-right[ <img src="img/hill_learn.png" width="100%"> ] --- ## Practice, practice, practice - Learning a tool and then forgetting to use it is the best way to unlearn it - Write down what you've learned (make a list of commands, write a blog post) ## Give talks - Sign up to give talks at local meetups - Teach workshops to students/fellow faculty/coworkers - Journal/book clubs with your co-workers/team --- .pull-left[ ## Resources `\(\rightarrow\)` - Interactive lessons - Online courses - Blog posts ## Ask for help - Stack Overflow - Rstudio Community ] .pull-right[ <center><a href="https://github.com/jminnier/awesome-rstats/"><img src="img/jminnier_learnr.png" width="100%" height="50%">github.com/jminnier</a></center> ] --- .pull-left[ # We don't want to be left behind - Work on projects outside of your comfort zone - Grow as statisticians, grow as a field - Be open to other fields collaborating on "data science" ] .pull-right[ # Don't forget to have fun! <center><img src="img/horst_welcome_to_rstats_twitter.png" width="95%" height="50%"><a href="https://github.com/allisonhorst/stats-illustrations"><br>Allison Horst</a></center> ] --- class: inverse # Thank you! .pull-left-60[ Contact me:
minnier-[at]-ohsu.edu,
[datapointier](https://twitter.com/datapointier),
[jminnier](https://github.com/jminnier/) Slides available: [bit.ly/jmin-jsm19](https://bit.ly/jmin-jsm19) Slide code and files available at: [github.com/jminnier/talks-etc](https://github.com/jminnier/talks_etc) Slides created via the R package [xaringan](https://github.com/yihui/xaringan) by [Yihui Xie](https://twitter.com/xieyihui?lang=en) with the [xaringanthemer](https://github.com/gadenbuie/xaringanthemer) package by [Garrick Aden-Buie](https://github.com/gadenbuie), inspired by slides and slide formatting by [Alison Hill](https://alison.rbind.io/), [Jennifer Thompson](https://jenthompson.me/), and [Chester Ismay](https://ismayc.github.io/) ] .pull-right-40[ <center><img src="img/horst_r_first_then.png" width="100%" height="50%"><a href="https://github.com/allisonhorst/stats-illustrations"><br>Allison Horst</a></center> ]