I’m moving my workflow to the cloud! It’s faster, I don’t need to buy a fancy new computer to crunch big data sets, it’s the wave of the future, etc.
Not sure I did it in the optimal way, but hey, for a first go it went pretty smoothly.
Here are my goals:
You’ll need a web browser and a terminal app (command line app) on your local computer.
For Mac, use Terminal. For *nix computers, use Command Line. For Windows, google using PuTTy for this.
ssh-keygen -t rsa
When prompted, add a password (you can leave blank, but that’s sketchy)
pbcopy < ~/.ssh/id_rsa.pub will copy your key to the clipboard for use in the next step.
Head to https://www.digitalocean.com/ and create an account. You have to provide a credit card, but you won’t be charged anything, and you’ll get $100 credit.
Once you are all logged in to Digital Ocean, go to Account > Security > SSH keys. Create a new key and paste your key info (from the last step) into the box. Give it a meaningful name that describes the user and computer it applies to (i.e., Henry_MacBookProKey).
Go to Manage > Networking > Firewalls > Create Firewall
Add a new rule for TCP and change the Port Range to
You are now ready to safely create your computer!
Select: - Ubuntu latest version - $10/mo option or greater ($5 one ran out of memory on me)
Choose to use the SSH key you created earlier (Henry_MacBookPro)
Note: You’ll only pay the full $10/month if you leave your Digital Ocean computer on and running the whole month. Most of the time, you’ll just start it, do work, and then shut it down and only pay a fraction of that. Plus, you’ve got that $100 credit so you are good for a long time.
Important: Note the IP address of your new droplet. You’ll need that in a minute.
I didn’t have much luck accessing the computer terminal through Digital Ocean. It was much easier to use Mac Terminal and do everything from there.
From Mac Terminal or other command line, log into your Digital Ocean computer:
You’ll need to say you trust the new connection and enter your SSH password from step 1.
apt-get update sudo apt-get install curl build-essential libcurl4-gnutls-dev libxml2-dev libssl-dev libgdal-dev libproj-dev xorg libx11-dev libglu1-mesa-dev r-cran-rgl
Docker will allow you to install pre-made computer images for a variety of topics. Here we will install RStudio and the tidyverse packages. If you just want vanilla RStudio (without tidyverse), change “tidyverse” into “rstudio” throughout.
snap install docker docker pull rocker/tidyverse:latest
Replace your_password with a real password of your own choosing
sudo docker run -d -p 8787:8787 -e PASSWORD=your_password --name rstudio rocker/tidyverse
docker start rstudio
RStudio is now running, but invisibly. To see it, go to: http://the.ip.address.that.digital.ocean.gave.you:8787 and login: username: rstudio password: whatever you used in the last step to replace your_password.
I like the Merbivore or Dracula themes.
For Pane Layout, I love this layout (and so do most folks who see it):
Source | Console ------------------------ History | Everything else
Here’s my list:
install.packages(c("here", "ggthemes", "scales", "ggmap", "plotly", "forecast", "treemapify", "dbscan", "rmarkdown", "flexdashboard", "maptools", "treemap", "ggtern"))
Set up the folder structure how you like it. Here is how I usually set mine up:
MyProject/ ├── docs/ ├── data/ ├── raw ├── processed ├── cleaning/ ├── analysis/ ├── output/ ├── drafts/
docsis where I keep info about the experiment, a data dictionary, etc;
dataholds the raw data (if it isn’t on the web or a server);
cleaningcontains R scripts that take raw data and generate processed data;
analysiscontains R scripts that take processed data and do something with it;
outputis where figures and completed documents go. If I ever truly get to where my entire document is in RMarkdown, I’ll probably end there.
draftsfolder where Word documents live.
If you organize your Projects like this, then you may be interested in the
here package. It helps make your projects as portable to other systems as possible. For example, you could say
write_xlsx(mydata, here("output", "myfile.xlsx")) and you don’t need to know where
output is. It is just ‘here’, in your current project, no matter what drive or system it is on. In this case, here(“output”) means MyProject/output.
here("data", "raw") is a link to my raw data folder.
Snapshots allow you to jump right back here (almost) in the future, so you can start each new project with things already set up the way you like them.
Back in Digital Ocean, click on your droplet and slide the bar to turn it off. Once it has powered down, go to Snapshots and take a snapshot. I save this with a name like “RStudio-basic-template”.
Once you have saved a snapshot, delete your droplet. This will stop you from being charged your hourly rate for server time. You can always pick up again from your snapshot.
In the future, use this template to launch a brand-new configured droplet.
To use a saved template, navigate to Manage > Images and create a new droplet from your RStudio-basic-template.
Once it is ready, note the ip address Go to Terminal, and log in (as above: firstname.lastname@example.org) Type
docker start rstudio
Navigate to http://the.new.ip.address:8787
Log in with username: rstudio and password: your previous password
Continue to save snapshots, but this time instead of naming them templates, name them with meaningful info such as: "2019-03-16_tiger-project_data-cleaning-complete’