From original post @ http://analyticsblog.mecglobal.it/analytics-tools/bashr/
In the world of data analysis, the term automation runs hand in hand with the term “scripting”. There’s not the best programming language, only the most suitable to perform the required function.
In our case, many data aggregation procedures are run from unix/linux servers, collecting API data in real time, so it becomes essential to make sure that data is formatted and correctly stored for the analysis/visualization needs.
In our case some automatic procedures run via cron at night, calling multiple R scripts with some parameters.
Our challenge was to ensure that R scripts could perform certain procedures or not, depending on the parameters passed via bash script. The question was: how to send parameters from bash script to R in real time?
The answer is very simple and two aspects needed to be considered: the bash script that invokes the R script passing the parameters, and the R script itself that must read the parameters.
In this example we will create a variable containing the date of yesterday (the variable “fileshort”) and we will pass this variable to R in order to save a file using the variable as filename.
Let’s start from the bash script:
#!/bin/bash
data=`date --date=-1day +%Y%m%d`
fileshort=test_$data.csv Rscript /home/file_repo/testfile.R $fileshort --save
As you can see a simple variable fileshort is created and then sent to R script. As for the syntax, to invoke R you can use either “Rscript” “R <“: the result will be identical.
Now it’s time to edir our R script. First we need tell our script to intercept the parameters/arguments passed by shell, checking them with the print method as you can see below:
args <- commandArgs()
print(args)
on console R will print what follows:
[] "/usr/lib/R/bin/exec/R" [] "--slave"
[] "--no-restore" [] "--file=/home/file_repo/testfile.R"
[] "--args" [] "test_20150201.csv"
[] "--save"
In our case the required parameter is the filename, or “test_20150201.csv” which is the sixth element of the array [6].
At this point you just need to assign a variable with the element that interests us:
name <- args[]
and use our variable as we prefer. In our example to write a file:
require(lubridate) write.table(db_final,paste0(name), append = FALSE, quote = FALSE, sep = ",",
eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = FALSE, qmethod = c("escape", "double"),
fileEncoding = "")
The generated file will have name “test_20150201.csv”
Enjoy!