added script to generate data tables for the ameriflux data#6
added script to generate data tables for the ameriflux data#6areevesman wants to merge 4 commits intoNCEAS:masterfrom areevesman:master
Conversation
dmullen17
left a comment
There was a problem hiding this comment.
Overall really solid work!
If you read my comments you'll find that you can generalize alot of this code so you aren't repeating yourself. Take your time implementing these changes.
When you start making changes follow these steps so that you're able to commit (basically save) your work https://github.com/NCEAS/data-processing#making-changes-to-your-contribution
R/Ameriflux_data_tables.R
Outdated
| definitions <- read.csv('/home/reevesman/Ameriflux/attribute_function/definitions.csv', | ||
| stringsAsFactors = F) | ||
|
|
||
| data1 <- read.csv('/home/reevesman/Ameriflux/AMF_US-Ivo/AMF_US-Ivo_BASE_HH_2-1.csv', |
There was a problem hiding this comment.
Rather than read all of these in one at a time, you can have your function do so for you.
it would read something like:
attribute_definitions <- function(data_path, definitions) {
data <- read.csv(data_path, skip = 2, stringsAsFactors = FALSE)
(the rest of your code)
}
R/Ameriflux_data_tables.R
Outdated
|
|
||
| data2 <- read.csv('/home/reevesman/Ameriflux/AMF_US-ICt/AMF_US-ICt_BASE_HH_2-1.csv', | ||
| skip = 2, | ||
| stringsAsFactors = F) |
There was a problem hiding this comment.
R is weird in that it lets you abbreviate TRUE and FALSE as T/F. It's generally considered best practice to spell these out to increase readability.
There was a problem hiding this comment.
When you're making changes you can split up your commits. For instance in one commit you can just change all the T/Fs to TRUE/FALSE, and just make that commit something like "updated T and F syntax"
R/Ameriflux_data_tables.R
Outdated
|
|
||
| if (str_sub(att,-5,-1) == '_PI_F'){ | ||
| x <- str_sub(att,-5,-1) | ||
| extra <- paste(definitions[which(definitions$uniqueAttributeLabel == '_PI'), 'uniqueAttributeDefinition'], |
There was a problem hiding this comment.
this is something i mentioned to sharis as well. At the beginning of your function you could have a line that sets new column names for your attributes. These column names in the attributes csv are pretty bad and make the R card harder to read.
colnames(attributes) <- c("category", "name", "defintion", "units", "SI_units")
R/Ameriflux_data_tables.R
Outdated
|
|
||
| ############################################################################## | ||
|
|
||
| attribute_units <- function(data, definitions){ |
There was a problem hiding this comment.
It looks like this function searches for qualifiers in variable names and deletes them. You could probably simplify this by using the gsub function and a regular expression. Take a look at what this code does and you should be able to simplify this quite a bit. | stands for "or" in R regular expressions.
names <- c("var_1", "var_F_2")
gsub("_1|_2|_F", "", names)
https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf
R/Ameriflux_data_tables.R
Outdated
| QUALIFIERS_EXIST <- TRUE | ||
| } | ||
|
|
||
| else if (str_sub(att,-3,-1) %in% c('_PI','_QC','_IU','_SD')){ |
There was a problem hiding this comment.
It looks like you're looking for every possible iteration of qualifiers and treating each one as a unique case. While this is totally acceptable, it's a good idea to think about how you might scale this if there were too many to write out by hand.
You can reverse the %in% statement so you're looking for if a qualifier is in an attribute. Then you can run the rest of the commands that you have after an else-if statement to get the extra definition.
See if you can use this example to simplify your code. You should only need to do this twice, once for the integer qualifiers and once for the character ones. The logic here is a bit tricky so don't hesitate to ask me about it
att <- "CO2_F_1
int_qualifiers <- c('_1','_2','_3','_4','_5','_6','_7','_8','_9')
sapply(int_qualifiers, grepl, x = att) # this applies grepl to each value in int_qualifiers, with the additional argument x = att
|
Thanks @dmullen17! I will definitely work on all of these. I really enjoyed this project and your feedback! Please let me know about any similar projects! |
|
@areevesman glad to hear it! If you want to focus on making these changes, I'll think of a similar task for you to work on by the time you're done. |
|
Hey @dmullen17, I made all of the edits that I was planning to based on your comments! Thanks for reviewing for me! Jesse said that you've got an intense job interview today, good luck! |
@dmullen17 I just added the script for the data tables!