Over a million developers have joined DZone.
Platinum Partner

Taxonomy with R: Exploring the Taxize-Package

· Big Data Zone

The Big Data Zone is presented by Exaptive.  Learn about how to rapidly iterate data applications, while reusing existing code and leveraging open source technologies.

First off, I'd really like to give a shout-out to the brave people who have created and maintain this great package - the fame is yours!

So, while exploring the capabilities of the package some issues with the ITIS-Server arose and with large datasets things weren't working out quite well for me.

I then switched to the NCBI API and saw that the result is much better here (way quicker, on first glance also a higher coverage). 

At the time there is no taxize-function that will pull taxonomic details from a classification returned by NCBI, that's why I plugged together a little wrapper - see here:

# some species data:
spec <- data.frame("Species" = I(c("Bryum schleicheri", "Bryum capillare", "Bryum argentum", "Escherichia coli", "Glis glis")))
spl <- strsplit(spec$Species, " ")
spec$Genus <- as.character(sapply(spl, "[[", 1))
 
# for pulling taxonomic details we'd best submit higher rank taxons
# in this case Genera. Then we'll submit Genus Bryum only once and 
# save some computation time (might be an issue if you deal 
# with large datasets..)
 
gen_uniq <- unique(spec$Genus)
 
# function for pulling classification details ("phylum" in this case)
get_sys_level <- function(x){ require(taxize)
                              a <- classification(get_uid(x))
                              y <- data.frame(a[[1]])                                        # if there are multiple results, take the first..
                              z <- tryCatch(as.character(y[which(y[,2] == "phylum"), 1]),    # in case of any other errors put NA
                                            error = function(e) NA)
                              z <- ifelse(length(z) != 0, z, NA)                             # if the taxonomic detail is not covered return NA
                              return(data.frame(Taxon = x, Syslevel = z))
                             }
 
# call function and rbind the returned values 
result <- do.call(rbind, lapply(gen_uniq, get_sys_level))
print(result)
#         Taxon       Syslevel
# 1       Bryum   Streptophyta
# 2 Escherichia Proteobacteria
# 3        Glis       Chordata
 
# now merge back to the original data frame
spec_new <- merge(spec, result, by.x = "Genus", by.y = "Taxon")
print(spec_new)
#         Genus           Species       Syslevel
# 1       Bryum Bryum schleicheri   Streptophyta
# 2       Bryum   Bryum capillare   Streptophyta
# 3       Bryum    Bryum argentum   Streptophyta
# 4 Escherichia  Escherichia coli Proteobacteria
# 5        Glis         Glis glis       Chordata
#

The Big Data Zone is presented by Exaptive.  Learn how rapid data application development can address the data science shortage.

Topics:

Published at DZone with permission of Kay Cichini , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}