Extracting Datasets from Excel Files in a Zipped Folder
Join the DZone community and get the full member experience.Join For Free
the title of the post is a bit long, but that’s the problem i was facing this morning: importing datasets from files, online. i mean, it was not a “problem” (since i can always download and extract the files manually), more a challenge (i should be able to do it in r, directly). the files are located on ressources-actuarielles.net , in a zip file. those are mortality tables used in french-speaking african countries, and i guess that one problem came from special characters, such as “é” or “è”… when you open the zip file, you see a folder:
and in that folder, several files that i would like to import:
my first code was quite standard, unfortunately, it did not work
> library(xlsx) > loc.url <- "http://ressources-a...cima.zip" > temp <- tempfile() > download.file(loc.url,temp) trying url 'http://ressources-a...cima.zip' content type 'application/x-zip' length 503786 bytes (491 kb) opened url downloaded 491 kb > list.files <- unzip(temp,list=true) > data <- read.xlsx(unz(temp, + list.files$name),sheetindex=1) error in path.expand(file) : invalid 'path' argument
indeed, there was a problem with the path,
> list.files$name  "tables de mortalit‚ cima/cima f.xlsx"
thankfully, as usual, @ 3wen came to rescue me, and suggested the following,
> sys.setlocale("lc_all", "c")  "c" > loc.url <- "http://ressources-a...cima.zip" > td <- tempdir() > tf <- tempfile(tmpdir=td, fileext=".zip") > download.file(loc.url, tf) trying url 'http://ressources-a...cima.zip' content type 'application/x-zip' length 503786 bytes (491 kb) opened url ============================================= downloaded 491 kb > fname <- unzip(tf, list=true)$name > unzip(tf, files=fname, exdir=td, + overwrite=true) > fpath <- file.path(td, fname)
that was it… then @ 3wen suggested another package to read the xlsx file (that works well on my linux laptop, not on windows)
> data <- read.xls(fpath, sheet = 1) perl: warning: setting locale failed. perl: warning: please check that your locale settings: language = "fr_ca:fr", lc_all = (unset), lang = "fr_ca.utf-8" are supported and installed on your system. perl: warning: falling back to the standard locale ("c"). > sys.setlocale()  "" > head(data) age lx..cima.f. qx..cima.f. dx..cima.f. 1 0 1000000.0 0.2849% 2848.5034 2 1 997151.5 0.0387% 385.8644 3 2 996765.6 0.0296% 295.2717 4 3 996470.4 0.0252% 251.5163 5 4 996218.8 0.0216% 214.9381 6 5 996003.9 0.0197% 195.9929
that was more subtle than expected…
Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.