DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Securing Your Software Supply Chain with JFrog and Azure
Register Today

Trending

  • What ChatGPT Needs Is Context
  • Integrating AWS With Salesforce Using Terraform
  • Avoiding Pitfalls With Java Optional: Common Mistakes and How To Fix Them [Video]
  • From CPU to Memory: Techniques for Tracking Resource Consumption Over Time

Trending

  • What ChatGPT Needs Is Context
  • Integrating AWS With Salesforce Using Terraform
  • Avoiding Pitfalls With Java Optional: Common Mistakes and How To Fix Them [Video]
  • From CPU to Memory: Techniques for Tracking Resource Consumption Over Time
  1. DZone
  2. Coding
  3. Languages
  4. Removing Uncited References in a Tex File (with R)

Removing Uncited References in a Tex File (with R)

Arthur Charpentier user avatar by
Arthur Charpentier
·
Oct. 27, 14 · Interview
Like (0)
Save
Tweet
Share
5.52K Views

Join the DZone community and get the full member experience.

Join For Free

Last week, with @3wen, we were working a the revised version of our work on smoothing densities of spatial processes (with edge correction). Usually, once you have revised the paper, some references were added, others were dropped. But you need to spend some time to check that all references are actually mentioned in the paper. For instance, consider the following compiled tex file:

Only three references are actually mentioned in the document, so we need to update the reference list (by removing the first three). If you use a bib file, it is very simple, and only cited references will appear in the list. The problem here is that we used bibitems,

I wanted to work on that manually this week-end, but @3wen suggested to write a simple R function to scan the tex f file (as well as the aux file actually) to remove uncited references. The idea is the following. First, let us scan the two files

> library(stringr)
> setwd("/home/tex/")
> file_tex <- scan("file_test.tex", what = "character", sep = "\n")
Read 15 items
> file_aux <- scan("file_test.aux", what = "character", sep = "\n")
Read 21 items

Then, we extract only parts related to the bibliography,

> beg_file <- which(str_detect(string = file_tex, pattern = "\\\\begin\\{thebibliography\\}"))
> end_file <- which(str_detect(string = file_tex, pattern = "\\\\end\\{thebibliography\\}"))

References here are the following lines

> biblio <- file_tex[seq(beg_file+1, end_file-1)]
> biblio
[1] "\\bibitem[Cressie(1991)]{Cressie} Cressie, N. (1991). Statistics for Spatial Data. New York: John Wiley \\& Sons"                                  
[2] "\\bibitem[Diggle (2002)]{Diggle} Diggle, P., Heagerty, P., Liang, K.Y. \\& Zeger, S. 2002. Analysis of Longitudinal Data. Oxford University Press."
[3] "\\bibitem[Ripley(1981)]{Ripley} Ripley, B. 1981. Spatial Statistics, Wiley, New York."                                             
[4] "\\bibitem[Scott(1992)]{Scott} Scott, D W 1992 Multivariate Density Estimation: Theory, Practice, and Visualization. New York, John Wiley and Sons."
[5] "\\bibitem[Silverman(2004)]{Silverman} Silverman B W 1986 Density Estimation for Statistics and Data Analysis."
[6] "London, Chapman \\& Hall."                                             [7] "\\bibitem[Wand \\& Jones(1995)]{Wand} Wand, M.P; Jones, M.C. (1995). Kernel Smoothing. London: Chapman \\& Hall/CRC. "

If you look carefully at the output, you can observe that the fifth reference is on two lines. Which might happend frequently. So we need to check precisely when a reference starts, and when it ends.

> beg_bibitem <- which(str_detect(string = biblio, pattern = "\\\\bibitem"))
> go_through <- cbind(beg_bibitem, c(beg_bibitem[-1]-1,length(biblio)))
> go_through
     beg_bibitem  
[1,]           1 1
[2,]           2 2
[3,]           3 3
[4,]           4 4
[5,]           5 6
[6,]           7 7

Actually, we should also check if a reference is cited. Sometimes, there are references with a comment sign.

> go_through <- data.frame(beg = beg_bibitem, end = rep(NA, length(beg_bibitem)))
> for(i in seq_len(length(beg_bibitem))-1){
+   go_through[i,2] <- beg_bibitem[i+1]-1
+ }
> go_through[nrow(go_through), 2] <- length(biblio)
> go_through$comment <- str_detect(biblio[beg_bibitem], "^%")
> go_through
  beg end comment
1   1   1   FALSE
2   2   2   FALSE
3   3   3   FALSE
4   4   4   FALSE
5   5   6   FALSE
6   7   7   FALSE

Let us now extract the labels of all the references (%).

> extract_ref_cite <- function(bibitem, file){
+   entree <- file[bibitem]
+   if(str_detect(entree, "bibitem\\[.*\\]\\{")){
+     nom_citation <- str_extract(entree, "]\\{(.*?)\\}")
+   }else{
+     nom_citation <- str_extract(entree, "\\{(.*?)\\}")
+   }
+   str_replace_all(string = nom_citation, pattern = "\\{|\\}|]", replacement = "")
+ }
> bibitems_ref <- unlist(lapply(beg_bibitem, extract_ref_cite, biblio))
> bibitems_ref
[1] "Cressie"   "Diggle"    "Ripley"    "Scott"     "Silverman" "Wand"

We have six references, with those labels (as expected).

Now, if we look at the aux file, to see which references are cited in the text,

> ind_cite <- which(str_detect(string = file_aux, pattern = "\\\\citation"))
> bibitems_cite_names <- unlist(lapply(ind_cite, extract_ref_cite, file_aux))
> bibitems_cite_names
[1] "Scott"     "Scott"     "Silverman" "Silverman" "Wand"      "Wand"     
[7] "Scott"     "Scott"

Note that references are mentioned twice (at least): once for the author’s name, once for the year of publication. Since we just need to see which one actually appears in the aux file, we can use

> bibitems_cite_names <- unique(bibitems_cite_names)
> bibitems_cite_names
[1] "Scott"     "Silverman" "Wand"

Now, we can see which references are cited,

> go_through$keep <- bibitems_ref %in% bibitems_cite_names
> go_through
  beg end comment  keep
1   1   1   FALSE FALSE
2   2   2   FALSE FALSE
3   3   3   FALSE FALSE
4   4   4   FALSE  TRUE
5   5   6   FALSE  TRUE
6   7   7   FALSE  TRUE

Based on that table, we can use a simple code: references that we do not need will be seen as comments, while those that are cited will appear in the reference list.

> return_cite <- function(one_ligne){
+   citation <- str_c(biblio[one_ligne[1,"beg"]:one_ligne[1,"end"]], collapse = "\n")
+   if(!one_ligne[1,"keep"] & !str_detect(citation, "^%")){
+     citation <- str_replace_all(citation, pattern = "\n", replacement =  "\n%")
+   }
+   citation
+ }

For instance,

> return_cite(go_through[1,])
[1] "%\\bibitem[Cressie(1991)]{Cressie} Cressie, N. (1991). Statistics for Spatial Data. New York: John Wiley \\& Sons"

since the first reference does not appear in the text, while

> return_cite(go_through[4,])
[1] "\\bibitem[Scott(1992)]{Scott} Scott, D W 1992 Multivariate Density Estimation: Theory, Practice, and Visualization. New York, John Wiley and Sons."

Now, we can easily generate our bibliography, in LaTeX

> cat(unlist(lapply(1:nrow(go_through), function(x) return_cite(go_through[x,]))), sep = "\n\n")
%\bibitem[Cressie(1991)]{Cressie} Cressie, N. (1991). Statistics for Spatial Data. New York: John Wiley \& Sons

%\bibitem[Diggle (2002)]{Diggle} Diggle, P., Heagerty, P., Liang, K.Y. \& Zeger, S. 2002. Analysis of Longitudinal Data. Oxford University Press.

%\bibitem[Ripley(1981)]{Ripley} Ripley, B. 1981. Spatial Statistics, Wiley, New York.

\bibitem[Scott(1992)]{Scott} Scott, D W 1992 Multivariate Density Estimation: Theory, Practice, and Visualization. New York, John Wiley and Sons.

\bibitem[Silverman(2004)]{Silverman} Silverman B W 1986 Density Estimation for Statistics and Data Analysis. 
London, Chapman \& Hall.

\bibitem[Wand \& Jones(1995)]{Wand} Wand, M.P; Jones, M.C. (1995). Kernel Smoothing. London: Chapman \& Hall/CRC.

We simply need to copy that list and paste it in our LaTeX file. Nice, isn’t it?

R (programming language)

Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • What ChatGPT Needs Is Context
  • Integrating AWS With Salesforce Using Terraform
  • Avoiding Pitfalls With Java Optional: Common Mistakes and How To Fix Them [Video]
  • From CPU to Memory: Techniques for Tracking Resource Consumption Over Time

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: