Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

An Automatic Code to Extract Tweets (and Produce the "Somewhere Else" Review)

DZone's Guide to

An Automatic Code to Extract Tweets (and Produce the "Somewhere Else" Review)

· DevOps Zone
Free Resource

The Nexus Suite is uniquely architected for a DevOps native world and creates value early in the development pipeline, provides precise contextual controls at every phase, and accelerates DevOps innovation with automation you can trust. Read how in this ebook.

A few weeks ago, I ask in a post the (simple) question "dear reader, who are you?" just to know more about the readers of my blog. I found that extremely interesting (even if - to be honest - I was expecting more answers to start a more serious sociological study of the readers of my blog). And an interesting point was that a lot of readers of my blog come to read the "somewhere else" posts, which is a review of interesting posts and articles found on the internet. Those links I share actually come from my tweets. I have on my blog a backup of my tweets, and usually, that's where I go if I want to find some article, or some graph, or some map I have in mind, that I've seen somewhere (but usually I can't remember where). But most of the time, I feel bored, because there is nothing new: it is simply a copy and paste from my tweets.

And this afternoon @tomroud asked how those posts were written: was there an automatic procedure, or was I doing it manually? Until tonight, I was doing it manually. But because it was some kind of stupid challenge, I did try to produce a code that will generate a simple list of my tweets that I can use to produce a post.

Nevertheless, there are still two problems I cannot fix with a code:

  • in my "somewhere else" posts, there was a language distinction, with posts and articles in English first, and then those in French. Unfortunately, I could not find a function that detects the language of a tweet. I remember that we've been trying with @3wen to write such a code, but I could not find it... I guess @3wen had a first draft so if we can find it, I will upload it on my blog (or he will upload it on his)
  • in my posts, I include the picture, if any. This part will still be done manually because it is much more difficult (but I guess it is possible...)

Now, before starting, we will need  functions from an old post, to convert twitter's shorten url to real ones,

extraire <- function(entree,motif){
res <- regexec(motif,entree)
if(length(res[[1]])==2){
 debut <- (res[[1]])[2]
 fin <- debut+(attr(res[[1]],"match.length"))[2]-1
return(substr(entree,debut,fin))
}else return(NA)}
unshorten <- function(url){
uri <- getURL(url, header=TRUE, nobody=TRUE, followlocation=FALSE, 
cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))
res <- try(extraire(uri,"\r\nlocation: (.*?)\r\nserver"))
return(res)}

Now, let us consider the following code. The first step, of course, is to run some lines that will allow me to use Twitter's API,

require(twitteR)
reqURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "https://api.twitter.com/oauth/access_token"
authURL <- "https://api.twitter.com/oauth/authorize"
apiKey <- "yourAPIkey"
apiSecret <- "yourAPIsecret"

twitCred <- OAuthFactory$new(consumerKey=apiKey,consumerSecret=apiSecret,requestURL=reqURL,accessURL=accessURL,authURL=authURL)

twitCred$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))

registerTwitterOAuth(twitCred)

Then, I need to be cautious become some of my tweets are in French, and some weird symbols might appear,

Sys.setlocale("LC_CTYPE","fr_FR.UTF-8")

Now I can write my function

somewhere_else <- function(){

tweets_freak <- searchTwitter("from:@freakonometrics", n = 500)

save(tweets_freak, file="somewhere_else.RData")

tweets_freak_df <- do.call("rbind", lapply(tweets_freak, as.data.frame))

text_tweets_freak <- tweets_freak_df$text

tweets_freak_message <- text_tweets_freak[which(substr(text_tweets_freak,1,1)!="@")]

SE <- which(substr(tweets_freak_message,1,15)=="\"Somewhere else")
first_SE <- SE[1]

tweets_freak <- tweets_freak_message[1:(first_SE-1)]

substitute_id <- function(x){
split_x <- strsplit(x,"@")[[1]]
x_id <- paste(split_x,collapse="http://twitter.com/",sep="")
split_x_id <- strsplit(x_id,"http")
n <- length(split_x_id[[1]])
tweet_x <- strsplit(split_x_id[[1]]," ")

if(n==1) rt <- x_id
if(n>1){
for(i in 2:n){
url <- tweet_x[[i]][1]
split=FALSE
if(substr(url,nchar(url),nchar(url))%in%c(":",",",";",")","(")) split <- TRUE
if(split==FALSE) unshort_url <- unshorten(paste("http",url,sep=""))
if(split==TRUE) unshort_url <- unshorten(paste("http",substr(url,1,nchar(url)-1),sep=""))
tweet=FALSE
if(substr(url,4,10)=="twitter") tweet=TRUE
if((split==FALSE)&(tweet==FALSE)) tweet_x_2 <- c("<a href=\"",unshort_url,"\">",unshort_url,"</a>")
if((split==TRUE)&(tweet==FALSE)) tweet_x_2 <- c("<a href=\"",unshort_url,"\">",unshort_url,"</a>",substr(url,nchar(url),nchar(url)))
if((split==FALSE)&(tweet==TRUE)) tweet_x_2 <- c("<a href=\"",unshort_url,"\">@",substr(unshort_url,21,nchar(unshort_url)),"</a>")
if((split==TRUE)&(tweet==TRUE)) tweet_x_2 <- c("<a href=\"",unshort_url,"\">@",substr(unshort_url,21,nchar(unshort_url))
,"</a>",substr(url,nchar(url),nchar(url)))
tweet_x[[i]] <- c(tweet_x_2,tweet_x[[i]][-1])
}
rt <- paste("<li>",paste(unlist(tweet_x),collapse=" "),"</li>",sep="")
}
return(rt)
}

tweets_freak_sub <- lapply(tweets_freak, substitute_id)
write.table(unlist(tweets_freak_sub),file="tweets_somewhere_else.txt",quote=FALSE,row.names=FALSE)

cat("Number of tweets.....",length(tweets_freak_sub),"\n")
cat("File.................",paste(getwd(),"tweets_somewhere_else.txt",sep="/"),"\n")
cat("Done\n")
}

The first tricky part was to recognize names mentionned in my tweets (since some of them are retweets). The second one was to create an html link each time there is a link (I did not take into account hastags, here). If I run it, get

> somewhere_else()
Number of tweets..... 72 
File.... /home/arthur/tweets_somewhere_else.txt 
Done
Warning message:
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit,  :
  500 tweets were requested but the API can only return 191

If I make a copy and paste from the text file, I have

which makes sense, because those are indeed my most recent posts,

etc. I will have to spend some time to include pictures, graphs, maps, videos, etc, but that function should save me some time!

The DevOps Zone is brought to you in partnership with Sonatype Nexus.  See how the Nexus platform infuses precise open source component intelligence into the DevOps pipeline early, everywhere, and at scale. Read how in this ebook

Topics:

Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}