Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Traffic Flow of Kota Kinabalu (With R)

DZone's Guide to

Traffic Flow of Kota Kinabalu (With R)

Let's see how to visualize data based on traffic flow using R. We'll visualize the data on maps and analyze it.

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

Recently, we had our first practicals on network flows, using an example mentioned in some papers published by Noraini Abdullah and Ting Kien Hua. From the roads mentioned in the articles, I did try my best to locate the nodes on a map:

m=matrix(c(0,5.995910, 116.105520,
1,5.992737, 116.093718,
2,5.992066, 116.109883,
3,5.976947, 116.095760,
4,5.985766, 116.091580,
5,5.988940, 116.080112,
6,5.968318, 116.080764,
7,5.977454, 116.075460,
8,5.974226, 116.073604,
9,5.969651, 116.073753,
10,5.972341, 116.069270,
11,5.978818, 116.072880),3,12)

We can visualize this below:

library(OpenStreetMap)
map = openmap(c(lat= 6.000, lon= 116.06),
c(lat= 5.960, lon= 116.12))
map=openproj(map)
plot(map)
points(t(m[3:2,]),col="black", pch=19, cex=3 )
text(t(m[3:2,]),c("s",1:10,"t"),col="white")

If the source is realistic (up north), I do not feel very comfortable with the location of the sink (on the west). But let's pretend it's fine (to do the math, at least).

To extract information about edge capacity on that network, use the following code to extract the three tables from the paper:

library(devtools)
install_github("ropensci/tabulizer")
library(tabulizer)
location <- 'http://www.jistm.com/PDF/JISTM-2017-04-06-02.pdf'
out <- extract_tables(location)

With Windows, it seems to be necessary to download another package first:

library(devtools)
install_github("ropensci/tabulizerjars")
install_github("ropensci/tabulizer")
library(tabulizer)
location <- 'http://www.jistm.com/PDF/JISTM-2017-04-06-02.pdf'
out <- extract_tables(location)

Now, we can get out the DataFrame with the following capacities:

B1=as.data.frame(out[[2]])
B2=as.data.frame(out[[3]])
E=data.frame(from=B1[3:20,"V3"],
to=B1[3:20,"V4"])
E=E[-c(6,8),]
capacity=as.character(B2$V3[-1])
capacity[6]="843"
capacity[4]="2913"
E$capacity=as.numeric(capacity)

We can add those edges on our map (without the arrows to indicate the direction, it would be to too hard to read):

plot(map)
points(t(m[3:2,]),col="black", pch=19, cex=3 )
B=data.frame(i=as.character(c("s",paste("V",1:10,sep=""),"t")),
x=m[3,],y=m[2,])
for(i in 1:nrow(E)){
i1=which(B$i==as.character(E$from[i]))
i2=which(B$i==as.character(E$to[i]))
segments(B[i1,"x"],B[i1,"y"],B[i2,"x"],B[i2,"y"],lwd=3)
}
text(t(m[3:2,]),c("s",1:10,"t"),col="white")

To get the graph with capacities, an alternative is to use:

library(igraph)
g=graph_from_data_frame(E)
E(g)$label=E$capacity
plot(g)

But it does not respect geographical locations of nodes. It can actually be done using the following code:

plot(g, layout=as.matrix(B[,c("x","y")]))

To get a better understanding of the capacities of the road, use this code:

plot(g, layout=as.matrix(B[,c("x","y")]),
edge.width=E$capacity/200)

From that network with capacities, the goal is to determine the maximum flow on that network, from the source to the sink. This can be done with R using:

> (m=max_flow(graph=g, source="s", target="t"))
$value
[1] 2571$flow
[1] 1191 1380 1422 1380 231 0 231 0 1149 1422 1149 0 0 1149 1422
[16] 1149

Our maximum flow is here 2,571, which is different from was is actually claimed both in the two papers max flow min cut theorem to... and application of the shortest path... ("the maximum flow for the capacitated network with 12 nodes and 16 edges of the selected scope in this study was 2,598 vehicles per hour") where there are clearly typos since values in the table and on the graph are different. Here, I did use the ones from the tables.

E$flux1=m$flow
E(g)$label=E$flux1
plot(g, layout=as.matrix(B[,c("x","y")]),
edge.width=E$flux1/200)

That is nice, but rather odd. Actually, a much simpler flow can be considered, but the same global value:

E$flux2=c(1422,1149,1422,1149,0,0,0,0,
1149,1422,1149,0,0,1149,1422,1149)
E(g)$label=E$flux2
plot(g, layout=as.matrix(B[,c("x","y")]),
edge.width=E$flux2/200)

Nice, isn't it? It is actually possible to do exactly the same on another paper they have, on the same city.

location <- 'http://www.worldresearchlibrary.org/up_proc/pdf/999-150486366625-30.pdf'
out <- extract_tables(location)
dim(out[[3]])
B1=as.data.frame(out[[3]])
E=data.frame(from=B1[2:61,"V2"],
to=B1[2:61,"V3"],
capacity=B1[2:61,"V4"])
E$capacity=as.numeric(
as.character(E$capacity))
library(igraph)
g=graph_from_data_frame(E)
m=max_flow(graph=g,
source="S",
target="T")
E$flux1=m$flow
E(g)$label=E$flux1
plot(g,
edge.width=E$flux1/200,
edge.arrow.size=0.15)

Here, the value of the maximal flow is 4,017, just as they found in the original paper:

And that's it!

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.

Topics:
big data ,r ,tutorial ,data analytics ,data visualization

Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}