Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Extracting Information from a Picture, Round 1

DZone's Guide to

Extracting Information from a Picture, Round 1

Let's try to extract information from a map of France using a simple R function.

· Big Data Zone ·
Free Resource

Cloudera Data Flow, the answer to all your real-time streaming data problems. Manage your data from edge to enterprise with a no-code approach to developing sophisticated streaming applications easily. Learn more today.

This week, I wanted to extract more detailed information from the nice map below. I could not get access to the original dataset, per zip code, and I wondered if (assuming that the map was high resolution) it was actually possible to extract information using a simple R function.

As we can see, there is red and green on the map, and I would love to know which are the green and the red cities in France. One important issue is actually the background. Here it's nice and white, but white is a strange color — achromatic and very light. More specifically, if I search red areas, the background gets very red. And very green, too. So, to avoid those issues, I used gimp to change the background into black. On the opposite, where it's black, it's neither red nor green!

Let us get the map, and extract information from the file;

url="https://f.hypotheses.org/wp-content/blogs.dir/253/files/2018/12/inondation3.png"
download.file(url,"inondation3.png")
image="inondation3.png"
library(pixmap)
library(png)
IMG=readPNG(image)

Information is stored in several matrices — or in arrays. Dimension 1 is the height of the picture (in pixels), dimension 2 is the width, and the third one is either 1 (red), 2 (green) or 3 (blue), based on the rgb decomposition of each pixel. Then, I try to find the border of the map:

nl=dim(IMG)[1]
nc=dim(IMG)[2]
MAT=(IMG[,,1]+IMG[,,2])/2
x=apply(MAT,2,max)
plot(x,type="l")

When it's null, it means there is no color on the line of the matrix, i.e. it's completely black. Initially, I used the mean function, but the maximum really behaves like a step function:

y=apply(MAT,1,max)
plot(y,type="l")

Let us find cutoff values, on the left and on the right, on top and on the bottom:

image(1:nc,1:nl,t(MAT))
abline(v=min(which(x>.2)),col="blue")
abline(v=max(which(x>.2)),col="blue")
abline(h=min(which(y>.2)),col="blue")
abline(h=max(which(y>.2)),col="blue")

We obtain the following (forget about the fact that — somehow — France is upside-down):

We can zoom in, just to make sure that our border is fine:

par(mfrow=c(1,2))
image(min(which(x>.2))+(-5):5,1:nl,t(MAT)[min(which(x>.2))+(-5):5,])
abline(v=min(which(x>.2))+(-5):5,col="white")
abline(v=min(which(x>.2)),col="blue")
x1=min(which(x>.2))-1

and on the vertical range:

image(max(which(x>.2))+(-5):5,1:nl,t(MAT)[max(which(x>.2))+(-5):5,])
abline(v=max(which(x>.2))+(-5):5,col="white")
abline(v=max(which(x>.2)),col="blue")
x2=max(which(x>.2))+1

So far so good. Let us keep the subpart of the picture:

image(x1:x2,y1:y2,t(MAT)[x1:x2,y1:y2])

Now, let us focus on the red part/component of that picture:

ROUGE=t(IMG[,,1])[x1:x2,]
ROUGE=ROUGE[,y2:y1]
library(scales)
image(x1:x2,y1:y2,ROUGE,col=alpha(colour=rgb(1,0,0,1), alpha = seq(0,1,by=.01))

Not bad, right? And you can get a similar graph for the green part:

VERT=t(IMG[,,2])[x1:x2,]
VERT=VERT[,y2:y1]
image(x1:x2,y1:y2,VERT,col=alpha(colour=rgb(0,1,0,1), alpha = seq(0,1,by=.01)))

Now, I wanted to adjust a map of France on that one. Using shapefiles of administrative regions, it would be possible to get the proportion of red and green parts (départements, cantons, and so on). As a starting point (before going to 'départements'), let us use a standard shapefile for France:

library(maptools)
library(PBSmapping)
url="http://biogeo.ucdavis.edu/data/gadm2.8/rds/FRA_adm0.rds"
download.file(url,"FRA_adm0.rds")
FR=readRDS("FRA_adm0.rds")
library(maptools)
PP = SpatialPolygons2PolySet(FR)
PP=PP[(PP$X<=8.25)&(PP$Y>=42.2),]
u=(x1:x2)-x1
v=(y1:y2)-y1
ax=min(PP$X)
bx=max(PP$X)-min(PP$X)
ay=min(PP$Y)
by=max(PP$Y)-min(PP$Y)
PP$X=(PP$X-ax)/bx*max(u)
PP$Y=(PP$Y-ay)/by*max(v)
image(u,v,ROUGE,col=alpha(colour=rgb(1,0,0,1), alpha = seq(0,1,by=.01)))
points(PP$X,PP$Y)

We try here to rescale it. The left part should be aligned to the left part of the picture, as well as the right part. And the same holds for the top and the bottom:

Unfortunately, even if we change the projection technique, I could not perfectly match the contour of France. I am quite sure that it's a projection problem! I tried a dozen popular tools, with no success. So if anyone has a clever idea...

We'll continue this tomorrow in Round 2.

 Cloudera Enterprise Data Hub. One platform, many applications. Start today.

Topics:
big data ,r ,data extraction ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}