Sunday, February 28, 2016

Spatial reporting with R

Many times we need to plot geo-spatial data in analytics. Information like sales per region, income distribution makes more sense when they are plotted on a map. We can do this quite easily in
R. Let us see it in action.

First of we need data about the map. There are many libraries from where we can download this data
for our personal use. Here we will use data from http://gadm.org/  Here there are data at different levels of details available for most countries. Let us use data for India. To load this downloaded data into R, first open R in R-Commander and change your working directory to where you saved the file

setwd("/home/soumyanath/Downloads/R_Maps")
and then read the data into a variable with

ind1 <- readRDS("IND_adm1.rds")

Let us check what kind of data has been loaded with

class(ind1)

It will show

[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"

As it is SpatialPolygon, let us load library(sp)

library(sp, pos=4)
library(methods, pos=4)

Now, the question is, how do we see this data? There is a function to plot spatial data, we use that

spplot(ind1, "NAME_1", scales=list(draw=T), colorkey=F, main="India")

will show a map. Actually I do not like it, it shows a truncated view of Kashmir, but then we are
using data from an USA repository and I have no means to influence them. We shall revisit this part at a later stage on how to correct the maps, but for now, let us make use of what we have. To manipulate data we need to know properties of the data that we have. We can look into the loaded data with names function. It shows:

> names(ind1)
 [1] "OBJECTID"  "ID_0"      "ISO"       "NAME_0"    "ID_1"      "NAME_1"    "HASC_1"    "CCN_1"     "CCA_1"    
[10] "TYPE_1"    "ENGTYPE_1" "NL_NAME_1" "VARNAME_1"

We can also check property of the data by using
summary(ind1)

This will show various properties of data loaded. Right now we are interested in knowing ID for the states so that we can use it to color the maps with our data. We can user print(ind1) to view complete data, but in this case it will be a huge print. In this example we will use state ID "HASC_1" to plot our data. We can see the values with:

print(ind1$HASC_1)


Right now we do not have any data so we populate a excel sheet and fill data with state ID, fill some sales data and assign a color value based on sales amount. In reality we will probably use a data base to get this data. We save data into csv format and read it in R by:
pdata = read.csv("filename")

confirm data has been read correctly

We add a new property color.data into the dataframe ind1 based on color values taken from csv file

ind1$color.data =pdata[pdata[1]==ind1$HASC_1,3]

Now we plot the map with these color. The command is

spplot(ind1,"NAME_1",  col.regions=ind1$color.data, colorkey=T, main="Indian States")

We have the result here
Same concept can be extended to district level for more granular analysis.

In our next blog, we shall see how to link this map projection with database to get real time analysis