Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Calculating the Difference Between Ordered Factor Variables

DZone's Guide to

Calculating the Difference Between Ordered Factor Variables

· Big Data Zone
Free Resource

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

In my continued exploration of Wimbledon data I wanted to work out whether a player had done as well as their seeding suggested they should.

I therefore wanted to work out the difference between the round they reached and the round they were expected to reach. A ’round’ in the dataset is an ordered factor variable.

These are all the possible values:

rounds = c("Did not enter", "Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals", "Winner")

And if we want to factorise a couple of strings into this factor we would do it like this:

round = factor("Finals", levels = rounds, ordered = TRUE)
expected = factor("Winner", levels = rounds, ordered = TRUE)  

> round
[1] Finals
9 Levels: Did not enter < Round of 128 < Round of 64 < Round of 32 < Round of 16 < Quarter-Finals < ... < Winner

> expected
[1] Winner
9 Levels: Did not enter < Round of 128 < Round of 64 < Round of 32 < Round of 16 < Quarter-Finals < ... < Winner

In this case the difference between the actual round and expected round should be -1 – the player was expected to win the tournament but lost in the final. We can calculate that differnce by calling the unclass function on each variable:


> unclass(round) - unclass(expected)
[1] -1
attr(,"levels")
[1] "Did not enter"  "Round of 128"   "Round of 64"    "Round of 32"    "Round of 16"    "Quarter-Finals"
[7] "Semi-Finals"    "Finals"         "Winner"

That still seems to have some remnants of the factor variable so to get rid of that we can cast it to a numeric value:

> as.numeric(unclass(round) - unclass(expected))
[1] -1

And that’s it! We can now go and apply this calculation to all seeds to see how they got on.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
big data ,bigdata

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}