Over a million developers have joined DZone.

Michael Jordan Is at the Top of the Machine Learning All-Stars List… But There Is a Twist

Sometimes you have to go through hoops to get the computer science right.

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

No one is surprised to hear that Michael Jordan is at the top of his list. A recent study has deemed him to be the most important player in his chosen field...

Okay, before you wonder if you went to the wrong site, you didn't. I'm talking about Michael I. Jordan, this guy:

Image title

He is currently the Pehong Chen Distinguished Professor at the University of California, Berkeley. And he contributes to all of these groups:

So what list are we talking about and how did he get to the top? Well, for those of you that have dipped into academics as a career, you already know that your reputation is established by the connections that are expressed in your published work. Who you reference in your papers and who references you in their papers define your socio-scientific status. Many of you are aware that Google Scholar has been around for more than a decade and contains extensive data about what papers people have published. It is a gigantic, centrally located database for this information, but late last year an interesting new service was introduced. It is called Semantic Scholar and was created by the Allen Institute for Artificial Intelligence in Seattle, Washington. Note: It is not associated with Google Scholar.

Semantic Scholar purports to do much more by using artificial intelligence techniques to extract meaning from the publications and to use that meaning to more fully understand  the connection networks between the publications. And there are a lot of data visualizations that reflect the normal stuff you might expect like number of citations, number of papers, etc. but there are other features such as acceleration, velocity, and recency that can be explored. But one of the most interesting features is an interactive graph that connects the scholar to the authors who most influenced the scholar as well as to the authors that were most influencedby the scholar. It is very cool how you can wander through the cascades of influence and at a glance see who and what papers were involved in the connections. Here is a static image of one of the pages, but you must go to the site because everything you hover over, or click on leads you to more information.

Image title

Here are some of the people at the top of this list:

Sadly, the system does not cover all of academia just yet. But happily (for us) the corpus includes all the major computer science publications! Perhaps that's not unfair because we do get to eat our own dog food. Plus, who but the software geniuses that we are would be better suited to test and improve the system. #amiright? The corpus currently represents about 4 million papers in computer science. The next major domain on the drawing boards will focus on neuroscience. This new corpus will cover the influence of brain researchers and will debut in San Diego, California at the Society for Neuroscience meeting, on 12 November 2016.

There is some hope that this measurement of "influence" will improve the rather primitive "publish or perish" paradigm. So, if confidence can be established in the ratings from this new system then it seems likely that they will be used in hiring and tenure evaluations. And this new system rates more than just the people. It can rate the influence of institutions. (Here is a list of the top 50 domains. Can you guess number one?) 

Jeff Clune, a computer scientist at the University of Wyoming in Laramie was one of the first to use it upon its initial release. Once he poked around his own influence graph he commented: "It is extremely fun, ... I can see which scientists have most influenced my own career, which scientists I have inspired the most, and the same for any other scientist." And while he admitted it was fun to play with, Clune suggested it might have value in the academic hiring and promotion process.

Finally we have the statistics to figure out who is the Michael Jordan of machine learning. It turns out it's Michael Jordan. Go figure.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.

computer science,semantic coding,machine learning,academic research

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}