Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Football Transfers Graph

DZone 's Guide to

Football Transfers Graph

The Football Transfer window is currently open, which means players are moving around the globe for increasingly exorbitant fees.

· Database Zone ·
Free Resource

The Football Transfer window is currently open, which means players are moving around the globe for increasingly exorbitant fees.

The best place to keep track of what's going on is the Latest Transfers page of the transfermarkt website, which captures details of the players, clubs, and fees involved.

Irfan and I were trying to work out where the money was flowing based on the transfers, so we decided to create a Neo4j Graph to help us out.

The scraping code to get the data from the transfermarkt website is in the mneedham/football-transfers repository. The transfers.json file contains JSON documents for all the transfers that happened since June 2019.

Loading the data

One line of the transfers.json file looks like this:

{
   "season":"2019/2020",
   "player":{
      "href":"/antoine-griezmann/profil/spieler/125781",
      "name":"Antoine Griezmann",
      "position":"Centre-Forward",
      "age":"28",
      "image":"https://tmssl.akamaized.net//images/portrait/medium/125781-1533626871.jpg?lm=1533626889",
      "nationality":"France"
   },
   "from":{
      "href":"/atletico-madrid/startseite/verein/13",
      "name":"Atl\u00e9tico Madrid",
      "country":"Spain",
      "league":"LaLiga",
      "leagueHref":"/primera-division/transfers/wettbewerb/ES1",
      "image":"https://tmssl.akamaized.net//images/wappen/tiny/13.png?lm=1519120744"
   },
   "to":{
      "href":"/fc-barcelona/startseite/verein/131",
      "name":"FC Barcelona",
      "country":"Spain",
      "league":"LaLiga",
      "leagueHref":"/primera-division/transfers/wettbewerb/ES1",
      "image":"https://tmssl.akamaized.net//images/wappen/tiny/131.png?lm=1406739548"
   },
   "transfer":{
      "href":"/jumplist/transfers/spieler/125781/transfer_id/2552096",
      "value":"\u00a3108.00m",
      "timestamp":1563058800
   }
}

We have players, from and to clubs, and the transfer itself. We'll import that data into the following graph model:

Image title

We can execute the following query that uses APOC's Load JSON procedure to create players, transfers, leagues, and clubs:

CALL apoc.load.json("https://github.com/mneedham/football-transfers/raw/master/data/transfers.json")
YIELD value
WITH value, apoc.text.replace(value.transfer.value, "£", "") AS transferValue
WHERE transferValue <> "?" AND transferValue <> "-"
MERGE (p:Player {id: value.player.href})
SET p.name = value.player.name
MERGE (from:Club {id: value.from.href})
SET from.name = value.from.name
FOREACH(ignoreMe IN CASE WHEN value.from.leagueHref = "" THEN [] ELSE [1] END |
  MERGE (fromLeague:League {id: value.from.leagueHref})
  SET fromLeague.name = value.from.league
  MERGE (from)-[:IN_LEAGUE]->(fromLeague)
)
MERGE (to:Club {id: value.to.href})
SET to.name = value.to.name
FOREACH(ignoreMe IN CASE WHEN value.to.leagueHref = "" THEN [] ELSE [1] END |
  MERGE (toLeague:League {id: value.to.leagueHref})
  SET toLeague.name = value.to.league
  MERGE (to)-[:IN_LEAGUE]->(toLeague)
)
MERGE (t:Transfer {id: value.transfer.href})
SET t.value = CASE
       WHEN transferValue contains "k" 
       THEN toFloat(apoc.text.replace(transferValue, "k", "")) 
            * 1000
       WHEN transferValue contains "m" 
       THEN toFloat(apoc.text.replace(transferValue, "m", "")) 
            * 1000000
       ELSE 0.0 END
SET t.date = date(datetime({epochseconds:value.transfer.timestamp}))
MERGE (t)-[:OF_PLAYER]->(p)
MERGE (t)-[:FROM_CLUB]->(from)
MERGE (t)-[:TO_CLUB]->(to);

After we've done that, we'll run the following query to create a relationship from a league to the country that it belongs to:

CALL apoc.load.json("https://github.com/mneedham/football-transfers/raw/master/data/leagues.json")
YIELD value
MATCH (l:League {id: value.league})
MERGE (c:Country {name: value.country})
MERGE (l)-[:IN_COUNTRY]->(c);

Image title

Graph of the top 10 transfers

Now we're ready to query the graph.

Top 10 transfers

The following query finds the top 10 transfers by value and returns the player and clubs involved:

MATCH (t:Transfer)-[:OF_PLAYER]->(player), 
      (from)<-[:FROM_CLUB]-(t)-[:TO_CLUB]->(to)
RETURN player.name, from.name, to.name,
       apoc.number.format(t.value) AS price
ORDER BY t.value DESC
LIMIT 10

Image title

Top 10 transfers

Atlético Madrid and Real Madrid appear on several of these transfers.

We can aggregate the queries involving these teams to see how much money has been involved in their transfers.

Money in, money out

The following query finds the clubs that have spent and received the most money during the transfer window:

MATCH (club:Club)
WITH club,
     apoc.coll.sumLongs(
       [(club)<-[:FROM_CLUB]-(t) | t.value]) AS moneyIn,
     apoc.coll.sumLongs(
       [(club)<-[:TO_CLUB]-(t) | t.value]) AS moneyOut
RETURN club.name, 
       apoc.number.format(moneyIn) AS in, 
       apoc.number.format(moneyOut) AS out
ORDER BY moneyIn + moneyOut DESC
LIMIT 10

Image title

As we guessed, the Madrid clubs are at the top of the list. Surprisingly there aren't any English clubs in the top 10.

What about if we only look at the money spent on transfers?

Money out

The following query finds the clubs that have spent and received the most money during the transfer window:

MATCH (club:Club)-[*2]->(country:Country)
WITH club, country,
     apoc.coll.sumLongs(
       [(club)<-[:TO_CLUB]-(t) | t.value]) AS moneyOut
RETURN club.name, country.name,  apoc.number.format(moneyOut) AS out
ORDER BY moneyOut  DESC
LIMIT 10

Image title

The Spanish teams still dominate the top 3 positions, but interestingly, Aston Villa has spent the most money of the English teams so far. Presumably, that will change by the end of the summer.

Next, let's go a level up and see which countries money is flowing between.

Money flow by country

The following query finds the total fees spent moving players from teams in one country to another, excluding transfers between clubs that play in the same country.

We also return the most expensive transfer between those countries:

MATCH (t:Transfer)-[:OF_PLAYER]->(player),
      (fromCountry)<-[:IN_COUNTRY]-(fromLeague),
      (fromLeague)<-[:IN_LEAGUE]-(from)<-[:FROM_CLUB]-(t),
      (t)-[:TO_CLUB]->(to)-[:IN_LEAGUE]->(toLeague),
      (toLeague)-[:IN_COUNTRY]->(toCountry)
WITH *
ORDER BY fromLeague, toLeague, t.value DESC
WITH fromLeague, toLeague, sum(t.value) AS totalFees, 
     fromCountry, toCountry, 
     collect({player: player.name, fee: t.value}) AS transfers
WHERE fromCountry <> toCountry
RETURN fromCountry.name, toCountry.name, 
       apoc.number.format(totalFees) AS total, 
       transfers[0].player AS player,
       apoc.number.format(transfers[0].fee) AS fee, 
       size(transfers) AS numberOfTransfers
ORDER By totalFees DESC
LIMIT 10

Image title

The most money was transferred from Portugal to Spain, although this is a bit skewed by the transfer of João Felix, which accounts for almost 60% of the money flow.

Next Steps

We hope you enjoy the dataset, and if you have any questions or suggestions on what we should do next, let us know in the comments.

Topics:
database ,football transfers graph ,tutorial ,football transfer window ,top ten football transfers

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}