Fake news is a term frequently used these days, especially related to the election of a certain U.S. president. Incorrect information on the web is nothing new, so what is all this new excitement about? The integration of social media into our lives is so deep, it has become the primary interface between ourselves and the (digital) world. They are the means for chatting with family and friends, reading the news, do shopping, and (soon) paying. This poses risks — and one of them is misinformation. In any given day, we might read an article from a New York Times reporter and an article from an anonymous blogger and fall in the trap of thinking that they both have the same validity just because we are consuming them from the same social media platform. Combined with our human nature of easily accepting information we receive, especially when it is in agreement with our own world-view, people are believing things that are obviously debatable, to say the least.
This is a serious matter and we have to find defenses against it because fake news has already produced very real results. Technology (like social media) enabled this situation, and technology must provide defenses against it. The good (and truthful) news is that the tools to recognize fake stories already exist.
I read that Facebook will try to fight back using fact-checkers for news that were reported by many users to be fake. I do not know exactly their implementation strategy, but relying solely on human fact-checkers is not going to cut it. The reason is obvious: there are significantly more people writing fake stories than there are fact-checkers, and it takes more time to verify/disprove than to write a fake story. With the help of technology, however, fact-checking can be a very viable solution — specifically an approach similar to Blockchain and complemented by Artificial Intelligence.
The main idea behind the solution is simple: instead of trying to find out for each story if it is factual, focus on examining its source and its distributors. Additionally, we need to automatically recognize if the content of a story is very similar to the content of stories that proved to be fake.
To make a simplified example, if it is proven (from fact-checkers) that Dennis said many fake things, we should take any of his future stories with caution. Does this remind you the story of the boy who cried wolf? Well, no one is blaming the villagers. Actually, Google is already taking a similar approach. Google APAC CMO Simon Kahn in his talk on the British chamber of commerce said:
“The issue that everyone in the industry is very focused on is how to curb it while still allowing people to have open information and networks. … One way of tackling it is looking at sites that are purveyors of fake news and basically stopping advertising — cutting off oxygen so that they aren’t making money off it.”
Cutting their advertising profits may decrease their motivation to create fake news, but it is not enough. If we can “strong-arm them” financially, then there are others who can fund them and push their own agenda with fake stories. It is more “honest” to simply hold them accountable for the content they produce. It is very important to know these purveyors of fake news and to be able to prove that indeed they are.
That sets the requirement of knowing the source of a story and from whom that story was disseminated. It turns out that there is a technology already that can do a pretty good job with that: Blockchain!
In case Blockchain does not ring a bell, I am pretty sure that Bitcoin does. Bitcoin is a popular peer-to-peer cryptocurrency and payment system that is already used widely. Well, Blockchain is the underlying technology that makes such a system possible. I will not go very deep into how Blockchain works, but I have to explain its workflow briefly so you can understand the benefits it provides and how they relate to our solution against fake news.
Imagine Blockchain as a ledger with entries like in the table below. In this ledger, all transactions between users in the network are recorded. Everyone involved (i.e. Nia, Mary, etc.) maintains their own exact copy of the ledger. In the metaphorical ledger below, each row describes a simple transaction between two people, such as in Transaction 2 where Mary sends $5 to Helen. In Blockchain, instead of rows, we have blocks. Each block/transaction is connected to the previous one, forming a chain. The data in each block is cryptographically hashed. The blocks contain information about the reference of the previous block, the details of the transaction, a timestamp, and proof of work for securing the block.
But how are the blocks chained to each other? Every time someone attempts a transaction, the transaction is published. At this time, it is not still in the chain because it is not verified. The verification is performed by the so-called miners, which race between them to validate whether the transaction is valid (i.e. whether there are enough funds for it) but also to decrypt between which parties. The first miner that verifies it puts the block in the chain and updates the rest so they can update their ledgers. This requires a lot of computational power, and the miners do it because they get a fee for every transaction they verify. Another important detail is that Blockchain allows us to know the “story” of everyone’s money. What I mean is that when Martin takes $10 from Nia and gives $9 of that to Doug, we know that these 9 dollars that he (Doug) now owns came originally from Nia.
So what do we do with this information?
- A complete history of the transactions (or whatever information we care about) and their relations.
- This history is publicly available, transparent and cannot be doctored. To be more precise, it is just extremely difficult to unnoticeably modify the information on the chain because of the hashing and the fact that everyone has a synchronized copy, so it is required to corrupt the data of many places at once.
- It is not controlled by anyone. It will be a public record for all of us so we know who said what and eventually whether things they claimed proved to be misleading.
Applying such technology in the context of the fake news, allows us to know who created and disseminated the stories. At any given point, if a news article has been fact-checked and was found to be false, we are able to know its origin and distributors.
It is important to note that we will not “cut them off” by removing their stories but instead hold them accountable for the validity of their content. And if they prove to be vendors of fake news, assign them a deception score for any information they distribute: a higher score for the source but also a score for the subsequent distributors. When a story is published in the future, we can estimate its potential falsehood based on its source and distributors using their deception scores - algorithm has to be defined.
Now, we have a way to quickly compute the estimated deceitfulness of a story but we need to scale it even more. For this, we need to automatically recognize fake stories that are similar. What I mean is that different stories may refer to the same fake event. If you think about it, a single fake story or article cannot cause traction. It is when different articles and sources mention the same fake events that cause this false “verification” and convince people that it must be true. When a story is fact-checked and proved to be fake, then all related stories that claim the same fake information can be classified as fake. This automated process can already be achieved by computers with a decent degree of precision. Machine Learning and Natural Language Processing can do that. Most of us have probably used or at least have knowledge of such applications. A good example is applications for identifying plagiarism in research papers. A more naive (but more common) example is when our email client classifies certain emails as spam “because emails with similar contents were found to be spam.”
Automatically identifying stories that contain content known to be fake is by no means an easy task and the applications will not be performing perfectly. In fact, it will be more of a fuzzy approach, but we can use it to rate automatically stories that the classifier categorizes as fake with a high level of confidence. The sources and distributors of these articles will be assigned again with the appropriate deception score.
I said earlier that human fact-checkers alone cannot keep up with fake news purveyors. Well, an application that rates stories based on sources/disseminators and recognizes automatically stories with the same content will be much faster. Start feeding it with data and it will quickly be able to produce accurate estimations.
The results of the application should be used by the platforms from which the stories are consumed. For instance, next to a story, Facebook could show the estimated score of deceitfulness for the story. A crude sample of the idea is shown in the picture below:
But wait a minute! What if the analysis is wrong? What if the story is not fake after all? The source will have the option to dispute the evaluation and bring forward evidence that it is true. After all, those claiming something to be true should be responsible for proving that indeed it is.