In Hadoop 1.x, there are some problems, for example HA and too many small files.
In Hadoop 2.x Yarn, there are no HA problems. But only one of the masters in yarn can be active and serve to clients, and other masters are stand by. I think this is a waste of masters.
The Hadoop master nodes do not store metadata and add locks to the file HDFS is writing.
The metadata should be stored in other roles called Metadata Node cluster, all the masters can access the Metadata Node cluster.
The Metadata Node cluster can use Zookeeper to implement. We can use zookeeper tree model as file system tree model. If the znode is a file node, it should be have children nodes in znode and the children nodes describe data block information, the file znode has its data to describe the file itself, such as file length and file access information and others.
Zookeeper is naturally a support cluster, so we don't worry about HA.
If there are too many small files, we should use several Metadata Node Clusters to store metadata, and use shard rule to determine which cluster the file should be stored in. The Metadata Node Cluster should can be added to system transparently.
When the client wishes to write a file, the master should check if there is a write lock flag in the file metadata.
If there are no write lock flags, the master should add a write lock flag in the file znode data then the client write the file as usual. After the client is done, the Master remove the write lock flag.
If there is a write lock flag, the master should refuse client's request or delay the request and put the request in a queue and let the client wait until the write lock is removed.
If we use Pessimistic Strategy for the reading operation, if client reads the file, the master should check if there is a write flag existing of the file. If no write flag exists, the client reads the file as usual else warn the client the file is locked for writing.
If we use Optimism Strategy for the reading operation, the master can ignore the write lock flag and read as usual.
After the refactoring, all the master nodes can provide service to clients at the same time and the small file problem is solved too.
What about your opinion?