Apache Hadoop Code Quality: Production VS Test
Join the DZone community and get the full member experience.
Join For FreeIn order to get high-quality production code, it's not enough just to ensure maximum coverage with tests. No doubt, great results require the main project code and tests to work efficiently together. Therefore, tests have to be paid as much attention to as source code. A decent test is a key success factor, as it will catch regression in production. Let's take a look at PVS-Studio static analyzer warnings to see the importance of the fact that errors in tests are no worse than the ones in production. Today's focus: Apache Hadoop.
About the Project
Those who were formerly interested in Big Data have probably heard or worked with the Apache Hadoop project. In a nutshell, Hadoop is a framework that can be used as a basis for building and working with Big Data systems.
Hadoop consists of four main modules; each of them performs a specific task required for a Big Data analytics system:
About the Check
As shown in the documentation, PVS-Studio can be integrated into a project in a variety of ways:
- Using the Maven plugin.
- Using the Gradle plugin.
- Using the Gradle IntellJ IDEA.
- Directly using the analyzer.
Hadoop is based on the Maven build system; therefore, there were no obstacles with the check.
After I integrated the script from the documentation and edited one of the pom.xml files (there were modules in dependencies that were not available), the analysis started!
After the analysis was completed, I chose the most interesting warnings and noticed that I had the same number of warnings in production code and in tests. Normally, I don't consider analyzer warnings from tests. But when I divided them, I couldn't leave 'tests' warnings unattended. "Why not take a look at them," I thought, "because bugs in tests might also have adverse consequences." They can lead to incorrect or partial testing, or even to mishmash.
After I selected the most intriguing warnings, I divided them by the following groups: production, test and the four main Hadoop modules. And now I'm glad to offer the review of analyzer warnings.
You may also like: A Kafka Tutorial for Everyone, no Matter Your Stage in Development.
Production Code
Hadoop Common
V6033 An item with the same key 'KDC_BIND_ADDRESS' has already been added. MiniKdc.java(163), MiniKdc.java(162)
xxxxxxxxxx
public class MiniKdc {
....
private static final Set<String> PROPERTIES = new HashSet<String>();
....
static {
PROPERTIES.add(ORG_NAME);
PROPERTIES.add(ORG_DOMAIN);
PROPERTIES.add(KDC_BIND_ADDRESS);
PROPERTIES.add(KDC_BIND_ADDRESS); // <=
PROPERTIES.add(KDC_PORT);
PROPERTIES.add(INSTANCE);
....
}
....
}
A value added twice in a HashSet
is a very common defect when checking projects. The second addition will be ignored. I hope this duplication is just a needless tragedy. What if another value was meant to be added?
MapReduce
V6072 Two similar code fragments were found. Perhaps, this is a typo and the localFiles
variable should be used instead of localArchives
.
- LocalDistributedCacheManager.java(183).
- LocalDistributedCacheManager.java(178).
- LocalDistributedCacheManager.java(176).
- LocalDistributedCacheManager.java(181).
xxxxxxxxxx
public synchronized void setup(JobConf conf, JobID jobId) throws IOException {
....
// Update the configuration object with localized data.
if (!localArchives.isEmpty()) {
conf.set(MRJobConfig.CACHE_LOCALARCHIVES, StringUtils
.arrayToString(localArchives.toArray(new String[localArchives // <=
.size()])));
}
if (!localFiles.isEmpty()) {
conf.set(MRJobConfig.CACHE_LOCALFILES, StringUtils
.arrayToString(localFiles.toArray(new String[localArchives // <=
.size()])));
}
....
}
The V6072 diagnostic sometimes yields some interesting findings. The purpose of this diagnostic is to detect the same type of code fragments resulted from copy-paste and replacement of one-two variables. In this case, some variables have even been left "underchanged."
The code above demonstrates this. In the first block, the localArchives
variable is used in the second similar fragment — localFiles
. If you study this code with due diligence, rather then quickly run through it, as it often happens while code reviewing, you'll notice the fragment, where the author forgot to replace the localArchives
variable.
This gaffe can lead to the following scenario:
- Suppose, we have localArchives (size = 4) and localFiles (size = 2).
- When creating the array,
localFiles.toArray(new String[localArchives.size()])
, the last two elements will be null (["pathToFile1", "pathToFile2", null, null]
). - Then,
org.apache.hadoop.util.StringUtils.arrayToString
will return the string representation of our array, in which the last files names will be presented as "null" ("pathToFile1, pathToFile2, null, null"). - All this will be passed further, and God only knows what kind of checks there are for such cases =).
xxxxxxxxxx
boolean isHierarchySameAs(Queue newState) {
....
if (children == null || children.size() == 0) {
....
}
else if(children.size() > 0)
{
....
}
....
}
Due to the fact that the number of elements is checked for 0 separately, the further check children.size() > 0 will always be true.
HDFS
V6001 There are identical sub-expressions 'this.bucketSize' to the left and to the right of the '%' operator. RollingWindow.java(79).
xxxxxxxxxx
RollingWindow(int windowLenMs, int numBuckets) {
buckets = new Bucket[numBuckets];
for (int i = 0; i < numBuckets; i++) {
buckets[i] = new Bucket();
}
this.windowLenMs = windowLenMs;
this.bucketSize = windowLenMs / numBuckets;
if (this.bucketSize % bucketSize != 0) { // <=
throw new IllegalArgumentException(
"The bucket size in the rolling window is not integer: windowLenMs= "
+ windowLenMs + " numBuckets= " + numBuckets);
}
}
YARN
V6067 Two or more case-branches perform the same actions. TimelineEntityV2Converter.java(386), TimelineEntityV2Converter.java(389).xxxxxxxxxx
public static ApplicationReport
convertToApplicationReport(TimelineEntity entity)
{
....
if (metrics != null) {
long vcoreSeconds = 0;
long memorySeconds = 0;
long preemptedVcoreSeconds = 0;
long preemptedMemorySeconds = 0;
for (TimelineMetric metric : metrics) {
switch (metric.getId()) {
case ApplicationMetricsConstants.APP_CPU_METRICS:
vcoreSeconds = getAverageValue(metric.getValues().values());
break;
case ApplicationMetricsConstants.APP_MEM_METRICS:
memorySeconds = ....;
break;
case ApplicationMetricsConstants.APP_MEM_PREEMPT_METRICS:
preemptedVcoreSeconds = ....; // <=
break;
case ApplicationMetricsConstants.APP_CPU_PREEMPT_METRICS:
preemptedVcoreSeconds = ....; // <=
break;
default:
// Should not happen..
break;
}
}
....
}
....
}
The same code fragments are in two case branches. It's just all over the place! In the prevailing number of cases, this is not a real error, but just a reason to think about refactoring the switch statement. This is not true for the case at hand. Repeated code fragments set the value of the variable preemptedVcoreSeconds
. If you look closely at the names of all variables and constants, you will probably conclude that in the case, if metric.getId() == APP_MEM_PREEMPT_METRICS
, the value must be set for the preemptedMemorySeconds
variable, not for preemptedVcoreSeconds
. In this regard, after the switch statement, preemptedMemorySeconds
will always remain 0, while the value of preemptedVcoreSeconds
might be incorrect.
xxxxxxxxxx
public synchronized void synchronizePlan(Plan plan, boolean shouldReplan)
{
....
try
{
setQueueEntitlement(planQueueName, ....);
}
catch (YarnException e)
{
LOG.warn("Exception while trying to size reservation for plan: {}",
currResId,
planQueueName,
e);
}
....
}
The planQueueName
variable isn't used when logging. In this case, either too much is copied or the format string isn't finished. But I still blame the good old copy-paste, which in some cases is just great to shoot yourself in the foot.
Test Code
Hadoop Common
V6072 Two similar code fragments were found. Perhaps, this is a typo and 'allSecretsB' variable should be used instead of 'allSecretsA'.
TestZKSignerSecretProvider.java(316), TestZKSignerSecretProvider.java(309), TestZKSignerSecretProvider.java(306), TestZKSignerSecretProvider.java(313).
xxxxxxxxxx
public void testMultiple(int order) throws Exception {
....
currentSecretA = secretProviderA.getCurrentSecret();
allSecretsA = secretProviderA.getAllSecrets();
Assert.assertArrayEquals(secretA2, currentSecretA);
Assert.assertEquals(2, allSecretsA.length); // <=
Assert.assertArrayEquals(secretA2, allSecretsA[0]);
Assert.assertArrayEquals(secretA1, allSecretsA[1]);
currentSecretB = secretProviderB.getCurrentSecret();
allSecretsB = secretProviderB.getAllSecrets();
Assert.assertArrayEquals(secretA2, currentSecretB);
Assert.assertEquals(2, allSecretsA.length); // <=
Assert.assertArrayEquals(secretA2, allSecretsB[0]);
Assert.assertArrayEquals(secretA1, allSecretsB[1]);
....
}
And again the V6072. Look closely at variables allSecretsA
and allSecretsB
.
V6043 Consider inspecting the 'for' operator. Initial and final values of the iterator are the same. TestTFile.java(235).
xxxxxxxxxx
private int readPrepWithUnknownLength(Scanner scanner, int start, int n)
throws IOException {
for (int i = start; i < start; i++) {
String key = String.format(localFormatter, i);
byte[] read = readKey(scanner);
assertTrue("keys not equal", Arrays.equals(key.getBytes(), read));
try {
read = readValue(scanner);
assertTrue(false);
}
catch (IOException ie) {
// should have thrown exception
}
String value = "value" + key;
read = readLongValue(scanner, value.getBytes().length);
assertTrue("values nto equal", Arrays.equals(read, value.getBytes()));
scanner.advance();
}
return (start + n);
}
A test that's always green? =). A part of the loop, which is a part of the test itself, will never get executed. This is due to the fact that the initial and final counter values are equal in the for
statement. As a result, the condition i < start
will immediately become false, leading to such behavior. I ran through the test file and lept to the conclusion that i < (start + n)
had to be written in the loop condition.
MapReduce
V6007 Expression 'byteAm < 0' is always false. DataWriter.java(322)
xxxxxxxxxx
GenerateOutput writeSegment(long byteAm, OutputStream out)
throws IOException {
long headerLen = getHeaderLength();
if (byteAm < headerLen) {
// not enough bytes to write even the header
return new GenerateOutput(0, 0);
}
// adjust for header length
byteAm -= headerLen;
if (byteAm < 0) { // <=
byteAm = 0;
}
....
}
The condition byteAm < 0
is always false. To figure it out, let's give the code above another look. If the test execution reaches the operation byteAm -= headerLen
, it means that byteAm >= headerLen
. From here, after subtraction, the byteAm
value will never be negative. That's what we had to prove.
HDFS
V6072 Two similar code fragments were found. Perhaps, this is a typo and the normalFile
variable should be used instead of normalDir
. TestWebHDFS.java(625), TestWebHDFS.java(615), TestWebHDFS.java(614), TestWebHDFS.java(624)
xxxxxxxxxx
public void testWebHdfsErasureCodingFiles() throws Exception {
....
final Path normalDir = new Path("/dir");
dfs.mkdirs(normalDir);
final Path normalFile = new Path(normalDir, "file.log");
....
// logic block #1
FileStatus expectedNormalDirStatus = dfs.getFileStatus(normalDir);
FileStatus actualNormalDirStatus = webHdfs.getFileStatus(normalDir); // <=
Assert.assertEquals(expectedNormalDirStatus.isErasureCoded(),
actualNormalDirStatus.isErasureCoded());
ContractTestUtils.assertNotErasureCoded(dfs, normalDir);
assertTrue(normalDir + " should have erasure coding unset in " + ....);
// logic block #2
FileStatus expectedNormalFileStatus = dfs.getFileStatus(normalFile);
FileStatus actualNormalFileStatus = webHdfs.getFileStatus(normalDir); // <=
Assert.assertEquals(expectedNormalFileStatus.isErasureCoded(),
actualNormalFileStatus.isErasureCoded());
ContractTestUtils.assertNotErasureCoded(dfs, normalFile);
assertTrue( normalFile + " should have erasure coding unset in " + ....);
}
Believe it or not, it's a V6072 again! Just follow the variable normalDir
and normalFile
.
V6027 Variables are initialized through the call to the same function. It's probably an error or un-optimized code. TestDFSAdmin.java(883), TestDFSAdmin.java(879).
xxxxxxxxxx
private void verifyNodesAndCorruptBlocks(
final int numDn,
final int numLiveDn,
final int numCorruptBlocks,
final int numCorruptECBlockGroups,
final DFSClient client,
final Long highestPriorityLowRedundancyReplicatedBlocks,
final Long highestPriorityLowRedundancyECBlocks)
throws IOException
{
/* init vars */
....
final String expectedCorruptedECBlockGroupsStr = String.format(
"Block groups with corrupt internal blocks: %d",
numCorruptECBlockGroups);
final String highestPriorityLowRedundancyReplicatedBlocksStr
= String.format(
"\tLow redundancy blocks with highest priority " +
"to recover: %d",
highestPriorityLowRedundancyReplicatedBlocks);
final String highestPriorityLowRedundancyECBlocksStr = String.format(
"\tLow redundancy blocks with highest priority " +
"to recover: %d",
highestPriorityLowRedundancyReplicatedBlocks);
....
}
In this fragment, highestPriorityLowRedundancyReplicatedBlocksStr
and highestPriorityLowRedundancyECBlocksStr
are initialized by the same values. Often, it should be so, but this is not the case. Variables' names are long and similar, so it's not surprising that the copy-pasted fragment wasn't changed in any way. To fix it, when initializing the highestPriorityLowRedundancyECBlocksStr
variable, the author has to use the input parameter highestPriorityLowRedundancyECBlocks
. In addition, most likely, they still need to correct the format line.
V6019 Unreachable code detected. It is possible that an error is present. TestReplaceDatanodeFailureReplication.java(222).
xxxxxxxxxx
private void
verifyFileContent(...., SlowWriter[] slowwriters) throws IOException
{
LOG.info("Verify the file");
for (int i = 0; i < slowwriters.length; i++) {
LOG.info(slowwriters[i].filepath + ....);
FSDataInputStream in = null;
try {
in = fs.open(slowwriters[i].filepath);
for (int j = 0, x;; j++) {
x = in.read();
if ((x) != -1) {
Assert.assertEquals(j, x);
} else {
return;
}
}
} finally {
IOUtils.closeStream(in);
}
}
}
The analyzer complains that the i++
counter in the loop can't be changed. This means that in the for (int i = 0; i < slowwriters.length; i++) {....}
loop no more than one iteration will execute. Let's find out why. In the first iteration, we link the thread with the file that corresponds to slowwriters[0]
for further reading. Next, we read the file contents via the loop for (int j = 0, x;; j++)
.
- If we read something relevant, we compare the read byte with the current value of the j counter though
assertEquals
(if the check is not successful, we fail out of the test). - If the file is checked successfully, and we get to the end of the file (we read -1), the method exits.
Therefore, whatever happens during the check of slowwriters[0]
, it won't get to checking subsequent elements. Most likely, a break
has to be used instead of return
.
YARN
V6019 Unreachable code detected. It is possible that an error is present. TestNodeManager.java(176)
xxxxxxxxxx
public void
testCreationOfNodeLabelsProviderService() throws InterruptedException {
try {
....
} catch (Exception e) {
Assert.fail("Exception caught");
e.printStackTrace();
}
}
In this situation, the Assert.fail
method will interrupt the test and the stack trace won't be printed in case of an exception. If the message about the caught exception is enough here, it's better to remove printing the stack trace to avoid confusion. If printing is necessary, you just need to swap them.
Many similar fragments have been found:
- V6019 Unreachable code detected. It is possible that an error is present. TestResourceTrackerService.java(928).
- V6019 Unreachable code detected. It is possible that an error is present. TestResourceTrackerService.java(737).
- V6019 Unreachable code detected. It is possible that an error is present. TestResourceTrackerService.java(685).
V6072 Two similar code fragments were found. Perhaps, this is a typo and the publicCache
variable should be used instead of usercache
.
TestResourceLocalizationService.java(315),
TestResourceLocalizationService.java(309),
TestResourceLocalizationService.java(307),
TestResourceLocalizationService.java(313)
xxxxxxxxxx
public void testDirectoryCleanupOnNewlyCreatedStateStore()
throws IOException, URISyntaxException
{
....
// verify directory creation
for (Path p : localDirs) {
p = new Path((new URI(p.toString())).getPath());
// logic block #1
Path usercache = new Path(p, ContainerLocalizer.USERCACHE);
verify(spylfs).rename(eq(usercache), any(Path.class), any()); // <=
verify(spylfs).mkdir(eq(usercache), ....);
// logic block #2
Path publicCache = new Path(p, ContainerLocalizer.FILECACHE);
verify(spylfs).rename(eq(usercache), any(Path.class), any()); // <=
verify(spylfs).mkdir(eq(publicCache), ....);
....
}
....
}
And finally, V6072 again =). Variables to view the suspicious fragment: usercache
and publicCache
.
Conclusion
Hundreds of thousands of lines of code are written in development. Production code is usually kept clean from bugs, defects, and flaws (developers test their code, the code is reviewed, and so on). Tests are definitely inferior in this regard. Defects in tests can easily hide behind a "green tick." As you've probably got from today's warnings review, a green test is not always a successful check.Further Reading
Published at DZone with permission of Maxim Stefanov. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments