Halstead Metrics: For When Program Size Matters
Using Halstead Metrics, you can start to make decisions on working with a program, such as how hard it will be and how long it may take.
Join the DZone community and get the full member experience.
Join For FreeThe most common metric to measure program size is Software Lines of Code (SLOC). This metric has remained with us due to its simplicity. But what does it really tell us about the program and the application? Because of all the variables involved (no pun intended), it can’t give us a real comparison with other programs and applications. Straight or “physical” SLOC can include all the comments, spaces, and definitions.
There are basically two things you need to know about a program for a better comparison: how many statements and how many variables. In an attempt to remedy this, “logical” SLOC is often used. It measures the number of executable statements. Since you will know how much of the program is involved with logic, it gets you closer to understanding the true size of a program to aid in comparisons. By using this metric, you will know how many statements the developer will have to review to make a change, which helps you estimate the time it will take to evaluate and carry out the task.
Maurice Halstead introduced some new metrics in 1977. Instead of just counting the lines, you look at the actual verbs and variables used in the program. The calculation begins with a count of the unique operators and operands. It’s a count of all the distinct verbs and variables usedin the program. This gives you an idea of how many unique verbs there are, but perhaps more importantly, it will give you an idea of the variables you have in the program. These numbers are added together to come up with the vocabulary. Next, you count the total number of verbs and variables used. The exercise will include, for example, each occurrence of a variable that is referenced multiple times in the program. This will give you the total operators and operands counts, which are combined to produce the length.
Halstead Metrics values are:
- Unique operators (n1): The unique or distinct number of verbs and elements other than data elements occurring in your program. Operators are syntactic elements such as +, -, <, >.
- Unique operands (n2): The unique or distinct number of data elements occurring in your program. Operands consist of literal expressions, constants, and variables.
- Total operators (N1): The total number of verbs and elements other than data elements occurring in your program. Paired operators such as BEGIN .. END, DO .. UNTIL, FOR .. NEXT are treated as a single operator.
- Total operands (N2): The total number of data elements occurring in your program.
- Vocabulary (n): The number of unique operators and operands in your program, n, computed n1+n2. This is an estimation of the size of the program’s vocabulary (the number of things that must be known to understand the program).
- Length (N): The length of your program computed N1+N2.
This is the core, but there are many other metrics based on these building blocks. Before we get to those, let’s fully understand how these can be used to compare programs.
TRIMAIN |
CWXTCOB |
PP110 |
PDA008 |
|
Lines of Code |
56 |
701 |
2655 |
3043 |
Comment Lines |
0 |
164 |
813 |
786 |
Statements |
15 |
200 |
452 |
685 |
Unique Operators [n1] |
10 |
14 |
16 |
34 |
Unique Operands [n2] |
23 |
204 |
563 |
472 |
Total Operators [N1] |
16 |
201 |
471 |
665 |
Total Operands [N2] |
39 |
522 |
1194 |
1717 |
Vocabulary |
33 |
218 |
579 |
506 |
Length |
55 |
723 |
1,665 |
2382 |
These four programs (TRIMAIN, CWXTCOB, PP110, PDA008) progress in size from a very small one to a relatively large one. Straight Lines of Code shows PDA008 being the largest, but let’s keep looking. The next metric here is comment lines, so we can see how well documented the programs are. No surprises here — the larger the program, the more comments that will be needed. You want to look at the ratio of comments to statements to get a real feel of how well a program is documented. The count of statements gives us a better feel of size, and here we can see that PDA008 is still the largest.
Now, we can look at the Halstead Metrics. The unique operators are not much different for the first three, but much larger for the last. But, when we look at unique operands, which is actually a count of all the variables used in the program, not merely defined, we see that PP110 has the most variables to understand. The unique counts give us a base of what is in the program, but not actual size. To see how often these are used, we look at the total counts. Here, we can start to see the complexity grow, but the important metrics — vocabulary and length — are next. It is often said that vocabulary is the number of things that must be known to understand the program. I think of it as the number of things I have to keep in my head to know what is going on in the program. This is the base number, using unique operators. For TRIMAIN, it is 33 things. I can handle that. CWXTCOB has 218 so it is more challenging to handle, but the next two are interesting. Because there are more variables used in PP110, I need to know more to understand it than PDA008. This is something that is not revealed in LOC or Statements. Lastly, there is length, where we see the usual progression in the programs.
So, looking at these metrics, what have we learned? Hopefully that there is more to understanding and comparing programs that just their apparent size. Using the Halstead vocabulary and length provides more reliable metrics for judging what is important: how much you need to understand to work on a program. From this, you can start to make decisions on working with a program, such as how hard it will be and how long it may take.
Published at DZone with permission of , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
HashMap Performance Improvements in Java 8
-
Getting Started With the YugabyteDB Managed REST API
-
Micro Frontends on Monorepo With Remote State Management
-
Unlocking the Power of AIOps: Enhancing DevOps With Intelligent Automation for Optimized IT Operations
Comments