COM Files and Why They Matter
Want to learn more about COM files and why they matter? Click to learn more about COM files and why they are essential to understanding viruses and malware today.
Join the DZone community and get the full member experience.Join For Free
COM files were the first executable format used on the Windows systems, and they were the first targets for malware.
Early virus writers could really only use COM files as targets, although this changed to DOS executables and then to the PE format we're familiar with today. Many of the techniques used by virus writers were established in the early COM days and persist today.
COM files were very simple. They were, essentially, images of what a program would look like when executed. So when you compiled a COM file, you were essentially generating the exact memory image (data and code) that would be loaded into memory and executed. As a result, they were size limited (64KB) initially, and registers were set by DOS prior to execution. EXE files were a bit more complex, in that they had a relocation table, a dedicated stack pointer, and could be larger than 64KB.
COM files also start execution from a known address point in the image — 0x100. This is the location immediately following the program segment prefix, a data structcure used in DOS to store program state, that was inserted in bytes 0x0 to 0x100 in any running program.
This makes it much easier to create computer viruses. Since you know exactly where program execution will begin, all you need to do is copy the instructions from the front of a COM file to the back of the file, insert your instructions in the front, and then when you're finished, copy the original bytes back. Or, you can insert a jump instruction at 0x100, copy the code at that location to a different memory address, jump to your code (whereever it is), copy the original 0x100 back, and then jump from your code to the code back to the original file to begin execution of the original code.
The bottom line is that having a simple executable format with known intitial instruction pointer addresses makes writing COM file-infecting viruses much easier.
COM files still exist today, though they're no longer executed in the same way. Nevertheless, the approaches developed to deal with this simple format still work today, and the basic approaches have changed very little over the years. And by working with this old executable file format, we can learn why and how malware has developed in the ways it has and why we've developed the defenses we've built.
It's very difficult to jump into virus or malware analysis today as the defenses we've built and the approaches malware authors have developed to circumvent those defenses are complex and difficult to grasp. By starting with formats without these kinds of defenses, we can more easily understand what virus writers were thinking and then see how those approaches have evolved over time, giving us more insight into these techniques.
Opinions expressed by DZone contributors are their own.