Let's Talk ASM - String Concatenation
Not a lot of developers today know Assembly, which - regardless of your professional line of work - is a good skill to have. Assembly teaches you think on a much lower level, going beyond the abstracted out layer provided by many of the high-level languages. Today we're going to look at a way to implement a string concatenation function.
Specifically, I want to follow the following procedure for building the final result:
- Ask the user for input
- Append a crLf (carriage return + line feed) to the entered string
- Append the entered string to the existing composite string
- Follow back from step 1 until the user enters a terminator character
- Display the composite string
.586 .MODEL FLAT INCLUDE io.h ; header file for input/output cr equ 0dh ; carriage return character Lf equ 0ah ; line feed .STACK 4096 .DATA prompt BYTE cr, Lf, "Original string? ",0 resTitle BYTE "Final Result",0 stringIn BYTE 1024 DUP (?) stringOut BYTE 1024 DUP (?) linefeed BYTE cr, Lf
Notice the reference to io.h - at this point you want a way to receive user input and display output data through standard WinAPI channels, and io.h does just that. Some ASM experts might argue that it is not a good idea to use WinAPI hooks in the context of a "pure" assembly program, for educational purposes, but in this situation the focus is on the inner workings of a different function.
NOTE: The program is adapted to the scenario where the execution of the string concatenation function is the sole purpose. As you will get a hang of the execution flow, you can easily adapt it to a scenario where some of the registers can be re-used.
Let's start by clearing the ECX and EDX registers:
.CODE _MainProc PROC ; clear the ECX and EDX registers because these will ; be used for length counters and sequential increments. xor ECX, ECX xor EDX, EDX
Once the strings will be entered by the user, I will need to find out the length of the string to append, in order to have a correct sequential memory address. Now I need to get user input:
INPUT_DATA: ; prompt the user to enter the string he ultimately ; wants appended to the main string buffer. input prompt, stringIn, 40 ; read ASCII characters ; make sure that the string doesn't start with the $ character ; which would automatically mean that we need to terminate the ; reading process cmp stringIn, '$' je DONE lea EAX, [stringOut + EDX] ; destination address push EAX ; push the destination on the stack lea EAX, [stringIn] ; source address push EAX ; push the source on the stack call strcopy ; call the string copy procedure
Once the string is entered, I can check whether the terminator character - "$", was used.
One of the great things about the cmp instruction is the fact that it checks the starting address of the entered string, therefore I can simply compare the entered data with a single character.
In case the character is encountered, the program flow terminates at DONE, where the output is displayed:
DONE: ; output the new data. output resTitle, stringOut mov EAX, 0 ret
strcopy is an internal procedure that will simply copy a string from one memory address to another:
strcopy PROC NEAR32 push EBP mov EBP, ESP push EDI push ESI pushf mov ESI, [EBP+8] mov EDI, [EBP+12] cld whileNoNull: cmp BYTE PTR [ESI], 0 je endWhileNoNull movsb jmp whileNoNull endWhileNoNull: mov BYTE PTR [EDI], 0 popf pop ESI pop EDI pop EBP ret 8 strcopy ENDP
To make sure that the next string is properly appended, I need to find out the length of the previous one, for a correct memory address offset:
; let's get the length of the current string - move it ; to the proper register so that we can perform the measurement mov EDI, EAX ; find the length of the string that was just entered sub ECX, ECX sub AL, AL not ECX cld repne scasb not ECX dec ECX add EDX, ECX
REPNE SCASB is used for an in-string iterative null terminator search (you can read more about it here). It will decrement ECX for each character.
; we need to append the linefeed (crLf) to the string so we apply ; the same string concatenation procedure for that sequence. lea EAX, [stringOut + EDX] ; destination address push EAX ; first parameter lea EAX, [linefeed] ; source push EAX ; second parameter call strcopy ; call string copy procedure mov EDI, EAX ; we know that the crLf characters are 2 entities, therefore ; increment the overall counter by 2. add EDX, 2 ; ask for more input because no terminator character was used. jmp INPUT_DATA
Once the basic input data is processed, I can append the crLf sequence and increment EDX for the proper offset, after which the program flow is being reset from the point where the user has to enter the next character sequence.