Malware and Hashing: Hiding Functionality

[Total: 0 Average: 0/5]

Introduction

There is no doubt that malware coders are very good programmers. The sophistication of modern malware and its capabilities proves this point easily. However, we must also acknowledge the creativity and imagination of the malware coders, as they employ techniques to disguise what they are doing from the prying eyes of the malware analyst, intrusion detection system, or anti-virus program scanning their code.

What are some of the goals of the malware coder? Consider the following:

Spread their malware to as many systems as possible.

Remain undetected.

Make sure the malware runs every time the system boots.

Extract valuable information from the infected system.

Enable remote access / control of the infected system.

Write the malware code in a way that disguises its functionality.

In this article we will take a look at one way the malware functionality is disguised.

Windows DLLs

The Windows operating system makes heavy use of Dynamic Link Libraries (DLLs). These are essentially pre-written subroutines that provide many different kinds of functionality, from file system navigation, to file manipulation, process creation, communication, and calls into the kernel of the operating system to do almost anything.

DLLs are designed to be loaded into memory as needed and discarded from memory when no longer needed, as a way of managing memory. They evolved from the old days of scarce memory as a way to make the most out of limited RAM.

Once a DLL is loaded into memory, the functions that it exports (makes available to external programs) are accessed using a method that involves searching through a list of exported function names until the desired function name is located. Then the memory address where the function code actually resides after the DLL has been loaded into RAM is read from a corresponding entry in a different table. Figure 1 shows an example of this. Figure 1 is a flowchart illustrating the search activity showing how a function’s code address is obtained.

Figure 1: Accessing an exported DLL function

Windows itself uses DLLs to perform its various operating system chores, and there are all sorts of powerful functions available in different DLLs. In particular, KERNEL32.DLL provides functions for allocating memory and even executing processes. In short, Windows itself provides the very DLLs the malware coder needs in order to download malware and execute it.

Since DLLs are so important to the malware coder, simply following the standard programming practice of using the exported function name when calling the function would give away the functionality of the malware. These function names could be easily discovered by running the code of the malware through a program used to display any printable strings that exist within the code (which the DLL function names clearly are).

Even though the malware code could be packed / encrypted until it is loaded into RAM, and then unpacked / decrypted, a memory image of the malware’s process would still show the function name strings. So the clever malware coder must think of a different way of hiding the function names.

Strange PUSHes

Anyone familiar with the organization of a run-time stack knows that many different kinds of things are pushed onto the stack when a function is called. Consider these instructions:

PUSH EAX

CALL SUB_11C

When these instructions execute, this results in the run-time stack having a parameter (from PUSH EAX) and a return address (due to the CALL instruction) pushed onto it. Once we enter the subroutine code, EBP is also pushed and then reassigned to point to the base address of the stack frame.

The stack frame now looks like this:

Figure 2: Run-time stack frame

So, within the subroutine that was called, an instruction like

MOV EBX, [EBP+8]

copies the EAX parameter from the stack frame into EBX inside the subroutine. This is a common way of using the stack to pass parameters.

So, knowing this, a malware analyst might look at the disassembled code shown in Figure 3 and wonder what all the strange PUSHes are for.

Figure 3: A series of strange PUSH values

Now, there does not appear to be anything useful in many of the pushed values, although there are two PUSHes that have values that look like ASCII codes. In fact, they are ASCII codes and represent the string “URLMON” if you are familiar with the order in which byte values get PUSHed in the Intel 80×86 architecture. This is quite interesting, and possibly a clue as to what the other strange values represent.

Remember that the malware coder wants to disguise the functionality of the malicious code. Could it be that the strange hex values PUSHed onto the stack have some relation to the DLL function names being utilized? Let’s keep that thought in mind as we look at an interesting subroutine located somewhere else in the malware.

An Even Stranger Subroutine

While the mystery of the strange hexadecimal values continues, an even stranger subroutine shows up in the malware. It is not obvious what the purpose of the subroutine is, but there are some features of the code that hint at its purpose. Take a look at Figure 4 and see if you can determine anything useful from the subroutine instructions.

Figure 4: Subroutine with an unknown purpose

Here are some things to consider:

The entry and exit instructions manipulate the run-time stack.
The EDX register is utilized to generate some kind of 32-bit result.
Values read from memory pointed to by EAX are used to alter EDX via an XOR operation.
The subroutine loops until the value read from memory is a zero.

An experienced programmer may look at these characteristics and not see the overall purpose of the code. But sometimes intuition jumps in and creates that leap of understanding that ties everything together: this subroutine is creating a 32-bit hash value.

Upon entry EAX points in memory to a 0-terminated string that represents the name of an exported DLL function. The bytes from the string are XORed with EDX and EDX is rotated 3 bits prior to each XOR operation. The 32-bit value in EDX represents the hash value of the memory string.

Figure 5: Mystery revealed: 32-bit hash generator

What is the purpose of this code? It is used to build a hash value that represents the exported function name from a DLL. The purpose of the hash value is to hide the name of the exported function from anyone performing reverse engineering or malware analysis on the code. For example, running the code of a typical WIN32 program through the Strings program will reveal the text-based exported function list, an example of which is shown here:

ReadFile

CreateFileA

GetProcessHeap

FreeLibrary

GetCPInfo

GetACP

GetOEMCP

VirtualQuery

InterlockedExchange

MultiByteToWideChar

GetStringTypeA

GetStringTypeW

To disguise the exported function name the malware writer uses the hash value in place of the function name. To test this theory, a simple C program was written that duplicates the functionality of the hashing subroutine, using the same XOR and 3-bit rotate operations. A file containing all of the exported function names from KERNEL32.DLL is passed to the C program, which generates the 32-bit hash values for each function name. Here is a small portion of the results:

Figure 6: Hash values for KERNEL32.DLL exported function names

The hex value 410E2A69 is the malware writer’s way of hiding the name of the KERNEL32.DLL exported function MoveFileA being called. Going through the malicious assembly language, all the hash values being PUSHed were located and looked up. Some values did not match the list generated from KERNEL32.DLL, so exported function names from URLMON.DLL and NTDLL.DL were used as well.

Here are the corresponding DLL functions, in the order they are called from the code:

Just seeing this sequence of DLL calls reveals the overall intent of the malicious code. A file is downloaded from the Internet (via URLDownloadToCacheFileA) and executed (with WinExec).

Mystery Revealed

Having discovered all these secrets and tricks, we can now determine what the main portion of the machine language payload does:

Figure 7: Strange PUSH values finally revealed

So, with a little effort and inspiration, the malware coder’s efforts to disguise what was really going on in the malicious code were defeated and its functionality revealed. The fascinating thing to think about here is that the malware coder took the technique of hashing and re-purposed it for something else. This is why creativity and imagination is important. It is important to the malware coder and it is also important to the malware analyst.

Author Bio

James L. Antonakos is a SUNY Distinguished Teaching Professor of Computer Science at Broome Community College, in Binghamton, NY, where he has taught since 1984. James teaches both in the classroom and online in classes covering electricity and electronics, computer networking, computer security and forensics, information management, and computer graphics and simulation. James is the designer and director of the new 2-year AAS Degree in Computer Security and Forensics at Broome Community College. James is also an IT security consultant for Excelsior College and an online instructor for Champlain College and Excelsior College. James has extensive industrial work experience as well in electronic manufacturing for both commercial and military products, particularly in flight control computer technology for Navy aircraft. James also consults with many local companies in the areas of computer networking and information security. James is the author or co-author of over 40 books on computers, networking, electronics, and technology. He is also A+, Network+, and Security+ certified by CompTIA and ACE certified in computer forensics by AccessData. James is a frequent presenter at the annual New York State Cyber Security Conference, the founder of WhiteHat Forensics, and an NCI Fellow for the National Cybersecurity Institute in Washington, DC.

Malware and Hashing: Hiding Functionality

Leave a Reply Cancel reply