Oski Stealer: unpacking and string decryption

PUBLISHED ON / 5 MIN READ — MALWARE

Introduction

As part of the Zero2Auto malware analysis course, Daniel @0verfl0w_ is providing biweekly malware analysis challenges for practicing and getting more hands-on experience. This time, the challenge was to unpack the initial Oski Stealer .NET layer, and then afterwards identifying string encryption in the sample and develop a script to automate string decryption.

Sample: 707adf85c61f5029e14aa27791010f2959e70c0fee182fe968d2eb7f2991797b

Unpacking .NET layer

After identifying the first layer as a .NET PE binary, we can use dnSpy for debugging and disassembly. Using the dnSpy debugger debugging the binary is fairly straightforward, even though finding good breakpoints is challenging due to heavy code obfuscation. After being disappointed about finding a PE header, but that just being another module being loaded at runtime, module breakpoints helped a lot to break on modules being loaded without the need to manually debug through module loading. The initial layer loads three additional modules from memory.

Modules loaded

After some additional debugging and stepping through the program, always keeping an eye on the local variables, the program will show a long byte array. Byte arrays are always interesting due to the possibility of finding additional modules or even the payload - and seeing that the array starts with the bytes 4D 5A - which is the magic number (MZ) for PE files - makes this array even more interesting.

Byte array

Saving the file to disk and inspecting it with pestudio shows that it is in fact a 32-bit Microsoft Visual C++ executable.

Payload

SHA256: 101D608D893ED193835CD04B4D06A79960032FB703C5268E910DD4880C53A992

String decryption

For analysing string decryption in the payload binary I used Cutter. The first thing we can notice while looking at the strings in the payload executable is a very large number of base64 encoded strings.

Encoded strings

Unfortunately, the strings are not just base64 decoded, but on top of that also encrypted. Ben Cohen at Cyberark has done an Oski Stealer analysis in 2021 (link) and reported on base64 decoding in combination with RC4 encryption. This previous analysis helps us a lot with forming a hypothesis for our own sample, which we have to confirm before writing a script to automatically decrypt all strings.

When looking at the first function which is called in the main function we already notice a pattern: base64 strings are being pushed on the stack and then passed as an argument to another function.

String setup

Therefore we will investigate this function and whether it is the decryption function - but it looks very much like it is already. The function returns a memory address in eax which then, as we can see when putting a breakpoint at the end of the function, has the actual decrypted string:

String decrypted

Of course for a RC4 string decryption the program needs an encryption key. Unless the program gets the key from a command and control server, the key has to be included in the binary or at least be generated at runtime. At the beginning of the string setup routine, before all the strings are passed to the actual decryption function, we can see two memory locations being accessed.

String access

One of these is loading the string “alazlfa.cf”. What looks like an encrypted string or maybe a key (because it’s not followed by a decryption function call) is in fact a domain (VirusTotal) used by the malware to download additional libraries. The other string is indeed the password as we will verify later.

Key

Regarding the actual decryption algorithm which, from having done our research, we suspect to be RC4. Having a suspicion about the key and algorithm we can of course verify that theory immediately, but first I wanted to check if I could at least roughly identify the RC4 function and confirm that suspicion without testing it in an external tool. Knowing that the string decryption function does base64 decoding and the decryption before returning a pointer to the decrypted string, going bottom up through the decryption function seemed like a promising approach to identify the decryption-subroutine. And indeed, in one of the functions we can see a pattern which is very typical for RC4: Two loops to 256 (0x100). RC4 doesn’t have any relevant constants giving away the algorithm, but it uses a 256 byte substitution box and therefore shows corresponding loops.

RC4 loops

Using Cyberchef we can now confirm our suspicion about the key and algorithm by successfully decrypting on of the strings:

Cyberchef decryption

String decryption with Rizin and rzpipe

Of course nobody wants to copy and paste all the strings into Cyberchef but rather have this information available in an easier way or in the context of a disassembler. Thereofore part of the challenge was to write such a script. I’ll quickly discuss my solution in this section.

My script is based on Rizin using rzpipe - which allows to interact with Rizin and using all functionality. First, it uses two hardcoded values for the decryption function and the key (both could technically be also detected during runtime):

decryption_function = 0x00422f70
key_addr = 0x0042a074

Next, rzpipe is used to open the file in Rizin and analyse it. It either handles a filename as commandline argument, or if not provided to be executed interactively in Rizin/Cutter (no argument for “open”).

if len(sys.argv) == 2:
    pipe = rzpipe.open(sys.argv[1])
else:
    pipe = rzpipe.open()
pipe.cmd('aa')

Afterwards it reads the key from the hardcoded address:

key = pipe.cmdj('pszj @ %s' % key_addr)['string']

Then, the script iterates over all the references to the decryption function, searches for the argument in the instruction right before the function call, and then decodes and decrypts said argument. For every decrypted string a comment is set at the decryption function call.

In total, the script looks like this:

import sys
import rzpipe
import base64
from Crypto.Cipher import ARC4

decryption_function = 0x00422f70
key_addr = 0x0042a074

if len(sys.argv) == 2:
    pipe = rzpipe.open(sys.argv[1])
else:
    pipe = rzpipe.open()

pipe.cmd('aa')

# Read key from memory (pszj = print string zeroterminated and parse json)
key = pipe.cmdj('pszj @ %s' % key_addr)['string']
print("Key: " + key)
print("--------------------------------------------------")

# Iterate references to the decryption function
for xref in pipe.cmdj('axtj %d' % decryption_function):
 
    # Find argument push before function call xref (print opcode before opcode @ address)
    argument_push = pipe.cmdj('pdj -1 @ %d' % (xref['from']))

    # Get address of encrypted string
    encrypted_string_addr = argument_push[0]['opcode'][5:]
    
    # Read encrypted string from memory (pszj = print string zeroterminated and parse json)
    encrypted_string = pipe.cmdj('pszj @ %s' % encrypted_string_addr)['string']
        
    # Decode and decrypt
    rc4 = ARC4.new(key.encode("utf8"))
    decrypted_string = rc4.decrypt(base64.b64decode(encrypted_string))
    
    print(encrypted_string + " @ " + encrypted_string_addr + ": " + decrypted_string.decode("utf-8"))
    pipe.cmdj('CCu %s @ %d' % (decrypted_string, xref['from']))

This way the script can be run standlone, within Rizin or within Cutter.

Standalone:

Standalone string decryption

Rizin:

rizin.exe file.exe
. rzpipe_oski_decrypt.py

Rizin string decryption

Cutter:

. rzpipe_oski_decrypt.py

Cutter string decryption

The script is available on Github: oski_string_decrypt.py

A full list of extracted strings can be found here: oski_strings.txt

TAGS: MALWARE