Interesting results on self-decryptor

**amadmonk** · 05-15-2009

I've been doing some research on building a self-decrypting/encrypting injected class.

I've been wondering if it would be possible to self-encrypt/decrypt the binaries injected before and after calls into the injected code. At first I thought the process would be cumbersome and slow, but my tests are showing otherwise. Even on a slow, single-core machine, a simple rotating XOR encryption (hardly "encryption," but I'm assuming Blizzard isn't going to subject my machine to forensics analysis) can encrypt/decrypt roughly a billion bytes per second. That's fast enough to be done on every single call -- every single frame, too -- without a noticeable client slowdown, unless your injected code gets huge.

If you put a simple call-gate into a non-standard section (obviously the encryptor/decryptor needs to be unencrypted; self-decrypting functions are a bitch) and gated everything through that call gate, and had the call gate do the encrypt/decrypt upon enter/exit of the injected DLL, you'd have basically an undetectable binary. It would be obvious that SOMETHING was there, but not what it was.

The weaknesses of the approach are:

1) Some slowdown in the binary due to hashing large blocks of memory on a regular basis. This can be mitigated by reducing the size to be hashed (for instance, we can use the file alignment instead of memory alignment on the sections, and trim up to 0xE00 bytes per section). Also, the xor operation is insanely fast, so I don't think this will really be a problem.
2) The encryptor/decryptor is hashable. However, I anticipate implementing this in two parts, both using raw ASM (which implies limited portability; I know Cypher will complain about that

). In the end, the ASM shouldn't look much different from a million similar blocks of ASM, so it will be extremely hard to hash without inducing false positives. Also, since the encryption/decryption will essentially make the entire REST of the binary unhashable, ultimately this will significantly decrease the hashable footprint of the app. You might even be able to skip hiding the module VA since (unless you do something really stupid like name it WoWHack.dll) it will just look like a DLL with garbage in it.
3) It will take some care and work to make this thread-safe. There are two primary ways to do this. One is to use a spinlock to essentially make the injected DLL single-threaded (additional threads would just spin until the first call exits). This is relatively simple to implement, but it does have some limitations -- if your detoured thread took a long time to execute, a second thread could wait for a long time. If the threads were dependant upon each other in some way (or worse, if you had a re-entrant thread), you could deadlock. The other alternative is to implement a "use count" mechanism, again guarded by a spinlock. Essentially each thread would increment the usecount on entry, and decrement it on exit. When the usecount incremented to 1, the thread that triggered that would know it needed to decrypt the rest of the PE. Similarly, when the usecount decremented to 0, the thread that triggered that would re-encrypt the PE. This is a little more performant (no long waits just for a second thread), and it's also safe against reentrancy (a reentrant thread would just increment, and decrement, the usecount).
3a) The thread safe mechanisms above could be hashable if not implemented carefully. However, I'm guessing that there are at least a few lock cmpexchg's in WoW already

4) Although I say this makes the module "unhashable," it's not 100%. There's always the possibility that a second thread could run Warden or something while the first thread is in a hooked routine (and thus the code is plain and hashable). However, this is a very, very small chance, and I don't even know if Warden is capable of running on other threads.

So, this is a fair amount of work to make happen. Here's what needs to happen:

1) Injector code (simple, and done)
2) After injection, the injector process needs to:
2a) Open the remote module and read the section table.
2b) Process each injected call into a trampoline for encryption/decryption (this implies that the injector process would know the list of injected calls somehow; I was thinking that __declspec(dllexport) would work, since all the exports will be encrypted too, thus you're not more detectable)
2c) Manually build the encryptor/decryptor via injected ASM. This sounds hard, but it's not really; you can hash up a random key with something like GetTickCount() (thus preventing key-based hashing by Warden), inject the calls to do the thread-barrier stuff, the decryption, the actual call, and the decryption.
2d) Unlink the module from the loaded module list (this is extra stealth and it also prevents the app from crashing when someone tries to walk the PEB LDR lists and your DLL header is encrypted). For extra security, wipe out the PE header and the import/export tables (now that you're done with them).
2e) Encrypt the sections (pretty much everything except your decryption code, if you decide to keep it in a custom code section).
2f) CreateRemoteThread on the decryptor call gateway, passing in a pointer to the "real" init code as well as the table of encryption entry trampolines built up in 2b.

Once this is done, the remote DLL is injected, encrypted, and running. The init code (after the initial decryption) will set up hooks as usual (IAT, detours, they should all work), but instead of hooking directly to the function in your injected module, they should hook to the appropriate offset in the table of encryption entry points from 2b.

Et voila, a self-encrypting/decrypting injected DLL.

If I can make it work, it will make it nigh impossible for Warden to signature-hash your encrypted code. In fact, if I can make it work, it shouldn't be tightly coupled to any individual code; it will be reusable and make Blizzard's work much, much harder. Warden will be able to detect that SOMETHING is in your process (unless you detour/hook VirtualQuery etc.), but they won't know what, and even a memory snapshot -- as long as the code was in the encrypted state -- will just look like a virus. Plausible deniability at its best.

So right now, most of this is in POC stage, but I'm hoping to have something working within a week (unless I can get some of the heavy hitters to help me with important chunks, like the mechanism to list the hook functions needing call gate trampolines, or the thread-safe mechanisms, etc.)

One of the reasons I'm even posting about this is to ask for help; unlike the usual "security through obscurity" techniques, if I can make this work, it will make the code theoretically unhashable (since I'm using polymorphic virus techniques), except for the encrypt/decrypt call gate (which I have a few tricks up my sleeve to tweak, including using metamorphic techniques, again from VX tech). This means that even if Warden devs read this thread, they won't be able to counter this technique without obtaining a memory dump and running forensics analysis. Any real reverse engineer could counter it in minutes, but the point is that Blizzard doesn't have the time and money to do forensics analysis on a bunch of random boxes, especially if there's nothing tripping hash signature detection routines in Warden to key Blizz to the fact that THIS box is special. It becomes an essentially unbreakable Catch-22 for them; either they have to go super hardcore on their detection, probably banning lots of innocents, probably crashing a lot, and probably running SUPER slow due to network transfers of HUGE amounts of memory data for analysis... OR they simply won't be able to counter it.

By the way, if anyone steals this idea and implements it before I can, at least give a credit to AMM for the idea

I'm open to anyone pointing out flaws I've missed in my gedankenversuch here.

**amadmonk** · 05-15-2009

Oh one note, this will not make you undetectable. Your memory can still be seen by Warden (unless you hook it). Your threads (if you create any) can still be seen. Callstack analysis will still work (although I know it's not implemented). Any global objects you create (say, shared memory sections or named pipes) are still detectable. Windows and processes are detectable. And of course, hooks are detectable (and as Cypher chided me in another thread, reversible, which isn't as bad as detection, but still sucks).

But it will make your injected code -- any code you protect via this encrypt/decrypt mechanism -- theoretically unhashable by Warden. Which means that while you might get disconnected, you're very unlikely to be banned (by Warden).

If you brag about how l33t your bot is in-game, or you fail to protect code in the encrypt/decrypt call gate, or you inject into the wrong place, that's not my problem...

**Shynd** · 05-15-2009

Upon loading the injector, have it randomly change the location and method of encryption/decryption used and THEN inject the DLL. Since the transformation routine is the only part that won't be getting encrypted, it should be changed at least a little every time the hack is loaded so that they can't simply obtain a hash of that.

Another idea would work alongside an implementation of ManualMap: have the injector randomly insert meaningless ASM into every routine present in the library before injection. Things like mov eax, eax or xor swapping, shit that has no effect at all on the execution flow but will mess up whatever hashes they may be taking of injected modules. Sure, this will increase the library in size substantially, but the performance hit should be minimal (unless I'm missing something, which is altogether possible). I guess you'd also have to scramble offsets in the data sections--relocate pointer and then brute-force search for all instances of that pointer?--for it to be even remotely useful, and your polymorphic code idea is probably more workable, seeing as the only two benefits (with a ton of downsides) to this idea is that it a) only has to run at startup and b) will never allow warden to catch your code unencrypted. Eh, the more I think about it, the dumber this idea sounds when compared with yours.

**Cypher** · 05-15-2009

Haha you've put a lot of thought into this, nice. Warden runs in the same thread as WoWs render thread, so you should be safe on that front.

I assume you're using a different xor value for each instance of your hack. What I'm curious about though, is how do you intend to stop Blizzard retrieving the 'key' (nitpicker's corner: yes its not a key, we know) at runtime?

Also, whats to stop blizzard hashing for example the stuff in the .rdata segment of your PE file, or another segment that isn't code but that also shouldn't change?

Very interesting project though. Good luck with it.

P.S. Kinda tired atm, I'll put some more thought into it and maybe come back with a more detailed look into it later.

**Cypher** · 05-15-2009

Originally Posted by Shynd

Upon loading the injector, have it randomly change the location and method of encryption/decryption used and THEN inject the DLL. Since the transformation routine is the only part that won't be getting encrypted, it should be changed at least a little every time the hack is loaded so that they can't simply obtain a hash of that.

Another idea would work alongside an implementation of ManualMap: have the injector randomly insert meaningless ASM into every routine present in the library before injection. Things like mov eax, eax or xor swapping, shit that has no effect at all on the execution flow but will mess up whatever hashes they may be taking of injected modules. Sure, this will increase the library in size substantially, but the performance hit should be minimal (unless I'm missing something, which is altogether possible). I guess you'd also have to scramble offsets in the data sections--relocate pointer and then brute-force search for all instances of that pointer?--for it to be even remotely useful, and your polymorphic code idea is probably more workable, seeing as the only two benefits (with a ton of downsides) to this idea is that it a) only has to run at startup and b) will never allow warden to catch your code unencrypted. Eh, the more I think about it, the dumber this idea sounds when compared with yours.

To actively insert or remove code on the fly you would have to relocate all relative addresses would you not?

If you add 3 bytes to the top of a function then you're going to throw off relative jumps/calls/etc.

Also, manually mapping is highly poc. It's extremely difficult to get working 'properly'.

**Shynd** · 05-15-2009

Yeah, that's where the multiple necessary passes come in, which make it not only time-consuming and CPU-consuming but a pain in the ass to actually implement. You'd also need to know where each subroutine begins and ends and, let's face it, parsing for that sort of thing is hit and miss. What do you do, loop through until 0xC3 0xCC? Yeah, that's gonna work.

For it to work, you'd need to know exactly what you want to encrypt and decrypt--only your code or all embedded libraries as well?--exactly where each routine began and ended, keep track of where they are all after the insertion of bytes, and then pass back through and restore all relatives jumps and calls. Then you do it on other segments, pass back through and restore relatives, and then manually map it into the process...? Yeah, that sounds likely.

**amadmonk** · 05-15-2009

Well, rolling a couple of replies into one here:

First, the metamorphic code idea (inserting nops and mov edi, edi, and so on) isn't a bad one, I just think it's unworkable on a large dataset. I want the encryptor to avoid being too tightly bound to the encrypted code, so since my two options are to have the encryptor "know about" the code it's encrypting, or to do the (much harder) task of writing a general metamorphic toolkit, I'll go with the first, and keep the code that the encryptor "knows about" small -- just the encryption/decription shim. THAT, however, is very suitable for metamorphing, although I may need to bounce some ideas off you guys for the simplest/most compact method to do it.

Ultimately, it's the injector that does most of the work. And Cypher, yeah, I was figuring on just hashing GetTickCount in the injector and using it as the randomly-generated per-session key. That should keep the module from being even theoretically hashable unless someone at Blizzard wants to RE the code (and even then they'd have to RE it each time since all they'd see before RE is just garbage).

I've got the encryption routine mostly working in C (some minor annoyances with memory protection flags, but nothing bad). Now I just need to look at it in ASM and fine tune it, then add the thread protection stuff. My asm decryptor/encryptor needs to be side-effect free so that I know that a jmp from my call trampoline to my encryptor, to my actual call site, won't trash the stack or flags or register, or whatever. Nothing impossible, just a bit of annoying work in the assembler...

Now Cypher, here's a fun task for you: how to make this x64 portable?

I make a LOT of DWORD-assumptions in my code right now.

Oh btw Shynd: I'm just encrypting the entire PE. Right now my encryptor is in a custom code section, so I pull out the size and RVA of that section before running so that I don't actually encrypt myself

In the "real deal", this will be raw ASM that will live in its own VirtualAlloc'ed (by the injector) block, so I won't have to worry about it... just encrypt the whole damned PE from &__ImageBase to &__ImageBase + (IMAGE_OPTIONAL_HEADER).SizeOfImage. That's why I wanted to performance check the XOR encryption (I actually XOR then rotate the key at each DWORD, just so big blocks of 0's don't leave my key in the plain) -- to make sure hashing the entire PE won't hit performance. It won't; by my calcs, we could hash/unhash the PE thousands of times per second without a noticeable perf hit. XOR'ing memory is *FAST*.

**amadmonk** · 05-15-2009

Some POC stuff. Missing headers and won't compile as-is to keep the script-kiddies from stripping it before I even finish.

Right now it doesn't store the hashed value back in memory, but stepping through it in the debugger it's clear that (if the VM protect flags were right) it would.

Code:

#pragma code_seg(push, ".stubby")

static DWORD s_stubbyBaseRVA = 0;
static DWORD s_stubbySize = 0;
static PDWORD s_imageBase = 0;
static DWORD s_imageSize = 0;

EXTERN_C IMAGE_DOS_HEADER __ImageBase;

PDWORD __stdcall GetImageBase()
{
	if (s_imageBase == 0)
	{
		PIMAGE_NT_HEADERS ntHdrs = (PIMAGE_NT_HEADERS)((DWORD)&__ImageBase + __ImageBase.e_lfanew);
		s_imageBase = (PDWORD)&__ImageBase;
		s_imageSize = ntHdrs->OptionalHeader.SizeOfImage;

		PIMAGE_SECTION_HEADER pSecHdr = (PIMAGE_SECTION_HEADER)(sizeof(IMAGE_NT_HEADERS) + (DWORD)ntHdrs);
		for (int i = 0; i < ntHdrs->FileHeader.NumberOfSections; ++i, ++pSecHdr)
		{
			// obviously, I'll have the injector change/trash/erase this so I won't have
			// an identifiable/hashable string in the clear in the sections table :)
			if (_strnicmp((const char *)pSecHdr->Name, ".stubby", strlen(".stubby")) == 0)
			{
				s_stubbyBaseRVA = pSecHdr->VirtualAddress;
				s_stubbySize = pSecHdr->SizeOfRawData;
			}
		}
	}

	return s_imageBase;
}

// ultimately, implement in raw ASM and VirtualAlloc by the injector.  For now, POC.
__declspec(noinline) // really, really bad things could happen if this got inlined
int __stdcall HashPE()
{
	DWORD dwBefore = GetTickCount(); // for performance tuning
	DWORD dwHash = GetTickCount();
	PDWORD dwBuffer = GetImageBase();
	unsigned int uiStubbyStart = s_stubbyBaseRVA >> 2; // right shift 2 is divide by 4 to account for DWORD's... sorry Cypher :P
	unsigned int uiStubbyEnd = uiStubbyStart + (s_stubbySize >> 2);
	for (unsigned int i = 0; i < s_imageSize >> 2; ++i)
	{
		if ((i < uiStubbyStart) || (i >= uiStubbyEnd))
		{
			DWORD dwResult = dwBuffer[i] ^ dwHash;
			//dwBuffer[i] = dwBuffer[i] ^ dwHash;
		}
		dwHash <<= 1; // keep on turning, keep on turning around...
	}
	DWORD dwAfter = GetTickCount(); // perf tuning
	printf("%d ms elapsed, %f bytes/ms\n", 
		dwAfter - dwBefore, 
		((float)s_imageSize / (float)(dwAfter - dwBefore)));
}

__declspec(naked)
__declspec(noinline)
void CallGate(void) // per hook
{
  // pseudo-code
  // save state (registers, flags, etc.)
  // do thread-gate to deal with multithreading, reentrancy, etc.
  HashPE(); // decrypt the PE
  // tweak retaddr to point to post-call, save original retaddr (at issue:  how to do this without trashing the stack or registers? maybe TLS?)
  // jmp REAL_HOOK (if registers, flags, and stack are clean, this will look like a CALL to REAL_HOOK)
post-call:
  // REAL_HOOK hits ret... comes back to HERE, stack should look clean 
  // thread-gate again
  HashPE(); // recrypt the PE
  __asm push REAL_RET_ADDR (see above; how can we preserve this in a thread-safe manner?)
  __asm ret; // return to real caller 
}
#pragma code_seg(pop)

**amadmonk** · 05-15-2009

So, gotta convert this snippet to ASM and side effect-proof it. Then gotta write the injector code to build trampolines into the encryption code (gotta love XOR, can use the same code for encrypt/decrypt).

Finally, gotta tweak my injected code to use a table of trampoline offsets instead of the "real" code offsets. Not sure how to do that in a "clean," C++ style yet. One thing at a time...

**amadmonk** · 05-15-2009

Minor problem solved: TlsAlloc/TlsGetValue/TlsSetValue should work to store the "real" return address. Unfortunately, this means that my injector needs to actually evaluate code in the target process (TlsAlloc) before it can write out the ASM. Alternatively, I can reverse TlsAlloc/Get/Set and emulate it; that might produce less entanglement between the injector/injected code.

I think that this is actually a fairly nice solution, because if I also patch up a variable in the injected code, the injected code can then have a simple mechanism to find out "who is calling me" (which might be useful, depending upon the situation).

Gotta look at the asm for TlsAlloc/TlsGetValue/TlsSetValue now. I'm assuming I'm going to be seeing a lot of fs:2Ch...

EDIT: Okay, I've pretty much reversed TlsAlloc/TlsGetValue/TlsSetValue. It should be possible to do all of this from the injector without involving the target process at all.

**g3gg0** · 05-15-2009

some other question that raises when reading this....
why does warden not hash the wow binary image (well, some chunks of it) in memory?
instead of detecting bad guys, check if the binary is not modified.

yes.. im aware of the few static addresses that are checked.
im talking about some "random" chunks that are checked somewhen during gameplay.

i would handle it like that:
- server has a large list of addresses w/ size and their hash values
- server sporadically sends address w/ size to client
- client hashes and sends back the hash value
- everything of course only within the wow process

thats not a big deal - even for blizz.

pro:
- easy to implement
- copy-paste injection hackers will fail

con:
- will only be able to check detoured calls / modified statics
- will (also) be bypass-able by skilled ppl (e.g. memmaping original image into spare area and redirect hashes to there)

why dont they do it like that?

**kynox** · 05-15-2009

Originally Posted by g3gg0

some other question that raises when reading this....
why does warden not hash the wow binary image (well, some chunks of it) in memory?
instead of detecting bad guys, check if the binary is not modified.

yes.. im aware of the few static addresses that are checked.
im talking about some "random" chunks that are checked somewhen during gameplay.

i would handle it like that:
- server has a large list of addresses w/ size and their hash values
- server sporadically sends address w/ size to client
- client hashes and sends back the hash value
- everything of course only within the wow process

thats not a big deal - even for blizz.

pro:
- easy to implement
- copy-paste injection hackers will fail

con:
- will only be able to check detoured calls / modified statics
- will (also) be bypass-able by skilled ppl (e.g. memmaping original image into spare area and redirect hashes to there)

why dont they do it like that?

Warden is a selective anticheat. No scan in it targets anything on a mass scale. I guess it keeps bans controllable by monitoring the current releases and only adding the problematic ones.

**Cypher** · 05-15-2009

x64 portability means the following:

All "DWORDS" which are used to hold addresses need to be changed to "DWORD_PTR".

No naked functions. No inline ASM, with the exception that you also do the x64 ASM, compile it in MASM, and provide the conditional compilation required to use the ASM from the object file.

Any other non-portable datatypes need to be updated.

Nothing major for something like this fortunately afaik.

**amadmonk** · 05-16-2009

Hmm, that might almost be doable. My decryptor code is mostly inline asm right now, but I might be able to modularize it into MASM modules. Low priority for me personally but I'll try to avoid the no-no's when coding if it's not too much of a breaker.

Shout-Out

User Tag List

Thread: Interesting results on self-decryptor

Thread Tools

Search Thread