Reading Game Memory is Slow! Looking for advice.

**Torpedoes** · 12-17-2013

I've recently started writing hacks for games, in particular Team Fortress 2 and CS:GO. By using ReadProcessMemory I managed to write fully external wall hacks, aimbots and so on. Over the past several days, however, I've been experimenting with World of Warcraft, specifically, I'm attempting to write an external radar. The core portion which reads the game is already implemented but there's a bottleneck. When exploring densely populated regions such as cities, reading becomes exceptionally slow (upwards of around 100-200 ms per read), this is worse than analyzing screenshots. As you can imagine, this was never a problem in FPS games since there were never more than about 32 entities to keep track of.

My question. What are some techniques that you guys use for your external hacks. Please try to be specific. I'm looking for a solution that would allow me to read everything in real-time (let's say 15-30 ms for densely populated regions). In case you're wondering I'm experimenting with real-time screen projection, similar to AVR. It works but I need to speed things up. (I'm using C++ and do all my reading in a separate thread). Any tips will be appreciated (and rewarded with hopefully a great app

)

**DrakeFish** · 12-17-2013

Originally Posted by Torpedoes

I've recently started writing hacks for games, in particular Team Fortress 2 and CS:GO. By using ReadProcessMemory I managed to write fully external wall hacks, aimbots and so on. Over the past several days, however, I've been experimenting with World of Warcraft, specifically, I'm attempting to write an external radar. The core portion which reads the game is already implemented but there's a bottleneck. When exploring densely populated regions such as cities, reading becomes exceptionally slow (upwards of around 100-200 ms per read), this is worse than analyzing screenshots. As you can imagine, this was never a problem in FPS games since there were never more than about 32 entities to keep track of.

My question. What are some techniques that you guys use for your external hacks. Please try to be specific. I'm looking for a solution that would allow me to read everything in real-time (let's say 15-30 ms for densely populated regions). In case you're wondering I'm experimenting with real-time screen projection, similar to AVR. It works but I need to speed things up. (I'm using C++ and do all my reading in a separate thread). Any tips will be appreciated (and rewarded with hopefully a great app

)

Calling RPM (ReadProcessMemory) is usually something you want to do as less frequently as possible. I have no idea what your current code looks like and your application could be slowed down for other reasons, but a tip would be to try to optimize it so you you call RPM less frequently. By example, you could read an entire "object" at once and then access its fields directly in your process' memory instead of reading each field with a RPM of 4-bytes. While doing that, considering only one thread is doing the reading, you should also re-use the same buffer instead of allocating/freeing one every time.

These are pretty obvious tips but your question doesn't really go into details concerning what you're currently doing.

**namreeb** · 12-17-2013

Is there any particular reason why you are not injecting a dll?

**Torpedoes** · 12-17-2013

Originally Posted by DrakeFish

Calling RPM (ReadProcessMemory) is usually something you want to do as less frequently as possible. I have no idea what your current code looks like and your application could be slowed down for other reasons, but a tip would be to try to optimize it so you you call RPM less frequently. By example, you could read an entire "object" at once and then access its fields directly in your process' memory instead of reading each field with a RPM of 4-bytes. While doing that, considering only one thread is doing the reading, you should also re-use the same buffer instead of allocating/freeing one every time.

These are pretty obvious tips but your question doesn't really go into details concerning what you're currently doing.

You're right, please let me elaborate. As of right now my update loop looks like this:

Code:

RPM GameState
if (not in game)
    return;

RPM EntityList (2)
RPM FirstEntity
RPM LocalGuid

RPM Camera (2)
RPM Camera Fields (3)

while (nextEntity != 0 and (nextEntity & 1) == 0)
{
    RPM EntityType
    switch (Type)
    {
        case Npc:
            RPM fields (9)
        case Player:
            RPM fields (6)
        case Object:
            RPM fields (7)
    }
    RPM nextEntity
}

Thank you for the tips. In response to your observations: I have identified the bottleneck in the update loop, there don't seem to be any anomalies in other parts of the program. Regarding memory optimizations, Yes I'm aware of those techniques, and will be incorporating them later on. I am a bit confused though about reading the "entire" object. I have thought about this but can't understand how big the buffers should be. Especially when fields like name are two level pointers.

Here are my current ideas:

Object caching (as you suggested), when I figure out the specifics such as the question above about buffer size.
Partitioning entities across multiple threads in a thread pool. Each thread would read a percentage of entities.
Refreshing certain values faster than others. For instance, camera would get refreshed faster than entities. That way when moving the camera, lag would be significantly reduced since old entity locations may be used.

**Torpedoes** · 12-17-2013

Originally Posted by namreeb

Is there any particular reason why you are not injecting a dll?

Not a fan. From what I've seen it takes way more effort to hide to remain undetected and I'm in no way good enough to decompile warden to understand how it works, let alone hide injected dll's. Furthermore, it's a bit clunky, I'd rather just have the user double click an exe and within seconds just have it work, like magic and without any extra steps. I also don't want to write a custom UI framework (not sure how that whole business works with injected dll's).

**Valediction** · 12-18-2013

Originally Posted by Torpedoes

I am a bit confused though about reading the "entire" object. I have thought about this but can't understand how big the buffers should be. Especially when fields like name are two level pointers.

That's one of the problems. Another problem with RPM is it failing either randomly (partial failures happen often, particularly when calling it with a high rate) or crossing memory page boundaries and attempting to read non paged memory for example. For reading an object, if you have an idea of its theoretical size you can tune your memory read to that size and it will allow efficient lookup of that object's values. For multilevel, you have to bite the bullet and perform several calls.

But you could try caching memory ranges based on some memory proximity you notice when you're iterating lists and such, or pages as a whole (to simplify the boundary checks). You take snapshots and discard them after each iteration, but at least you save up a good number of calls.

**ioctl** · 12-19-2013

I do aggressive caching of memory in 4Kb chunks: whenever I need to do a memory read, I zero out the low 12 bits of the address, fetch a whole 4k page, and store it in a cache. The cache gets purged after I grab a complete snapshot of the game state. It cut down the number of reads I do by over 90%.

I'm running a 64-bit bot against a 32-bit Wow.exe, so that opens up an interesting caching scheme:
I allocate 4Gb of anonymous, unreserved memory. This creates a region exactly the size of Wow.exe's VM that uses no memory until you write into it and start dirtying pages. Adding pages to the cache involves just copying them to the corresponding offset in the cache region. A giant bitmap keeps track of which pages are valid, and purging the cache is just clearing the bitmap. Reading from the cache is super simple -- you just make sure all the pages covered by your read are valid, and use the memory directly. No gluing fragments from different pages together if your read spans a boundary.

**Torpedoes** · 12-19-2013

Originally Posted by ioctl

I do aggressive caching of memory in 4Kb chunks: whenever I need to do a memory read, I zero out the low 12 bits of the address, fetch a whole 4k page, and store it in a cache. The cache gets purged after I grab a complete snapshot of the game state. It cut down the number of reads I do by over 90%.

I'm running a 64-bit bot against a 32-bit Wow.exe, so that opens up an interesting caching scheme:
I allocate 4Gb of anonymous, unreserved memory. This creates a region exactly the size of Wow.exe's VM that uses no memory until you write into it and start dirtying pages. Adding pages to the cache involves just copying them to the corresponding offset in the cache region. A giant bitmap keeps track of which pages are valid, and purging the cache is just clearing the bitmap. Reading from the cache is super simple -- you just make sure all the pages covered by your read are valid, and use the memory directly. No gluing fragments from different pages together if your read spans a boundary.

This is very helpful, thanks. I'll attempt to combine it with some of my previous optimization techniques and see where it gets me. Just a question though, assuming this is done in C++ for Windows, what do you mean by "allocate anonymous, unreserved memory". I'm not sure I understand how you do this.

**ioctl** · 12-19-2013

I was being vague about it because I'm not using windows =)

In the Unixes, you do this with mmap() and the MAP_ANONYMOUS and MAP_NORESERVE flags. Looks like VirtualAlloc() with MEM_COMMIT and MEM_RESERVE does something similar, but is limited to 32-bit sizes? You can probably get away with allocating 2G - the upper 2G of each process is used by the system IIRC.

**Torpedoes** · 12-19-2013

Originally Posted by ioctl

I was being vague about it because I'm not using windows =)

In the Unixes, you do this with mmap() and the MAP_ANONYMOUS and MAP_NORESERVE flags. Looks like VirtualAlloc() with MEM_COMMIT and MEM_RESERVE does something similar, but is limited to 32-bit sizes? You can probably get away with allocating 2G - the upper 2G of each process is used by the system IIRC.

Yeah I was a bit confused there since I haven't come across this technique yet. Also I seem to recall that only 600 megs or something was used by the system, but I may be wrong. Anyways, thanks again!

**kosacid** · 12-19-2013

i use a lock when i read through the objects when i find what im looking for i lock it till it changes,eg say i want to read info on my self i go through object manager once i found my player it stops looking through object manger and locks on that section, what you use to release it is up to you most of my programs use this way its the fastest way i have found, i can post a c++ example if you like

**ioctl** · 12-19-2013

Originally Posted by Torpedoes

Yeah I was a bit confused there since I haven't come across this technique yet. Also I seem to recall that only 600 megs or something was used by the system, but I may be wrong. Anyways, thanks again!

The split is either 2Gb/2Gb or 3Gb/1Gb according to this: Virtual Address Space (Windows)

Let me know if you get this scheme to work with VirtualAlloc(). Very curious! Worst case, you can cache pages in a hash table or something, and either stitch a string together for large lookups from multiple pages, or make some kind of magic view object that proxies through to the cached data without copying.

**Cypher** · 12-22-2013

Originally Posted by ioctl

I was being vague about it because I'm not using windows =)

In the Unixes, you do this with mmap() and the MAP_ANONYMOUS and MAP_NORESERVE flags. Looks like VirtualAlloc() with MEM_COMMIT and MEM_RESERVE does something similar, but is limited to 32-bit sizes? You can probably get away with allocating 2G - the upper 2G of each process is used by the system IIRC.

Yes, you can use VirtualAlloc with MEM_RESERVE to reserve address space without actually allocating anything. Then you can call VirtualAlloc again with the MEM_COMMIT flag when you're actually ready to allocate and use some memory.

Not sure what you mean it's limited to 32-bit sizes... The function takes a SIZE_T which is 32-bit on 32-bit and 64-bit on 64-bit, so in the case where you're running a 64-bit bot against a 32-bit WoW you can certainly do what you're suggesting (and allocate the full 4GB np).

You just need to remember to commit the page before you decide to write to it (and decommit it afterwards, if applicable).

Originally Posted by ioctl

The split is either 2Gb/2Gb or 3Gb/1Gb according to this: Virtual Address Space (Windows)

Let me know if you get this scheme to work with VirtualAlloc(). Very curious! Worst case, you can cache pages in a hash table or something, and either stitch a string together for large lookups from multiple pages, or make some kind of magic view object that proxies through to the cached data without copying.

That only really applies on 32-bit versions of Windows, and anyone still running a 32-bit version of Windows is bad and should feel bad.

Under 64-bit versions of Windows, 32-bit software has full access to the 4GB of address space (though I think it may need to 'opt in' with the /LARGEADDRESSAWARE flag).

**ioctl** · 12-22-2013

Originally Posted by Cypher

Not sure what you mean it's limited to 32-bit sizes... The function takes a SIZE_T which is 32-bit on 32-bit and 64-bit on 64-bit,

My bad. Misread the API.

Originally Posted by Cypher

That only really applies on 32-bit versions of Windows, and anyone still running a 32-bit version of Windows is bad and should feel bad.

Under 64-bit versions of Windows, 32-bit software has full access to the 4GB of address space (though I think it may need to 'opt in' with the /LARGEADDRESSAWARE flag).

Also applies to 32-bit apps running on 64-bit windows. I'm intentionally running 32-bit Wow.exe because many more people are digging through it for offsets, etc., not to mention the fact that it makes the simple and fast caching scheme described above possible.

**Torpedoes** · 12-23-2013

Originally Posted by ioctl

Let me know if you get this scheme to work with VirtualAlloc(). Very curious! Worst case, you can cache pages in a hash table or something, and either stitch a string together for large lookups from multiple pages, or make some kind of magic view object that proxies through to the cached data without copying.

I'll have to re-architect my system a bit and test out a few other tricks but I'm confident I'll be able to improve my overall system with this. Thanks again!

Originally Posted by ioctl

Also applies to 32-bit apps running on 64-bit windows. I'm intentionally running 32-bit Wow.exe because many more people are digging through it for offsets, etc., not to mention the fact that it makes the simple and fast caching scheme described above possible.

I'm still running 32-bit only because everyone can, but not everyone can run 64-bit. Unfortunately.

Shout-Out

User Tag List

Thread: Reading Game Memory is Slow! Looking for advice.

Thread Tools

Search Thread

Reading Game Memory is Slow! Looking for advice.

Similar Threads

Looking for advice on a PvE server with atleast 20x xp rates.

[Mangos] mangos noob looking for advice (spell fixin' issues)

I am going to buy a wow bot. looking for advice.

Looking for advice on furthering my Guide.

OwnedCore Forums

casino

CoreCoins

My OwnedCore

About Us

Casino