Introduction: Basic automation on X11 / GNU / Linux
This is a brief introduction on how to automate & control WoW on Linux. That includes:
- Finding the window handle with Xlib
- Taking screenshots of the WINE window
- Faking keyboard & mouse input
To use and work with the code snippets, some basic understanding of the X Window System, POSIX and C certainly won't hurt.
Don't forget to Read The Fine Manual (pages) either 
* Finding the window, its size and its position
To find the window, we need Xlib. Xlib is the client library to the X Window System. See the last section of this article for more information.
Code:
/*
Linux automation example #1, written by Sednogmah for MMOwned.
- Find the X11 window handle by the window name
- Print the window's location and size
Copy & paste this to example--find-win-by-name.c and compile it with:
CFLAGS="-std=c99 -Wall"
LDFLAGS="-lX11"
SRC="example--find-win-by-name.c"
BIN="${SRC%%.c}"
gcc $CFLAGS $LDFLAGS -o $BIN $SRC
Example output:
Name: World of Warcraft | ID: 4000007 | Size: 1920x1200 | Location: 0, 0
*/
#include <X11/Xutil.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
/* Recursively search the X11 window tree for a window name.
* Returns 1 on success and 0 on failure.
*/
int x_find_window_by_name(Display* disp, Window root, char* name, Window* result)
{
Window *children, queryroot, queryparent;
unsigned int n_children;
int status;
status = XQueryTree(
disp, root, &queryroot, &queryparent, &children, &n_children);
if(!status)
{
return 0;
}
else
{
for(int i=0; i < n_children; i++)
{
char *win_name;
if(XFetchName(disp, children[i], &win_name))
{
if(strcmp(win_name, name) == 0)
{
*result = children[i];
XFree(children);
return 1;
}
XFree(win_name);
}
if(x_find_window_by_name(disp, children[i], name, result) == 1)
return 1;
}
XFree(children);
}
return 0;
}
// example usage
int main() {
char targetwin[] = "World of Warcraft";
Display *disp; // DISPLAY structure
Window win; // Target window structure
Window root; // root window (desktop)
// Fetch the current display's name from the environment variable DISPLAY.
// Alternatively, you can set this to work with a remote X11 server.
const char *display_name = getenv("DISPLAY");
// Try to open the DISPLAY
disp = XOpenDisplay(display_name);
if(disp == NULL)
{
printf("Could not open display.\n");
return 1;
}
root = RootWindow(disp, 0);
// Let's find the window
int status;
status = x_find_window_by_name(disp, root, targetwin, &win);
if(!status)
{
printf("Could not find a window called '%s'.\n", targetwin);
return 1;
}
// Query & print some information about our window
Window qroot;
int x,y;
unsigned int w,h,bw,d;
XGetGeometry(disp, win, &qroot, &x, &y, &w, &h, &bw, &d);
printf(
"Name: %s | ID: %x | Size: %dx%d | Location: %d, %d\n",
targetwin,
(unsigned int) win,
w, h, x, y
);
// Fin! :)
return 0;
}
Good. Now we can find WoW's window, get its location & size. The next thing we might want to try is to capture the contents of the window.
* Capturing the contents of a window
Now that we have the window handle, capturing and processing images is a piece of cake, thanks to a lightweight image processing library called "imlib2". Among many things, imlib2 allows you to:
- Save images to disk in one of many formats
- Apply filters to images
- Scale images
- Render truetype anti-aliased text
- Fastest image compositing, rendering and manipulation library for X
The online documentation of imlib2 is outdated. Use the docs that come with the library.
Code:
/*
This snippet assumes that we already have:
- X11 Display structure
- Window handle
- The window width w, and its height h
*/
Visual *vis;
Drawable draw;
Imlib_Image img;
// Prepare imlib
vis = DefaultVisual(disp, DefaultScreen(disp));
imlib_context_set_display(disp);
imlib_context_set_visual(vis);
draw = win;
// This sets the X drawable to which images will be rendered when you call a
// render call in Imlib2. This may be either a pixmap or a window. You must set
// this to render anything.
imlib_context_set_drawable(draw);
// Take screenshot
im = imlib_create_image_from_drawable( None, 0, 0, w, h, 1);
// Sets the current image Imlib2 will be using with its function calls.
imlib_context_set_image(im);
// Do something with it, for example save it to a file
imlib_image_set_format("png");
imlib_save_image("screenshot.png");
// Frees the image that is set as the current image in Imlib2's context.
imlib_free_image();
* Sending fake keyboard and mouse actions
Now that we can gather some information, you probably wonder how we might turn action into reaction.
First, we want to set the focus to our target window. That's easy:
Code:
// Focus window
XRaiseWindow(disp, win);
XSetInputFocus(disp, win, RevertToNone, CurrentTime);
Note that some window managers like Gnome's metacity or KDE's kwin have "focus stealing prevention" and ignore XRaiseWindow. In order to circumvent it, you either have to talk to the window manager itself or turn it off. Also, WINE only accepts keyboard input if you click the window at least once, which is a lesser problem of course.
In order to send fake keyboard and mouse events, we're using the XTest library.
Code:
// add this include to your app:
#include <X11/extensions/XTest.h>
Mouse:
Code:
////// Mouse
// Move the mouse pointer
XWarpPointer(disp, None, win, 0, 0, 0, 0, x, y);
// Press right mouse button (Button #3)
// int XTestFakeButtonEvent(display, button, is_press, delay);
// Note: CurrentTime is a constant and means: press immediately
XTestFakeButtonEvent(disp, 3, 1, CurrentTime);
// sleep for a while
usleep(1000 * milliseconds);
// Release mouse button
XTestFakeButtonEvent(disp, 3, 0, CurrentTime);
XFlush(disp);
Keyboard:
Code:
////// Keyboard
// Press a key
// int XTestFakeKeyEvent(display, keycode, is_press, delay);
XTestFakeKeyEvent(disp, 20, 1, CurrentTime);
// sleep for a while
usleep(1000 * milliseconds);
// Release a key
XTestFakeKeyEvent(disp, 20, 1, CurrentTime);
XFlush(disp);
* Advantages of multiple X servers
Thanks to the architecture of X11, it's possible to run & control multiple X servers on your local machine or even somewhere in your network.
This allows you to run two X servers on your local machine. You can work, play other games or watch movies on your first, while your second X session runs some simple window manager like fluxbox, WoW and your bot. If WoW.exe asks Wine whether it's the active foreground window, WINE will happily reply with "Yes!" even though you're actually watching a movie. All without any hacks or hooks.
Due to the network transparency of the X11 protocol, you can control the 2nd X session from the first one too.
Why do you want WoW to be the foreground window? As it was previously stated, WoW could always check if it's actually running in the foreground. While it's certainly not an offense by itself to play in the background, as it's done by multiboxers, WoW could at least flag you for inspection by a GM. If there's only one game session coming from a certain IP address and the only window is in the background, that would be a good reason to flag you.
* References, reading material
- man pages. online: Linux man pages
man XWarpPointer
man XTestFakeButtonEvent
man XTestFakeKeyEvent
- Xlib - Wikipedia, the free encyclopedia
- Xlib programming manual: function index
- /usr/share/doc/libimlib2-dev/html/
* Disclaimer
I'm not a native speaker. Excuse my occasionally odd grammar and spelling errors. Constructive criticism, ideas & comments are very welcome.
* Edits
- Note about focus stealing prevention
- use XFlush() after fake input events