Comtypes: How Dropbox learned to stop worrying and love the COM

// By Alicia Chen • Oct 04, 2012

Here at Dropbox, we often use Python in uncommon ways. Today, I’ll be writing about a module that few Python users have even heard of before—comtypes. Comtypes is built on top of ctypes and allows access to low level Windows APIs that use COM. This module allows you to write COM-compatible code using only Python. For example, the Dropbox desktop client feature that allows you to upload photos from a camera uses comtypes to access Windows Autoplay. But before we talk about comtypes, we have to talk about COM.

What is COM?

The Component Object Model is a standard introduced by Microsoft back in 1993. It allows two software components to interact without either one having knowledge of how the other is implemented, even if the components are written in different languages, running in different processes, or running on different machines and different platforms. Many Windows APIs still rely on COM, and occasionally, we have to work with one of them. The camera upload feature we released this year runs on Windows XP, an OS from 10 years ago, as well as Windows 8, an OS that hasn't been released yet. And it does all this using a standard that was created almost 20 years ago.

On Windows, COM is both a standard and a service. It provides all the systems and utilities necessary to make inter-component compatibility possible. The standard requires interfaces to be compiled into a binary format that is language agnostic. For this purpose, it includes a specification for its own interface language—the Microsoft Interface Definition Language, aka MIDL—which is compiled into a binary called a type library that is then included inside a runnable such as a .dll or .exe. COM also allows run-time querying of supported interfaces, so that two objects can agree on an interface much like strangers meeting in a foreign country—"Do you speak IHardwareEventHandler version 2? No? Well, parlez-vous version 1?" On top of that, it also provides for object reference counting, inter-process marshalling, thread handling, and much more. Without this functionality, a component implementer would have to make sure objects used by a different process are cleaned up eventually, but not while they're still being referenced. She’d have to serialize arguments to pass between components, and figure out how to control access from multi-threaded components into objects that may or may not be thread-safe.

COM handles these things for you, but this functionality comes at a cost. It involves a fair amount of complexity, which is unfortunately necessary, and a lot of syntactic convolution, which is just plain unfortunate. Writing an object that uses a COM component, aka a COM client, is difficult. Writing a COM object that other components can use, aka a COM server, can be downright devilish.

If you think COM seems like magic, you’re right—it is definitely some sort of black magic. COM requires incantations such as 

CoRegisterClassObject(
    {0x005A3A96, 0xBAC4, 0x4B0A, {0x94, 0xEA, 0xC0, 0xCE, 0x10, 0x0E, 0xA7, 0x36}},
    NULL,
    CLSCTX_INPROC_SERVER | CLSCTX_LOCAL_SERVER,
    REGCLS_MULTIPLEUSE,
    &lpdwRegister
    );

and the use of strange ritual equipment like MIDL compilers. To ensure unambiguity in class and interface identification, everything is referenced by GUIDs, which are undescriptive and unwieldy at best. And when creating a COM server, the sheer number of configuration options at every step of the way can be paralyzing. You have to answer questions such as “Are my threads running in Single Threaded Apartments or Multithreaded Apartments?” and “What does it mean to set my ThreadingModel to Both instead of Free?” Understanding these questions requires a lot of COM-specific background knowledge, and most articles about these choices are pages long, and often involve charts, diagrams, and sample code.

I came to Dropbox with enough knowledge of COM to squeak by, and still consider myself no more than an advanced novice. If one were to attempt to write pure Python code that used or, heaven forbid, implemented a COM object, one would need to generate and parse the binary type library files that specify COM interfaces, perform all the complex Windows registry rituals, track all the reference counts to COM objects, as well as correctly write the endless syntactical mumbo jumbo. Fortunately for us, the comtypes module exists to abstract (almost) all of this horribleness away from us.

Comtypes

If COM is black magic, then comtypes is the mysterious witch doctor service that you contract to perform the black magic for you. For simple tasks, everything likely works fine. Unfortunately, if you need to do anything very complex, you run the risk of being left in the dark as to what sort of invocations were performed, only to find the demon knocking on your door.

Still, comtypes makes life much easier. When it works well, you can simply feed comtypes the path to the dll or exe of the object you’re trying to use, and then write pretty straightforward code like

device_obj = CreateObject("PortableDevice.PortableDevice", IPortableDevice)
contents = device_obj.Content()
for item in contents:
    print item

This (slightly simplified) sample code allows access to the contents of a camera attached to the computer. The deviceobj is a Python wrapper around a COM object that is actually implemented elsewhere on the system, one that represents a camera we can interface with. Underneath, comtypes will be busy CoCreateInstancing, QueryInterfacing, and wrapping ctypes objects with Python objects for your ease of use. There’s usually no need for you to worry about the million things that are going on underneath. But unfortunately, things don’t work smoothly all the time, so what kind of hackers would we be if we didn’t open it up to see how it works?

Automatic code generation: GetModule

The magic begins in the comtypes.client module. The handy helper function GetModule will take a binary file like a .tlb or .exe, extract the binary data, and automatically generate Python code that, like a header file, specifies all the interfaces, methods, and structs that you need to use a particular COM object. Anyone’s who’s worked with the Windows API might be familiar with the rabbit-hole of struct and type declarations. A FORMATETC struct, for example, is one that is used in drag and drop APIs. It is declared as two mystery types followed by three 32 bit ints. Further digging will reveal that one of the unknown types is an enum, but the other is another struct involving more unknown types. For it to be usable in Python, you have to break things down into known types without the benefit of importing hundreds of Windows headers. GetModule will do all of these things for you, but the generated code hides the interesting part, the wrapper classes that actually proxy to the real COM objects underneath. So it’s time to dig a little deeper.

A metaclass in the wild

This brings us to the comtypes class IUnknown, the root of all evil (no joke—check __init__.py:979). In COM, IUnknown is the grandfather of all interfaces, the interface which all other interfaces inherit from. It contains only three functions: 

// QueryInterface returns a pointer to the interface you are querying for,
// or an error if the object does not implement it
int QueryInterface(InterfaceID refiid, void** ppObjectOut);
 
// These methods are used for reference counting
int AddRef();
int Release();

In comtypes land, IUnknown is actually a base class. For any COM interface that you intend to call into—IPortableDevice for example—you must create a class that inherits from IUnknown. All the methods in the interface are declared in the variable _methods_ as tuples, specifying function name, types and names of args and return values.

class IPortableDevice(IUnknown):
   _iid_ = GUID('{625e2df8-6392-4cf0-9ad1-3cfa5f17775c}')
   _methods_ = [
       COMMETHOD([], HRESULT, 'Open',
           ( ['in'], LPWSTR, 'pszPnpDeviceID' ),
           ( ['in'], POINTER(IPortableDeviceValues), 'pClientInfo')),
       COMMETHOD([], HRESULT, 'Content',
           ( ['out'], POINTER(POINTER(IPortableDeviceContent)), 'ppContent'))
   ]

The class IUnknown itself is actually pretty simple. The magic happens in its metaclass _cominterface_meta, which turns these tuples into bound methods. For the uninitiated, all Python classes are actually objects, and metaclasses are things that make classes. When you declare a class like IPortableDevice that inherits from IUnknown, the metaclass of IUnknown takes COMMETHODs declared above and creates two bound methods: IPortableDevice.Open, which takes in two parameters and returns nothing; and IPortableDevice.Content, which takes no parameters and returns one. These wrapper methods check that calls are made with the appropriate number and type of inputs, a necessity when communicating between untyped, flying-by-the-seat-of-your-pants Python and statically typed, compiled languages like C++. The wrappers then proxy the method calls into an actual COM object that was instantiated under the covers, wrap the return values in Python types, and return them to you, transforming returned error codes into Python exceptions along the way. It’s wonderful, except when it doesn’t work exactly as intended.

The most painful such incident brought development to a dead halt for two days. The only symptom was that the program would occasionally crash after reading a bunch of images from a camera. The bug was non-deterministic and no exception was generated. After endless hours of printing and prodding, I finally found the root cause of the problem. In COM, the implementer of an interface typically does AddRef on the object when he creates it so that it is “born” with a reference which is passed to the caller of CreateObject, while the user of the interface is responsible for calling Release when he is done with it. Additional calls to AddRef and Release are only necessary if the user makes copies of the reference to the object. So in comtypes, the __init__ method of a comtypes object does not call AddRef on the COM interface, but deletion does call Release. This in itself is only passingly strange, because it usually works.

However, the clever wrapping of COM objects sometimes results in comtypes objects being created unexpectedly, and then also deleted unexpectedly. For example, in the following code

# The following is equivalent to the C code
#   IDeviceItem* idevice_item_array = IDeviceItem[10];
#   device->GetItems(10, &idevice_item_array);
idevice_item_array = (pointer(IDeviceItem) * 10)()
device_obj.GetItems(10, idevice_item_array)
device_item = idevice_item_array[0]

you would expect device_item to be a pointer to an IDeviceItem. Normally, you would have to “dereference” the pointer to get to the item itself, as in

# In ctypes, pointer.contents refers to the target of the pointer
device_item = idevice_item_array[0].contents
device_item.do_something()

However, when you index the array, comtypes helpfully transforms idevice_item_array[0] from type pointer(IDeviceItem) to type IDeviceItem, so instead we have 

# idevice_item_array[0] is already of type IDeviceItem??
idevice_item_array[0].do_something()

In the process, it unexpectedly creates an instance of IDeviceItem. More importantly, it unexpectedly destroys an instance of IDeviceItem. So, the following code

for i in range(100):
    print idevice_item_array[0]

actually crashes Python because it creates and destroys a hundred IDeviceItem objects, resulting in a hundred calls to Release on the real COM object. After the first Release call, the COM object is considered deleted. Whenever the garbage collector for that COM object is triggered, everything explodes.

The workaround? Save your reference into a Python object and keep it around. Don’t index your COM object arrays more than once. 

device_item = idevice_item_array[0]
for i in xrange(100):
    print device_item

This results in exactly one call to Release, which occurs when the Python object deviceitem is destroyed.

All of this happened within my first few months at Dropbox, and I barely spoke Python at the time. I learned what a metaclass was before I had fully mastered list slicing syntax. Meanwhile, my counterpart on the Mac camera uploads side was not having the easiest time either. Without the benefit of a compatibility enforcer like COM, he was trying to ferret out why an OS X library was trying to execute Dropbox code as PowerPC assembly on an X86 machine (it’s complicated—the explanatory comment is about fifty lines long). It made me feel a little bit better about dealing with incorrect vtable pointers and bad reference counting.

Discovering comtypes was an integral part of the development of the photo feature, and it certainly presented enough excitement to be considered an adventure. In the end, for all the problems I encountered, having comtypes made it much easier to access COM APIs. Reference counting bugs are the price you pay when you work with low-level code, but it certainly would have been much more work to write our own Python wrappers around COM. COM may be difficult to use, and comtypes occasionally frustrating, but with a working knowledge of how to use the first and how to work around unexpected pitfalls in the second, we can plow ahead with future Windows features. In fact, not long after we released the camera feature, we found ourselves again needing to interact with a COM component. With all this knowledge under my hat, I made myself a COM client in no time. It was a glorious victory.


// Copy link