Creating a DOCX/Python Polyglot

May 7, 2021
Written by: Eivind Utnes
Back to blog

Last year, I ran into this tweet from John Gordon (@indiecom):

By researching, I found that Python has been able to execute modules from zip files since version 2.6, and considering that office documents (docx, xlsx etc.) are also zip files, I wanted to know if this would work with them as well – and use this opportunity to do some research on the file format behind the office files, the Open Packaging Conventions (OPC) standard. My goal was to create a file that was both a valid Word document and a working Python file simultaneously.

First, renaming a simple word file from .docx to .zip reveals the following structure:

> _rels
    .rels
> docProps
    app.xml
    core.xml
> word
    > _rels
        document.xml.rels
    > theme
        theme1.xml
    document.xml
    fontTable.xml
    settings.xml
    styles.xml
    webSettings.xml
[Content_Types].xml

The important file here is the [Content_Types].xml file, but I will get back to that.

For a python module to be executable from a zip file, the __main__.py file needs to be located in the root directory of the archive.

I added a simple “print()” __main__.py file to the root, which makes the structure look like this:

> _rels
    .rels
> docProps
    app.xml
    core.xml
> word
    > _rels
        document.xml.rels
    > theme
        theme1.xml
    document.xml
    fontTable.xml
    settings.xml
    styles.xml
    webSettings.xml
__main__.py
[Content_Types].xml

After renaming it back to docx and running it as a python script, we get the following output:

prime@dev $ python Pythondoc.docx
Snakes. Why did it have to be snakes?

As it turns out, turning a word document into a python script is fairly simple. However, Microsoft Office does not enjoy archive meddling shenanigans, and the resulting file throws an error at us when opened in Microsoft Word.

For this to be useful, I really want the file to be openable as a word file as well. I originally attempted to add a modified override to the [Content_Types].xml, but Microsoft Office did not accept my new override as valid. By referring to Microsoft's documentation, I added “.py” as a valid extension by adding the following snippet to the [Content_Types].xml file:

<Default Extension="py" ContentType="application/octet-stream"/>

This provides us with a file that is both a valid python script:

prime@dev $ python Pythondoc.docx
Snakes. Why did it have to be snakes?

And a valid Microsoft Office document:

I have tested it with Microsoft Powerpoint and Microsoft Excel, and this method should work with any file that follows the OPC standard, and a more correct title for this article would have been “Creating an OPC/Python polyglot”.

Note that when the file is opened and then saved, the file is recreated, which removes the __main__.py file and resets the [Content_Types].xml file.

For added fun, I extended the python script a little, and added a macro to the docx file (making it a docm). Now the document executes itself using python on start (if the macro is enabled), and writes a file to disk.

__main__.py:

print("Snakes. Why did it have to be snakes?")
file_object = open('snake.txt', 'a')
file_object.write("Snakes. Why did it have to be snakes?")
file_object.close()

Pythondoc.docm macro:

Sub AutoOpen()
    Shell ("python Pythondoc.docm")
End Sub

Result:

Uses

For a Red Team, this trick can be used to execute code on a machine without relying on Powershell, which can be useful in environments where Powershell is locked down or monitored. However, the usefulness is limited, as it relies on the availability of Python, which is not common in your average workplace environment.

Detection

Because the python module needs to be named __main__.py for Python to recognize it, detection is straightforward. A simple Yara rule to detect if a file is an OPC file and whether it has a python __main__.py file hidden inside it:

rule OPCPythonPolyglot
{
        meta:
                $author = "0xPrime"
                $comment = "OPC/Python polyglot"

        strings:
                $opcidentifier1 = ".xml.relsPK"
                $opcidentifier2 = "[Content_Types].xmlPK"
                $pythonfile = "__main__.pyPK"

        condition:
                all of them
} 

The polyglot and YARA file can be found at this repository:

Github

The word file may be caught by Microsoft Defender due to the “downloaded from internet” flag.

If you have comments or questions please contact me on Twitter @0xprime. Alternatively, you can get in contact with me through the contact form.

Header picture by David Clode on Unsplash