On a daily basis we collect tons of Spam emails, which we analyze for malicious content. Of course, this is not done manually by our thousands of minions, but automated using some Python-fu. Python is a programming language that comes with many libraries, making it easy for us to quickly perform such tasks.
Python’s email library deals with, well, emails. And it does it well. But on October 3rd, we encountered an attachment that wasn’t there – at least according to Python’s email library.
Now how could that happen?
Emails do have a certain structure, which is described nicely in RFC #822, RFC #2822, RFC #5322, RFC #2045, RFC #2046, RFC #2047, RFC #2049, RFC #2231, RFC #4288 and RFC #4289. Even though these RFC’s are clear in their own way, an illustration might help (we focus on multipart emails only) to understand why Python’s email library got fooled.