This is a personal blog. My other stuff: book | home page | Twitter | prepping | CNC robotics | electronics

January 27, 2015

Technical analysis of Qualys' GHOST

This morning, a leaked note from Qualys' external PR agency made us aware of GHOST. In this blog entry, our crack team of analysts examines the technical details of GHOST and makes a series of recommendations to better protect your enterprise from mishaps of this sort.

Figure 1: The logo of GHOST, courtesy of Qualys PR.

Internally, GHOST appears to be implemented as a lossy representation of a two-dimensional raster image, combining YCbCr chroma subsampling and DCT quantization techniques to achieve high compression rates; among security professionals, this technique is known as JPEG/JFIF. This compressed datastream maps to an underlying array of 8-bpp RGB pixels, arranged sequentially into a rectangular shape that is 300 pixels wide and 320 pixels high. The image is not accompanied by an embedded color profile; we must note that this poses a considerable risk that on some devices, the picture may not be rendered faithfully and that crucial information may be lost.

In addition to the compressed image data, the file also contains APP12, EXIF, and XMP sections totaling 818 bytes. This metadata tells us that the image has been created with Photoshop CC on Macintosh. Our security personnel notes that Photoshop CC is an obsolete version of the application, superseded last year by Photoshop CC 2014. In line with industry best practices and OWASP guidelines, we recommend all users to urgently upgrade their copy of Photoshop to avoid exposure to potential security risks.

The image file modification date returned by the HTTP server at is Thu, 02 Oct 2014 02:40:27 GMT (Last-Modified, link). The roughly 90-day delay between the creation of the image and the release of the advisory probably corresponds to the industry-standard period needed to test the materials with appropriate focus groups.

Removal of the metadata allows the JPEG image to be shrunk from 22,049 to 21,192 bytes (-4%) without any loss of image quality; enterprises wishing to conserve vulnerability-disclosure-related bandwidth may want to consider running jhead -purejpg to accomplish this goal.

Of course, all this mundane technical detail about JPEG images distracts us from the broader issue highlighted by the GHOST report. We're talking here about the fact that the JPEG compression is not particularly suitable for non-photographic content such as logos, especially when the graphics need to be reproduced with high fidelity or repeatedly incorporated into other work. To illustrate the ringing artifacts introduced by the lossy compression algorithm used by the JPEG file format, our investigative team prepared this enhanced visualization:

Figure 2: A critical flaw in GHOST: ringing artifacts.

Artifacts aside, our research has conclusively showed that the JPEG formats offers an inferior compression rate compared to some of the alternatives. In particular, when converted to a 12-color PNG and processed with pngcrush, the same image can be shrunk to 4,229 bytes (-80%):

Figure 3: Optimized GHOST after conversion to PNG.

PS. Tavis also points out that ">_" is not a standard unix shell prompt. We believe that such design errors can be automatically prevented with commercially-available static logo analysis tools.

PPS. On a more serious note, check out this message to get a sense of the risk your server may be at. Either way, it's smart to upgrade.

January 09, 2015

afl-fuzz: making up grammar with a dictionary in hand

One of the most significant limitations of afl-fuzz is that its mutation engine is syntax-blind and optimized for compact data formats, such as binary files (e.g., archives, multimedia) or terse human-readable languages (RTF, shell scripts). Any general-purpose fuzzer will have a harder time dealing with more verbose dialects, such as SQL or HTTP. You can improve your odds in a variety of ways, and the results can be surprisingly good - but ultimately, it's never easy to get from Set-Cookie: FOO=BAR to Content-Length: -1 by randomly flipping bits.

The common wisdom is that if you want to fuzz data formats with such ornate grammars, you need to build an one-off, protocol-specific mutation engine with the appropriate syntax templates baked in. Of course, writing such code isn't easy. In essence, you need to manually build a model precise enough so that the generated test cases almost always make sense to the targeted parser - but creative enough to trigger unintended behaviors in that codebase. It takes considerable experience and a fair amount of time to get it just right.

I was thinking about using afl-fuzz to reach some middle ground between the two worlds. I quickly realized that if you give the fuzzer a list of basic syntax tokens - say, the set of reserved keywords defined in the spec - the instrumentation-guided nature of the tool means that even if we just mindlessly clobber the tokens together, we will be able to distinguish between combinations that are nonsensical and ones that actually follow the rules of the underlying grammar and therefore trigger new states in the instrumented binary. By discarding that first class of inputs and refining the other, we could progressively construct more complex and meaningful syntax as we go.

Ideas are cheap, but when I implemented this one, it turned out to be a good bet. For example, I tried it against sqlite, with the fuzzer fed a collection of keywords grabbed from the project's docs (-x testcases/_extras/sql/). Equipped with this knowledge, afl-fuzz quickly spewed out a range of valid if unusual statements, such as:

select sum(1)LIMIT(select sum(1)LIMIT -1,1); select round( -1)````; select group_concat(DISTINCT+1) |1; select length(?)in( hex(1)+++1,1); select abs(+0+ hex(1)-NOT+1) t1; select DISTINCT "Y","b",(1)"Y","b",(1); select - (1)AND"a","b"; select ?1in(CURRENT_DATE,1,1); select - "a"LIMIT- /* */ /* */- /* */ /* */-1; select strftime(1, sqlite_source_id());

(It also found a couple of crashing bugs.)

All right, all right: grabbing keywords is much easier than specifying the underlying grammar, but it still takes some work. I've been wondering how to scratch that itch, too - and came up with a fairly simple algorithm that can help those who do not have the time or the inclination to construct a proper dictionary.

To explain the approach, it's useful to rely on the example of a PNG file. The PNG format uses four-byte, human-readable magic values to indicate the beginning of a section, say:

89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 | .PNG........IHDR
00 00 00 20 00 00 00 20 02 03 00 00 00 0e 14 92 | ................

The algorithm in question can identify "IHDR" as a syntax token by piggybacking on top of the deterministic, sequential bit flips that are already being performed by afl-fuzz across the entire file. It works by identifying runs of bytes that satisfy a simple property: that flipping them triggers an execution path that is distinct from the product of flipping stuff in the neighboring regions, yet consistent across the entire sequence of bytes.

This signal strongly implies that touching any of the affected bytes causes the failure of an underlying atomic check, such as header.magic_value == 0xDEADBEEF or strcmp(name, "Set-Cookie"). When such a behavior is detected, the entire blob of data is added to the dictionary, to be randomly recombined with other dictionary tokens later on.

This second trick is not a substitute for a proper, hand-crafted list of keywords; for one, it will only know about the syntax tokens that were present in the input files, or could be synthesized easily. It will also not do much when pitted against optimized, tree-based parsers that do not perform atomic string comparisons. (The fuzzer itself can often clear that last obstacle anyway, but the process will be slow.)

Well, that's it. If you want to try out the new features, click here and let me know how it goes!