Friday, June 15, 2012

Tech Check: 1st Circuit errs in description of file hashing

In United States v. Farlow, 2012 U.S. App. LEXIS 11121 (1st. Cir. Jun. 1, 2012) the 1st Circuit erred in its description of how changing a file affects its hash value.  Judge Thompson stated:

The problem for Farlow is that we have rejected the idea that government agents should so narrowly restrict their searches of digital devices. "When searching digital media for 'chats' and other evidence of enticement" -- like the bodybuilder image -- "government agents cannot simply search certain folders or types of files for keywords." Crespo-Rios, 645 F.3d at 43 (emphasis added). The same goes for other specific identifying information -- like hash values. This is because computer files are highly manipulable. Id. at 43-44. A file can be mislabeled; its extension (a sort of suffix indicating the type of file) can be changed; it can actually be converted to a different filetype (just as a chat transcript can be captured as an image file, so can an image be inserted into a word-processing file and saved as such). See id. Any of these manipulations could change a document's hash value. And in any event a limited hash-value search would not have turned up any chat transcripts (which, again, can be saved as image files) or other evidence of Farlow's New York crimes. The government therefore reasonably executed a broad search that fell within the scope authorized by the valid warrant it obtained.
The highlighted/bolded portion is not in fact, completely true. It is true that capturing a chat transcript as an image, or placing it in a different document does change the hash value. But, merely changing the name of a file, or changing its extension using regular file operations does not change that file's hash value. A friendly example of that on OS X:


















And for clarity's sake, a duplicate test on Windows, using "hashtest2.txt" from the OS X machine as a starting point. I have copied the file and renamed it, as well as copied it and changed the extension:


Notice that the hash never changes, from OS X to Windows. It remains eb1a3227cdc3fedbaec2fe38bf6c044a.

I point this out merely to prevent this erroneous statement from being perpetuated. I do not think, on the whole, that it really makes too much difference in the case itself. I'm open to opinions otherwise.

As a caveat, let me also note that changing a file extension can also occur through a program (i.e. MS Paint) whereby one file format is converted to another (png to jpg, for example), and that would change the hash value. I think the words in this decision are just a little unclear and ambiguous.

3 comments:

  1. Doesn't it seem disingenuous to not search for the hash FIRST, being as particular as the 4A demands, and only if you don't find the picture then move on?

    The search images for chats seems like a stretch too..

    ReplyDelete
  2. I agree that searching for the hash first would be ideal, but the underlying point is that it is possible to change a file by a single byte and defeat such a search; thus, since such a search ends up being impractical, why put the extra step when they can just search by file type (typically carving for a specific header).

    ReplyDelete
  3. As stated in other cases, if something puts a burden on the investigators to be particular, it's the fourth amendment. Seems like in this case they knew specifically the file they were looking for, and a hash search would have been perfect for finding that particular file. That very much seems in line with fourth amendment law.

    The file type argument seems circular. If it is an image search, and the law enforcement agent is claiming the potential for modification and image conversion, why would they then search all images of that type?

    I think I see where you're going here, in that any small modification can significantly change the hash. There are tools to deal with "homologous files" though, Using a forensic technique called fuzzy hashing. Look at tools called SSDeep and sdhash. They will catch files where just a small change has been made as 90+ % similar. Plus these types of searches have been built into tools like FTK for years now.

    I just don't understand how agents can argue they must use gallery view, which is by definition a general search, to look for pictures they already could know hashes for. It just seems dishonest. Use the hashes first, then do homologous files search, then do keyword, then if all that fails, do "gallery". It's rare that forensic agents even could know exactly what they're looking for, but when they do they are obligated under the fourth to be as particular as possible. This is not the last time we will see an argument like this. Who knows how good the defenses expert was.

    ReplyDelete