They say Twitter has a 280-character limit but one undergrad exceeded it — by a gazillion.
Look at the image above. Sure, Shakespeare looks dashing, but that’s not the point. Look at it closely — what do you see? What if I told you it holds the man’s entire works inside it? David Buchanan, a computer science undergraduate, managed to infiltrate a zip archive of all of Shakespeare’s work into a small version of the image and then added it to a Tweet.
It worked.
Assuming this all works out, the image in this tweet is also a valid ZIP archive, containing a multipart RAR archive, containing the complete works of Shakespeare.
This technique also survives twitter's thumbnailer :P pic.twitter.com/P0Owq9abRC
— David Buchanan (@David3141593) October 29, 2018
So how does this work?
[panel style=”panel-default” title=”Shakespeare’s works” footer=””]There are 884,421 total words in Shakespeare’s 43 works. The average length of English words is 4.5 letters, which adds up to around 4,000,000 characters.
Since 1 character = 1 byte and 1 Megabyte = 1,000,000 bytes, Shakespeare’s works can fit in about 4 MB. This is the bare minimum.[/panel]
Speaking to Motherboard, Buchanan explained:
“So basically, I wrote a script which parses a JPG file and inserts a big blob of ICC metadata,” he said. “The metadata is carefully crafted so that all the required ZIP headers are in the right place.”
“I was just testing to see how much raw data I could cram into a tweet and then a while later I had the idea to embed a ZIP file,” Buchanan added.
For the less computer-savvy, what Buchanan is saying is that he wrote a script that analyzes the logical syntactic components (parses) embedded in an image (JPG file). All JPG files have metadata — which is basically a set of data that gives information about the image itself (and potentially other data). Thus, he was able to hide the whole archive (ZIP).
It’s not the first time something like this has been done — especially on Twitter. There’s actually a name for this technique: it’s called steganography. Generally speaking, steganography is the practice of concealing a file, message, image, or video within another file, message, image, or video. A recent paper describes the practice and some case studies, particularly the potential for malicious usage.
Funny enough, Buchanan thought this was a bug and reported it to Twitter — but they replied that it’s “not a bug”. So at least for the meantime, Twitter steganography is here to stay — at least for now.