Press-Republican

September 8, 2013

Metadata = Data? You bet

BY STEWART DENENBERG, Technology and Society
Press-Republican

---- — As I mentioned in last month’s column, the story of the National Security Agency’s domestic surveillance made public by Edward Snowden has legs. 

In fact, if the story were an insect, it would be a millipede. Now before you send me a nasty correction, let the record show that I know that, by definition, an insect is limited to six legs but “millipede” sounds so much better than “arthropod.” It would seem that in this brave, new digital age there should be not only millipedes but mega, giga, tera and even petapedes. No matter. Suffice to say that this story shows no signs of ending well or soon.

As of Aug 21, the latest twist to this thriller revealed that two years ago, the Foreign Intelligence Surveillance Act Court strongly admonished the NSA for sweeping up domestic- along with foreign-intelligence gathering. The crux of the issue was that, without a warrant, the NSA had no authority to spy on U.S. citizens and in fact, were violating the fourth amendment protecting citizens from unreasonable search.

I have spent some weeks researching the method that NSA must have used to intercept U.S. citizen’s phone calls, emails and other Internet transactions and could only find the political and economic aspects — how they pressured Internet providers such as Verizon and AT&T to “share” their data unbeknownst to U.S. users. There was very little information about the actual techniques applied to the data once the NSA had it in their hands. So I decided to abandon the experiential approach and apply deduction instead. After all, I had taught the database-management course in my career as computer-science professor so why not put to use what I had learned? Here’s the way I think it went:

Once the NSA had all of this data safely stored on their collection of disks, they could make the first pass over the data to create their database. The three main functions of a database system are: create, update and interrogate. In the create phase, the raw data is usually indexed for rapid retrieval during the update and interrogate phases. Indexing is a fairly straightforward operation; if you’re of a certain age, you remember thumb-indexed dictionaries to facilitate the interrogate function. For example, if you needed the definition of “mendacious” you could start your search immediately in the “M” section of the dictionary thanks to the handy thumb indentation rather than begin on page one, and search sequentially from there. Techniques similar to this are embodied in computer programs whose job it is to update and interrogate large databases — similar in kind but not in degree. These programs not only allow for multiple indexes as links to the database, but are degrees of magnitude faster than manual methods.

For example, if I am the program looking at one of your emails, I can record the time and date it was sent, your and the recipient’s email addresses as well as any keywords that have been deemed important like: “bomb,” “Syria,” “China,”... you get the idea. Next, I determine the location in disk memory where this email will be stored, but before I store it, I make a note, in the form of a list that associates each of the keywords with that disk location. This process is repeated for all of the emails in the database, and when it’s finished, we have created a table of keywords and the disk locations of the emails that contain that word:

Keyword: aardvark; location: 636542

Keyword: bomb; location: 124679, 001489, 789325

Keyword: zygote; location: 987654, 123321

Now imagine that I’m the interrogate program and my human NSA agent wants to look at all emails that contain the word “bomb.” All I have to do to make him happy is consult my table of associations between keywords and disk locations, go to each location (124679, 001489, 789325), and display the full email located there.

By this time, dear reader, you may have surmised that these keywords that link to and allow rapid access to individual emails are the metadata the NSA originally claimed to be outside the purview of the fourth amendment, as they are not the actual data itself. If you believe that, I have a lottery prize for you to claim.

Dr. Stewart A. Denenberg is an emeritus professor of computer science at Plattsburgh State, retiring recently after 30 years there. Before that, he worked as a technical writer, programmer and consultant to the U.S. Navy and private Industry. Send comments and suggestions to his blog at www.tec-soc.blogspot.com, where there is additional text and links. He can also be reached at denenbsa@gmail.com.