On the day of the Twitter (NYSE: TWTR) IPO, Paul Ford has a great piece about the metadata that is associated with every single tweet. There are 31 publicly documented data fields (number of favorites, number of retweets, and many more you probably weren’t familiar with):
This is the sort of combinatorial work that defines the modern Web: There’s so much data that there’s a very good chance that you will be the first of all humankind to find something interesting or unexpected. Whether you find something valuable is another question. But it’s surprisingly easy to become an expert in a very tiny niche as a developer—to become a world-leading expert in Android video or a specialist in Twitter geography—and charge accordingly for your services.
For all the possibilities of APIs, there are also limits. Another tweet field, “withheld_copyright,” if set to “true,” lets you know that a tweet is in trouble—that its content has raised flags and hackles over copyright. The text of the tweet, in that case, may be suppressed. The “withheld_in_countries” field can provide a list of the nations in which the tweet is banned. Another field has a telling name: “possibly_sensitive.” It’s set to either true or false. The field indicates whether a tweet links to potentially offensive things such as “nudity, violence, or medical procedures.” (If ever you wanted a snapshot of our world’s anxieties in three terms, there you have it.) As a user you can check a box on your profile so that the media you link to is automatically flagged this way. If you don’t, you risk having your pictures of your medical procedure marked as objectionable by an offended reader and thus put “in review,” the Twitter version of limbo.
A field like this indicates the inherent difficulty of managing an enormous platform like Twitter. The only way the company survives is if it can safely ignore most of what’s said on Twitter. If it had to use employees to monitor tweets, it wouldn’t last a day. But in order to attract as many users as possible, it must find ways to avoid horrifying them.
There’s a great deal of hedging in both the words “possibly” and “sensitive.” The end result is that Twitter is putting the moral burden on the user. One person’s art is another person’s smut, and Twitter is not going to decide which is which—nor is it going to force you to look at the stuff. This position is both somewhat noble for its acceptance of the range of human expression and also highly expedient, putting the responsibility back on the user: We told you the picture was “possibly sensitive,” so why did you look at it?
Much of the rest of the metadata contained in a tweet is familiar: The number of times people have starred, or “fav’d,” a tweet, the number of times it’s been retweeted. The value for “user” contains a whole huge bundle of stuff—the user’s name, a link to an avatar image, the user’s number of followers, the number of people the user follows, and whether the user is “verified” and deserves one of those blue checkmarks. It’s a pretty full portrait of an individual, given that this portrait is attached to every tweet.
From a single tweet and with no other information, you can extract a sense of social influence—how big a voice an individual has, the number of people they reach, the number of people who engaged with this particular tweet. Tweets themselves are just regular text (although text on a computer is anything but regular; there are dozens of abstractions that make it possible for an “a” to appear on a screen—but it’s safe to gloss over that). Here it is, 140 characters, a plain little beastie. You might be fooled into thinking there’s hardly anything there.
Read the rest here.