Two Structures: Times Square and #Teaparty

The car bomb fizzled, and an intense search for the incompetent bomber ensued. He turned out to be as incompetent at escape as he was at making bombs. So after two days he was whisked off to the top secret 'place' where terrorist bombers are interrogated.

Equally intense was the commentary. The media was reporting 24/7. However, since the search was not being carried out in public there was very little to report. There were occasional rumors, a statement or two from 'sources,' and a lot of filling the hours with speculation and comment.

And 'the world' said, move over media. We can do our own commentary. So Twitter was very busy as people commented on the events as they learned them. The figure gives the shape of that commentary in time for May 1 through May 5.

Figure 1. Twitter messages over time from Trendistic May 1 through May 5

There was a sharp spike Saturday night and Sunday morning. Then the messaging declined until Tuesday morning when the announcement came that the culprit had been captured. The number of messages declined again, spiked again and then fell to only a few hundred an hour. The total number of messages exceeded 115,000, and each was transmitted to, on average, 1,000 followers [Boynton, March 16, 2010]. If three zeros are added to the number of messages sent you get a big number, 115,000,000, for the reach of this messaging.

On Twitter the words "times square" constituted a domain of communication. It was a domain with a particular structure, and I am going to compare that structure with the domain constituted by #teaparty, which is very different in structure. These structures, and others that might be investigated, are important because of how they help us understand what is going on in this new medium called microblogging.

Times Square

What did they say as they were following events? To answer that question I captured 1,500 tweets containing the words times square, usually as a phrase, between 1:18 p.m. and 3:20 p.m. on May 5, 2010. That was 750 tweets an hour, which is a pretty fast flow of messages.

Pakistan menahan orang-orang terkait bom Times Square [elshinta]: Elshinta Newsroom, Pihak berwenang Pakistan dila...

Latest: Bloomberg lobbies for tighter gun control laws following Times Square bomber bust http://bit.ly/9dKnMr

New blog posting, U.S. Attorney General Holder: Times Square Bomber Faisal Shahzad Fully Cooperating with Authorities -

Man Who Alerted Police To Failed Times Square Bombing Is A Muslim Immigrant http://bit.ly/bIuL89 (via @RayBeckerman)

The four above can be used to make three points about the messages. One, it was global communication; it was not all English. Two, there were regular reports about actions of the authorities. Often the reports came via the news media, and sometimes directly from the authorities. And most tweets came from citizens pointing out and commenting on some bit of information about the ongoing efforts connected with the attempted bombing.

The content of the messages covered everything one might imagine being commented on. There are so many elements of the events covered that there is not much point in summarizing the content of 1,500 messages much less 115,000. However, there is comparative information about the structure of Twitter communication that can help specify the nature of the domain that became tweeting about Times Square.

There are two sets of results from other research that can help clarify the structure of the communication about Times Square. In the spring of 2009 Microsoft drew a sample of 700,000 Twitter messages covering a 6 month period [Boyd, 2010]. In addition, I have collected and reported on 125 streams of Twitter messages concerned with political matters [Boynton, April 22, 2010].

A first question about this domain is its size? Is 115,000 messages a 'big deal' or is it a very ordinary stream of messages? For the 125 streams I studied the distributions were highly skewed so I divided them into quintiles. The top quintile for messages in a stream ranged from 44,000 to 585,000, and there were very few at the top end of the range. So, one can say about the stream about Times Square that it was big relative to other streams about political matters. The unsuccessful bombing of Times Square and subsequent events got a lot of attention from tweeters.

The users of Twitter have invented conventions -- procedures not supplied by Twitter -- to facilitate communication. Three of the conventions will be helpful in assessing the structure of this domain of communication.

This is a domain of pointing to and commenting on the rapidly changing scene in New York. One way to justify that claim is to note the percentage of messages containing a url, or http://, which is a reference to a file not on Twitter. The file might be a document or an image or music or video or any other type of document on the web, that is, a document which has a url. The file might have been produced by news media, government agencies, or individuals. But the url in the message is used to point beyond the tweet to the other. One can understand the tweet as the gist of a larger point made in the file that is referenced. The Microsoft research team found that 22% of tweets contained a url. In the political streams I investigated the range was from 29% to 98% containing a url. The political streams started above the global standard and ranged far above standard practice. Eighty-five percent of the messages about Times Square contained a url, which put them at the bottom of the top quintile. Three of the messages quoted above were 'pointing to' without much in the way of commentary. This is an example of reflection on the 'outside source' identified by the url in the message.

Media Ignores Fact That a Muslim Man First Alerted Police to Failed Times Square Bombing http://ow.ly/1HoCm #Islam #tcot

This particular comment 'took off' and was repeated frequently.

The second convention reinforces the point that this should be understood as a domain of pointing to and commenting on. That convention is the retweet, which is most often symbolized by RT @[username]. In the Microsoft research they found that 3% of tweets were likely to be retweets. The range for the 125 political streams was 4% to 72%. People doing politics are doing much more retweeting than standard. For the Times Square stream 37% of the messages contained RT @[username], which put them in the middle quintile for political streams. From the point of view of the individual retweeting is: I ran into something I found interesting, and I want to pass it on to people who may not encounter the tweet I saw but will see mine. So, it is definitely an act of pointing to, where the tweet pointed to might be either a bit of information or a comment. In terms of the operation of the system retweeting is important because redundancy is essential for information to flow through a loosely connected network [Boynton, March 16 2010]. Twitter is definitely a loosely connected network, and without the redundancy of many similar messages any single message would die very quickly. One version of the tweet about the media ignoring the faith of the man who discovered the bomb was retweeted 195 times in this stream of 1,500 messages, which shows the potential reach of retweeting.

The third convention is the hashtag, which is symbolized with #[some characters]. Like any convention the use is use in practice. In political messages hashtags are used for two overlapping purposes. Hashtags both identify a subject and persons with whom you may wish to communicate. What makes this go is that hashtags are easily searched in the giant bundle of text that is Twitter. If you search for "hcr" you will get many words that contain those characters in that order as well as the messages that are identified by using #hcr as a reference to health care reform. Very few words begin with # so sticking a # on the front of a set of characters improves search considerably. If you wanted to communicate with people interested in health care reform #hcr became the way to do it. As a result there are, in early May 2010, 615,000 tweets that contain that hashtag, and it is still going. You find messages about health care reform by searching for #hcr, and other people find your tweet about health care reform when they find the #hcr you included in your message. In addition to using hashtags to identify a subject they are also used to identify a political disposition. #Teaparty, which I will analyze next, is a good example of a political disposition, but so is #p2 for 'progressives' or #gop or #dems. I use disposition because you cannot do ideology in 140 characters; it is disposition rather than developed argument. What you get is only the gist with, very often, an emotional encasing. It is important to note that anyone can include #hcr or #teaparty in a tweet. And you can put both in. So individuals with different views of health care reform can find each other to argue and individuals with different dispositions can find each other knowing something of the disposition that is being used to join the communication.

The goal in both uses of a hashtag is finding people with whom you want to communicate. If all you wanted was information you would be better served by going to Google or Bing or any of the other search engines that will provide you with sources that are carefully selected for reliability. And if you were in information retrieval you would approach Twitter like this:

Micro-blogging services such as Twitter allow anyone to publish anything, anytime. Nonetheless [needless, sic] to say, many of the available contents can be diminished as babble or spam. However, given the number and diversity of users, some valuable pieces of information should arise from the stream of tweets. Thus, such services can develop into valuable sources of up-to-date information (the so-called real-time web) provided a way to find the most relevant/trustworthy/authoritative users is available. [Gayo-Avello, 2010]

You have to distil the 'true information' from the "babble and spam" if information retrieval is what you are after. While individuals will certainly learn from twitter messages addressed to a subject or a disposition it is communication that makes Twitter go. This is the same drive that led email to become nearly universal where access to the internet is available or that leads individuals to go through their day with a mobile phone firmly attached to their ear. Human beings communicate with other human beings. Twitter facilitates communication.

The Microsoft researchers found that 5% of the messages in their sample contained a hashtag. In the 125 streams of messages I studied the range was from 1% containing at least one hashtag to 100% containing at least one hashtag, and the more prevalent the use of hashtags in a stream of messages the more likely the messages are to contain more than one hashtag. In this stream about Times Square only 13% of the messages contained a hashtag. While this is well above the standard practice of 5% it is on the edge between the lowest and next to lowest quintile for political streams. Apparently, "times square" was sufficient to locate the messages being posted about the incident in Times Square and subsequent actions taken by the U.S. government. No hashtag seemed to be necessary to participate in this stream of communication. Moreover, this appears to have been a self contained domain of communication. There are very few cross references, which is one of the functions hashtags play. The analysis of #teaparty will show that stream to be full of cross references. The two streams are very different in this way.

After the events of May 1 "Times Square" became a domain of communication with a particular structure. Like other streams of political messages it used the conventions of communication much more fully than standard practice. It is particularly high in use of urls. It is in the middle with retweets. And it is low in hashtags. As a stream it was easily constituted without the use of a tag invented for the occasion. It was about pointing to and commenting on the events. And it was self contained as a stream.

#teaparty

Welcome to #teaparty

Three points are illustrated by the quotations. One, this is a partisan crowd. "Progressive" is clearly a designation of disapprobation. Two, they were reaching out for communication. "Check this out" one says. Twenty-seven hastags were used in the 5 messages. Five of the 27 were #teaparty because that was the search term, but the other 22 were additional references to dispositions. Each identifies a way one can be found by individuals with whom one would like to communicate. In these five tweets most of the hashtags refer to conservative dispositions, but #p2, #tlot, and #p3, which are progressive designations, are also included. Three, they are strongly argumentative with phrases like "defeat truth with loudness" and "traitor media" and "vote em out in Nov." There is no doubt where the authors stand.

Teaparty is both an historical allusion and an acronym. The acronym comes from Taxed Enough Already Party, but it seems obvious that it is used because of the allusion to the Boston tea party in which the people asserted that they were wresting control of their country from foreign powers. In the current useage the 'foreign powers' are the Democrats and the rhinos [Republicans in name only]. It has a brief history having started early in 2009 [Wikipedia]. I have captured tweets that included the hashtag #teaparty beginning with December 9, 2009. The figure below gives the average tweets a day per month from December 9 through May 10, 2010.

 

Figure 2. Number messages per day containing #teaparty
Dec
Jan
Feb
Mar
Apr
May

There is a huge surge in the use of Twitter by people including #teaparty in their messages. It started at 917 per day in December, climbed to 7448 in April, and then receded a bit to 5918 in early May. #teaparty is the movement on the web, and it is flourishing.

As a roiling, rambunctious social movement with a remarkable presence in microblogging it is very interesting. But I want to use it here as a contrast with the communication domain that was constituted by messaging about Times Square. I selected 1500 tweets containing #teaparty between 10:46 a.m. and 3:17 p.m. on May 5, 2010. That was the same day and approximately the same time that I selected the tweets containing Times Square. The stream was 333 messages per hour instead of 750 per hour.

I will use the conventions developed by tweeters to make the comparison and draw the contrast. The Times Square domain was most distinctive in pointing to and commenting on external documents which they did using a url. Eighty-five percent of the messages containing Times Square also contained a url. While the #teaparty messages are far above standard practice only 63.4% contained a url. That is substantially lower than in the Times Square communication. The two domains were similar in terms of retweeting. Forty percent of the #teaparty communications contained RT @ compared with 37% for the Times Square communications. Where they diverge the most is the use of hashtags. Only 13% of the Times Square stream contained a hashtag. The average number of hashtags per message containing #teaparty was 4.5. One might discount #teaparty since that was the search term used to select the tweets. But even if you subtract #teaparty from the total there are still 3.5 hashtags per message.

What does this indicate about the domain being constituted by #teaparty? First, the assumption is that individuals are searching Twitter messages to read those they find interesting. If that was not the assumption then there would be no point in including the hashtag except, possibly, as a way to file your own messages. If people are searching Twitter messages then each of the hashtags is a way that people searching can find your message. The more hashtags the more ways there are to find your message. So the individual uses the hashtags that have become vocabulary in use, by convention, to invite communication with people also interested in messages containing the hashtag. This is a domain inviting interaction much more fully than the Times Square domain. Each of the hashtags has a somewhat different reference. In this stream some are dispositionally close together and some are dispositionally far apart. This then is a domain with a much more elaborate structure than the Times Square stream.

To track this more elaborate structure I chose the 9 hashtags that were used most frequently in the original 1500 messages and did a search to obtain 1500 messages of each. The result was ten sets of 1500 messages drawn at approximately the same time. Since each message can contain more than one hashtag there is some duplication in the sets. However, the overlap and not overlap in the messages is just what I want to find. The extent of duplication is the elaborate structure of the domain.

Table 1. Number of messages using each of the other hashtags

The table shows how many times messages with one hashtag contain the other hashtags. Since each set contains 1,500 messages all of the numbers in the table are directly comparable. The table is not symmetric. Of the messages that contained #912, the first row, 683 contained the hashtag #p2. But of the 1500 messages that contained #p2 only 42 contained the hashtag #912. This asymmetry is possible because the sets were selected in independent searches.

First, look at the independence of the two domains. Of the 1500 messages containing Times Square -- the bottom row -- 0 contain #912, 0 contain #beck or #glennbeck, 1 has #gop, 1 has #iamthemob, 0 contain Obama, and the lack of cross reference continues across the columns. These are two very distinct streams of communication.

The most frequently used hashtags in messages containing #teaparty are all dispositional if you understand references to Glenn Beck and Palin as dispositional. The exception is Obama who is usually the noun in a brief statement. The subject matter can be the trends of the day, and these are messages that give an interpretation of the events of the day from the point of view of a particular disposition.

There are many interesting individual comparisons in the matrix, but the important point is the overall structure of referencing behavior. The densest part of the matrix is the top right hand corner, but there are connections everywhere. The closest to isolates are Obama and #iamthemob, which is an interesting pair. But that is only in terms of hashtags that point to them. #iamthemob has hashtags pointing out at the median rate, and the messages containing Obama have three times as many hashtags with the other streams as the other streams have with them. The messages containing Obama are referring to other streams in a way that makes it hard to consider them isolates. So, the first thing to note about the structure is that there are connections everywhere.

A second feature of the matrix is the cross referencing of different dispositions in the same messages. One-third of the messages that contain #gop also contain the #p2 hashtag. The same message refers to both the grand old party and to the chief identification of progressives on Twitter. The persons who wrote those messages are inviting both republicans and progressives to read the message. The progressive hashtags are #p2 and #tlot and Obama. For the messages containing #912 683 also have #p2 and 949 have #tlot. #beck messages have fewer cross references. #gop has roughly 500 of each. #iamthemob has only 228 #p2 hashtags but has 538 #tlot hashtags. And you find much the same reading through all rows of the matrix. There is also a great deal of cross referencing within dispositions, but that is more expected than referencing across dispositions. A study of linking among 1000 conservative and liberal blogs found that 91% of the links were within disposition and only 9% across [Adamic and Glance, 2005]. It appears the communication structure of the conservative-liberal world on Twitter is quite different from that of blogs.

This is a domain with a high potential for interaction. It is not the interaction of conversation, however. There is a way to address a specific person, which would indicate something like conversation, but that occurs in only 160 of the 1500 #teaparty messages. Instead it is interaction as in public discussion. Twitter is constituted as a public space in which messages flow to and from individuals wanting to communicate about a topic. It opens up the public space that has been dominated by television and some newspapers for fifty years to individuals interested in having their say and attending to the say of others.

Conclusion

This is the conclusion of the earlier paper. It is the conclusion of this paper as well.

As political scientists what we have to see in this messaging is an augmented public domain. The public domain has been monopolized by the few -- the media and whoever they were willing to attend to -- because of technology. There are still countries, China and Iran are two good illustrations, that want to close down any opening of a public domain to citizens. But what has been opened is very unlikely to be closed. 'Leaders' are going to have to move over as the not-leaders enter the public domain in ways that were not possible in the past. This public domain is even messier than the domain dominated by TV in the US, which is itself taking on a very different form than the practices of the past. If you like neat you are not going to like this new world very much [Boynton, April 22, 2010]

What I did in this paper that adds to this conclusion is to look specifically at two cases of the opening up of the public domain. They were chosen because I was confident they would be dramatically different in structure. Though in different ways. each is an opening of the public domain. Times Square is a domain of pointing to and commenting on incidents of great interest as they happen. News media have had a monopoly on pointing to and commenting on, but that monopoly is broken as individuals add their voices to the voices of the news media. #teaparty is a domain of public discussion in which the interpretation of the events of the day are argued out with agreement and disagreement rampant in the discussion. And when it comes to 'not neat' it is a domain filled with very different views of the facts as well as interpretation of events.

References

L. A. Adamic and N. Glance, 'The Political Blogosphere and the 2004 U.S. Election: Divided
They Blog', Annual Workshop on the Webloging Ecosysteml, WWW2005, Japan, 2005.

Danah Boyd, Scott Golder, Gilad Lotan, "Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter," hicss, pp.1-10, 2010 43rd Hawaii International Conference on System Sciences, 2010

G. R. Boynton [March 16, 2010] Sarah Palin did what? The Importance of Redundancy

G. R. Boynton [April 22, 2010] Politics Moves to Twitter: How Big is Big and Other Such Distributions, paper presented at the annual meeting of the Midwest Political Science Association.

Daniel Gayo-Avello (2010) Nepotistic Relationships in Twitter and their Impact on Rank Prestige Algorithms, arXiv:1004.0816v1

Wikipedia [May 12, 2010] Tea Party Movement, but subject to change as all Wikipedia entries are.

© G. R. Boynton, 2010
May 13, 2010