The Reach of Politics via Twitter -- Can that be real?

G. R. Boynton, Andrew Bates, Edward Bettis, Matthew Bopes, Richard Brandt, Derek Fohrman, Jeremy Hahn, Tressa Hart, Caleb Headley, Jory Kopish, Robert Maharry, Joseph Matson, Kierstin Mohoff, Rose Mraz, Matthew Palmer, Laur Pena, Brittany PHillips, Anne Rhodes, Hanna Rosman, Clint Sievers, Daniel Tate, Sean Tyrrell, Javin Villarreal, Philip Wiese, Alden Wignall

We know that Twitter has become an important venue for political communication. For example, the first presidential debate of 2012 produced 10.3 million messages posted to Twitter (Twitter Blog, 10/04/2012). That was 77% of the online communication about the debate. (Fitzpatrick, 10/04/2012) Not only is it the venue of choice for political communication, it is also a particularly good venue for trolling for campaign contributions. Twitter users who see political ads are 97% more likely to visit a campaign donation page. (Dugan, 10/12/2012) If that is not enough there is also the two-step flow of communication called following.

Two numbers set the stage.

During ten days of March, 2012 Twitter messages posted by ThinkProgress could be viewed 47 million times. (See Boynton, 5/17/2012 for the details) That was 4.7 million potential views a day. Compare 4.7 million to 2.8 million, which is the audience for The O'Reilly Factor on Fox News, and 2.3 million, which is the total viewers of The Daily Show of Jon Stewart -- the two TV persons with the largest audiences.

At the Republican Convention Clint Eastwood spoke addressing most of his remarks to an empty chair; the chair from which the president was missing. That was followed by the Obama campaign's Twitter quip "This seat's taken." This seat's taken was the most retweeted Twitter message of the convention according to Twitter. A rough estimate of the reach of that tweet was 68 million. (See Boynton, 9/6/2012 for the details) Nielsen reported that the audience for the Romney speech that evening was 30.3 million (Fouhy, 9/5/2012)

The viewership of tweets by ThinkProgress was twice that of O'Reilly and Jon Stewart. "This seat's taken" was seen by twice the number of people who watched the Romney speech. These numbers challenge our tacit assumptions about mass media and Twitter. Yes, Twitter is 400 million messages a day, but it must still take a 'back seat' to the mass medium known as television. That is the general assumption.

I could pile up more very big numbers, but in this report I will look at two challenges to the big numbers. Challenges that say there is something badly misleading about the big numbers.

Estimates of Twitter use

Pew provides an estimate of Twitter use in the U.S. based on their surveys. The most recent report was for users as of February of 2012. (Smith, Brenner 5/31/2012) They estimate that 15% of online adults use Twitter, and that 80% of adults are internet users. To compare this with the numbers above the percentages have to be converted into numbers.

US population 18+ = 308,155,000

80% on line = 246,524,000

15% of online use Twitter = 36,978,600

37 million is fewer than 47 million, and considerably fewer than 68 million, which raises the question about the counts of Twitter reach.

However, there is another way to estimate Twitter use in the U.S. Semiocast, a French social media analysis company, estimates that as of July 1, 2012 there were more than 140 million Twitter accounts in the US and it had grown from 108 million at the beginning of the year to 142 million by the end of June. (Semiocast, 7/30/2012) They used a combination of checking profiles and a sample of 1 billion tweets in June to make their estimate.

The estimates are wildly different. There are three things to say about the difference. First, they are counting different users. Pew estimates number of persons with a Twitter account. Semiocast estimates number of accounts. There are many accounts that are institutional in addition to accounts of individuals. And institutional accounts are important in political communication via Twitter. The New York Times has 6.1 million followers and 90,046 tweets as of September 18, 2012, for example. Their tweets are not all about politics, but many are. And The New York Times is only one. There are an uncounted number of institutional accounts that are doing political communication via Twitter.

Second, they are also wildly different in their estimates of growth of Twitter use. Pew estimates that the percentage of online users with Twitter accounts grew from 13% in 2011 to 15% in 2012. That seems very short of the explosive growth found by almost all other counts. Semiocast, for example, finds a much greater growth in the first half of 2012 -- 142 million is 1.32 times 108 million. The Pew growth rate seems well below what others have found.

Third, both methods used for making estimates have flaws. Semiocast based their estimate on examining 500 million profiles on Twitter. But most profiles do not include location. With the 1 billion sample of Twitter messages they can get time zones. They can check on language. But they are constructing an estimate from shards of evidence. It is not a count. Pew surveys have something of the same problem. Pew has now announced that the response rate for their surveys is 9%. (Pew Research Center, 5/15/2012) That does not mean we should ignore their results. It does mean that, just as with the Semiocast estimates, one would also want other evidence to supplement the Pew survey results.

Where does that leave us? I suggest a safe position is to assume that the number of Twitter accounts being used for political communication is somewhere between the Pew estimate and the Semiocast estimate. That leaves room for some very big numbers.

Fake Accounts

We all know there are many fake accounts on Twitter. We do know that. We just do not know how many nor how many show up in communication about politics. So, the existence of an unknown number of fake accounts is a challenge to any count of Twitter messages about politics. How many are fake? How many are not?

In July, 2012 Status People, a small British firm, made available a tool that estimates the number of fake accounts on the basis of the activity associated with the account. (Status People, 8/22/2012) They draw a sample of 1,000 accounts from the most recent 100,000 accounts that have added themselves as followers. "On a very basic level spam accounts tend to have few or no followers and few or no tweets. But in contrast they tend to follow a lot of other accounts." This, plus some other adjustments, is their specification of fake accounts. They do not specify how they determine inactive accounts, but they would be accounts that do not fall into the definition of fake but have little activity. Good accounts follow, have followers, and post messages. Most of the published reports on the use of this tool have looked at stars: Lady Gaga, Obama and the rest. My favorite is from the Irregular Times that checked on the U.S. presidential candidates. (Cook, 8/27/2012) He found that the presidential candidate with the fewest fake accounts was Jill Stein, the Green Party candidate, who had 2% fake accounts, 22% inactive, and 76% good. And 28% of the followers of President Obama are fake. Even in this fashion he is 'ahead' of his challenger since only 16% of Romney's followers are fake. Of course the president has 20 million followers as of October 2, 2012, and growing every day, compared with 1 million for Romney.

To estimate the extent of fake followers in political communication using Twitter two different data collections are employed. One collection involved a high profile political event. Twitter messages during the evening of the Romney acceptance speech were generously gathered for me by Mike Jensen. The entire political media was focused on this event. It is very different from the other collection. The other analysis is based on eight searches with a sampling of messages every day for two months. The eight searches cover a diverse array of subjects, but none is like the high profile event of the evening of the Republican National Convention.

Fake Accounts at the Republican Convention

The evening Romney gave his acceptance speech to the Republican Convention Mike Jensen collected a sample of Twitter messages that mentioned Romney, in one or more forms, and Obama in one or more forms. He accessed the Twitter streaming API and collected a sample of 591,462 tweets. This is campaign communication at a specific point in the campaign. It is not simply communication about the Republican Convention, which is evident since 361,507 messages posted to Twitter mentioned Romney and 309,395 mentioned Obama. A subset mentioned both since the total exceeds the number in the sample. Along with other information the number of followers of the person posting each tweet was collected.

There is a tremendous range in number of followers. The fewest are zero since a person without followers can still post a message. At the top is President Obama with 19,068,078, as of that evening. There are two ways to get a sense of the range. One is to look at the people at the very top.

Follower Count for Top Twitter Accounts

Follower Accounts

Number of Followers

Obama - barackobama

19,068,078

CNN Breaking News -cnnbrk

8,619,387

New York Times - nytimes

6,003,374

CNN - cnn

5,778,905

Perez Hilton - PerezHilton

5,501,293

Breaking News - breakingnews

4,555,200

will.i.am - iamwill

4,222,143

Eva Longoria - evalongoria

4,158,080

Time.com - time

3,831,731

Peter Cashmore - mashable

2,987,684

Anderson Cooper - andersoncooper

2,962,324

Total

67,688,199

Two features of the table are noteworthy. One, these are user accounts with a great many followers -- millions all. Two, with the exception of Obama, Perez Hilton, will.i.am and Eva Longoria they are accounts from the news media. Peter Cashmore and Anderson Cooper are the two news persons on the list, and the rest are news organizations that have Twitter accounts.

These are at the top of a tremendously skewed distribution. Neither a mean nor a figure are much help in grasping the distribution. One way to get a better feel for the distribution is to use quintiles. The twitter messages were sorted from most followers to least followers and then divided into quintiles. The mean number of followers for each quintile is presented in the next table.

Romney night mean followers by quintile

0 to 120,000

120,001 to 240,000

240,001 to 360,000

360,001 to 480,000

480,001 to 591,464

32

80

300

733

16,227

Among the 120,000 with the fewest followers 3,168 had zero followers. The average number of followers for that fifth of the Twitter messages was 32. There is a gradual increase in the mean from 32 to 80 to 300 to 733. Then there is a huge jump to a mean of 16,227. Since the quintile with the highest mean contains all of the accounts with multiple million followers it is clear that the distribution in the top quintile is itself severely skewed.

What about fake followers? I examined fake followers in two sub-distributions. The top 200 Twitter messages were examined for fake followers. And 100 user accounts with approximately 1,000 followers each were examined. You give the information to the People Status software one account at a time. That makes a complete inventory of 590 thousand accounts infeasible.

The top 200 tweets were chosen because these are the messages that are going out extremely widely. If you sum all followers for all Twitter messages and compare this with the sum for the top 200 the top 200 messages have 32% of the total followers for the entire sample. What happens with these Twitter posts makes a very big difference in the flow of messages on this occasion. The 200 messages were posted by 56 user accounts, which is an average of just under four posts for each during the evening. Forty of the accounts are related to the news industry either as an institutional account of as an account of a person known by working in the news industry.

What could one expect for the Twitter messages produced by these high visibility accounts? First, these are the kind of accounts bots attach followers to. Being associated with people who are well known seems a road to respectability in a world where there is otherwise very little information. So, the expectation is that there would be a high level of fake followers for these accounts. Second, this is the best of the news industry. Many Twitter users are likely to follow these sources in much they same way they 'follow' them on TV. With TV they watch but do not respond. With Twitter reading and not responding should be a big element of their large following. So, one would expect inactive followers to be as high or higher than fake followers.

Top 200 Twitter Messages Romney Night

Fake

Inactive

Good

Top 200

20.75%

43.93%

36.32%

Number messages

130,253,663

275,761,129

227,990,990

Twenty percent of the followers of these messages are fake followers. The percentage of inactive followers is 44%. In writing about the followers of 'stars' authors have frequently come close to characterizing the inactive followers as also not real; accounts that were opened and then the persons never returned. But that seems a somewhat less likely interpretation in this case. We know that many people use Twitter as a source of very fast news. (Pew, 9/27/2012) Since these 200 messages are largely from sources of the news media it does not seem unexpected that 44% of the followers are inactive. This interpretation is also consistent with the report of the CEO of Twitter. Dick Costolo announced that 40% of the people with Twitter accounts do not tweet, but use Twitter to follow others. (Long, 10/12/2012) And 36% of the followers are good; they follow, are followed and post messages to Twitter. The raw numbers are taken from a sample. Comparing the size of this sample with the report about number of messages from Twitter this looks like roughly a one-third sample. (Boynton, 9/20/2012) In this sample the 500 million that are inactive or good is a very big communication stream. It is not the number of people receiving a message, but it is a measure of the flow of communication that evening. Messages were going out at a very high volume to followers of the high profile user accounts.

User accounts with 1,000 followers are not Twitter stars. Unlike the previous analysis these are user accounts, and the numbers refer to accounts. While they are not stars they are much more active on Twitter than usual. In terms of numbers of followers they are at the top of the fourth quintile. So they have more followers than about 70% of Twitter messages.

What should one expect for these accounts? They are not famous enough to attract bots and other devices for assigning them fake followers. So fake followers should be low. If we understand the Twitter experience of these accounts as primarily news reporting then we should expect a large number of inactive followers as with the stars. If we think of their Twitter experience as very active participation in communication with both reading and writing then the expectation would be for fewer inactive followers and more active followers.

Accounts with 1,000 Followers Twitter Messages

Fake

Inactive

Good

Accounts 1,000

5.84%

13.15%

82.66%

The percentage of fake followers drops from 21% to 6%. The number of inactive drops from 44% to 13%. And the number of good accounts increases from 36% to 83%. These are very big changes as you move from the star accounts to people of middle range activity. The have many fewer fake followers and far more good accounts following them. Listening and having something to say is the way to attract followers who also want to listen and have something to say. The user names suggest that these are primarily individuals rather than institutions. However, the task is building a basis for estimating the incidence of fake accounts in political communication using Twitter so investigating these accounts further would side track the primary task.

What does this analysis suggest about fake followers in political communication? This was a special event and they do not happen very often in politics. The next events like this in the presidential campaign will be the debates between Obama and Romney.This kind of spike occurred with all of the debates in the Republican nomination campaign. (Boynton, 3/23/2012) Many more people are following and commenting on politics when such an event occurs. And the 'leading figures' are out. The number of fake followers is certainly high for the stars, but one could imagine it higher than 21%. Losing one-fifth reduces the estimate of the reach of these messages, but that is only for the leading figures. Once down to the activist level the reduction of reach has fallen to five percent. In this case it is a 21% loss for one third of the messages and a good deal smaller loss for the other two-thirds.

Ongoing Streams of Communication

The next analysis is based on ongoing streams collected by TweetTronics during July and August of 2012. They take a small sample of messages every day, but over two months the collections become rather large. The search terms for the eight streams are: barackobama, Romney, Palin, #P2, #Teaparty, RT @ThinkProgress, RT @nytimes, RT @MoveOn. The choice of streams was intended to produce a diverse set. Three are politicians: barackobama, which is the official user name on the Twitter account, Romney, and Palin who still has a substantial following using Twitter. #P2 is the progressive hashtag and #Teaparty is a hashtag for that movement. These are two political 'dispositions,' which are different streams than the ones about the politicians. Finally, there are the Twitter messages retweeting three different 'political media' organizations: ThinkProgress 'runs' on social media for liberal causes. MoveOn has been a very successful liberal organization effectively using email. And then the New York Times. If general statements are appropriate across this diverse set of collections that would be impressive evidence for the general statements.

The analyses of the 200 user accounts with the most followers for each of the eight streams were done by students who share authorship. I added the counts for the 100 user accounts that had aproximately 1,000 followers.

The collections are samples so the raw numbers are very partial totals. However comparisons between the collections does give an indication of their relative size.

total user accounts

total followers

average followers

barackobama

149,889

249,924,216

1667

RT @nytimes

134,339

186,646,099

1389

romney

122,081

318,012,582

2604

palin

119,236

201,333,569

1691

RT @thinkprogress

51,615

51,445,535

996

#p2

39,948

52,668,719

1318

#teaparty

28,249

39,232,240

1389

RT @moveon

8,585

11,928,535

1389

The search for Twitter messages containing barackobama found the largest number of individual user accounts, at 149,889, with retweeting nytimes messages, search for Romney and for Palin close behind. Many fewer user accounts were involved in the messages that retweeted ThinkProgress messages, searches for #p2 and #teaparty, and even farther down was retweets of messages of MoveOn with only 8,585 user accounts. The total followers is the sum of followers of each of the user accounts that posted a message to Twitter mentioning the search term. While there are substantial diffferences in the number of user accounts there is much less variation in the average number of followers per user account. The user accounts found in the search for Romney had substantially more than any of the others with 2604 followers per account. Then there were two at 1,600s, barackobama and palin, the rest were in the 1300 except for the user accounts that retweeted ThinkProgress messages with an average of 966 followers.

Not only is there variation in the focus of the streams of communication, there is also considerable variation in the size of the streams. It is variation across two important dimensions for characterizing streams of communication.

The number of followers per user account is highly skewed in each of these distributions.

barackobama

Romney

RT @moveon

Palin

RT @nytimes

#p2

#teaparty

RT @thinkprogress

53%

52%

51%

47%

40%

38%

38%

38%

The table gives the percentage of all accounts that follow the top 200 . The extremes in the table are readily explicable. Obama and Romney streams are most likely to include messages from the major news organizations such as CNN, CBSNews, BBCWorld, and others with large followings. But they are also more likely to be mentioned by celebrities such PerezHilton, EvaLongoria and DanielTosh who also have very large followings. #p2, #teaparty, and thinkprogress are fringe elements of US politics. Hence they are less likely to have posts from Obama or major news organizations or major celebrities. But even for the 'fringe' streams the top 200 still get 38% of all followers of posts in those streams.

The challenge is interpreting the very large number of recipients of Twitter messages about politics from these streams. To what extent are those numbers produced by large number of fake accounts? The table shows the number of fake accounts for each of the streams of messages.

Fake Accounts

 

barackobama

Romney

RT @moveon

Palin

RT @nytimes

#p2

#teaparty

RT @ thinkprogress

Top 200

26%

23%

5%

19%

28%

16%

16%

18%

1,000

2%

3%

2%

2%

2%

1%

1%

2%

The three that are over 20% are RT @nytimes, with 28% fakes, barackobama, with 26% fakes, and Romney, with 23% fakes. These are the most prominent subjects of the eight. They are the ones most likely to have been mentioned by user accounts that attract bots that set up fake accounts. It is easy to illustrate how this happens. This is the number of followers and the number of fake accounts for the 10 user accounts with the most followers that mentioned Obama in a tweet. BarackObama is clearly at the top with 19 million followers, and that was to increase substantially during the election campaign.

User Account

Followers

Fake Followers

BarackObama

19,027,066

36%

danieltosh

6,216,847

18%

CNN

5,657,090

39%

PerezHilton

5,358,086

22%

EvaLongoria

4,101,633

29%

BBCBreaking

3,855,283

27%

SarahKSilverman

3,126,763

14%

mashable

2,981,526

9%

CNNEE

2,972,160

27%

carmeloanthony

2,450,160

15%

 

 

 

obama2012

190,529

7%

Like Obama the rest are stars. CNN, CNNEE, and mashable are media 'stars.' And the rest are entertainment stars including Carmelo Anthony an NBA entertainment star. Stars attract fake accounts.

The other way to illustrate this is by comparing the number of followers and the percentage of followers that are fake for BarackObama and Obama2012. Twenty million followers for barackobama is accompanied by 36% fake followers. One hundred and ninety thousand for obama2012 only attracts 7% fake accounts.

The streams mentioning the less famous -- Palin, #p2, #teaparty and RT @thinkprogress -- are included in Twitter messages with fewer fake followers. For them it is between 16% and 19%. So fake followers are still a big part of the reach of messages mentioning them. But it is a smaller part than for the stars. RT @moveon is the interesting deviant case. One might suggest that this is what little social media fame gets you. They have, by far, the fewest user accounts retweeting their messages, and those accounts have, by far, the fewest number of fake followers.

The contrast between the top 200 accounts and the accounts that have approximately 1,000 followers is striking. These accounts, drawn from the same streams, have one or two percent fakes. Only the accounts from the Romney stream has a many as three percent fakes. This is very similar to the findings from the Romney speech. By the time you get to accounts with a thousand followers you are in a different world from the stars. They have no attraction for fakes.

What about good accounts; the accounts that follow, are followed and post messages? The table gives the percentage of followers receiving messages mentioning each of the eight subjects that are good accounts. It gives the percentages for the top 200 accounts mentioning each of the eight and for 100 for each subject with approximately 1,000 followers.

Good Accounts
 
barachobama
Romney
Palin
RT @nytimes
#p2
#teaparty
Top 200
44%
38%
44%
40%
59%
66%
1,000
87%
83%
86%
85%
93%
94%
             

The pattern is straightforward. The 'stars' -- barackobama, Romney, Palin and RT @nytimes -- are mentioned in Twitter messages that go to only 40% good user accounts. That is balanced by roughly 25% of user accounts receiving messages mentioning them that are fake accounts. That leaves roughly 30% as inactive accounts, which is in the neighbrohood of what Costolo tells us we should expect. The 'fringe' -- #p2, #teaparty, and RT @thinkprogress -- are mentioned in messages that go to followers who are considerably more likely to be good accounts. The messages mentioning #teaparty go to a high of 66% good accounts.

The user accounts with approximately 1,000 followers have almost all good followers. The accounts mentioning the stars have followers who are good accounts in the 80% range. The fringe have followers who are 90% good. They have almost no fake followers. They are overwhelmingly connected with other good user accounts.

Conclusion

The reason to examing the validity of the big numbers is that they portend a dramatic reorganization of the public domain. We are moving from a broadcast-audience public domain to a much more elaborate co-motion in which broadcast media and social media interact in very new ways. (Boynton, 10/15/2012) If the numbers are as big as they appear to be then 'public' is going to become something new. One writer quipped that post-debate TV anchors are now basically for telling people what was said on Twitter during the debate. But people are not going to need this service much longer because they are engaged in the communication during the debate. They will not need someone to tell them what they have seen first hand.

So, what should be the conclusion?

What about the number of accounts? There are two estimates: Pew estimates 37 million and Semiocast estimates 170 million. Both are rough approximations. Since it is easy to find followers of messages that add up to numbers in between the two in between seems a safe estimate.

What about fake accounts? For high profile communication it seems reasonable to estimate that roughly 50% of the total number of followers are following the top 200 users who post to Twitter, and half are following people with less stardom. The top 200 seem to have about 25% fake followers and 30% inactive followers. But the rest seem to have very different numbers. They are followed by very few fake accounts, and their followers are overwhelmingly good, or active, user accounts. If you count only the fake followers of the top 200 then the total would be reduced by 12.5%. That could be increased to 15% or slightly more due to the fake followers of the not top 200. It looks like no more than 20% fake followers in those large numbers. So you could reduce the number who saw "This seat's taken" from 68 million to 55 million. That is still substantially more than saw the speech on TV. RT @thinkprogress is mentioned in messages by user accounts that have 18% fake followers. That would suggest 3.9 million receiving retweets of ThinkProgress tweets instead of 4.7 million. But that is still greater than the leading lights on TV.

The principal conclusion is that fake accounts are found in communication about politics. They exaggerate the number of recipients of Twitter messages because fake accounts do not read. Only the not-fakes read. However, it appears that the most one would expect is about 20%. That is a substantial number, but it is not so large that the large number of recipients of Twitter messages now looks small. The public domain is becoming something new.

References

Boynton, G. R. (10/15/2012) Voice and the Reconstruction of the Public Domain

Boynton, G. R. (9/20/2012) Really big Twitter numbers: the flow of commotion/communication

Boynton, G. R. (9/6/2012) The reach of "This seat's taken"

Boynton, G. R. (5/17/2012) Political Organizations and the Magic of Retweeting

Boynton, G. R. (3/23/2012) Triggers and Surges

Cook, Jim (8/27/2012) Jill Stein is the Presidential Candidate with teh Fewest Fake Followers on Twitter, Irregular Times

Dugan, Lauren (10/12/2012) Twitter Users Who See Political Ads Are 97% More Likely T Visit a Campaign Donation Page, AllTiwtter

Fitzpatrick, Alex (10/04/2012) Twitter Dominated Online Chatter About the Presidential Debate, Mashable

Fouhy, Beth (9/5/2012) Republican Convention Ratings Plummet from 2008, Huff Post Media

Long, Mary C. (10/12/2012) Twitter CEO Dispels Biggest Micconception About Twitter And About Fake Followers, AllTwitter.

Pew Research Center for the People & the Press (9/27/2012) In Changing News Landscape, Even Television is Vulnerable

Pew Research Center (5/15/2012) Assessing the Representativeness of Public Opinion Surveys

Politicususa (6/4/2011) Jon Stewart's Ratings Are Now Higher Than All Of Fox News

Semiocast, Twitter reaches half a billion accounts; More than 140 millions in the U.S.

Smith, Aaron and Joana Brenner (5/31/2012) Twitter Use 2012, Pew Research Center

Status People (8/22/2012) Find Out More

Twitter Blog (10/04/2012) Dispatch from the Denver debate

© G. R. Boynton, October 25, 2012