Getting historical Twitter data since the beginning of the social network is a complex, yet not impossible task. Several options are available, some of them offering just the raw data, others offering important Twitter analytics about all the historical information collected.
So, check this article out to get all the data you need about tweets from the past and how to best access them.
- What Is Twitter Historical Data
- Twitter Historical Data - Why Is It So important?
- Does Twitter Save Its History?
- Buying Twitter Data
- How to Get Access to Twitter Search History Since 2006
- How to Get Historical Twitter Data?
- The Easiest Way to Get Access to Old Tweets and Twitter Historical Data
- Twitter Historical Data Case Study
- The Twitter Metrics to Track within a Twitter Historical Report
- What Historical Twitter Data Do You Need?
- Is Twitter Still A Relevant Social Media Platform?
- Conclusions - Twitter Historical Data Without Limits
What Is Twitter Historical Data
Twitter historical data includes all the tweets and retweets that have ever been published, together with valuable insights about them. While Twitter offers APIs that can help you get all the historical tweets in raw data, tools like BrandMentions offer you not only all the tweets ever published but also metrics and insights about them.
According to a recent statistic, there’s roughly half a billion tweets being broadcasted every day. That sure is a lot. But should we care? Very much so. Not just because of the sheer volume of information that is published on Twitter, but also because of the cultural, economic, and social impact it has had since its creation.
Usually, tweets show up in reverse chronology, with the newest tweets at the top of your Twitter feed, and the older ones pushed towards the bottom.
Everything is about being in the now, which has made Twitter not only a hugely influential source of news but has often proven to be the tool of choice of newsmakers themselves.
Twitter Historical Data - Why Is It So important?
For many agencies and experts, analyzing historical data from Twitter (and other social networks) has become very important in understanding the markets and making decisions based on it.
So, what kind of insight can you get out of historical data? Here are a few possibilities:
- you can get a better understanding of how Twitter and its algorithm work and what can be effective (or not) as a marketing strategy;
- you can find examples of successful or viral campaigns from the past that you can use as inspiration for your own;
- you can understand the evolution of a campaign or company in a wider context;
- you can study communities or networks;
- you can identify the various levels of influence of tweets or people;
- officials can use it as an alternative means of communication and information during natural disasters;
- researchers can study how political or social trends develop and change over time, etc.
It’s not just news that can benefit from Twitter, though. It can also be useful for marketing and PR purposes, giving your audience valuable content before they even become customers. The character limit can also be a blessing in disguise, as it forces you to create short and memorable ads, like a shout-out to a webinar your business is conducting, or a free e-book.
Twitter can also help you quickly figure out what your client's competitors are discussing, to ensure your client is up to date on industry trends, and participating in the larger conversation.
Does Twitter Save Its History?
According to Twitter’s own blog , Twitter data gets stored on an intricate infrastructure made up of:
- Apache Hadoop
- Manhattan (the backend for Tweets, Direct Messages, Twitter accounts)
- Graph and Flock (for graphs)
- Blogstore (for images, video, and large files)
- Redis and Twemcache (for caching users, timelines, and tweets)
- MySQL/PosgreSQL for managing the advertising side
The simple, straight-forward approach allows users to post a lot more often than they would on other platforms and Twitter has grown exponentially since its launch in 2006. Just imagine at what speed information is shared over Twitter!
Tweets and hashtags become trends every day and the amount of information generated is staggering. Which might make you wonder: just where does all the data go?
This is all to say that while the tool itself offers a seamless (and some would say addictive) experience, there’s a lot of work behind it.
For research purposes, getting your hands on large chunks of historical data doesn’t come so easy.
Buying Twitter Data
The easiest and care-free way to get Twitter data in raw format, as well as analytics alongside is by using a dedicated platform like BrandMentions. This way, you can access the full data and easily understand it, regardless if you're a developer, a digital marketer or a business owner. Full data access, easy to understand for any type of business of professional.
If you are technical savvy, and know a thing or two about extracting data via APIs, you can choose to retrieve data from the Twitter public API. API is short for “Application Programming Interface” and in this case is a way for your software to access the Twitter platform (as opposed to the Twitter website, which is how humans access Twitter).
For academic research mostly, you can also try to find a dataset that has already been collected and satisfies your research requirements.
You can also purchase historical Twitter data directly from Twitter, using the Historical PowerTrack enterprise product. The cost depends on both the length of the time period and the number of tweets. Often, the cost is driven by the length of the time period, so shorter periods might more affordable. Yet, if you are interested in years of historical data, costs will increase as well.
Therefore, when it comes to getting old tweets data and insights, BrandMentions is probably the most affordable cost-efficient platform for absolutely anyone. If you're tech savvy with some programming skills, you can try out Twitter's APIs as well.
How to Get Access to Twitter Search History Since 2006
If your interest is in very specific instances, Twitter has made it available to everyone years ago. You can search the tweets archive easily and free by using advanced commands “UNTIL” and “SINCE” in the search box and the YYYY-MM-DD format.
You can also combine both commands and use them together with hashtags. So, for example, if you wanted to find tweets tagged #pandemic in the years before the start of the current COVID one, you could use Twitter advanced search and type something to the extent of: #pandemic since:2015-01-01 until:2019-08-31
But what if you are interested in a significantly more popular tag or topic and wanted more flexibility in handling the resulting data?
Luckily, there are several options nowadays, both free and paid. Options from Twitter itself include Historical PowerTrack APIs and Full-Archive Search, 2 Twitter APIs which provide access to any publicly available Tweet, starting with the very first Tweet that went out, in March 2006. Both products show historical data and scan the full archive of tweets, and generate a set of tweets matching your query.
However, each Twitter search API is based on different architectures, resulting in significant result differences. So, keep on reading to better understand the Twitter data extraction process.
How to Get Historical Twitter Data?
If your interest is in historical data on a larger scale, there are nowadays a variety of options. Twitter itself recommends 2 free alternatives: Historical PowerTrack and Full-Archive Search.
Historical PowerTrack (HPT) works on a very large scale. It allows for up to 1,000 rules and generates a separate file for each 10-minute interval (assuming there is at least one tweet of interest in that time interval). This means, in effect, that a single day’s worth of tweets could yield up to 144 separate files. If you’re looking at an entire year, you’re potentially looking at more than 50,000 files. Understandably then, jobs can take hours or even days to generate (depending on the length of the time interval you look at).
Full-Archive Search (FAS), meanwhile, provides its results similarly to what a Google search would return. This means that you don’t get the full list of results all at once, but rather you get a small fraction of results per page and you are then able to scroll at your own pace through the rest. The maximum number of tweets you can see per page is 500, after which you need to make another request. So, if you made a search for a particular hashtag over a period of 30 days and you get back 5,000 results, you’ll have to make a total of 10 requests. The trade-off with FAS, compared to HPT, is that it only takes a single rule per request.
There are also third-party alternatives which go a step further and offer an in-depth analysis of the data.
BrandMentions is a social listening platform that offers Twitter historical data insights and reports that can go back all the way to the first tweet. Therefore, you get not just the raw Twitter dataset but also an analysis of that data, dashboards that can be shared, as well as Excel and PDF reports with “deeper insights”. You'll get the possibility to search old tweets, regardless of the time stamp, as well as get access to real-time tweets data. The tool covers both past and future mentions, so it should be a comprehensive solution for any entity interested in the social network's historical data.
There are also other paid tools offering this possibility. Crimson Hexagon has a different approach, as it offers data sets that were built by querying against an already existing set of historical tweets.
Other providers, like DiscoverText or Soda Analytics have a more academic focus and include features such as measuring inter-coder reliability, which are particularly relevant to researchers.
The Easiest Way to Get Access to Old Tweets and Twitter Historical Data
By now, you might be thinking that getting access to Twitter historical data is very hard as it implies knowledge of APIs. But actually, there is an easy way of extracting all the relevant historical Twitter info.
You can use BrandMentions for this job, as it will basically do all the work for you.
With BrandMentions, literally all the Twitter historical data is a few clicks away.
So, here are the steps you need to take:
Step 1. Get a BrandMentions free trial (there is a free trial that pulls out historical tweets for the past 30 days or a paid version that can offers you data from more than 10 years).
Step 2. Add the brand/term/hashtag you are interested in getting the historical data for. You can add one or several. The app will automatically extract the data from the past 30 days.
Step 3. Click on Extract Historical Data button to access Twitter data from more than 10 years ago.
You will not only get all the old tweets, but you'll also get valuable insights about them. You'll get to analyze overtime evolution, you'll get to see KPIs like reach, interaction, the sentiment of the tweets and many other metrics. And, without sounding too cheesy, you get all of these with just a few clicks.
Getting the data is the first step. Organizing it within reports and analyses is the next step and maybe the most important one. BrandMentions' ability to offer such valuable insights on Twitter historical data is probably one of the greatest features it has to offer.
Twitter Historical Data Case Study
There are a lot of reasons why getting access to historical tweets insight is a gold mine. Yet, here are just three of them:
- Understand the past
- Understand the evolution
- Enable forecasting
And what better way of understanding this than by looking at a case study that aggregates real historical data.
Using BrandMentions, we've extracted all the historical Twitter data for "#taycan".
For those of you who don't know, Taycan is an all-electric car made by German automobile manufacturer Porsche since 2019.
By extracting the full set of data for #taycan we can easily understand the history of the brand, its evolution and we can make some predictions about its future.
There has been extracted all data since the very first tweet containing the Taycan hashtag, 8 June 2018, until 27 January 2021.
As you can see in the screenshot bellow taken from the BrandMentions app, every single tweet containing #taycan has been extracted.
To have a snippet of the type of insights you'll get along with the raw set of historical tweets, you can check this publicly shared dashboard that aggregates data for the #taycan. The dashboard can be accessed by anyone, anytime and it's constantly updating so you can see the historical evolution as well as the real time data.
From the numbers of interaction to reach, top influencers, country and language distribution or the tone of the tweets, you can find it all there.
Let's dig in a bit within the data to see what we can find more.
First of all, we can access to the very first tweet that used our analzsed hashtag, dating from June 8, 2018, 8:58 PM, Belgium time.
— Vroom.be (@Vroom_be) June 8, 2018
Shortly after this very first tweet has seen the light of the day, many other tweets followed. In a couple of minutes many other tweets and retweets appeared. It was already news and things were going big.
In the very first week after the first tweet, 8-15 June 2018, no less than 567 tweets appeared. This correlates very well with the amount of data appeared on other social networks, news, blogs as well.
After this very first period of time, things began to slow down, as there can be seen a decrease in the number of tweets. Yet, what is very interesting to analyze is a spike that took place on September 9, 2019. It's super easy to spot it when you look at the full set of data. For sure it would have been way harder if we didn't have access to the big context.
4th of September 2019 was Taycan's big launch. Depending on the time zone you are, you might see the tweets reported on September 3, 2019.
The brand new electric car debuted worldwide with three simultaneous events in Germany, China, and Canada. Below's the very first tweet from that day from the Porsche team.
Watch professional racer, Shea Holbrook, floor the soon to be released Taycan from 0 to 90mph and back to 0 in just 10.17s, on none other than the USS Hornet. #Taycan #Porsche pic.twitter.com/GgLO1Soq5R
— Porsche (@Porsche) September 3, 2019
For the day launch only, you can get lots of insights, like the country distribution of tweets or other related hashtags. All these data are super important for any marketer, product manager or business owner as it offers a better understanding of the whole context.
The most recent spike was on 20 January 2021. With literally one click away, BrandMentions allows you to check the tweets from that day, export them or get insights on them.
A few clicks away you figure out that January 2020, the German automaker announced that it would release a lighter-weight, rear-wheel-drive model with improved performances. The ease with which you get all this data in such a fast moving industry it's indisputable.
This case study is just a brief example of how you can benefit from having access to a full historical data of tweets. You can try it on for yourself and get convinced, if you aren't yet, that sometimes "reality" is not a function of the event as event, but of the relationship of that event to past, and future, events.
The Twitter Metrics to Track within a Twitter Historical Report
Often time, evaluating and keeping track of your results turns to be one of the most important steps in the process of running a business. Same goes when it comes to Twitter data. Any Twitter historical report should aggregate data that can give you enough insights to understand what happened in the past so you can craft a better strategy in the future.
BrandMentions offers not only all the Twitter historical data you need (way back to the very first tweet), but also comprehensive reports that include:
- Total number of tweets
- Total reach for the collected tweets
- Total number of interactions
- Total number of retweets
- Total number of positive tweets
- Total number of negative tweets
- The tweets and retweets distribution
- Top Twitter Influencers
- Overall and individual tweets' performance
- The Location of the tweets
- Number of tweets and retweets by WeekDay
- Language representation
- Trending hashtags
- Context of mention
- Month to month growth
- Evolution of tweets
What Historical Twitter Data Do You Need?
Of course, before you deep dive into a sea of tweets, it’s worth clarifying first why you want to do that, and to what extent. So, ask yourself questions like:
- Do I need current/recent tweets or older/historical tweets?
- Do I need a certain amount of them, or will a few examples do?
- Will I need to cover a certain, specific time interval, or do my tweets need to reflect a variety of time periods?
- Do I need the full “population” of tweets on a topic or is a sample enough?
- What budget do I have to acquire Twitter data?
- To what extent will you have to share the tweets used for the analysis with your audience?
Based on the answers to these questions, you should have a pretty clear idea on how to proceed further.
Is Twitter Still A Relevant Social Media Platform?
Of all the social media networks that tend to get lumped in together, Twitter may have stayed truest to its original design.
Twitter started off as and continues to be a “microblogging” platform which allows you to post and view others’ posts.
The catch is that these posts, called tweets, can be a maximum of 280 characters long. It used to be 140, but it changed in 2017. That is, in fact, one of the only significant changes that Twitter ever made to its platform. Despite the character limit, you can still share links, photos, GIFs, or videos, making it very easy to share media content, news or research. Another, perhaps more significant, difference between Twitter and its social media “siblings” is that everything is, by default, public. There is still an option to make your account private, but you have to actively do so.
Twitter users are 38% more likely to post opinions about brands and products than other social media users.
There were a few other changes over the years, but most were just fine-tuning around what you could see and how you could personalize your content. In 2014, Twitter added recommendations on tweets, topics, and accounts.
In 2015, they added a section titled “While you were away”, which recapped select tweets that were published while a user was not active. This section was retitled “In Case You Missed It” two years later, along with an explanation of how tweets were scored on a relevance model. The algorithm was revealed to take into account recency, relevance (based on keywords), engagement (based on number of retweets and favorites) and other factors related to followers, location and media usage. Most recently, in 2019, Twitter introduced Topics to allow users to follow bigger conversations. Following a Topic adds related Tweets, users, events, and ads to the feed.
71% of Twitter users say they use the network to get their news.
Several protests in the past years were successful in no small part due to the use of Twitter among protest-goers. Clearly, though, it’s not always this serious with Twitter, and the platform has a known history of being the place of origin of many-a-memes.
But even beyond that, it has provided everyone with a unique platform where anyone can be a reporter or a cultural critic, leading to a universe of diverse viewpoints, all amplified organically. When a US Airways plane crash-landed in the Hudson River in 2009, it was a regular Twitter user that started spreading the word, before many media outlets even became aware.
Twitter Historical Data Extraction Limitations
Regardless of the solution you choose, it’s useful to keep some potential limitations in mind when deciding on the provider:
- Not all providers might allow you to export raw data sets from their platform. Alternatively, some may place a limit on the amount of data you can export per day. Make sure to check this beforehand if raw data manipulation is important to you.
- The solutions offered by some providers is essentially a “black box” one, meaning you have no insight into the algorithm that was used and how certain filters are applied or defined.
- Cost is often driven by the length of the time period being queried or the amount of data being retrieved. If cost is a consideration, make sure you first identify your needs very clearly and only perform the searches that are strictly relevant for you (rather than casting a wider “net”).
There is indeed the option that you find an already existing dataset that matches your interests. Academic and research websites in particular might hold libraries that hold and continuously build libraries on a number of public-interest topics (including those related to state institutions and news organizations).
Conclusions - Twitter Historical Data Without Limits
One of the most important asset when we talk about data, is probably the context. Some might even assume that context is everything. It shapes the meaning in all communication. Without context you can’t communicate effectively. When your message is delivered in one context, but received in another, it likely leads to miscommunication. And this is maybe one of the most important advantages that historical data brings: a clear and comprehensive image over the context.
Getting historical data with no time limit is not an easy task. Yet, it becomes way easier if you use the right resources or tools. You can use any of the methods described above to get your Twitter historical data, yet we recommend a tool like BrandMentions that would offer valuable insights on the data as well, and not just raw information.
We hope that this article sheds some light on how to extract and analyze Twitter historical data. Choose the right version for you but don't forget the importance of context and insight. Happy monitoring!