Data visualization is essential to comprehending complex datasets. Learn how to efficiently pull numbers using Google Sheets so you can create visualizations that communicate a story.
TfL was challenged with modernising how they collected, managed and analysed their massive terabytes of data. Today, apps built using TfL data reach millions of London transport users while producing monetised time savings worth millions in savings for users and TfL alike. Efficient transport in London · London public-transport rebound.
Why Ridership Data Matters
Transportation agencies seeking to attract riders back must first understand who and where their services are being utilized; but in order to do that, robust, high-quality ridership data must exist – otherwise agencies risk operating blind.
TfL initially took an initial cautious approach to open data. Since then, however, they have made great strides by opening key datasets up for real-time release, and have seen hundreds of apps built using this data reach millions of London transport users and save monetised time savings for them in monetised savings terms alone. Furthermore, TfL now uses insights gained through open data initiatives as part of strategic planning to meet future demand more effectively.
TfL uses predictive models based on historical boarding and alighting data to assess what effect new Tube lines would have on overall passenger numbers, forecast capacity requirements for fixed routes, as well as measure their effect on existing capacity (via elasticities like fares, span of service frequency or wait time).
TfL has utilized ridership data analysis beyond planning purposes in order to address specific business problems. For instance, they identified patterns of fare evasion among passengers and are now targeting these with specific enforcement campaigns. Furthermore, TfL utilizes ridership data tracking passenger flows through station queues in order to gain insights into popular destinations and travel times.
As TfL explores new uses for its data, it must take care to assess how doing so might impact passenger trust and privacy. To keep their trust intact, proactive communication with stakeholders must occur and reassuring them about how TfL uses their data – for TfL this includes implementing robust data protection measures as well as engaging in regular storytelling to communicate its commitment to ethical usage of passenger information.
TfL faces the difficult challenge of merging data from disparate sources that reside in silos. To address this issue, they have employed graph databases to reveal hidden relationships and patterns across billions of data connections. Their modern data engineering pipeline automates this entire process for them – leading them to use Neo4j’s graph solution dashboards that provide insight into passenger movements through its intricate network.
Methodology: Pulling the Numbers
Transport in London is an integral component of its urban infrastructure, serving nine million residents and 20 million visitors daily. London’s network of roads, railways, and underground systems is among the world’s most complex – so managing it efficiently requires incredible logistics expertise. Transport for London relied heavily on real-time data during COVID pandemic to track travel patterns, inform public health measures, and support recovery of its vital services.
Lauren Sager Weinsten, TfL’s chief data officer, spoke at the recent DataIQ 100 Summit about their journey towards becoming a data-driven organisation. She provided several key defining moments in TfL’s data journey such as its fare evasion analysis which has significantly helped improve revenue collections.
She highlighted TfL’s commitment to open data as an essential step on its data journey. Since 2007, TfL began publishing open data with the goal of encouraging existing or new businesses to develop apps that served its customers; this was particularly crucial given that smartphone adoption among transport users had seen rapid expansion.
TfL faced several unique challenges in gathering its datasets. While the organization amasses vast quantities of data every week, it was challenging to gain any insights from individual data sets stored and analyzed separately.
TfL was able to create an accurate digital twin of its transport network using Neo4j’s graph solution to integrate multiple datasets and form a relational model of its transportation system that linked all its different data sets. By creating such a model, TfL could rapidly answer complex queries and quickly address incidents within their transport network.
TfL’s digital twin was able to identify patterns in traffic flows and congestion that would otherwise be difficult for a standard spreadsheet to detect. For instance, TfL could use its digital twin to inform it when each station would reach capacity on specific days by looking at time series of entrances and exits; using that data they sent relevant notifications via WhatsApp messaging services for passengers who signed up for this service.
Winners & Laggards by Line
London, as a city that thrives off of public transit, relies heavily on its Tube system to carry economic and social life of major urban centers like it. After the nationwide lockdown in March 2020, ridership plummeted by nearly four per cent compared to pre-pandemic levels; many stores, hairdressers and restaurants remained closed, while people were advised either working from home or taking only necessary trips on the Tube.
TfL staff — including hundreds of frontline transport workers — did their best to keep the system operating, while still protecting and informing the public. COVID has claimed 89 TfL employees to date, impacting not only them directly but also more broadly across the workforce – the death toll being more than double of that seen nationwide due to people in public-facing roles like bus drivers who tend to work alone without proper protections in place against possible infections.
TfL anticipates ridership will rebound within 18 months after the pandemic ends; its exact timeframe depends on how quickly commuters feel comfortable riding trains and buses again.
TfL’s data team utilized Kestra database to collect usage information for each of their 272 London Underground and London Overground stations between January 2007 and 2021, recording entry/exit counts between entries/exits between this period, recording which lines serve the station (London Underground, Elizabeth Line, DLR or London Overground), whether or not Night Tube service exists and more.
Visualizations produced from this research provide users with information on which stations and routes for each line were most in demand on weekdays, along with a graph allowing users to compare current travel week usage with that from 2019. This comparison gives an accurate depiction of past usage levels without taking Elizabeth Line effects into account, helping us see which lines have bounced back fastest and how quickly TfL will need to increase capacity to meet demand when the Tube opens fully.
What the Trends Mean for 2025
With access to numerous software applications and widespread online availability, businesses have never found it easier to gather, organize, analyze, and display their data. Data visualization has become an indispensable asset for organizations looking to stay competitive, boost revenues, enter new markets or boost employee productivity and satisfaction.
Transport for London (TfL) entered open data with caution, opting to release 62 distinct datasets as an experiment. However, five years on from launch of its policy, thousands of apps created using TfL data now reach millions of people and deliver millions in monetised time savings to customers of TfL.
While TfL has made significant strides toward its goal of becoming a fully data-driven organisation, there remain significant challenges in doing so. TfL shares many of the same challenges faced by other organisations: data sources may reside on multiple platforms and connecting them is often an uphill struggle.
Sager Weinsten provided an example of TfL’s journey data, which provides valuable information about how passengers travel to, from, and between stations. TfL collects terabytes of this information weekly but historically was analysed separately and did not offer insight into its entirety.
By integrating journey data with other sources of information, TfL has been able to identify patterns of fare evasion and enhance its enforcement efforts. Furthermore, data analysis enables TfL to predict Tube line disruptions during peak travel times as well as more accurately inform its decision-making processes.
TfL’s success with data transformation initiatives has demonstrated the need to focus on building trust among its stakeholders. This necessitates proactive communication and transparency regarding data collection, processing and use as well as continuous learning initiatives. Furthermore, an ideal culture must exist so employees are free to act upon and innovate using data without being limited by outdated processes or resistant resistance to change.