This thesis is focused on improving both Operational Planning and Control of Public Road Transportation (PT) Networks (i.e. buses and taxis) using location-based data gathered through the Global Positioning System (GPS data). Its aim is to monitor the operations of these vehicular networks to infer useful information about their future status on both short-term and long-term horizons. To do it so, we undertook an explorative approach by surveying the data driven methods on this topic in order to identify research opportunities worthy to be further studied. The main idea is to provide sustainable frameworks (in a computational point of view) to handle this massive sources of data. Ultimately, we want to extract information useful to improve Human Mobility on the major urban areas.
As result of the abovementioned survey, three concrete problems were addressed on this thesis: (1) Automatic Evaluation of the Schedule Plan's Coverage; (2) Real-Time Mitigation of Bus Bunching occurrences; (3) Real-Time Smart Recommendations about the most adequate stand to head to in each moment according to the current network status. To do it so, we developed Machine Learning (ML) frameworks in order to advance the State-of-The-Art on such problems.
The first problem (1) concerns the days that are covered by the same schedule. This definition is usually made during the design of the network planning and it is based on the relationship between the demand profiles generated and the resources available to meet such demand. Consequently, at the best of our knowledge, there is no research work addressing this topic using GPS data. All the days covered by the same timetable have exactly the same daily profile due to the fact that they share the same departing/arrival times. However, the real values of such times may differ from the original ones (causing an undesired gap between the defined timetables and the real ones). To overcome this issue, we propose to evaluate if such coverage still meets the network behavior using a ML framework. It explores such differences by grouping each one of the days available into one of the possible coverage sets. This grouping is made according to a distance measured between each pair of days where the criteria rely on their profiles. As output, rules about which days should be covered by the same timetables are provided. Such rules can be used by the operational transportation planners to perform the abovementioned evaluation. These rules also provide insights on how the current coverage can be changed in order to achieve that.
The prevalence of (2) Bus Bunching (BB) is one of the most visible characteristics of an unreliable service. Two (or more) buses running together on the same route is an undeniable sign that something is going terribly wrong with the company's service. Most of the state-of-the-art on this topic departs from the assumption that the probability of BB events is minimized by maximizing headway stability. Notwithstanding its validity, this approach requires multiple control actions (e.g. speed modification, bus holding, etc.) which may impose high mental workload for drivers and result with low compliance rates. Hereby, we propose a proactive rather than a reactive operational control framework. The basic idea is to estimate the likelihood of a BB event occurring further downstream to then let an event detection threshold triggers the deployment of a corrective control strategy. To do it so, we propose a Supervised Online Learning framework. It is focused on exploring both historical and real-time AVL data to build automatic control strategies, which can mitigate BB from occurring while reducing the human workload required to make these decisions. State-of-the-art tools and methodologies such as Regression Analysis, Probabilistic Reasoning and Perceptron constitute building blocks of this predictive methodology.
The (3) taxi driver mobility intelligence is an important factor to maximize both profit and reliability within every possible scenario. Knowledge on where the services (transporting a passenger from a pick-up to a drop-off location) will actually emerge can be an advantage for the driver - especially when there is no economic viability of adopting random cruising strategies to find passengers. The stand-choice problem is based on four key variables: (i) the expected revenue for a service over time, (ii) the distance/cost relation with each stand, (iii) the number of taxis already waiting at each stand and (iv) the passenger demand for each stand over time. However, at the best of our knowledge, there is no work handling this recommendation problem by using these four variables simultaneously. The variable (iii) can be directly computed by the real-time vehicle's position - however, the remaining three need to be estimated for a short-term time horizon.
To estimate the short-term demand that will emerge at a given taxi stand is a complex problem. Such demand can be decomposed into two axis: the (iv) pick-up quantity (i.e. an integer representing the number of services to be demanded) and (i) the expected revenue for a service over time (i.e. a fare-based category). To do it so, we propose a framework based on both time series analysis and discretization techniques which are able to perform such supervised learning task incrementally.
The variable (ii) is related on how much time it will take to get to a given urban area/taxi stand where there are favorable service demand conditions (e.g. high service demand in terms of passenger quantity or revenue-based). Consequently, it is focused on apriori Travel Time Estimation. This problem is vastly covered on the literature - namely, by using Regression analysis. However, we propose a most general technique to address this problem. There are two motivations to do it so: (ii-1) to provide a sustainable way to handle these large amount of data in order to extract usable information from it independently of the problem we want to solve (namely, its variable of interest); (ii-2) to be able to include multiple data sources in order improve the penetration rate (i.e. the ratio of ground truth information) of our framework. To carry out such task, we propose incremental discretization techniques to maintain accurate statistics of interest over a time-evolving Origin-Destination matrix. These techniques include spatial clustering and incremental ML algorithms.
All these problems were addressed using real world data collected from two major public road transportation companies running in Porto, Portugal. These frameworks achieved promising results on the experiments conducted to validate them. This work resulted into sixteen high quality peer-reviewed publications at internationally known venues and journals.