Academia.eduAcademia.edu

Outline

Factors influencing charter flight departure delay

2019, Research in Transportation Business & Management

https://doi.org/10.1016/J.RTBM.2019.100413

Abstract

This study aims to identify the main factors leading to charter flight departure delay through data mining. The data sample analysed consists of 5,484 flights operated by a European airline between 2014 and 2017. The tuned dataset of 33 features was used for modelling departure delay (e.g., if the flight delayed more than 15 minutes). The results proved the value of the proposed approach by an area under the receiver operating characteristic curve of 0.831 and supported knowledge extraction through the databased sensitivity analysis. The features related to previous flight delay information were considered as being the most influential toward current flight being delayed or not, which is consistent with the propagating effect of flight delays. However, it is not the reason for the previous delay nor the delay duration that accounted for the most relevance. Instead, a computed feature indicating if there were two or more registered reasons accounted for 33% of relevance. The contributions include also using a broader data mining approach supported by an extensive data understanding and preparation stage using both proprietary and open access data sources to build a comprehensive dataset.

References (74)

  1. Todnem By R (2005) Organisational change management: A critical review. J Change Management 5(4):369-380.
  2. Liang TP, Liu YH (2018) Research Landscape of Business Intelligence and Big Data Analytics: A Bibliometrics Study. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2018.05.018.
  3. Seng JL, Chen TC (2010) An analytic approach to select data mining for business decision. Expert Syst Appl 37(12):8042-8057.
  4. Kim WC, Mauborgne R (2004) Blue ocean strategy. Harv Bus Rev 82(10):69- 80.
  5. Schultz M (2018) A metric for the real-time evaluation of the aircraft boarding progress. Transp Res Part C: Emerg Technol 86:467-487.
  6. Norin A (2008) Airport logistics: modeling and optimizing the turn-around process. Dissertation, Linköping University Electronic Press.
  7. Chan FT, Bhagwat R, Kumar N, Tiwari MK, Lam P (2006) Development of a decision support system for air-cargo pallets loading problem: A case study. Expert Syst Appl 31(3):472-485.
  8. Guizzi G, Murino T, Romano E (2009) A discrete event simulation to model passenger flow in the airport terminal. Math Methods Appl Comput 2:427-434.
  9. Yimga J (2017) Airline on-time performance and its effects on consumer choice behavior. Res Transp Econ 66:12-25.
  10. Erdil A, Arcaklioglu E (2013) The prediction of meteorological variables using artificial neural network. Neural Comput Appl 22(7-8):1677-1683.
  11. Sadiq A, Ahmad F, Khan SA, Valverde JC, Naz T, Anwar MW (2014) Modeling and analysis of departure routine in air traffic control based on Petri nets. Neural Comput Appl 25(5):1099-1109.
  12. Forbes SJ (2008) The effect of service quality and expectations on customer complaints. J Ind Econ 56(1):190-213.
  13. Kohl N, Larsen A, Larsen J, Ross A, Tiourine S (2007) Airline disruption management -perspectives, experiences and outlook. J Air Transp Management 13(3):149-162.
  14. Sölveling G, Solak S, Clarke JPB, Johnson EL (2011) Scheduling of runway operations for reduced environmental impact. Transp Res Part D: Transp and Environ 16(2):110-120.
  15. Pyrgiotis N, Malone KM, Odoni A (2013) Modelling delay propagation within an airport network. Transp Res Part C: Emerg Technol 27:60-75.
  16. Ferguson J, Kara AQ, Hoffman K, Sherry L (2013) Estimating domestic US airline cost of delay based on European model. Transp Res Part C: Emerg Technol 33:311-323.
  17. Zou B, Hansen M (2014) Flight delay impact on airfare and flight frequency: A comprehensive assessment. Transp Res Part E: Logist Transp Rev 69:54-74.
  18. Tu Y, Ball MO, Jank WS (2008) Estimating flight departure delay distributions - a statistical approach with long-term trend and short-term pattern. J Am Statistical Assoc, 103(481):112-125.
  19. Rebollo JJ, Balakrishnan H (2014) Characterization and prediction of air traffic delays. Transp Res Part C: Emerg Technol 44:231-241.
  20. Choi S, Kim YJ, Briceno S, Mavris D (2016) Prediction of weather-induced airline delays based on machine learning algorithms. In: Digital Avionics Systems Conference (DASC), 2016 IEEE/AIAA 35th (pp.1-6). IEEE.
  21. Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78-87.
  22. Cook AJ, Tanner G (2015) European airline delay cost reference values. Technical Report. EUROCONTROL Performance Review Unit, Brussels.
  23. Sternberg A, Carvalho D, Murta L, Soares J, Ogasawara E (2016) An analysis of Brazilian flight delays based on frequent patterns. Transp Res Part E: Logist Transp Rev 95:282-298.
  24. Moro S, Cortez P, Rita P (2017) A framework for increasing the value of predictive data-driven models by enriching problem domain characterization with novel features. Neural Comput Appl 28(6):1515-1523.
  25. Cortez P, Embrechts MJ (2013) Using sensitivity analysis and visualization techniques to open black box data mining models. Inf Sci 225:1-17.
  26. Moro S, Rita P, Oliveira C (2017) Factors influencing hotels' online prices. J Hosp Mark Management 27(4):443-464.
  27. Tinoco J, Gomes Correia A, Cortez P (2018) Jet grouting column diameter prediction based on a data-driven approach. Eur J Environ Civ Eng 22(3):338- 358.
  28. Amin-Naseri MR, Yazdekhasti A, Salmasnia A (2018) Robust bi-objective optimization of uncapacitated single allocation p-hub median problem using a hybrid heuristic algorithm. Neural Comput Appl 29(9):511-532.
  29. Di Ciccio C, Van der Aa H, Cabanillas C, Mendling J, Prescher J (2016) Detecting flight trajectory anomalies and predicting diversions in freight transportation. Decis Support Syst 88:1-17.
  30. Wandelt S, Sun X (2014) Efficient compression of 4D-trajectory data in air traffic management. IEEE T Intell Transp, 16(2):844-853.
  31. Zhang N, Chandrasekar P (2017) Sparse learning of maximum likelihood model for optimization of complex function. Neural Comput Appl 28(5):1057- 1067.
  32. Zou B, Hansen M (2012) Impact of operational performance on air carrier cost structure: evidence from US airlines. Transp Res Part E: Logist Transp Rev 48(5):1032-1048.
  33. Ball M, Barnhart C, Dresner M, Hansen M, Neels K, Odoni AR, ... & Zou B (2010) Total delay impact study: a comprehensive assessment of the costs and impacts of flight delay in the United States. Federal Aviation Administration report.
  34. Torlak G, Sevkli M, Sanal M, Zaim S (2011) Analyzing business competition by using fuzzy TOPSIS method: An example of Turkish domestic airline industry. Expert Syst Appl 38(4):3396-3406.
  35. Pai V (2010) On the factors that affect airline flight frequency and aircraft size. J Air Transp Management 16(4):169-177.
  36. Clausen J, Larsen A, Larsen J, Rezanova NJ (2010) Disruption management in the airline industry -Concepts, models and methods. Computer Oper Res 37(5):809-821.
  37. Wu CL, Law K (2019) Modelling the delay propagation effects of multiple resource connections in an airline network using a Bayesian network model. Transp Res Part E: Logist Transp Rev, 122:62-77.
  38. Kafle N, Zou B (2016) Modeling flight delay propagation: A new analytical- econometric approach. Transp Res Part B: Method, 93:520-542.
  39. Wandelt S, Sun X, Zhang J (2019) Evolution of domestic airport networks: a review and comparative analysis. Transportmetrica B, 7(1):1-17.
  40. Hansen M, Zou B (2013) Airport Operational Performance and Its Impact on Airline Cost. In: Zografos, K., Andreatta, G., & Odoni, A. (Eds.), Modelling and managing airport performance (pp. 119-143), John Wiley & Sons.
  41. Sun X, Wandelt S, Cao, X (2017) On node criticality in air transportation networks. Netw Spat Econ, 17(3):737-761.
  42. Balakrishna P, Ganesan R, Sherry L (2010) Accuracy of reinforcement learning algorithms for predicting aircraft taxi-out times: A case-study of Tampa Bay departures. Transp Res Part C: Emerg Technol 18(6):950-962.
  43. Huijboom N, Van den Broek T (2011) Open data: an international comparison of strategies. Eur J ePractice 12(1):4-16.
  44. Witten IH, Frank E, Hall MA, Pal CJ (2016) Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
  45. Lan S, Clarke JP, Barnhart C (2006) Planning for robust airline operations: Optimizing aircraft routings and flight departure times to minimize passenger disruptions. Transp Sci 40(1):15-28.
  46. Eurocontrol (2018) What is a slot? https://www.eurocontrol.int/news/what-slot Accessed January 11, 2018.
  47. Wong JT, Tsai SC (2012) A survival model for flight delay propagation. J Air Transp Management 23:5-11.
  48. D'Ariano A, Pistelli M, Pacciarelli D (2012) Aircraft retiming and rerouting in vicinity of airports. IET Intell Transp Syst 6(4):433-443.
  49. Abdel-Aty M, Lee C, Bai Y, Li X, Michalak M (2007) Detecting periodic patterns of arrival delay. J Air Transp Management 13(6):355-361.
  50. Wei W, Hansen M (2005) Impact of aircraft size and seat availability on airlines' demand and market share in duopoly markets. Transp Res Part E: Logist Transp Rev 41(4):315-327.
  51. Haining R (1991) Bivariate correlation with spatial data. Geogr Anal, 23(3):210- 227.
  52. Chao CM, Yu YW, Cheng BW, Kuo YL (2014) Construction the model on the breast cancer survival analysis use support vector machine, logistic regression and decision tree. J Med Syst 38(10):1-7.
  53. Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22-31.
  54. Haykin SS (2009) Neural networks and learning machines (Vol. 3). Upper Saddle River, NJ, USA. Pearson.
  55. Moro S, Cortez P, Rita P (2015) Using customer lifetime value and neural networks to improve the prediction of bank deposit subscription in telemarketing campaigns. Neural Comput Appl 26(1):131-139.
  56. Hastie T, Tibshirani R, Friedman J (2008) The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition), NY, USA. Springer- Verlag.
  57. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273- 297.
  58. Gashler M, Giraud-Carrier C, Martinez T (2008) Decision tree ensemble: Small heterogeneous is better than large homogeneous. In:Machine Learning and Applications, 2008. ICMLA'08. Seventh International Conference on (pp.900- 905). IEEE.
  59. Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217-222.
  60. Liu M, Wang M, Wang J, Li D (2013) Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar. Sensor Actuat B-Chem, 177:970-980.
  61. Cortez P (2010) Data mining with neural networks and support vector machines using the r/rminer tool, Advances in Data Mining. Applications and Theoretical Aspects, 6171, Springer, 2010, pp.572-583.
  62. Muzammal M, Talat R, Sodhro AH, Pirbhulal S (2020) A multi-sensor data fusion enabled ensemble approach for medical data from body sensor networks. Inform Fusion, 53:155-164.
  63. Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861-874.
  64. Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171-186.
  65. Bengio Y, Grandvalet Y (2004) No unbiased estimator of the variance of k-fold cross-validation. J Mach Learn Res 5:1089-1105.
  66. Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. In: L Liu & MT Özsu (Eds.), Encyclopedia of database systems (pp.532-538). USA: Springer.
  67. Williams N, Zander S, Armitage G (2006) A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. ACM SIGCOMM Comp Com, 36(5):5-16.
  68. Cortez P (2014) Modern optimization with R. Springer.
  69. Cortez P, Embrechts MJ (2011) Opening black box data mining models using sensitivity analysis. In 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) (pp. 341-348). IEEE.
  70. Baluch M, Bergstra T, El-Hajj M (2017) Complex analysis of united states flight data using a data mining approach. In: Computing and Communication Workshop and Conference (CCWC), 2017 IEEE 7th Annual (pp.1-6). IEEE.
  71. Sun X, Wandelt S, Zanin M (2017) Worldwide air transportation networks: a matter of scale and fractality?. Transportmetrica A, 13(7):607-630.
  72. Gonçalves S, Cortez P, Moro S (2019) A deep learning classifier for sentence classification in biomedical and computer science abstracts. Neural Comput Appl, DOI:10.1007/s00521-019-04334-2.
  73. Luo C, Wu D, Wu D (2017) A deep learning approach for credit scoring using credit default swaps. Eng Appl Artif Intel, 65:465-470. Spring=1481; Summer=1424; Autumn=1143; Winter=1436 dt.weekend Weekend=1591; Working day=3893
  74. dt.year 2014=3713; 2015=1243; 2016=427; 2017=101 fl.arr.altitude {Min; Q1; Med; Q3; Max}={-12.2; 15.0; 53.9; 119.5; 2580.4}; Avg=179.0; SD=333.2 fl.arr.hour Distribution of flights arriving between 125 at 4 am, and 382 at 12 am fl.arr.latitude {Min; Q1; Med; Q3; Max}={-51.8; 22.0; 31.5; 43.7; 69.1}; Avg=28.9; SD=21.4 fl.arr.longitude {Min; Q1; Med; Q3; Max}={-149.6; 2.0; 11.8; 39.7; 174.8}; Avg=21.1; SD=47.6 fl.arr.night.office No=2622; Yes=2862 fl.dep.airport.type Medium=1165; Large=4319 fl.dep.altitude {Min; Q1; Med; Q3; Max}={-12.2; 15.0; 41.1; 116.4; 2580.4}; Avg=152.9; SD=297.3 fl.dep.country.flights.rank {Min; Q1; Med; Q3; Max}={1; 13; 39; 53; 171}; Avg=41.5; SD=41.1 fl.dep.hour Distribution of flights departing between 101 at 1 am, and 406 at 8 am fl.dep.is.capital No=3629; Yes=1855