Sequence Data Classification using Decision Trees with Novel Data Splitting Techniques

Authors

  • Cheruku Sudarsana Reddy Research Scholar, Department of Computer Science and Engineering, Acharya Nagarjuna University (ANU), Guntur (Dist), Andhra Pradesh, India Author
  • Dr. O. Nagaraju BOS Chairman-UG, Computer Science, ANU, Guntur (Dist), Andhra Pradesh, India Author

DOI:

https://doi.org/10.32628/CSEIT24102145

Keywords:

complex sequence data, sequence decision tree, Global-Energy-Technique (GET), Maximum-Minus-Minimum-Energy- Technique (MMET), Maximum-Energy-Technique (MET), Square-Maximum-Energy-Technique (SMET), Vectorization-Based Approach, splitting attribute, sub-sequence, Sequence_Decision_Tree

Abstract

Abstract: The research study in this paper aims to propose advanced research methodologies for classifying sequence data, addressing a fundamental challenging task in contemporary real time applications: the categorization of sequential information. Traditional classification approaches, such as Classification and Regression Trees (CART) and decision tree algorithms like C4.5, have been widely utilized in many real-time applications due to their attractive features such as classification accuracy, scalability, simplicity, interpretability, and general applicability. These attributes have solidified decision trees as benchmark models in both data mining and machine learning applications. However, standard C4.5 and CART are inherently limited to processing fixed-length vector data and are incapable of handling more complex data structures such as sequences, trees, graphs and special structures. To facilitate efficient and effective classification of sequence data, there is an increasing demand for cutting-edge, autonomous, highly intelligent models capable of real-time analysis. This particular study introduces four novel methodologies for decision tree-based sequence data classification: Global Energy Technique-1 (GET-1), Maximum Minus Minimum Energy Technique-2 (MMET-2), Maximum Energy Technique-3 (MET-3), and Square Maximum Energy Technique-4 (SMET-4). These special techniques leverage the concept of data energy or information value of data, which is analogous to its informational value, to enhance classification accuracy and efficiency.

📊 Article Downloads

References

T.P. Exarchos, M.G. Tsipouras, C.Papaloukas, and D.I.Fotiadis, “A two-stage methodology for sequence classification based on sequential pattern mining and optimization,”DataKnowl. Eng.,vol.66, no.3,pp.467-487, 2008. DOI: https://doi.org/10.1016/j.datak.2008.05.007

Z. Xing, J. Pei, and E. Keogh, “A brief survey on sequence classification,”ACM SIGKDD Explorations Newslett., vol. 12, no. 1, pp. 40–48, 2010. DOI: https://doi.org/10.1145/1882471.1882478

D. Fradkin and F. Morchen, “Mining sequential patterns for classification,”Knowl.Inf.Syst., vol.45, no.3, pp.731–749,2015. DOI: https://doi.org/10.1007/s10115-014-0817-0

C. Zhou, B. Cule, and B. Goethals, “Pattern based sequence classification,” IEEE Trans. Knowl. Data Eng., vol. 28, no. 5, pp. 1285–1298, May2016. DOI: https://doi.org/10.1109/TKDE.2015.2510010

Zengyou He, Ziyao Wu, Guangyao. Xu, Yan. Liu and QuanZou, “Decision Tree for sequences”, IEEE Transactions on Knowledge and Data Engineering, Vol. 35, No. 1, January 2023.

A. Brunello, E. Marzano, A. Montanari, and G. Sciavicco, “J48S: A sequence classification approach to text analysis based on decision trees,” in Proc. 24th Int. Conf. Inf. Softw. Technol., 2018, pp. 240–256. DOI: https://doi.org/10.1007/978-3-319-99972-2_19

J. De Smedt , G. Deeva, and J. De Weerdt , “Mining behavioral sequence constraints for classification,” IEEE Trans. Knowl. Data Eng., vol. 32, no. 6, pp. 1130–1142, Jun. 2020. Jun. 2020, pp. 1130-1142, vol. 32, DOI Bookmark: 10.1109/TKDE.2019.2897311 DOI: https://doi.org/10.1109/TKDE.2019.2897311

Y. Hamuro, M. Nakamoto, S. Cheung, and E. H. Ip, “mbonsai: Application package for sequence classification by tree methodology,” J. Statist. Softw.,vol. 86, no. 6, pp. 1–30, 2018. DOI: https://doi.org/10.18637/jss.v086.i06

D.Nguyen, W. Luo, T. D. Nguyen, S. Venkatesh, and D. Phung, “Sqn2Vec:Learning sequence representation via sequential patterns with a gap constraint,” in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discov. Databases, 2018, pp. 569–584. DOI: https://doi.org/10.1007/978-3-030-10928-8_34

E. Egho, D. Gay, M. Boulle, N. Voisine, and F. Clerot, “A user parameter-free approach for mining robust sequential classification rules,” Knowl. Inf.Syst., vol. 52, pp. 53–81, 2017. DOI: https://doi.org/10.1007/s10115-016-1002-4

J. M. Luna et al., “Building more accurate decision trees with the additive tree,” Proc. Nat. Acad. Sci. USA, vol. 116, no. 40, pp.19887–19893, 2019. DOI: https://doi.org/10.1073/pnas.1816748116

T. Le Nguyen , S. Gsponer, and G. Ifrim, “Time series classification by sequence learning in all-subsequence space,” in Proc. IEEE 33rd Int. Conf. Data Eng., 2017, pp. 947–958. DOI: https://doi.org/10.1109/ICDE.2017.142

C.L. Hsu, “A multi-valued and sequential-labeled decision tree method for recommending sequential patterns in cold-start situations,” Appl. Intell., vol. 51, no. 1, pp. 506–526, 2021. DOI: https://doi.org/10.1007/s10489-020-01806-0

J. K. Tarus, Z. Niu, and D. Kalui, ‘‘A hybrid recommender system for lelearning based on context awareness and sequential pattern mining,’’ SoftComput., vol. 22, no. 8, pp. 2449–2461, Apr. 2018. DOI: https://doi.org/10.1007/s00500-017-2720-6

Z. He, S. Zhang, and J. Wu, “Significance-based discriminative sequential pattern mining,” Expert Syst. Appl., vol. 122, no. 15, pp. 54–64, 2019. DOI: https://doi.org/10.1016/j.eswa.2018.12.046

X. Dong, P. Qiu, J. Lu, L. Cao and T. Xu, "Mining top-k useful negative sequential patterns via learning", IEEE Trans. Neural Netw. Learn. Syst.vol. 30, no. 9, pp. 2764-2778, Sep. 2019. DOI: https://doi.org/10.1109/TNNLS.2018.2886199

J.S. Okolica, G.L. Peterson, R.F. Mills and M.R. Grimaila, “Sequence Pattern Mining with Variables”, IEEE Transactions on Knowledge and DataEngineering (Volume: 32, Issue: 1, 01 January 2020) DOI: https://doi.org/10.1109/TKDE.2018.2881675

N. Zhang, X. Ren, and X. Dong, “An Effective Method for Mining Negative Sequential Patterns From Data Streams”, IEEE Access, date of publication 29 March 2023, date of current version 3 April 2023.

X. Gao, Y. Gong, T. Xu, J. Lu, Y. Zhao, and X. Dong, ‘‘Toward betterstructure and constraint to mine negative sequential patterns, ’’IEEE Transactions on Neural Networks. Learn. Syst., vol. 34, no. 2, pp. 571–585, Feb. 2023. DOI: https://doi.org/10.1109/TNNLS.2020.3041732

WenshengGan, J. Chun-Wei Lin, P. Fournier-VigerHan-Chieh Chao, Philip S. Yu, “A Survey of Parallel Sequential Pattern Mining” ACM Journals, ACM Transactions on Knowledge Discovery from Data,Vol.13,No.3

Junchi Yan, Xiaoyong Pan, Liangda Li, Changsheng Li And Peng Cui, “Sequential Data Modelling and Its Emerging Applications”, special section on sequential data modeling and its emerging applications, IEEE Access

S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, Vol. 9, No. 8, 1997, pp. 1735-1780. http://dx.doi.org/10.1162/neco.1997.9.8.1735

Xie, Y. “Student Performance Prediction via Attention-Based Multi-Layer Long-Short Term Memory”. Journal of Computer and Communications,2021, 9, 61-79. doi: 10.4236/jcc.2021.98005. DOI: https://doi.org/10.4236/jcc.2021.98005

Lei Cheng, Ruslan Khalitov ,Tong Yu, Jing Zhang and Zhirong Yang, “Classification of long sequential data using circular dilated convolutional neural networks”, Elsevier, Neurocomputing, Volume 518, 21 January 2023, Pages 50-59, https://doi.org/10.1016/j.neucom.2022.10.054, DOI: https://doi.org/10.1016/j.neucom.2022.10.054

Mudi Jiang, Jiaqi Wang, Lianyu Hu, Zengyou He, “Show moreRandom Forest Clustering for Discrete Sequences”, Pattern Recognition LettersVolume 174, October 2023, Pages 145-151, ELSEVIER, https://doi.org/10.1016/j.patrec.2023.09.001 DOI: https://doi.org/10.1016/j.patrec.2023.09.001

Junjie Dong, Xinyi Yang, Mudi Jiang, Lianyu Hu, Zengyou He, “Interpretable Sequence Clustering”, Information Sciences, Volume 689, January 2025, 121453, https://doi.org/10.1016/j.ins.2024.121453 DOI: https://doi.org/10.1016/j.ins.2024.121453

PanelLianyu Hu, Mudi Jiang, Xinying Liu, Zengyou He, “Significance-based decision tree for interpretable categorical data clustering”, Information SciencesVolume 690, February 2025, 121588 DOI: https://doi.org/10.1016/j.ins.2024.121588

Goodfellow, I., Bengio, Y., &Courville, A. (2016). Deep Learning.MIT Press.

Hochreiter, S., &Schmidhuber, J. (1997)."Long short-term memory." Neural Computation, 9(8), 1735-1780.Neural Computation, Volume 9, Issue 8 Pages 1735 – 1780, https://doi.org/10.1162/neco.1997.9.8.173 DOI: https://doi.org/10.1162/neco.1997.9.8.1735

B. Mateus, M. Mendes, and J.T. Farinha, “Comparing LSTM and GRU Models to Predict the Condition of a Pulp Paper Press,” October 2021, DOI:10.3390/en14216958 DOI: https://doi.org/10.3390/en14216958

Downloads

Published

29-04-2024

Issue

Section

Research Articles

How to Cite

[1]
Cheruku Sudarsana Reddy and Dr. O. Nagaraju, “Sequence Data Classification using Decision Trees with Novel Data Splitting Techniques”, Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, vol. 10, no. 2, pp. 957–978, Apr. 2024, doi: 10.32628/CSEIT24102145.