OptIForest: Optimal Isolation Forest for Anomaly Detection. (arXiv:2306.12703v1 [cs.LG])

OptIForest: Optimal Isolation Forest for Anomaly Detection. (arXiv:2306.12703v1 [cs.LG])
By: <a href="http://arxiv.org/find/cs/1/au:+Xiang_H/0/1/0/all/0/1">Haolong Xiang</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_X/0/1/0/all/0/1">Xuyun Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Hu_H/0/1/0/all/0/1">Hongsheng Hu</a>, <a href="http://arxiv.org/find/cs/1/au:+Qi_L/0/1/0/all/0/1">Lianyong Qi</a>, <a href="http://arxiv.org/find/cs/1/au:+Dou_W/0/1/0/all/0/1">Wanchun Dou</a>, <a href="http://arxiv.org/find/cs/1/au:+Dras_M/0/1/0/all/0/1">Mark Dras</a>, <a href="http://arxiv.org/find/cs/1/au:+Beheshti_A/0/1/0/all/0/1">Amin Beheshti</a>, <a href="http://arxiv.org/find/cs/1/au:+Xu_X/0/1/0/all/0/1">Xiaolong Xu</a> Posted: June 23, 2023

Anomaly detection plays an increasingly important role in various fields for
critical tasks such as intrusion detection in cybersecurity, financial risk
detection, and human health monitoring. A variety of anomaly detection methods
have been proposed, and a category based on the isolation forest mechanism
stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest
is often employed as a state-of-the-art detector for real deployment. While the
majority of isolation forests use the binary structure, a framework LSHiForest
has demonstrated that the multi-fork isolation tree structure can lead to
better detection performance. However, there is no theoretical work answering
the fundamentally and practically important question on the optimal tree
structure for an isolation forest with respect to the branching factor. In this
paper, we establish a theory on isolation efficiency to answer the question and
determine the optimal branching factor for an isolation tree. Based on the
theoretical underpinning, we design a practical optimal isolation forest
OptIForest incorporating clustering based learning to hash which enables more
information to be learned from data for better isolation quality. The rationale
of our approach relies on a better bias-variance trade-off achieved by bias
reduction in OptIForest. Extensive experiments on a series of benchmarking
datasets for comparative and ablation studies demonstrate that our approach can
efficiently and robustly achieve better detection performance in general than
the state-of-the-arts including the deep learning based methods.

Provided by:



Moderator and Editor