Neural Networks in Statistical Anomaly Intrusion Detection - CiteSeerX

Judith Stewart | Download | HTML Embed
  • Jan 1, 0000
  • Views: 25
  • Page(s): 6
  • Size: 149.95 kB
  • Report



1 Neural Networks in Statistical Anomaly Intrusion Detection ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech., University Heights, Newark, NJ 07102, USA Department of Mathematics, CUNY, Convent Ave. at 138 ST., New York, NY 100031, USA Network Security Solutions, 15 Independence Blvd. 3rd FL., Warren, NJ 07059, USA Abstract: - In this paper, we report on experiments in which we used neural networks for statistical anomaly intrusion detection systems. The five types of neural networks that we studied were: Perceptron; Backpropagation; Perceptron- Backpropagation-Hybrid; Fuzzy ARTMAP; and Radial-Based Function. We collected four separate data sets from different simulation scenarios, and these data sets were used to test various neural networks with different hidden neurons. Our results showed that the classification capabilities of BP and PBH outperform those of other neural networks. Key-Words: - Security, Intrusion Detection, Statistical Anomaly Detection, Neural Network Classification, Perceptron, Backpropagation, Perceptron-Backpropagation-Hybrid, Fuzzy ARTMAP, Radial-Based Function system that uses statistical models and neural networks 1 Introduction to detect attacks. The ubiquity of the Internet poses serious concerns on the security of computer infrastructures and the As the kernels of many anomaly IDS, neural networks integrity of sensitive data. Network intrusion detection have profound impacts on the system performance and is a very efficient approach to protect networks and efficiency, but little research has been completed computers from malicious network-based attacks. The which compares the output of neural networks as basic assumption of intrusion detection is that an applied to IDS problems. In this paper, we present our intruder's behavior will be noticeably different from experiments concerning the performances of five that of legitimate users. Intrusion detection techniques different types of neural networks. Section 2 can be partitioned into two complementary trends: introduces the statistical model that we are using. misuse detection, and anomaly detection. Misuse Section 3 describes the neural networks we tested. In detection systems, such as [1][2], model the known Section 4, we report the test bed and the attack attacks and scan the system for the occurrences of schemes we simulated. Some experimental results are these patterns. Anomaly detection systems, such as [3] also presented in that section. Section 5 draws some [7], flag intrusions by observing significant deviations conclusions and outlines future work. from typical or expected behavior of the systems or users. 2 Statistical Model Statistics have been used in anomaly intrusion Statistical Modeling and Neural Networks are widely detection systems [3]; however, most of these systems applied in building anomaly intrusion detection simply measure the means and the variances of some systems. For example, NIDES [3] represents user or variables and detect whether certain thresholds are system behaviors by a set of statistical variables and exceeded. SRIs NIDES [5][3] developed a more detects the deviation between the observed and the sophisticated statistical algorithm by using a 2-like standard activities. A system, which identifies test to measure the similarity between short-term and intrusions using packet filtering and neural networks, long-term profiles. Our current statistical model uses a was introduced in [4]. The work of Ghosh et al [7] similar algorithm as NIDES but with major studied the employment of neural networks to detect modifications. Therefore, we will first briefly anomalous and unknown intrusions against a software introduce some basic information about the NIDES system. In [8], we presented the prototype of a statistical algorithm. hierarchical anomaly network intrusion detection

2 In NIDES, user profiles are represented by a number k k of probability density functions. Let S be the sample Q = f ( N ).[ p i' p i + max ( pi' pi )] i =1 i =1 space of a random variable and events E1 , E 2 ,..., E k a mutually exclusive partition of S . Assume p i is the Layer-Window M Event Buffer Reference Model expected probability of the occurrence of the event Ei , and let p i' be the frequency of the occurrence of E i ... during a given time interval. Let N denote the total number of occurrences. NIDES statistical algorithm Reference Layer-Window 2 Event Buffer Model used a 2-like test to determine the similarity between the expected and actual distributions through the Reference statistic: Layer-Window 1 Event Buffer Model k ( pi' pi ) 2 Q = N i =1 pi Event When N is large and the events E1 , E 2 ,..., E k are Report independent, Q approximately follows a 2 Fig. 1 Statistical Model distribution with (k 1) degrees of freedom. However in a real-time application the above two assumptions where f (N ) is a function that takes into account the generally cannot be guaranteed, thus empirically Q total number of occurrences during a time window. may not follow a 2 distribution. NIDES solved this problem by building an empirical probability Besides similarity measurements, we also designed an distribution for Q which is updated daily in a real- algorithm for the real-time updating of the reference time operation. model. Let p old be the reference model before In our system, since we are using neural networks to updating, p new be the reference model after updating, identify possible intrusions, we are not so concerned and p obs be the observed user activity within a time with the actual distribution of Q . However, because window. The formula to update the reference model is network traffic is not stationary and network-based attacks may have different time durations, varying p new = s p obs + (1 s ) p old from a couple of seconds to several hours, we need an in which is the predefined adaptation rate and s is algorithm which is capable of efficiently monitoring the value generated by the output of the neural network traffic with different time windows. Based on network. Assume that the output of the neural network the above observations, we used a layer-window is a continuous variable t between 1 and 1, where 1 statistical model, Fig. 1, with each layer-window means intrusion with absolute certainty and 1 means corresponding to one different detection time slice. no intrusion again with complete confidence. In between, the values of t indicate proportionate levels The newly arrived events will first be stored in the of certainty. The function for calculating s is event buffer of layer 1. The stored events will be t , if t 0 compared with the reference model of that layer and s= the results are fed into neural networks to detect the 0, otherwise network status during that time window. The event Through the above equations, we ensured that the buffer will be emptied once it becomes full, and the reference model would be updated actively for normal stored events will be averaged and forwarded to the traffic while kept unchanged when attacks occurred. event buffer of layer 2. The same process will be The attack events will be diverted and stored, for us as repeated recursively until it arrives at the top level attack scripts, in neural network learning. where the events will simply be dropped after processing. 3 Neural Networks The neural networks are widely considered as an The similarity-measuring algorithm that we are using efficient approach to adaptively classify patterns, but is shown below: the high computation intensity and the long training

3 cycles greatly hinder their applications. In [4][7], BP small backpropagation network. PBH networks are neural networks were used to detect anomalous user capable of exploring both linear and nonlinear activities. In [8], we deployed a hybrid neural network correlations between the input stimulus vectors and the paradigm [6], called perceptron-backpropagation- output values. We tested PBH networks with the hybrid (or PBH) network, which is a superposition of a number of hidden neurons ranging from 1 to 8. perceptron and a small backpropagation network. In order to comprehensively investigate the performances of neural networks, we examined five different types of neural networks: Perceptron, BP, PBH, Fuzzy ART MAP and RBF. The perceptron [9], Fig. 2, is the simplest form of a neural network used for the classification of linearly separable patterns. It consists of a single neuron with adjustable synapses and threshold. Although our data sets will not, in general, be linearly separable, we are Input H idden Output using perceptron as a baseline to measure the Layer Layer Layer performances of other neural networks. Fig. 4 PBH architecture Inputs x1 Fuzzy ARTMAP [10] in its most general form is a system of two Fuzzy ART networks ARTa and ARTb x2 whose F2 layers are connected by a subsystem referred Output to as a match tracking system. We are using a y simplified version of Fuzzy ARTMAP [11], Fig. 5, xN-1 which is implemented for classification problems. We Threshold tested ARTMAP networks with the number of xN category neurons ranging from 2 to 8. Error Signal Fig. 2 Perceptron architecture The Backpropagation network [9], or BP, Fig. 3, is a Fuzzy ART multiplayer feedforward network, which contains an x1 C1 input layer, one or more hidden layers, and an output layer. BPs have strong generalization capabilities and x2 C2 have been applied successfully to solve some difficult and diverse problems. We tested BP networks with the number of hidden neurons ranging from 2 to 8. x P-1 xP C 2P Input Complement Catergory Output Layer Layer Layer Layer Fig. 5 Fuzzy ARTMAP architecture Radial-basis function network [9], or RBF, Fig. 6, involves three entirely different layers. The input layer is made up of source nodes. The second layer is a Input Hidden Output hidden layer of high enough dimension, which serves Layer Layer Layer a different purpose from that in a BP network. The output layer supplies the response of the network to Fig. 3 BP architecture the activation patterns applied to the input layer. We Perceptron-backpropagation hybrid network [6], or tested RBF networks with hidden neurons ranging PBH, Fig. 4, is a superposition of a perceptron and a from 2 to 8.

4 Typical Traffic Attack Traffic Scenario 1 600kbps 50kbps x1 G Scenario 2 600kbps 100kbps x2 Scenario 3 2Mbps 50kbps Scenario 4 2Mbps 100kbps G Table 1 Traffic Loads of The Four Simulatio Scenarios xP-1 4.2 Results For each simulation scenario, we collected 10,000 xP G records of network traffic. We divided these data into Input Hidden Output two separate sets, one set of 6000 data for training and Layer Layer of Layer Green's the other of 4000 data for testing. In each scenario, the Functions system was trained for 100 epochs. We evaluated the performances of the neural networks based on the Fig. 6 RBF architecture mean squared root errors and the misclassification rates of the outputs. The misclassification rate is In our experiments, we used NeuralWorks defined as the percentage of the inputs that are Professional II/PLUS to build all of the neural misclassified by neural networks during one epoch, networks depicted above. which includes both false positive and false negative misclassifications. 4 Experimental Results In this section, we will present our simulation In the rest of this section, we will present and analyze approach and the results in applying our statistical the simulation results of the neural networks one by models and the different neural networks to detect one. network-based attacks. First the testbed configuration and the simulation specifications will be introduced in 4.2.1 Perceptron subsection 4.1, and then subsection 4.2 reports the The mean squared root errors and the misclassification testing results. rates of the perceptrons within the four simulation scenarios are tabulated in Table 2. 4.1 Testbed We used a virtual network using simulation tools to MSR Error Misclass rate generate attack scenarios. The experimental testbed Scenario 1 0.685641 0.16725 that we built using OPNET, a powerful network Scenario 2 0.715895 0.202 simulation facility, is shown in Fig. 7. The testbed is a Scenario 3 0.738548 0.233889 10-BaseX LAN that consists of 11 workstations and 1 Scenario 4 0.635356 0.119444 server. Table 2 The simulation results of perceptrons We can see that the perceptrons performed poorly in all four scenarios: Mean squared root errors are between 0.6 and 0.7; and misclassification rates are between 0.1 and 0.2. Both the MSR errors and the misclassification rates are unacceptably high for an IDS. Fig. 7 Simulation Testbed 4.2.2 Fuzzy ARTMAP and RBF We simulated the udp flooding attack within the The results of Fuzzy ARTMAP and RBF nets are testbed. To extensively test the performances of neural shown in Fig. 8 to Fig. 11. The x-axes of the figures networks, we ran four independent scenarios with represent the number of category neurons in Fuzzy different typical traffic loads and attack traffic. Table 1 ARTMAP and the hidden neurons in RBF. The y-axes lists the traffic loads of the simulation scenarios. represent the lowest Mean Squared Root Errors and the lowest Misclassification Rates that these neural nets achieved within the 100 epochs.

5 1 0.5 scenario 1 scenario 1 0.9 scenario 2 0.45 scenario 2 scenario 3 scenario 3 0.8 scenario 4 0.4 scenario 4 0.7 0.35 Misclassification Rate 0.6 0.3 MSR Error 0.5 0.25 0.4 0.2 0.3 0.15 0.2 0.1 0.1 0.05 0 0 2 3 4 5 6 7 8 2 3 4 5 6 7 8 # o f c a tegory neurons # of hidden neurons Fig. 8 MSR errors of Fuzzy ARTMAP Fig. 11 Misclassification rates of RBF 0.5 From the above figures, we can see that, as the number 0.45 scenario scenario 1 2 of hidden neurons increases, the performances of both scenario scenario 3 4 ARTMAP and RBF networks improve. In most of the 0.4 cases, both of them outperformed perceptrons. 0.35 Misclassification Rate 0.3 4.2.3 BP and PBH 0.25 The results of BP nets are illustrated from Fig. 12 to 0.2 Fig. 15. 0.15 0.1 0.2 0.05 0.18 0 0.16 2 3 4 5 6 7 8 scenario 1 # o f c a tegory neurons 0.14 scenario 2 scenario 3 0.12 scenario 4 Fig. 9 Misclassification rates of Fuzzy ARTMAP MSR Error 0.1 1 0.08 scenario 1 0.9 scenario 2 0.06 scenario 3 0.8 scenario 4 0.04 0.7 0.02 0.6 0 2 3 4 5 6 7 8 MSR Error 0.5 # of hidden neurons 0.4 Fig. 12 MSR errors of BP 0.3 0.2 0.09 0.1 0.08 0 0.07 2 3 4 5 6 7 8 scenario 1 # of hidden neurons scenario 2 0.06 Misclassification Rate scenario 3 scenario 4 Fig. 10 MSR errors of RBF 0.05 0.04 0.03 0.02 0.01 0 2 3 4 5 6 7 8 # of hidden neurons Fig. 13 Misclassification rates of BP

6 PBH are more desirable for statistical anomaly intrusion detection systems. 0.2 0.18 Acknowledgements 0.16 Our research was partially supported by a Phase I scenario 1 0.14 scenario 2 SBIR contract with US Army. scenario 3 0.12 scenario 4 MSR Error 0.1 We would also like to thank OPNET Technologies, 0.08 Inc.TM, for providing the OPNET simulation software. 0.06 References: 0.04 [1] G. Vigna, R. A. Kemmerer, NetSTAT: a 0.02 network-based Intrusion Detection Approach, 0 1 2 3 4 5 6 7 8 Proceedings of 14th Annual Computer Security # of hidden neurons Applications Conference, 1998, pp. 25 34. Fig. 14 MSR errors of PBH [2] W. Lee, S. J. Stolfo, K. Mok, A Data Mining Framework for Building Intrusion Detection 0.09 Models, Proceedings of 1999 IEEE Symposium 0.08 of Security and Privacy, pp. 120-132. 0.07 [3] A. Valdes, D. Anderson, Statistical Methods for scenario scenario 1 2 Computer Usage Anomaly Detection Using 0.06 scenario scenario 3 4 NIDES, Technical report, SRI International, Misclassification Rate 0.05 January 1995. 0.04 [4] J. M. Bonifacio, et al., Neural Networks Applied 0.03 in Intrusion Detection System, IEEE, 1998, pp. 205-210 0.02 [5] H. S. Javitz, A. Valdes, the NIDES Statistical 0.01 Component: Description and Justification, 0 1 2 3 4 5 6 7 8 Technical report, SRI International, March 1993. # o f hidden neurons [6] R. M. Dillon, C. N. Manikopoulos, Neural Net Nonlinear Prediction for Speech Data, IEEE Fig. 15 Misclassification rates of PBH Electronics Letters, Vol. 27, Issue 10, May 1991, The figures indicate that BP and PBH networks have pp. 824-826. similar performances, and that both neural networks [7] A.K. Ghosh, J. Wanken, F. Charron, Detecting consistently perform better than the other three types Anomalous and Unknown Intrusions Against of neural networks. The curves in these figures are flat: Programs, Proceedings of IEEE 14th Annual the MSR errors and misclassification rates do not Computer Security Applications Conference, decrease as the number of hidden neurons increases. 1998, pp. 259 267 We believe the reason is that, because we only [8] Z. Zhang, et al, A Hierarchical Anomaly deployed one attacking technique, UDP flooding Network Intrusion Detection System Using attack, in our simulations, our data sets are too simple Neural Network Classification, to appear in for BP and PBH. In the future, we will incorporate Proceedings of 2001 WSES International more Denial-of-Service attacking techniques into our Conference on: Neural Networks and simulation, thus providing additional tests, and Applications (NNA 01), Feb. 2001 possibly greater challenges, for the neural networks [9] Simon Haykin, Neural Network A under consideration. Comprehensive Foundation, Macmillan College Publishing Company, 1994 5 Conclusions [10] G.A. Carpenter, et al, Fuzzy ARTMAP: An adaptive resonance architecture for incremental In this paper, we described our experiments of testing learning of analog maps, International Joint different neural networks for statistical anomaly Conference on Neural Networks, June 1992 intrusion detection. The results showed that BP and [11] NeuraWare Inc., Neural Computing A PBH nets outperform Perceptron, Fuzzy ARTMAP Technology Handbook for NeuralWorks and RBF. Thus, classification capabilities of BP and Professional II/PLUS and Neural Works Explorer, NeuralWare Inc., 1998

Load More