S.M.A.R.T. System: Behavior-based machine learning for cybersecurity
American Military University
October 22nd, 2017
Research paper for the class “IT Security: Attack and Defense”
As the internet began to grow throughout the 1980’s and into, the 1990’s so too did the data. The internet was a long ways from the nineteen nodes it started back in in the mid-1960’s as ARPANET (Kleinrock, 2010). As networks grew more extensive and more international, so did the data. Around the same time as ARPANET, or slightly prior, Richard Bellman was starting to develop the idea of artificial intelligence (Arel, Rose, & Karnowski, 2010).
Artificial intelligence can be easily described as a non-human form of intelligence. Many could argue a simple computer is an artificial intelligence in the way it processes information and makes decisions. However, with Bellman on the discovery of artificial intelligence in the 1950’s and throughout the modernization of the computer and networking world, A.I., as it is called was progressing.
During the technology revolution of the late 1990’s and into the 2000’s, many companies use some form of artificial intelligence in their products. Apple smart products, iPhone, and iPad use “Siri” who is artificial intelligence based. When the user says something, “Siri” can translate what was said into text, then convert it into a search or command on the phone.
As consumers, it is almost everywhere in products, from cameras to televisions, to even child toys. They all seem to have an aspect of artificial intelligence to them. However, they are not quite to the robotic stage that Hollywood often represents.
The question becomes, why isn’t this technology being used for good? Is law enforcement using artificial intelligence? Militaries, do they use it? What about healthcare, is it used in that field? This paper is going to address using a subset of artificial intelligence called machine learning to secure a network from intrusion.
Network and cyber security are significant problems around the world today. Over 60% of the American economy is made up of assets that are vulnerable to adversaries over a network (Slate, 2009). The technology is becoming so pronounced and substantial, it is already possible, or will be within five-years, to use artificial intelligence and machine learning to harden one’s system. As readers will see, machine learning is already being used in some forms of cybersecurity, and it would be a good bet the government has implemented machine learning to process the massive amounts of data it receives as well. However, using it to protect a local area network from intrusion is the key, and what the possibilities of combining machine learning and security are.
Part 1. Machine Learning
What is Machine Learning?
As explained earlier, Machine learning is a subset of Artificial intelligence (Arel et al., 2010). Machine learning is nothing new and has been around since the seventies when computer and math scientist first started using algorithms to make predictive outcomes from data (Louridas & Ebert, 2016). Today’s use of machine learning is not much different than it was fifty years ago, the only difference is in the data (Louridas & Ebert, 2016). The amount of data that is typically needed to train a machine is a lot, much more than it was in the 1970’s. Because if the massive amounts of data needed, it means processing power must be more massive; ram cache must be more extensive; the entire system has to be more significant and much faster.
In the simplest terms, machine learning is relatively close to how humans learn with a little extra mathematics involved. Humans learn from being shown how to do something; repetitiveness leads to memory. If someone wants to know how to edit a photo in Adobe Photoshop he or she can go to YouTube and watch as many videos as it takes for them to learn how to edit the photo. At the same time, they are practicing editing the photo and evaluating their work. They keep practicing until he or she sharpens their skills enough to where they do not need to practice anymore to be somewhat successful.
In another example is behavioral analysis in the form of forensics can be tied to the “process” of machine learning. In forensic psychology, profiling, or behavior analysis, whichever one might label it, these all mine data, process data, learn from the data and export a predictive function based on the information they have studied.
In machine learning, the process is similar only it is algorithms that are computing an incredible amount of data. The YouTube videos from the human examples are ‘Data’ in the machine examples. However, they go through the same process via written code on a machine. There are typically seven steps in a standard machine learning session, these steps are; gathering data, preparing the data, choosing a model for training, training, evaluation, tuning, and prediction (Guo & Google, 2017).
There are several different models to choose from when training a machine, but for the process of network security, the best model would be more of a numerical based model due to binary language. Training the machine is the most important, but it is also the most difficult. It involves computer scientist and individuals who are experts at data and data compression.
Applying Machine Learning to IT Security
Information security already utilizes some form of artificial intelligence or machine learning in some of its technology, that is nothing new. Most of the technology comes from virus protection software that scans and analyzes programs, software, apps, files, etc. on a computer. These programs use what’s considered a virus library and run the known viruses they have stored against any file on a computer (Kaur, Khalsa, & Dhesian, 2016).
As technology is getting better, antivirus programs are moving in the direction of behavioral modeling (Zhang, Raghunathan, & Jha, 2014) some are even using machine learning basics to detect viruses. Behavioral modeling is one of the foundations of the SMART system. However, the newer technology for antivirus is going after how programs usually act, their behavior (Kamesh & Sakthi Priya, 2014), then when they detect a change; it alerts the software something may be wrong.
This is the concept of the SMART system, however on a much larger scale. Think of IBM Watson. IBM Watson is comprised of somewhere around twenty-five-hundred computing cores throughout its entire processing core (Ferrucci et al., 2010). Watson and its multiple nodes can be preloaded with massive amounts of data and trained on just about anything. It beat the world record holder in Jeopardy using algorithms.
A system made for one single Local Area Network does not need twenty-five-hundred computing cores for processing strength. However, Watson gives the reader an idea of what the SMART system is about. Watson knew everything about, well, everything (Ferrucci et al., 2010). This is the power that the one part of the SMART system will harness over all the nodes inside the LAN. The other two sections of the system will work like a typical antivirus system with a database and an incident response center. They both will use much smaller hardware to detect and contaminate malicious traffic. This paper will speak more about the machine learning side and how it mitigates security threats than the antivirus side.
Without getting into the math, as mentioned in the first half of part one, machine learning is about teaching the machine everything about an object. For our example, we will use a desktop workstation. The SMART system needs data from this desktop labeled PC1 to conduct its hierarchical learning process about every single piece of information from PC1. This means the processor’s name, clock speed, temperature, DRAM usage, SRAM usage, cache usage, all the open and closed ports, the programs installed, files, keystrokes, accounts, etc. The entire goal of the training and evaluation phase is to know PC1 inside and out, have the data to recognize a pattern of behavior when it opens up an application or a particular file (Arel et al., 2010).
This will all be stored, processed, and sifted through to come out with a profile of that workstation. The SMART system will know how that PC1 is supposed to act at all times. Since before doing this, weighted measurements were placed in that would show the DRAM overworking or the cache overclocking, or a scan coming into a port. The SMART system is now adequately trained on two different outcomes and knows what to do when that happens because he has been programmed to. For example, if a port scan was conducted on PC1 ports 43/77/1134, the SMART system generally under normal behavior might show them at 0.0002hz. However, since the scan has occurred, there is a bump in all three ports of 0.0003hz. This slightest change causes the SMART system to immediately send a response to the Incident Response Center who deals with the incident.
This is the purpose of the training for the smart system. It is to gain the knowledge it needs to know exactly everything about the entire network and how it usually operates. Due to the fact, a typical network may handle more or fewer data during different times, that also has to be accounted for during the training process. This is why the mathematics of the training is not linear. Instead of it being one linear line if you will, it is multiple lines with multiple slopes (Guo & Google, 2017). See Fig. 1-3 to show the example from (Guo & Google, 2017) as they walked through the first phase of training.
With one part of the machine knowing exactly how the machine responds when every program is opened when every port is used, and every connection is connected, it uses this information and compares it to current information it is getting. This is how the process works. Just as in the port example above, there are several other examples too where a host’s behavior and pattern will change alerting the machine that it needs to send this incident to the Incident Response Center.
Incident Response Center
The Incident Response Center is the next step in the process of the entire ecosystem where the machine learning aspects have identified an incident, and now it is time to look further into it. The incident response section or center on this system is run by a computer separate from the learning section. The goal of the incident response is to identify the incident along with containing it immediately. In a typical incident command structure, there would be an identification, analyzing the scene and determining the damage, and contaminating the incident if it was malicious (Oriyano, 2012). With this incident response center, the process is still reasonably the same. However, the priority is gaining knowledge on the incident itself.
If the incident is deemed to be malicious, the incident response center will contain the file or seal the port or area so nothing malicious can occur. It may automatically disconnect the workstation from the internet disabling the card. Once everything is contained, the process of evaluating the event will occur to see what happened along with notifying the user or administrator. This would all be done in under five seconds time. The administrator could then log in to the SMART system and look at the malicious content as the SMART system begins to run it through a preprogrammed and continuously updated database.
At the same time, the SMART system knows precisely where the malicious material came from and logged it for the report along with preloaded instructions on how to fix that issue from the database. This is printed out or emailed to the administrator as a final report so the firm or person can debrief others and investigate the malicious material as they wish. The malicious content is taken out of the system, logged into the national database and shredded. Alternatively, if it were just a scan, nothing further would be done except propose to the administrator they close that port for good as it could be a vulnerability.
Part 2. Vulnerabilities and Theoretical Application
Websites can have many vulnerabilities to them. There are DDOS attacks that occur when multiple people send traffic to a web server, and it overflows the servers ability to handle that much traffic (Stewart, 2014). Applications from websites or applications based on a web server can also have open port vulnerabilities, but one of the most common vulnerabilities are the developers themselves. There is a slew of exploits that an adversary can use to attack a web server from SYN flooding where the attacker sends out a constant stream of packets with a return receipt. However, the return address is not correct and this causes the system to flood.
Another conventional attack on a web server is a buffer overflow, and this is an attack on a PC as well. The exploit makes it into the host, either a server or a PC, and goes directly to the DRAM and figures out how much storage it needs to go outside the stack just enough to inject malicious code (Chien & Szor, 2002). Once the correct size of storage is found from the stack, the code is injected, and this code can do a lot while the host DRAM is wholly overloaded (Chien & Szor, 2002). Large viruses have used this exploit such as the ‘Code Red’ virus in 2001. However, it is important to point out that to perform this exploit, the adversary either has to know how large the stack is or do a test to figure out the initial size of storage to consume so there would be a behavior change in the host DRAM. See Fig. 3 for example of buffer overflow.
Database exploits are widespread. Cross-site scripting and SQL exploits are two common ways adversaries get into databases. Databases are very lucrative for all hackers. No matter if they are a hacktivist or a crime ring, a database and achieving root can get the sensitive data.
Cross-site scripting works on the level of website forms, and the vulnerability comes from website developers who either forgot to or didn’t write into the source code a function for handling script. When someone types in a less than or greater than sign and an error page does not appear saying the form did not accept that character, the form may be vulnerable. There are several scripts that hackers could use to gain access to the database. However, they all maintain characters not generally used by people on the site. The characters would show up as a different weight when making comparisons with the SMART system.
The same method is used for SQL injection where a more in-depth script is used to try and breach the entire database to inject malicious material or grab things from it. The SMART system would see the ‘weird’ characters and immediately know something is going on in milliseconds.
Phishing attacks are where an adversary usually sends an email to a user with the hopes the user clicks on the email and gives up personal information (Wright & Marett, 2010). These emails are typically well designed to look exactly like the real company or organization and are made to scam to the user. Unfortunately, technology has lacked in this arena. There are spam filters that try and scan the websites against a database or try to use its algorithms to identify it as spam, however, in the majority of cases, email is a user based vulnerability (Wright & Marett, 2010).
When it comes to after the fact, after the user has possibly downloaded a virus, that is when the SMART system would kick in and would be able to trace it. However, having a robust security plan already in place with straight-forward policies and consequences for breaking those policies is extremely important when it comes to aspects of user-based vulnerabilities.
Every host out there has a vulnerability to a virus. A virus can come in the form of almost anything. A downloaded file, an injection, a website; security is never going to be 100%. However, when a virus enters a host, it will affect the host in some way, even in the smallest form.
It will cause the behavior of that host to change in some form or another. A trojan horse that a user downloaded as a ‘fun game’ but now let in a backdoor for a hacker will change the behavior. “While every day this host has been asleep at 2300 hours, now it is wide awake, and it is downloading administration files” – SMART system. Malicious content that attacks root systems will affect the normal behavior and patterns of that host. There might be an uptick in the CPU clock that is traced back to the root directory or a port.
No matter the virus there will be a change, the perfect example was mentioned before about the ‘Code Red’ virus that used a buffer overload to inject itself (Chien & Szor, 2002). Back then IT security professionals, especially antivirus professionals had no idea how to fix it by detection according to the authors of this article (Chien & Szor, 2002). Times changed and technology with it, we know viruses affect a host, antivirus specialist should be concentrating on the effect as a detection mechanism.
Human behavior is an exciting topic; we have collegiate degrees on it, we even have people who make a nice living trying to understand human behavior and then monitoring this behavior. The U.S. has behavioral analysts, Behavioral Detection Officers, whose job it is to watch people at airports and look for specific behavior indicators (Wigginton, Jensen, Graves, & Vinson, 2014).
As humans, we look at behavior and patterns all the time to determine threats. Criminal Intelligence Units use pattern analysis quite often to predict where an offender might strike next for example (Carter, 2004). For as much as humans do this for other humans to notice threats, targets, vulnerabilities, the question should be why hasn’t the IT security field been taking this approach to protecting a network? Studying network behavior and identifying or predicting changes.
Machine learning at this level does not exist that this writer is aware of. This entire unit would need a minimum of three separate computers, the first being the learning computer with a processor up near the hundreds of cores and hundreds of TB of cache. It would more than likely need no less than TB of DRAM or 2 Petabytes of DRAM. The internal storage would have to be flash storage as hard disk would be too slow, and it is unknown to this author how much internal storage would be needed. The two other sections of the SMART system would not have to be this large. However, they would have to reasonably large compared to typical computer systems.
It is possible to build a system such as this, IBM has created such a device. However, this is a futuristic device due to the knowledge it must obtain. As for now, software or hardware that is more behavior-based should be implemented for network security to be proactive instead of reacting to a database and scanning a file as it comes in.
Arel, I., Rose, D., & Karnowski, T. (2010). Deep machine learning-A new frontier in artificial intelligence research. IEEE Computational Intelligence Magazine, 5(4), 13–18. https://doi.org/10.1109/MCI.2010.938364
Carter, D. L. (2004). Law Enforcement Intelligence: A Guide for State, Local, and Tribal Law Enforcement Agencies (1st Editio). Washington, D.C. Retrieved from https://ric-zai-inc.com/Publications/cops-w0277-pub.pdf
Chien, E., & Szor, P. (2002). Blended attacks exploits, vulnerabilities and buffer-overflow techniques in computer viruses. Virus Bulletin Conference, (September), 1–36.
Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. a., … Welty, C. (2010). Building Watson: An Overview of the DeepQA Project. AI Magazine, 31(3), 59–79. https://doi.org/10.1609/aimag.v31i3.2303
Guo, Y., & Google. (2017). The 7 Steps of Machine Learning. USA: YouTube. Retrieved from https://www.youtube.com/watch?v=nKW8Ndu7Mjw
Kamesh, & Sakthi Priya, N. (2014). Security enhancement of authenticated RFID generation. International Journal of Applied Engineering Research, 9(22), 5968–5974. https://doi.org/10.1002/sec
Kaur, G., Khalsa, G. N., & Dhesian, B. S. (2016). Network security : anti-virus. International Journal of Advanced Research in Computer Science, 7(6), 79–85.
Kleinrock, L. (2010). An early history of the internet. IEEE Communications Magazine, 48(8), 26–36. https://doi.org/10.1109/MCOM.2010.5534584
Louridas, P., & Ebert, C. (2016). Machine Learning. IEEE Software, 33(5), 110–115. https://doi.org/10.1109/MS.2016.114
Oriyano, S.-P. (2012). Hacker techniques, tools, and incident handling (2nd Editio). Burlington, MA: Jones & Bartlett Learning, LLC Publications.
Pound, M., & Computer Science at the University of Nottingham. (2016). Buffer Overflow Attack. Youtube. Retrieved from https://www.youtube.com/watch?v=1S0aBV-Waeo
Slate, R. (2009). Competing with intelligence: New directions in China’s quest for intangible property and implications for homeland security. Homeland Security Affairs, 5(1), 29. Retrieved from http://search.proquest.com.ezproxy2.apus.edu/docview/1266213070/fulltextPDF/3BA31AD0F0634D73PQ/1?accountid=8289
Stewart, J. (2014). Network security, firewalls, and VPNs (2nd Editio). Burlington, Vermont: Jones & Bartlett Learning. Retrieved from https://online.vitalsource.com/#/books/9781284107715/cfi/6/2!/4/2/2@0:0
Wigginton, M., Jensen, C. J., Graves, M., & Vinson, J. (2014). What Is the Role of Behavioral Analysis in a Multilayered Approach to Aviation Security? Journal of Applied Security Research, 9(4), 393–417. https://doi.org/10.1080/19361610.2014.942828
Wright, R. T., & Marett, K. (2010). The Influence of Experiential and Dispositional Factors in Phishing: An Empirical Investigation of the Deceived. Journal of Management Information Systems, 27(1), 273–303. https://doi.org/10.2753/MIS0742-1222270111
Zhang, M., Raghunathan, A., & Jha, N. K. (2014). A defense framework against malware and vulnerability exploits. International Journal of Information Security, 13(5), 439–452. https://doi.org/10.1007/s10207-014-0233-1