Design Concept for a Strictly Anomaly-Based Web Intrusion Detection System

posted onJanuary 14, 2002

by hitbsecnews

----| 1. Introduction

Intrusion Detection is one of the hottest security technologies around. It is
also one of the fastest growing technology in terms of research and development.
There are a lot of commercial IDS products available today. These products
have their strengths and weaknesses, but most of them rely on one detection
mechanism - misuse detection. These products can detect various attacks, but
none (as far as I know) is specifically designed for monitoring web intrusion.

This paper outlines, in very general detail, the concept for a strictly
anomaly-based web intrusion detection system. I haven't done much research
into the area of web intrusion detection, thus some of you may find that this
paper is too broad, or lack direction. My purpose here is to introduce the
concept, the rationale behind the concept, and hopefully generate enough
interest from the security community to collobarate on further research and
development of such a product.

----| 2. Anomaly Detection

Anomaly detection is a process of identifying abnormal or unusual events or
behaviors on a network or host. Anomaly detection works on the assumption that
attacks are different from "normal" (legitimate) activities and can be detected
by anomaly detectors.

In anomaly detection, rule base are defined as a set of allowable system
or user behaviors, and anything that falls outside the set is anomalous.
One of the techniques used to detect anomalous behavior is statistical
analysis.

----| 3. Misuse Detection

Misuse detection on the other hand, is a process of detecting attacks based on
a defined set of attack rules or signatures. This method is basically pattern
matching, where input streams (such as TCP/IP packets) are analyzed and its
features (TCP header, payload, etc) are compared to a signatures database
to detect intrusions. Thus, the IDS engine works on the premise of "I saw an
attack, I did not see an attack".

In misuse detection the rule base is a set of attack patterns, while in
anomaly detection, the rule base is a set of normal usage patterns.

----| 4. Design Concept

How does anomaly detectors distinguish from normal and abnormal behaviors?
Several techniques have been developed and implemented over the years. However,
there are no available IDS (either commercial or Open Source) that are strictly
anomaly-based, or contains some anomaly detection capabilities. Most of the
anomaly detectors remain in research institutions, academia, and military and
government organization.

In designing the concept for a strictly anomaly-based web IDS, I will rely on
the request and response mechanism of HTTP sessions. This means that anomaly
detection depends on web server's activities - the request from client, and
its expected output.

The anomaly detection will be implemented on two detection components,
Request Detection and Response Detection.

----| 5. Request Detection

In a client server model such as HTTP, each request to the server must generate
certain response. For example, the expected response of a GET / HTTP/1.0
request should be the index file on the document root of the web server. The
header of the response will be in the form of

HTTP/1.1 200 OK
Date: Tue, 08 Jan 2002 08:24:52 GMT
Server: Apache
Last-Modified: Fri, 14 Sep 2001 11:13:19 GMT
ETag: "8699f-a17-3ba1e64f"
Accept-Ranges: bytes
Content-Length: 2583
Connection: close
Content-Type: text/html

The rest of the data consists of HTML documents, in this case, the main page
of the web site.

The rule base for anomaly detection is generated based on the request and
response data of the HTTP session. However, to generate all possible
combination of requests and responses will be undesireable.

The best source for the rule base is the web server's log file. For Apache,
access_log contains the exact request issued by the client to the server.
Normal or legitimate request can be constructed from the log file to create
a rule base for the Request Detection.

The IDS will then detect anomalous request based on the rule base that was
created. How do we do this?

5.1 Request Rule Base

Example: Contents for website like http://www.hackinthebox.org is dynamically
generated. To read an article, a user sends a HTTP request like this:

http://www.hackinthebox.org/article.php?sid=5555

The sid field can be any number. The rule base can be designed in such a way
that if the string followed by "sid=" is not a combination of digits, the
request is considered anomalous.

For example:

article.php?sid='='' OR '''
article.php?sid=""

The same technique can be applied to query strings that contain more than
one parameter.

For example:

http://victim/users/login/index.php?url=/archive/download.php?cat=Windo…

An anomalous request can be detected if the variable "url" or "file" is
something else other than "/windows/cracking" and "/archive/download.php",
for example "/../../etc/passwd"

5.2 Allowable Deviation in Request Rule Base

The example above led to another thought, that is the detection can be done
using "allowable" deviation in the query strings. This deviation measurement
can be defined in the rule base as well.

In detection intrusions, a measure of deviation is calculated based on the
rule base. If the deviation falls outside a certain threshold value, the
request is anomalous. This technique however, requires statistical analysis.
The advantage of this technique is that deviations in web requests are not
common. This makes the measurement easy.

----| 6. Response Detection

Another information source for the anomaly detector is the HTTP response.
HTTP response tells a lot about the "intent" of the request. Errors such
as "Forbidden", "File not found", "Directory listing of ..." can be good
indicators of anomalous intent.

The rule base generation for Response Detection will be similar to the Request
Detection.

There are a few advantages to this approach. One is that it helps the webmaster
to fix problem areas in the webserver. Too many "Forbidden" message may
indicate malicious intent. This will prompt to webmaster to redirect the
request to a default "Not Found" page, thus limiting the scope of available
information gathering source for the attacker.

Second, it helps the webmaster to monitor what data is actually sent by the
server to the requesting party. Here, the approach of "allowable" response
data is applicable. For example, the format of the page that is sent may
include banner ads, navigation bar, and some body of text data. If the
banner or navigation bar is missing in the response data, then original request
is anomalous. The "allowable response" can also cater to changes in contents,
such as dynamic contents generated by websites like hackinthebox.org.

----| 7. Advantages of Strictly Anomaly-Based Web Intrusion Detection System

7.1 Very low false positive

One of the prominent problems that arise in IDS implemention, especially misuse
detection is that of high false positive. Misuse detection rely on strict
pattern matching. Snort, for example, uses well known anomalous packet
fingerprints (signatures) to detect known attack patterns. This means that any
packet that matches the signature will trigger an alert.

In the concept that I've explained, where rules are inverted and "allowable"
deviation is introduced, false positives can be suppressed to a very low
level. The explaination is simple: in misuse (signature based) detection,
the detector relies on a set of events that are not permitted, i.e those that
represent attacks. Anomaly-based detection involves defining set of rules that
are permitted. Because of this, the number of attacks in the misuse set can
never be greater than the "not use" set. This means that all attack resides
on the "not use" set, even those that are unknown.

7.2 Efficient and Accurate

If one were to use misuse detection, one needs to update the signature
database constantly. Furthermore, the signature database needs to include
all possibilities of attack. This makes the analyzer slower since it has to
compare the input data (in our case, HTTP query strings) to each signature
until it finds a match. With anomaly-based, the rule set for normal
behavior can be kept small. Statistical analysis allows the IDS to use a
smaller rule set for detection.

7.3 Separation from other function of IDS

IDS products nowadays tend to do all things at once: detecting port scan,
buffer overflow, backdoors, etc. Separating the duties for web intrusion
attack become essential as more and more attacks are directed against
web servers. This makes management, analysis and response of intrusion data
easier, thus greatly taking a lot burden off the security analyst. This
also removes the complexity of monitoring intrusion in a heteregenous
environment.

7.4 Easy maintenance of rule base

In anomaly detection, the rule base can be generated from log files with
minimal analysis. This eliminates the need to custom made rule base that will
contain normal behavior, since the data is already supplied by the web server
itself.

----| 8. Limitations of Strictly Anomaly-Based Web Intrusion Detection System

One of the inherent problem that I found in this concept is that the web
server is still susceptible to denial of service attack. A simple threaded
process such as

lynx -traversal -crawl http://www.pasarborong.com

(pasarborong.com, I really mean no harm here :)

when run from several client machines for a long period of time will slow down
the web server. Furthermore, since the request falls inside the "normal"
rule base, this kind of attack will be difficult to detect.

However, it is not susceptible to "blinding the operator" attack as discussed
in my previous article, since the attacks can be easily detected (it falls
out the request rule base) and verified by the Response Detection mechanism of
the system.

----| 9. Summary

This paper is a rough design concept for a strictly anomaly-based web intrusion
detection. In this paper, I outlined two techniques, Request Detection and
Response Detection. The rule base for Request Detection is generated from
web server log files, such access_log and error_log for Apache web server.
This rule base contains normal request. Any request that does match the rules
in the rules based are anomalous. Another techniques is the "allowable" query
strings, in which deviations from the rule base is allowed, based on
statistical analysis (I am yet to define a method for the statistical analysis).

The second techniques is Response Detection, which is similar to the Request
Detection.

----| 9. Future Work

I welcome future discussion and collaboration if anyone is interested to
pursue this project. There are lot more thoughts and research that needs to be
done. For example, I haven't made it explicit where the data source for
Response Detection might be from. This could be an analyzer designed to
capture HTTP response packet, analyze and parse the packet, and generate
the reponse detection rule base. The technique for "allowable" request and
response is also not researched yet. Also, techniques to detect and avoid
denial of service attack on the web server needs to be researched as well.

----| 10. References

[1] Rebecca Bace, "Intrusion Detection", Macmillan Technical Publishing, 2000

[2] spoonfork, "Exploiting Weaknesses in Intrusion Detection System
Implementation", http://www.hackinthebox.org/article.php?sid=4654

[3] RFC2616, Hypertext Transfer Protocol, ftp://ftp.isi.edu/in-notes/rfc2616.txt

1.) Design Concept for a Strictly Anomally Based Web Intrusion Detection System - spoonfork
2.) The 2nd Annual 20 Worst People, Places and Things on the Internet 2001 - Archfiend
3.) NetStat - An overlook at market locked out commands - Kn¿ght
4.) The Acorn PC - logik
5.) The new iMac: Symbolic of what is wrong in technology - Dietcoke
6.) Interview: The Womb - A digital rebirth of sound - L33tdawg
7.) Review: Return to Castle Wolfenstein - L33tdawg
8.) Two years and still kicking - L33tdawg
9.) Of broadband gimmicks and the like - biatch0
10.) The Real McCoy - Dinesh Nair