Tracking
and Tracing Spoofed IP Packets to Their Sources
1. INTRODUCTION
Although
access control technologies such as firewalls, are commonly used to prevent
network attacks, they
cannot
prevent some specific attacks, including TCP SYN flooding. Consequently, more
companies are
deploying
intrusion detection systems (IDS). The IDSs detect network attacks; however,
they don't let us
identify
the attack source. This is especially problematic with Denial of Service (DoS)
attacks, for example,
because
the attacker doesn't need to receive packets from the target host and thus can
remain hidden.
Several
efforts are in progress in many different research and business places around
the world to develop
source-identification
technologies to trace packets even when an attacker fakes its IP address.
The
purpose of IP traceback is to identify the true IP address of a host
originating attack packets. Normally,
we
can do this by checking the source IP address field of an IP packet. Because of
a sender can easily fake
this
information, however, it can hide its identity. If we can identify the true IP
address of the attack host,
we
can also get information about the organization, such as its name, and the network's
administrator email
address,
from which the attack originated. Existing IP traceback methods can be
categorized as proactive or
reactive
tracing. The proactive tracing detects attacks when packets are in transit
while the reactive tracing
starts
after an attack is detected.
Existing
IP traceback methods can be categorized as proactive or reactive tracing. The
proactive tracing
prepares
information for tracing when packets are in transit. If packets tracing is
required, the attack victim
(target)
can refer to this information to identify the attack source. Two proactive
methods – packet marking
[1]
and messaging [2] – have been studied and reviewed. In packet matching [1],
packets store information
about
each router they pass as they travel through the network. The recipient of the
marked packet can use
this
router information to follow the packet's path to its source. Routers must be
able to mark packets,
however,
without disturbing normal packet processing. In messaging approaches [2],
routers create and
send
messages containing information about the forwarding nodes a packet travels
through. The approach
relies
on the Internet control message protocol (ICMP).
The
reactive tracing starts tracing after an attack is detected. Most of the
methods trace the attack path from
the
target to its source (origin). The challenges are to develop effective
traceback algorithms and packetmatching
techniques.
Various proposals attempt to solve these problems. Among those studied
techniques
are
hop-by-hop tracing, hop -by-hop tracing with an overlay network [3], IPsec
authentication [4], and
traffic
pattern matching [5]. In hop-by-hop tracing, a tracing tool logs into the
router closest to the attached
host
and monitoring the incoming packets. If the tool detects the spoofed packet, it
logs into upstream
routers
and monitors packets. If the spoofed flooding attack is still occurring, the
tool can detect the spoofedpacket again on one of the upstream routers. This
procedure is repeated recursively on the upstream routers
until
the tool reaches the attack's actual source IP address.
In
hop-by-hop tracing, the more hops there are, the more tracing processes will
likely be required. To
decrease
the number of hops required for tracing, hop -by-hop tracing with an overlay
network is being used
[3].
With the IPsec authentication [4], when the IDS detects an attack, the Internet
key exchange (IKE)
protocol
establishes IPsec security associations (SAS) between the target host and some
routers in the
administrative
domain. The last technique being surveyed is the traffic pattern matching in
which the trace
is
done by comparing traffic patterns observed at the entry and exit points of the
network with the Internet
map
[5]. A survey has been done to investigate the DDoS vulnerabilities and IP
spoofing as mentioned in
[6,
7, 8, 9, 10].
In
this paper, we will develop our own approach to trace suspected packets to
their sources. In our
approach,
routers log data about traversing packets as well as information about ot her
nodes in the packet's
path.
A distributed management approach will be developed to enable tracing across
networks with
different
access polices. Our approach is a reactive and it relies on hop-by-hop tracing.
In our reactive
approach,
forwarding nodes such as routers log information about traversing packets on
the Internet and
then
use the log data to trace each packet from its final destination to its source,
hop-by-hop. Information
about
the packets remains in forwarding nodes as packets traverse allowing us to
trace even a single attack packet
to its source.
2. METHODS OF IP TRACEBACK
The
purpose of IP traceback is to identify the true IP address of a host
originating attack packets. Normally,
we
can do this by checking the source IP address field of an IP packet. Because a
sender can easily forge
this
information, however, it can hide its identity. If we can identify the true IP
address of the attack host,
we
can also get information about the organization, such as its name and the
network administrat or's e-mail
address,
from which the attack originated. With IP traceback technology, which traces an
IP packet's path
through
the network, we can find the true IP address of the host originating the
packet. To implement IP
traceback
in a system, a network administrator updates the firmware on the existing
routers to the traceback
support
version, or deploys special tracing equipment at some point in the network.
Existing
IP traceback methods can be categorized as proactive or reactive tracing.
2.1 Hop-by-Hop IP Traceback
The
most common and basic method in use today for tracking and tracing attacks is
hop -by-hop traceback.
This
method is only suitable for tracing large, continuous packet flows that are
currently in progress, such
as
those generated by ongoing denial-of-service (DoS) packet flood attacks. In a
DoS flood attack, the
source
IP addresses are typically spoofed (i.e., they are forged addresses inserted
into the source address
field
of a packet to disguise the true IP address of the machine that originated the
packets), so tracing is
required
to find the true origin of the attack.
For
example, assume that the victim of a flood attack has just reported the attack
to their ISP. First, an ISP
administrator
identifies the ISP’s router that is closest to the victim’s machine. Using the
diagnostic,
debugging,
or logging features available on many routers, the administrator can
characterize the nature of
the
traffic and determine the input (ingress) link on which the attack is arriving.
The administrator then
moves
on to the upstream router (i.e., the router one previous hop away that is
carrying attack packets
toward
the victim). The administrator repeats the diagnostic procedure on this
upstream router, and
continues
to trace backwards, hop -by-hop, until the source of the attack is found inside
the ISP’s
administrative
domain of control (such as the IP address of a customer of the ISP) or, more
likely, until the
entry
point of the attack into the ISP’s network is identified. The entry point is
typically an input link on a
router
that borders another provider’s network. Once the entry point into the ISP’s
network is identified, the
bordering
provider carrying the attack traffic must be notified and asked to continue the
hop-by-hop
traceback.
Often there is little or no economic incentive for such cooperation.
2.2 Ingress Filtering
Much
of the attacks on the Internet by attackers is accomplished using attack
packets with spoofed source
addresses.
The occurrence of packets with spoofed source addresses, and their ability to
transit the Internet, can be greatly limited through cooperative efforts by
ISPs, using a basic packet filtering approach called
network
ingress filtering.
For
example, assume that an ISP provides Internet connectivity to a customer
network and assigns the
customer
a fixed set of IP addresses. Assume that the connectivity is provided via the
ISP’s router R. To
limit
IP source address spoofing, the ISP places an ingress (input) filter on the
input link of router R, which
carries
packets from the customer network into the ISP’s network and onto the Internet.
The ingress filter is
set
to forward along all packets with source addresses that belong to the known set
of IP addresses assigned
to
the customer network by the ISP, but the filter discards (and optionally logs
as suspicious) all packets
that
contain source IP addresses that do not match the valid range of the customer’s
known IP addresses.
Hence,
packets with source addresses that could not have legitimately originated from
within the customer
network
will be dropped at the entry point to the ISP’s network.
The
widespread use of ingress filtering by all service providers would greatly
limit the ability of an attacker
to
generate attack packets utilizing a broad range of spoofed source addresses,
making tracking, and tracing
the
attacker a much easier task. Any attacker located within the customer network,
in our example above,
would
either have to generate packets that carry the attacker’s legitimate source
address or (at worst) spoof
a
source address that lies within the set of IP addresses assigned to the
customer network. So, even in the
worst
case, an attack originating within the customer network in our example can be
traced to some
machine
in that customer network, simply by reading the source address on the attack
packet. With the help
of
the administrator of the customer network, the search for the attacker can then
proceed in a greatly
narrowed
search space.
3. SPOOFED PACKETS DETECTION METHODS
Detection
methods can be classified as those requiring router support, active host-based
methods, passive
host-based
methods, and administrative methods. Administrative methods are the most
commonly used
methods
today. When an attack is observed, security personnel at the attacked site
contact the security
personnel
at the supposed attack site and ask for corroboration. This is extremely
inefficient and generally
fruitless.
An automated method of determining the whether packets are likely to have been
spoofed is
clearly
needed. This section describes a number of such methods.
3.1 Routing methods
Because
routers (or IP level switches) can know which IP addresses originate with which
network interface,
it
is possible for them to identify packets that should not have been received by
a particular interface. For
example,
a border router or gateway will know whether addresses are internal to the
network or external. If
the
router receives IP packets with external IP addresses on an internal interface,
or it receives IP packets
with
an internal IP address on an ext ernal interface, the packet source is most
likely spoofed. In the wake of
recent
denial-of-service attacks involving spoofed attack packets, ISPs and other
network operators have
been
urged to filter packets using the above-described method. Filtering inbound
packets, known as ingress
filtering,
protects the organization from outside attacks. Similarly, filtering outbound
packets prevents
internal
computers from being involved in spoofing attacks. Such filtering is known as
egress filtering. It is
interesting
to note that if all routers were configured to use ingress and/or egress
filtering, attacks would be
limited
to those staged within an organization or require an attacker to subvert a
router. Internal routers with
a
strong notion of inside/outside can also detect spoofed packets. However,
certain network topologies may
contain
redundant routes making this distinction unclear. In these cases, host based
methods (discussed in
section
4.2) can be used at the router. A number of IP addresses are reserved by the
IANA for special
purposes.
These are listed in table 1. The addresses in the first group are private
addresses and should not be
routed
beyond a local network. Seeing these on an outside interface may indicate
spoofed packets.
Depending
on the particular site, seeing these on an internal address would also be
suspicious. The other
addresses
in table 1 are special purpose, local only addresses and should never be seen
on an outer interface.
Many
firewalls look for the packets described in this section. Typically they are
dropped when received.
Because
firewalls have been a popular security product, research into routing methods
has been active.
Most
all research has been in this area. Routers can also take a more active role in
detecting spoofed
packets.
A number of advanced router projects have dealt with this and spoofed packet
traceback.
These
are discussed in section 6. We have proposed a number of proactive methods that
can be used to
detect
and prevent spoofed packets.
One
limitation of routing met hods is that they are effective only when packets
pass through them. An
attacker
on the same subnet as the target could still spoof packets. When the attacker
is on the sameEthernet subnet as the target, both the source IP address and the
Ethernet MAC would be spoofed. If the
spoofed
source address was an external address, the MAC would be that of the router.
This implies that
other
techniques are required.
3.2 Non-routing methods
Computers
receiving a packet can determine if the packet is spoofed by a number of active
and passive
ways.
We use the term active to mean the host must perform some network action to
verify that the packet
was
sent from the claimed source. Passive methods require no such action, however
an active method may
be
used to validate cases where the passive method indicates the packet was
spoofed.
3.3 Active Methods
Active
methods either make queries to determine the true source of the packet
(reactive), or affect protocol
specific
commands for the sender to act upon (proactive). These methods have an
advantage over routing
methods
in that they do not require cooperation between ISPs and can be effective even
when the attacker is
on
the same subnet as the target. Active methods require a response from the
claimed source. Only if the
spoofed
host is active (i.e. connected to the network and receiving and processing
packets) can it be probed.
A
host that is heavy firewalled and cannot respond to probes is effectively
inactive. Because inactive hosts
are
commonly used as source addresses in spoofed packets, if these packets are seen
in an attack, it is likely
they
are spoofed. When hosts will not respond to any probes, passive methods will be
required for
corroboration.
TTL methods
As
IP packets are routed across the Internet, the time-to-live (TTL) field is
decremented. This field in the IP
packet
header is used to prevent packets from being routed endlessly when the destination
host can not be
located
in a fixed number of hops. It is also used by some networked devices to prevent
packets from being
sent
beyond a host’s network subnet. The TTL is a useful value for detecting spoofed
packets. Its use is
based
on several assumptions, which, from our network observations, appear to be
true.?
IP Identification Number
As
discussed in the section on Bounce Scanning, the sending host increments the
Identification Number
(ID)
in the IP header with each packet sent. Because this is a value that is easily
probed and changes in its
value
are predictable, we can use it to determine if a packet is spoofed. Unlike TTL
values, IP ID numbers
can
be used to detect spoofed packets even when the attacker and the target are on
the same subnet.
If
we send probe packets to the claimed source and we receive a reply, the ID
values should be near the
value
of questionable packets recently received from the host. Also, the ID values
observed in the probe
should
be greater than the ID values in the questionable packets. If not the packets
were likely not sent by
the
claimed source. If the host associated with the claimed source is very active,
the ID values may change
rapidly.
To be effective, the probes must be done very close in time to receipt of the questionable
packets.
OS Fingerprinting
The
above techniques illustrate aspects of the more general task of OS
fingerprinting where a series of
various
probes are used to identify the operating system of a particular host. Active
fingerprinting refers to
direct
probing of a computer, while passive fingerprinting refers to monitoring
traffic and comparing it to
expected
norms for different OSs. We can perform a limited passive fingerprint as we
observe network
traffic
from a particular host, then by comparing this to an active OS fingerprint, we
can determine if the
two
are likely to be the same OS. If not we can infer the packets are spoofed.
TCP Specific Methods
Flow Control
The
TCP header includes a window size field. This is used to communicate the maximum
amount of data
the
recipient can currently receive. This can also be interpreted as the maximum
amount of data the sender
can
transmit without an acknowledgement from the recipient. This is the TCP flow
control method. If the
window
size is set to zero, the sender should not send more data. If the packets we
are receiving are
spoofed,
then the sender will never see the recipient’s ACK-packets. This implies that
the sender will not
respond
to flow control. If the recipient does not send any ACK-packets, the sender
should stop after the
initial
window size is exhausted. If it does not, it is likely the packets are spoofed.
One way of
implementing
this check is to always send an initial window size that is extremely small. If
packets received
exceed
this threshold, we can infer the packets are spoofed. Because spoofing replies
with the correct
sequence
number to multiple TCP packets may be challenging, most spoofed TCP connections
do not
progress
past the first ACK-packet. This implies that the best chance to detect spoofed
packets requires it be done in the handshake. Fortunately the TCP handshake
requires the host sending the initial SYN wait for
the
returned SYN -ACK prior to sending its first ACK packet. By setting the window
size in the SYN-ACK
to
zero, we can we can determine if the sender is receiving (and responding to)
our packets. If the sender
sends
an ACK-packet with any data, we know the true source is not responding to our
packets, and were
likely
a spoofed packet.
Packet Retransmission
TCP
uses sequence numbers to determine which packets have been acknowledged. An
ACK-packet
communicates
to the recipient that all packets it has sent, up to and including the packet
with the sequence
number
in the packet have been successfully received. When a packet is received with
an ACK-number that
is
less than the minimum expected, or greater than the max expected, the packet is
dropped and as a way to
resynchronize
the connection, a reply with the minimum expected ACK-number is sent. We can
exploit
these
replies to probe for spoofed packets. By sending a probe packet, spoofed to be
from the internal host,
with
an ACK number greater than the minimum expected, we can induce a
resynchronization ACK from
the
host being probed. If the probe receives a RST in reply, we can infer the
connection was spoofed. A
concern
with this method is that it may lead to an ACK-storm as both sides attempt to
resynchronize. This
method
is best performed on a firewall where the probe reply could be captured. This
will prevent the
internal
host from seeing the reply, and will prevent an ACK-storm.
Traceroute
Traceroute
is a widely used network tool to discover the route from the site traceroute is
executed on to
another.
When used to detect spoofed packets, it may tell you the number of hops to the
true source.
Unfortunately
it is very slow and generally fails when the site being checked is behind a
firewall. If the
firewall
blocks the probing UDP packets (or the ICMP replies), the traceroute program
will know only the
number
of hops to the firewall. However, when the firewall is more hops away from the
monitored site than
the
true site, traceroute will return a hop count greater than expected of the
questionable packet. In this case,
traceroute
can be useful as a detector. Because of its performance, traceroute is a poor
general technique for
spoofed
packet detection. However, in cases where the attacker is nearer the target
than the true source
site’s
firewalls, and the firewall will not allow probes to succeed, traceroute or
similar techniques should be
considered.
The
issues with traceroute introduce a different method of spoofed packet detection
base only on previously
observed
packets. Because the TTL and ID fields are set by the true source, we can learn
the expected
values
for a particular host. Such passive methods are discussed in the next section.
3.4 Passive Methods
Passive
methods are a logical extension of the reactive methods discussed earlier.
Where observed data will
have
a predictable value, not relative to some prior packet, we can learn what
values are to be expected and
consider
packets with unexpected values suspicious. Because TTL values are a function of
a host’s OS, the
packet’s
protocol, and the network topology, all which are reasonably static, TTLs can
be used as a basis
for
passive detection. Conversely, IP ID numbers, which generally have a strong
relation to prior packets,
do
not make good candidates for the basis of a passive system. The next section
describes several different
passive
methods and how they could be used to detect spoofed packets.
Passive TTL Methods
By
recording, over a period of time, the TTL values of distinct source IP
address/protocols we can learn
which
values are expected from particular hosts. We believe that these are reliable,
predictable values of a
given
IP address/protocol. (See section 7 for experimental validation of this.) This
will give us a reasonable
basis
for identifying suspicious packets from previously observed hosts. Our
implementation of this
compares
observed packets to the expected TTL values for that packet. If the values were
anomalous, the
packet
would be flagged as suspicious. In many cases , we will receive packets from
hosts not previously
encountered.
These will have no entry in the table. Without further information we will not
be able to know
if
the packet’s TTL values are suspicious. How to flag such packets should be left
up to the particular
application.
However,
by taking advantage of the fact that similar IP addresses are commonly the same
number of hops
away
from a monitoring point, we can expand the above method to predict values for
previously unseen
packets.
In addition to learning IP address/protocol to TTL relations we can also learn
IP subnet to TTL
relations.
The predictability based on subnets is not expected to be as high as specific
IP address/protocols,
but
will provide additional information. Rather than use passive methods alone, by
using them in combination with reactive methods we can construct an efficient
spoofed packet detection system. The
reactive
method can be initiated only when the packet seems suspicious. This minimizes
the amount of
probing
required, and allows us to test packets using a number of methods. The
specifics or our
implementation
are described in sections 5 and 7. One of the strengths of passive TTL methods
is that they
are
resistant to network routing attacks. These occur when packets intended for a
particular host are routed
to
another host posing as the first. Such an attack is not strictly packet
spoofing because the packets are
coming
from the effective IP address of the sender. However, if the network distance
between the two hosts
has
changed, we will identify these packets as spoofed. This allows passive spoofed
packet detection to also
act
as a routing change detector.
OS Idiosyncrasies
We
have identified a number of other features that can be used to find suspicious
(possibly spoofed)
packets.
These include the expected source port for a TCP or UDP communication, expected
ID values for
certain
packets, and type of service (ToS) or differential service code point (DSCP)
values. The TCP
window
size has also been observed to be highly predictable given the source. Other
useful features are
likely.
Basically, any that is specific to a particular host, OS, NIC, etc. is a
potential identifier for that host.
How
useful a particular feature is depends on how predictable a particular feature
is and how likely another
computer
will generate the same value as the claimed source. Features with values common
to many
computers
will tend to generate false negatives while those that vary significantly will
tend to generate false
positives.
4. THE PROPOSED APPROACH
Denial-of-service
(DOS) attacks are a pressing problem in today’s Internet. Their impact is often
more
serious
than network congestion due to their targeted and concentrated nature. In a
distributed DOS
(DDOS)
attack, the attacker uses a number of compromised slaves to increase the
transmission power and
orchestrate
a coordinated flooding attack. Particularly, DDOS attacks with hundreds or
thousands of
compromised
hosts, often residing on different networks, may lead to the target system
overload and crash.
Because
the current Internet routing infrastructure has few capabilities to defend
against IP spoofing and
DDoS
attacks, we need to design a new defense mechanism against these attacks. In
particular, our
proposed
approach is to defend against these attacks and should satisfy the following
properties:
Fast response: The proposed approach should be able to
rapidly respond and defend against
attacks.
Every second of Internet service disruption causes economic damage. We would
like to
immediately
block the attack.
Scalable: Some attacks, such as TCP hijacking, involve only
a small amount of packets. However,
many
DDoS attacks are large scale and involve thousands of distributed attackers and
an even
larger
number of attack packets. A good defense mechanism must be effective against
low packet
count
attacks but scalable to handle much larger ones.
Victim filtering: Almost all DDoS defense schemes assume
that once the attack path is revealed,
upstream
routers will install filters in the network to drop attack traffic. This is a
weak assumption
because
such a procedure may be slow, since the upstream ISPs have no motivation to
offer this
service
to non-customer hosts and networks.
Efficient: The proposed approach should have very low
processing and state overhead for both the
routers
in the Internet and, to a lesser degree, the victims of the attacks.
Support incremental deployment: The proposed approach is
only useful and practical if it provides
a
benefit when only a subset of routers implement it. As an increasing number of
routers deploy
the
scheme, there should be a corresponding increase in performance.
Also,
the deployment of the solution should not leak proprietary information about an
ISP’s internal
network,
as some ISPs keep their network topology secret to retain a competitive
advantage.
Basic method of the traceback approach. Forwarding nodes, or
tracers, store data from an incoming packet as well as its
datalink-level
identifier in the packet information area, and they identify the adjacent
forwarding node.
No comments:
Post a Comment