BACKGROUND
[0001] A scripted language is high-level programming language that is interpreted at runtime
rather than compiled. For example, scripting languages may be embedded within hypertext
markup language ("HTML") and, when interpreted by a client and/or server, may add
functionality to a web page. Scripted languages may also contain malicious code which,
when interpreted, may compromise the security and/or integrity of the client and/or
server.
US 2005/0108554 A1 relates to a method and a system for a adaptive rule based content scanners.
SUMMARY
[0002] Particular aspects and embodiments are set out in the claims.
[0003] According to some possible implementations, a device may receive data, identify a
context associated with the data, and identify a script, within the data, associated
with the context. The device may parse the script to identify tokens, form nodes based
on the tokens, and assemble a syntax tree using the nodes. The device may rename one
or more identifiers associated with the nodes and generate a normalized text, associated
with the script, based on the syntax tree after renaming the one or more identifiers.
The device may determine whether the normalized text matches a regular expression
signature and process the data based on determining whether the normalized text matches
the regular expression signature. The device may process the data by a first process
when the normalized text matches the regular expression signature or by a second process
when the normalized text does not match the regular expression signature. The first
process is different from the second process.
[0004] According to some possible implementations, a method may include receiving, by a
device, data from an end-point device, extracting, by the device, scripted code from
the data, and generating, by the device, a normalized script, having whitespace and
identifiers, corresponding to the scripted code. The method may include formatting,
by the device, the whitespace of the normalized script according to a normalizing
scheme and renaming, by the device, the identifiers of the normalized script according
to the normalizing scheme. The method may include determining, by the device, whether
the normalized script matches a signature, and selectively sending, by the device,
the data to another end-point device, associated with the data, based on determining,
by the device, whether the normalized script matches the signature. The method may
include sending, by the device, the data to the other end-point device based on determining
that the normalized script does not match the signature or not sending, by the device,
the data to the other end-point device based on determining that the normalized script
matches the signature.
[0005] According to some possible implementations, a computer-readable medium storing instructions
may include multiple instructions that, when executed by one or more processors associated
with a device, cause the one or more processors to receive data, identify executable
code contained in the data, execute the executable code within an environment, and
generate a trace based on the execution of the executable code. The multiple instructions
may further cause the one or more processors to generate, based on the trace, a list
identifying one or more functions executed by the executable code and an order, execution
count, argument, or return value associated with executing a function of the one or
more functions. The multiple instructions may further cause the one or more processors
to generate, based on the trace, an optimized code by replacing a portion of the executable
code with an alternative portion, the portion and the alternative portion being equivalent
with respect to interaction with the environment during execution. The multiple instructions
may further cause the one or more processors to modify whitespace contained in the
optimized code based on a whitespace normalizing scheme, modify identifiers contained
in the optimized code based on an identifier normalizing scheme, and provide the optimized
code, having the whitespace and the identifiers modified according to respective whitespace
and identifier normalizing schemes, for matching against a code signature to determine
whether the executable code is malicious. The code signature may be associated with
determining whether the executable code is malicious. The multiple instructions may
further cause the one or more processors to provide the list for matching against
a list signature to determine whether the executable code is malicious. The list signature
may be associated with determining whether the executable code is malicious.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006]
Fig. 1 is a diagram of an overview of an example implementation described herein;
Fig. 2 is a diagram of an example environment in which systems and/or methods, described
herein, may be implemented;
Fig. 3 is a diagram of example components of one or more devices of Fig. 2;
Fig. 4 is a flow chart of an example process for de-obfuscating scripted language
for network intrusion detection using a regular expression signature; and
Figs. 5A-5H are diagrams of an example implementation relating to the example process
shown in Fig. 4.
DETAILED DESCRIPTION
[0007] The following detailed description of example implementations refers to the accompanying
drawings. The same reference numbers in different drawings may identify the same or
similar elements.
[0008] Numerous computing environments employ scripted languages, such as JavaScript, ECMAScript,
VBScript, Python, Perl, PHP, etc., to provide functionality. In particular, scripted
languages form a key component of dynamic hypertext markup language ("DHTML") and
asynchronous JavaScript + XML ("AJAX"), which enable a user to enjoy an interactive
and dynamic web browsing experience. Developers of mobile applications (e.g., for
smart phones, tablets, and the like) may include scripts in their products. Unfortunately,
the popularity of executable script has prompted certain individuals and actors to
create and distribute malicious scripts whose purpose is to disrupt, hijack, and/or
profit from a device which executes the script. These scripts may exploit a vulnerability
in a device's software and/or hardware to automatically execute during a seemingly
innocuous activity (e.g., by infecting a server trusted by the device and/or trusted
by a user of the device), and/or the author(s) of the malicious script may trick the
user into executing the malicious script, e.g., through social engineering. A malicious
script may include, for example, a virus, malware, spyware, ransom-ware, information
theft (e.g., sensitive session data associated with connection between two devices),
or the like.
[0009] To further complicate matters, authors of malicious scripts ("script writers") commonly
employ one or more obfuscation techniques to prevent detection by malware/antivirus
scanners. For example, script writers may randomize whitespace and/or identifiers
(e.g., names of private functions and/or variables), add bloat (e.g., a portion of
the script which does not affect meaningful script behavior, such as a comment or
a logic switch based on a known value-e.g., x=1; if (x==1) then {..}), use alternative
means to represent constant values (e.g., replacing a number with a mathematical operation),
use one function as a substitute for another (e.g., using a local function as an alias
for a suspicious remote function or object method), split a string and concatenate
the string at runtime, and/or encode a portion of the script and decode the portion
at runtime. More than one obfuscation technique may be used in a particular script,
and some techniques may be layered (e.g., encrypted content may be encrypted again
using a different key). Additionally, script writers may use tools to randomize the
use of obfuscation techniques so that a piece of malicious script may be presented
in numerous different obfuscated forms.
[0010] Implementations described herein may assist a user in mitigating the threat posed
by a malicious script by providing a method and device for intercepting script-containing
data sent over a network to an end-point device associated with the user, normalizing
and de-obfuscating the script contained within the data, comparing the normalized/de-obfuscated
script to a regular expression signature associated with a malicious script, and preventing
the script from being delivered to the end-point device if the script matches the
signature. In this way, the user's end-point device, as well as other end-point devices
associated with the network, may avoid infection by scripts already identified as
malicious. Additionally, the method and device may assist a user in analyzing a script
in order to determine whether the script is malicious and, if so, determine how to
create a regular expression signature to direct the device to detect and detain the
script.
[0011] Fig. 1 is a diagram of an overview of an example implementation 100 described herein.
Assume, for example, that a first end-point sends data (e.g., using hypertext transfer
protocol (HTTP)) to a second end-point. Prior to reaching the second end-point, the
data passes through a network device (e.g., an intrusion detection and prevention
system, a firewall, a proxy server, or the like). The network device identifies and
extracts a script from within the data. The network device further normalizes, de-obfuscates,
and traces the script in an emulated environment to generate a normalized / de-obfuscated
script and a list of the script's calls to system/global/application programming interface
(API) methods (e.g., functions which are not defined by the script but are provided
by the environment to allow the script to interact with the environment). The network
device then matches the normalized/de-obfuscated script to a signature (e.g., a regular
expression) associated with a script and/or matches a list of the script's method
calls to a signature associated with a list of method calls. If there is no match,
then the network device may assume that the data is benign and route the data to the
second end-point. If there is a match, the router may identify the script as a malicious
script, may refuse to route the data to the second end-point, and/or may take some
action in response to the detection of the malicious script.
[0012] In this way, the network device may protect the second end-point from a script which
has previously been identified as malicious. Furthermore, the network device may prevent
the further spread of the malicious script by refusing to route data which contains
the malicious script and possibly alerting a user to the detection (e.g., through
a log or email). By tracing the script and generating a list of the script's method
calls, the network device may simplify the task of generating signatures for new threats.
[0013] Fig. 2 is a diagram of an example environment 200 in which systems and/or methods,
described herein, may be implemented. As shown in Fig. 2, environment 200 may include
a group of sending end-point devices 210-1, ..., 210-M (M ≥ 1) (hereinafter referred
to collectively as "sending end-point devices 210" and individually as "sending end-point
device 210"); a group of receiving end-point devices 220-1, ..., 220-N (N ≥ 1) (hereinafter
referred to collectively as "receiving end-point devices 220" and individually as
"receiving end-point device 220"); a network device 230; a sending network 240; and/or
a receiving network 250. Devices of environment 200 may interconnect via wired connections,
wireless connections, or a combination of wired and wireless connections.
[0014] Sending end-point device 210 and/or receiving end-point device 220 may include one
or more devices capable of receiving and/or providing information over a network (e.g.,
sending network 240 and/or receiving network 250), and/or capable of generating, storing,
and/or processing information received and/or provided over the network. For example,
sending end-point device 210 and/or receiving end-point device 220 may include a computing
device, such as a laptop computer, a tablet computer, a handheld computer, a desktop
computer, a mobile phone (e.g., a smart phone, a radiotelephone, etc.), a network
device (e.g., a router, a gateway, a firewall, a hub, a bridge, etc.), a server device,
or a similar device. Sending end-point device 210 and/or receiving end-point device
220 may act as an end-point (e.g., a source and/or a destination) for a communication
with another sending end-point device 210 and/or receiving end-point device 220. For
example, a first sending end-point device 210 and/or receiving end-point device 220
may provide information to a second sending end-point device 210 and/or receiving
end-point device 220 (e.g., via network device 230, sending network 240, and/or receiving
network 250). For example, sending end-point device 210 may send, to receiving end-point
device 220, data which contains one or more scripts.
[0015] Network device 230 may include one or more devices (e.g., one or more traffic transfer
devices) capable of processing and/or transferring data between sending end-point
devices 210 and/or receiving end-point devices 220. For example, network device 230
may include a firewall, a router, a gateway, a switch, a hub, a bridge, a reverse
proxy, a server (e.g., a proxy server), a security device, an intrusion detection
device, a load balancer, or a similar device. For example, network device 230 may
selectively route data between sending end-point devices 210 and receiving end-point
devices 220 based on determining whether script within the data matches a signature.
[0016] Sending network 240 and/or receiving network 250 may include one or more wired and/or
wireless networks. For example, sending network 240 and/or receiving network 250 may
include a cellular network (e.g., a long-term evolution (LTE) network, a 3G network,
a code division multiple access (CDMA) network, etc.), a public land mobile network
(PLMN), a local area network (LAN), a wide area network (WAN), a metropolitan area
network (MAN), a telephone network (e.g., the Public Switched Telephone Network (PSTN)),
a private network, an ad hoc network, an intranet, the Internet, a fiber optic-based
network, a cloud computing network, and/or a combination of these or another type
of network. While shown as separate networks, sending network 240 and receiving network
250 may be implemented as a single network.
[0017] The number and arrangement of devices and networks shown in Fig. 2 are provided as
an example. In practice, there may be additional devices and/or networks, fewer devices
and/or networks, different devices and/or networks, or differently arranged devices
and/or networks than those shown in Fig. 2. Furthermore, two or more devices shown
in Fig. 2 may be implemented within a single device, or a single device shown in Fig.
2 may be implemented as multiple, distributed devices. Additionally, or alternatively,
a set of devices (e.g., one or more devices) of environment 200 may perform one or
more functions described as being performed by another set of devices of environment
200.
[0018] Fig. 3 is a diagram of example components of a device 300. Device 300 may correspond
to sending end-point device 210, receiving end-point device 220, and/or a network
device 230. In some implementations, sending end-point device 210, receiving end-point
device 220, and/or a network device 230 may include one or more devices 300 and/or
one or more components of device 300. As shown in Fig. 3, device 300 may include a
bus 310, a processor 320, a memory 330, a storage component 340, an input component
350, an output component 360, and a communication interface 370.
[0019] Bus 310 may include a component that permits communication among the components of
device 300. Processor 320 may include a processor (e.g., a central processing unit
(CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), etc.),
a microprocessor, and/or any processing component (e.g., a field-programmable gate
array (FPGA), an application-specific integrated circuit (ASIC), etc.) that interprets
and/or executes instructions. Memory 330 may include a random access memory (RAM),
a read only memory (ROM), and/or another type of dynamic or static storage device
(e.g., a flash memory, a magnetic memory, an optical memory, etc.) that stores information
and/or instructions for use by processor 320.
[0020] Storage component 340 may store information and/or software related to the operation
and use of device 300. For example, storage component 340 may include a hard disk
(e.g., a magnetic disk, an optical disk, a magneto-optic disk, a solid state disk,
etc.), a compact disc (CD), a digital versatile disc (DVD), a floppy disk, a cartridge,
a magnetic tape, and/or another type of computer-readable medium, along with a corresponding
drive.
[0021] Input component 350 may include a component that permits device 300 to receive information,
such as via user input (e.g., a touch screen display, a keyboard, a keypad, a mouse,
a button, a switch, a microphone, etc.). Additionally, or alternatively, input component
350 may include a sensor for sensing information (e.g., a global positioning system
(GPS) component, an accelerometer, a gyroscope, an actuator, etc.). Output component
360 may include a component that provides output information from device 300 (e.g.,
a display, a speaker, one or more light-emitting diodes (LEDs), etc.).
[0022] Communication interface 370 may include a transceiver-like component (e.g., a transceiver,
a separate receiver and transmitter, etc.) that enables device 300 to communicate
with other devices, such as via a wired connection, a wireless connection, or a combination
of wired and wireless connections. Communication interface 370 may permit device 300
to receive information from another device and/or provide information to another device.
For example, communication interface 370 may include an Ethernet interface, an optical
interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface,
a universal serial bus (USB) interface, a Wi-Fi interface, a cellular network interface,
or the like.
[0023] Device 300 may perform one or more processes described herein. Device 300 may perform
these processes in response to processor 320 executing software instructions stored
by a computer-readable medium, such as memory 330 and/or storage component 340. A
computer-readable medium is defined herein as a non-transitory memory device. A memory
device includes memory space within a single physical storage device or memory space
spread across multiple physical storage devices.
[0024] Software instructions may be read into memory 330 and/or storage component 340 from
another computer-readable medium or from another device via communication interface
370. When executed, software instructions stored in memory 330 and/or storage component
340 may cause processor 320 to perform one or more processes described herein. Additionally,
or alternatively, hardwired circuitry may be used in place of or in combination with
software instructions to perform one or more processes described herein. Thus, implementations
described herein are not limited to any specific combination of hardware circuitry
and software.
[0025] The number and arrangement of components shown in Fig. 3 are provided as an example.
In practice, device 300 may include additional components, fewer components, different
components, or differently arranged components than those shown in Fig. 3. Additionally,
or alternatively, a set of components (e.g., one or more components) of device 300
may perform one or more functions described as being performed by another set of components
of device 300.
[0026] Fig. 4 is a flow chart of an example process 400 for de-obfuscating scripted language
for network intrusion detection using a regular expression signature. In some implementations,
one or more process blocks of Fig. 4 may be performed by network device 230. In some
implementations, one or more process blocks of Fig. 4 may be performed by another
device or a group of devices separate from or including network device 230, such as
sending end-point device 210 and/or receiving end-point device 220.
[0027] As shown in Fig. 4, process 400 may include receiving data (block 405). For example,
network device 230 may receive data from sending end-point device 210 and/or receiving
end-point device 220. Network device 230 may buffer the received data while performing
further analysis. In this way, network device 230 may obtain data from which scripts
may be extracted and analyzed.
[0028] As further shown in Fig. 4, process 400 may include identifying, within the data,
a context and a script associated with the context (block 410). For example, a context
may include data associated with a document (e.g., HTML, XML, PDF, E-MAIL), an application
(e.g., a mobile / embedded application, web browser, or other script engine); or the
like. The script may include embedded script, linked script from another source, or
the like. In one example, the context may be an HTML web page having HTML elements,
and the script may be a JavaScript element defined within a <script></script> tag.
In this way, network device 230 may determine whether and/or how a script is to be
matched to a signature.
[0029] As further shown in Fig. 4, process 400 may include generating and normalizing a
syntax tree based on the script (block 415). For example, network device 230 may parse
the script, identify valid tokens, and populate an abstract syntax tree ("AST") with
nodes based on the identified tokens. Network device 230 may further normalize the
syntax tree by renaming local identifiers (i.e., identifiers which may be changed
without affecting the interpretation of the script) according to a normalized naming
scheme based on the node structure (e.g., by numbering/ranking the nodes and replacing
identifiers, associated with the nodes, with new identifiers having a normalized string
pattern based on a number associated with the node and/or the type of node). Network
device 230 may also remove whitespace while creating the syntax tree. For example,
network device 230 may identify and delete any whitespace which is not necessary for
parsing the script.
[0030] As further shown in Fig. 4, process 400 may include generating a normalized script
based on the syntax tree (block 420). For example, network device 230 may traverse
each node of the normalized syntax tree generated from the script and recode the script,
as a normalized script, using the normalized naming scheme. Network device 230 may
normalized the whitespace of the normalized script by using a particular whitespace
insertion scheme (e.g., inserting a minimal amount of whitespace that preserves the
syntactic structure of the script). By comparing the normalized script to similarly
normalized script signatures, network device 230 may perform more accurate, identifier-independent
matching to speed execution of a regular expression based signature matching algorithm.
[0031] As further shown in Fig. 4, process 400 may include determining whether to perform
dynamic de-obfuscation (block 425). For example, network device 230 may determine
whether to perform dynamic de-obfuscation based on configuration information identifying
a condition or a set of conditions associated with performing dynamic de-obfuscation.
For example, a condition may include a state of a Boolean flag which may be set to
true or false, e.g., based on user input (e.g., in the case of analyzing new scripts
to determine whether the new scripts are malicious). Additionally, or alternatively,
a condition may include a level and/or indicia of obfuscation-such as satisfying an
obfuscation level threshold. Network device 230 may, for example, determine the level
and/or indicia of obfuscation based on a heuristic algorithm (e.g., by determining
whether portions of script are likely to be encoded) or based on a preliminary signature
matching (e.g., by determining whether the script matches one or more regular expression
signatures associated with an obfuscation technique). Additionally, or alternatively,
a condition may include availability of resources-such as processing power, network
throughput/latency, and/or memory space. Additionally, or alternatively, a condition
may be based on information included in and/or associated with the data, such as a
sender, recipient, context, or the like. Alternatively, or additionally, network device
230 may determine to perform dynamic de-obfuscation of a script based on unsuccessfully
attempting to match the script to a signature prior (e.g., as described in block 455)
to performing dynamic de-obfuscation. Dynamic de-obfuscation may improve a level of
de-obfuscation and therefore a likelihood of malicious script detection.
[0032] As further shown in Fig. 4, if dynamic de-obfuscation is to be performed (block 425
= YES), then process 400 may include generating an object model based on the context
(block 430). For example, network device 230 may identify an API and/or object model,
which specifies how a script is to be interpreted and how the script may interact
with its environment. Additionally, or alternatively, network device 230 may identify
information contained within the data, such as elements, environment variables, or
the like, with which the script may interact at runtime. For example, the object model
may be an HTML document object model ("DOM") which includes various methods, properties,
and rules associated with, e.g., a web browser environment. The DOM may include a
DOM tree which maps data, e.g., HTML and/or scripting elements and values associated
with the elements, contained within one or more HTML/XML/CSS files associated with
the script. By generating an object model, network device 230 may prepare the emulator
environment to trace the normalized script.
[0033] As further shown in Fig. 4, process 400 may include tracing an interpretation of
the normalized script in an emulated object model environment (block 435). For example,
network device 230 may employ a script interpreting/debugging engine which emulates
the environment associated with the object model generated. If the object model is
a DOM, then network device 230 may emulate, for example, a web browser environment.
Alternatively, or additionally, the object model may correspond to a different context
(e.g., PDF, e-mail, embedded/mobile application, or the like) and network device 230
may likewise emulate an object model associated with that environment. Alternatively,
network device 230 may employ an emulation model which does not correspond to a specific
environment but still uses local elements within the object model.
[0034] Network device 230 may substitute or implement alternative functions in place of
system / global / potentially dangerous methods/functions. Alternatively, network
device may interpret / execute the script in a sandboxed environment to observe the
results of the script's execution and/or skip interpretation of portions of the script.
Based on interpreting the script, network device 230 may generate a trace of events
(e.g., function entry/exit points, arguments passed, values returned, calling of object
methods/functions, interacting/modifying elements of the model, or changing values
associated with script variables / object elements). The output from the trace provides
network device 230 with information which would be unavailable or difficult to obtain
by static analysis, and this information may be useful for de-obfuscating script,
matching against a signature, and identifying behavior of the script when executed.
[0035] As further shown in Fig. 4, process 400 may include generating a method call list
based on the trace (block 440). For example, network device 230 may identify, based
on the trace, all calls/references to functions which interact with the object model
(e.g., object methods). Network device 230 may further generate a method call list
comprising an ordered list of all calls to those functions, a quantity of calls to
those functions, as well as arguments passed to those functions. In this way, network
device 230 may characterize the script in terms of the script's interaction with its
environment.
[0036] As further shown in Fig. 4, process 400 may include determining if the call list
matches a signature (block 445). For example, network device 230 may select a regular
expression signature, from a list of regular expression signatures determined to match
call lists associated with one or more malicious scripts, and determine whether the
call list matches the regular expression signature. Network device 230 may repeat
this matching process (e.g., iteratively) for one or more additional regular expression
signatures to determine whether the call list matches a call list associated with
one of the one or more additional regular expression signatures.
[0037] As further shown in Fig. 4, if the call list does not match a signature (block 445
= NO), then process 400 may include optimizing the normalized script based on the
trace (block 450). For example, network device 230 may identify and remove, from the
normalized script, a portion of the normalized script which does not affect meaningful
script behavior (e.g., dead code). Additionally, or alternatively, network device
230 may identify and remove superfluous logic-switches (e.g., loops based on a constant
expression which will always, or never, be interpreted at runtime), alternatively
represented constant values, functions which return constant values, split strings,
or the like. Based on the trace, network device 230 may also decrypt encrypted portions
of the script and identify substitute/alias functions. Network device 230 may perform
this optimization in an iterative manner. Alternatively, or additionally, network
device 230 may perform the optimization on the syntax tree which represents the script
and create the optimized script from the syntax tree after optimizing the syntax tree.
[0038] As further shown in Fig. 4, process 400 may include determining whether the normalized
script (with or without additional trace-based optimization) matches a signature (block
455). For example, network device 230 may select a regular expression signature, from
a list of regular expression signatures determined to match one or more normalized
malicious scripts, and determine whether the normalized script matches the regular
expression signature. Network device 230 may repeat this matching process (e.g., iteratively)
for one or more additional regular expression signatures to determine whether the
normalized script matches a malicious script associated with one of the one or more
additional regular expression signatures.
[0039] As further shown in Fig. 4, if the normalized script does not match a signature (block
455=NO), then process 400 may include determining whether the script behaves maliciously
(block 460). For example, network device 230 may determine (e.g., based on user input,
the trace, the call list, the optimized/normalized script, and/or a condition identified
in the configuration information) whether the script behaves maliciously despite not
having matched a known signature. For example, network device 230 may identify, based
on the trace, one or more interactions between the script and the emulated object
model environment which are associated with one or more malicious behaviors (e.g.,
based on a heuristic model).
[0040] As further shown in Fig. 4, if the normalized script does not behave maliciously
(block 460 = NO), then process 400 may include processing the data as having a benign
script (block 465). For example, network device 230 may implement/execute/store the
received data and/or route the data to its intended recipient.
[0041] As further shown in Fig. 4, if the normalized script behaves maliciously (block 460
= YES), then process 400 may include modifying and/or generating one or more signatures
to match the normalized script and/or the call list (block 470). For example, network
device 230 may assist a user in developing a new signature, and/or modifying an existing
signature, for a new form of malicious script by providing the user with a portion
of the data containing the script, the normalized/optimized script, the trace, and/or
the call list. Additionally, or alternatively, network device 230 may automatically
(e.g., without requiring input from a user) generate a new signature, and/or modify
an existing signature, to identify the script as malicious. In this way, network device
230 may update existing signatures and/or create new signatures to identify scripts
as being malicious.
[0042] As further shown in Fig. 4, if the normalized script or the call list matches a signature
(block 445=YES or block 455 = YES) or the script behaves maliciously (block 460 =
YES), then process 400 may include processing the data as having a malicious script
(block 475). For example, network device 230 may delete, quarantine, or otherwise
prevent the execution, access, and/or transfer of the data to another device (e.g.,
to a receiving end-point device 220 associated with analyzing and/or processing a
malicious script). Additionally, or alternatively, network device 230 may notify a
user about the detection of a malicious script (e.g., through an e-mail, log entry,
and/or other message meant for access by / display to a user / device). Additionally,
or alternatively, network device 230 may generate one or more additional signatures
for matching subsequent data received by network device 230. For example, network
device 230 may generate a signature for matching data associated with a particular
network, device, protocol, content, and/or user (e.g., to isolate the receiving network
250 from a particular sending end-point 210).
[0043] Although Fig. 4 shows example blocks of process 400, in some implementations, process
400 may include additional blocks, fewer blocks, different blocks, or differently
arranged blocks than those depicted in Fig. 4. Additionally, or alternatively, two
or more of the blocks of process 400 may be performed in parallel.
[0044] Figs. 5A-5H are diagrams of an example implementation 500 relating to example process
400 shown in Fig. 4. Figs. 5A-5H show an example of a process for de-obfuscating scripted
language for network intrusion detection using a regular expression signature.
[0045] As shown in Fig. 5A, assume, for example implementation 500, that a first end-point
(e.g., sending end-point device 210) sends, to a second end-point (e.g., receiving
end-point device 220), HTTP data ("HTTP/1.0 200 OK | Content-Type: text/html" and
additional content) in response to an HTTP request from receiving end-point device
220 ("GET data.html HTTP/1.0 | From: endpoint2@network2 | User-Agent: HTTPTool/1.0").
Assume further that a network device (e.g., network device 230) is positioned at a
network location between the first end-point and the second end-point. Assume further
that the network device intercepts and analyzes data passing between the first end-point
and the second end-point as part of an intrusion detection and prevention system.
In this way, the network device may receive data passing between the first end-point
and the second end-point to determine whether the data contains malicious scripting
language.
[0046] As shown in Fig. 5B, assume, for example implementation 500, that the network device
identifies script (JavaScript) and a context associated with the script (a HTML web
page) contained within the data (the HTTP response received from the first end-point
device). Assume further that the network device uses a suitable parser to identify
valid script tokens (keywords, identifiers, punctuators, literals, and the like).
In this way, the network device may begin to normalize the script for identification.
[0047] As shown in Fig. 5C, assume, for example implementation 500, that the network device
generates an abstract syntax tree (AST) by creating nodes based on the tokens identified
from the script. Assume further that the network device renames the local variables
/ functions based on a standardized naming scheme (e.g., 'v' plus a sequential number
for variables and 'f' plus a sequential number for functions) and the location of
a node associated with a variable / function (numbering in order of traversal from
root to branches). In this way, the network device may use the AST to reduce storage
size and processing load while mitigating obfuscation by variable/function renaming.
[0048] As shown in Fig. 5D, assume, for example implementation 500, that the network device
generates, by traversing the nodes of the AST, a normalized script having normalized
identifiers (as described with respect to Fig. 5C). Assume further that the network
device, when generating the normalized script, inserts whitespace according to a particular
scheme for inserting whitespace (e.g., a scheme directing the network device to insert
only the minimum whitespace required to maintain the script's interpretation), resulting
in a normalized script having normalized whitespace. In this way, the network device
may match the normalized script to a regular expression signature using less memory
and fewer processor resources than if the network device were to use the original
script for matching, while standardizing the naming scheme to defeat renaming obfuscation
techniques.
[0049] As also shown in Fig. 5D, assume, for example implementation 500, that the network
device has determined that dynamic de-obfuscation is to be performed. Assume further
that the network device identifies an API and, based on the API, generates a DOM tree
which represents the HTML context and includes the normalized JavaScript. In this
way, the network device may initialize an emulator with an appropriate DOM for running
a traced interpretation of the script.
[0050] As shown in Fig. 5E, assume, for example implementation 500, that the network device
interprets the normalized script, using the emulator, within an emulated environment
based on the DOM. Assume further that network device 230 tracks function calls, arguments,
returned values, the DOM tree, variable values, and/or other information in a trace.
Using the emulator, the network device may determine that the local variables represent
split literal strings and may evaluate the concatenation of the split strings as "object,"
"classid," and "clsid:55963676-2F5E-4BAF-AC28-CF26AA587566;" that the fromCharCode
function operates with literal numeric value arguments and may evaluate the returned
value of the function as "http://badsite;" and that the JavaScript script creates
a new object in the DOM tree having the aforementioned classid and url as parameters.
[0051] As shown in Fig. 5F, assume that, based on the trace, the network device optimizes
the AST to concatenate the identified split literal strings, replace constant functions
with literal equivalents, propagate constants, and substitute called functions for
local alias functions which call the called functions. Assume that the network device
generates an optimized normalized script based on the optimized normalized AST. In
this way, the network device may further reduce the memory and/or processor requirements
of the signature matching process as well as revert certain forms of data obfuscation
in the script to reduce the odds of false negatives (i.e., not matching an obfuscated
malicious script to a signature).
[0052] Assume further that the network device generates a method call list based on the
trace. Assume further that the method call list identifies three external methods
as having been called by the script to interact with the DOM: "createElement," "setAttribute,"
and "fromCharCode" as well as a call count associated with each (1 for all three)
and arguments passed to each (inline / no arguments for "createElement" and "setAttribute;"
14 numeric values for fromCharCode). By generating the method call list, the network
device may provide another context for which a signature may be generated/matched
to identify a malicious script.
[0053] As shown in Fig. 5G, assume, for example implementation 500, that the network device
determines whether the normalized script, without dynamic de-obfuscation, matches
a first regular expression signature ("REGEX1=var f1=document\.createElement\(\(function\(\){
var var1=["]"). Assume further that the network device matches the first regular expression
signature at the beginning of the normalized JavaScript and thereby determines that
the data is to be treated as having a malicious script.
[0054] As further shown in Fig. 5G, assume that the network device determines whether the
normalized script, with dynamic de-obfuscation, matches a second regular expression
signature ("REGEX2=55963676-2F5E-4BAF-AC28-CF26AA587566") or a third regular expression
signature ("REGEX3=badsite"). Assume further that the network device matches the second
and third regular expression signatures to the normalized dynamically de-obfuscated
script and thereby confirms that the data is to be treated as having a malicious script.
[0055] As further shown in Fig. 5G, assume that the network device determines whether the
method call list matches a fourth regular expression signature ("REGEX4=fromCharCode
[0-9]+ \<[0-9]+ numeric value"). Assume further that the network device matches the
fourth regular expression signature to the method call list and thereby confirms that
the data is to be treated as having a malicious script.
[0056] In this way, the network device may identify and/or confirm the identification of
a malicious script based on one or more regular expression signatures matching one
or more scripts and/or a method call list. As shown in Fig. 5G, the network device
may select a particular regular expression signature or a particular set of regular
expression signatures based on whether the signature is to be matched to a non-dynamically
de-obfuscated normalized script, a dynamically de-obfuscated normalized script, or
to a method call list.
[0057] As shown in Fig. 5H, network device 230 may, based on determining whether the script
matches a signature for a malicious script, delete/quarantine the data received from
sending end-point device 210 (if the script matches a signature) or deliver the data
to the receiving end-point device 220 (if the script does not match a signature).
In this way, network device 230 may enable receiving end-point device 220 to seamlessly
send and receive data to and from sending end-point device 210 (as well as other potentially
dangerous/untrusted sources) while mitigating the risk of transferring a malicious
script.
[0058] As indicated above, Figs. 5A-5H are provided merely as an example. Other examples
are possible and may differ from what was described with regard to Figs. 5A-5H.
[0059] Implementations described herein provide an intrusion detection and protection system
which inspects data passing through a device, identifies a script and a context associated
with the script, normalizes whitespace and identifier naming, de-obfuscates the script,
traces the interpretation of the script in an emulated environment, and matches the
normalized / de-obfuscated script-and other trace information-to a regular expression
signature for a script that has been identified as being malicious. Implementations
described herein may also be used to assist a user in generating a regular expression
signature to be matched against a script.
[0060] Thus from one perspective, there has been described a device that receives data,
identifies a context associated with the data, and identifies a script, within the
data, associated with the context. The device parses the script to identify tokens,
forms nodes based on the tokens, and assembles a syntax tree using the nodes. The
device renames one or more identifiers associated with the nodes and generates a normalized
text, associated with the script, based on the syntax tree after renaming the one
or more identifiers. The device determines whether the normalized text matches a regular
expression signature and processes the data based on determining whether the normalized
text matches the regular expression signature. The device processes the data by a
first process when the normalized text matches the regular expression signature or
by a second process, different from the first process, when the normalized text does
not match the regular expression signature.
[0061] The foregoing disclosure provides illustration and description, but is not intended
to be exhaustive or to limit the implementations to the precise form disclosed. Modifications
and variations are possible in light of the above disclosure or may be acquired from
practice of the implementations.
[0062] As used herein, the term component is intended to be broadly construed as hardware,
firmware, and/or a combination of hardware and software.
[0063] Some implementations are described herein in connection with thresholds. As used
herein, satisfying a threshold may refer to a value being greater than the threshold,
more than the threshold, higher than the threshold, greater than or equal to the threshold,
less than the threshold, fewer than the threshold, lower than the threshold, less
than or equal to the threshold, equal to the threshold, etc.
[0064] It will be apparent that systems and/or methods, described herein, may be implemented
in different forms of hardware, firmware, or a combination of hardware and software.
The actual specialized control hardware or software code used to implement these systems
and/or methods is not limiting of the implementations. Thus, the operation and behavior
of the systems and/or methods were described herein without reference to specific
software code-it being understood that software and hardware can be designed to implement
the systems and/or methods based on the description herein.
[0065] Even though particular combinations of features are recited in the claims and/or
disclosed in the specification, these combinations are not intended to limit the disclosure
of possible implementations. In fact, many of these features may be combined in ways
not specifically recited in the claims and/or disclosed in the specification. Although
each dependent claim listed below may directly depend on only one claim, the disclosure
of possible implementations includes each dependent claim in combination with every
other claim in the claim set.
[0066] No element, act, or instruction used herein should be construed as critical or essential
unless explicitly described as such. Also, as used herein, the articles "a" and "an"
are intended to include one or more items, and may be used interchangeably with "one
or more." Furthermore, as used herein, the term "set" is intended to include one or
more items, and may be used interchangeably with "one or more." Where only one item
is intended, the term "one" or similar language is used. Also, as used herein, the
terms "has," "have," "having," or the like are intended to be open-ended terms. Further,
the phrase "based on" is intended to mean "based, at least in part, on" unless explicitly
stated otherwise.
1. A device, comprising:
one or more processors to:
receive data (405);
identify a context associated with the data;
identify a script, within the data, associated with the context (410);
parse the script to identify tokens;
form nodes based on the tokens;
assemble a syntax tree using the nodes (415);
rename one or more identifiers associated with the nodes;
generate a normalized text, associated with the script, based on the syntax tree after
renaming the one or more identifiers (420);
generate, based on tracing an interpretation of the normalized script a call list
that includes a list of calls to functions which interact with the object model thereby
allowing the device to characterize the script in terms of the script's interaction
with its environment (440) the functions being defined outside of the script;
determine whether the call list matches a regular expression signature; and
process the data based on determining whether the call list matches the regular expression
signature (445),
the data being processed by a first process when the call list matches the regular
expression signature (475),
the data being processed by a second process when the call list does not match the
regular expression signature (450),
the first process being different from the second process.
2. The device of claim 1, where the one or more processors, when generating the normalized
text based on the syntax tree, are further to:
generate a normalized script, corresponding to the script, by traversing the nodes
of the syntax tree,
the normalized text being the normalized script.
3. The device of claim 1 or 2, where the one or more processors, when generating the
normalized text based on the syntax tree, are further to:
generate a normalized script, corresponding to the script, by traversing the nodes
of the syntax tree;
interpret a portion of the normalized script;
generate a trace based on information obtained by interpreting the portion of the
normalized script; and
generate the normalized text based on the trace.
4. The device of claim 3, where the one or more processors, when generating the normalized
text based on the trace, are further to:
identify a function called by the script during interpretation;
determine an order and a quantity associated with the function called;
determine one or more parameters associated with the functions called; and
generate the normalized text based on the function, the order, the quantity, and the
one or more parameters.
5. The device of claim 3 or 4, where the one or more processors, when generating the
normalized text based on the trace, are further to:
identify, from the normalized script, a reducible expression;
determine, based on the trace, a reduced expression corresponding to the reducible
expression; and
substitute the reduced expression in place of the reducible expression,
the normalized text being the normalized script having the reduced expression substituted
in place of the reducible expression.
6. The device of claim 5, where the reducible expression is an encoded expression, and
the one or more processors, when determining the reduced expression, are further to:
decode the encoded expression to generate a decoded expression,
the reduced expression being the decoded expression.
7. The device of any preceding claim, where the one or more processors are further to
receive the data from a sending end-point and, when processing the data based on determining
whether the call list matches the regular expression signature, are further to:
identify a receiving end-point associated with the data;
send the data to the receiving end-point based on determining that the call list does
not match the regular expression signature; and
drop the data, without sending the data to the receiving end-point, based on determining
that the call list matches the regular expression signature.
8. A method comprising:
receiving, by a device, data from an end-point device (405);
extracting, by the device, scripted code from the data (410);
generating, by the device, a normalized script corresponding to the scripted code
and having whitespace and identifiers (420),
the whitespace of the normalized script being formatted, by the device, according
to a normalizing scheme, and
the identifiers of the normalized script being renamed, by the device, according to
the normalizing scheme;
generating, by the device and based on tracing an interpretation of the normalized
script a call list that includes a list of calls to functions which interact with
the object model thereby allowing the device to characterize the script in terms of
the script's interaction with its environment (440) the functions being defined outside
of the scripted code;
determining, by the device, whether the call list matches a signature (445); and
selectively sending, by the device, the data to another end-point device, associated
with the data, based on determining whether the call list matches the signature,
the data being sent to the other end-point device based on determining that the call
list does not match the signature, and
the data not being sent to the other end-point device based on determining that the
call list matches the signature.
9. The method of claim 8, where generating the normalized script further comprises:
parsing the scripted code to generate nodes associated with the identifiers;
populating an abstract syntax tree with the nodes;
renaming the identifiers based on the normalizing scheme and based on node positions
associated with the identifiers;
converting the nodes of the abstract syntax tree into scripted expressions; and
generating the normalized script by assembling the scripted expressions according
to the normalizing scheme.
10. The method of claim 8 or 9, where generating the normalized script further comprises:
determining an environment associated with interpreting the scripted code;
emulating the environment;
interpreting a portion of the scripted code in the emulated environment;
observing an interaction between objects of the interpreted portion of the scripted
code and between the portion of the scripted code and the emulated environment; and
modifying the portion of the scripted code based on the observed interaction, the
normalized script being based on the modified portion of the scripted code.
11. The method of claim 10, where modifying the portion of the scripted code further comprises:
concatenating one or more split strings identified within the portion of the scripted
code.
12. The method of claim 10 or 11, where modifying the portion of scripted code further
comprises:
iteratively modifying the portion of the scripted code.
13. The method of claim 10, 11 or 12, where modifying the portion of the scripted code
further comprises:
decoding the portion of the scripted code.
14. A computer-readable medium storing instructions that, when executed by a device, cause
the device perform the method of claims 8-13.
1. Vorrichtung, die Folgendes umfasst:
einen oder mehrere Prozessoren, die zu Folgendem ausgelegt sind:
Empfangen von Daten (405);
Identifizieren eines Kontextes, der mit den Daten verknüpft ist;
Identifizieren eines Skripts in den Daten, das mit dem Kontext verknüpft ist (410);
Parsen des Skripts, um Tokens zu identifizieren;
Bilden von Knoten auf Basis der Tokens;
Zusammenstellen eines Syntaxbaums unter Verwendung der Knoten (415);
Umbenennen von einer oder mehreren Kennungen, die mit den Knoten verknüpft sind;
nach dem Umbenennen der einen oder mehreren Kennungen Erzeugen auf Basis des Syntaxbaums
eines normalisierten Textes, der mit dem Skript verknüpft ist (420);
Erzeugen auf Basis des Verfolgens einer Interpretation des normalisierten Skripts
einer Aufrufliste, die eine Liste von Aufrufen für Funktionen beinhaltet, die mit
dem Objektmodell interagieren, wodurch es der Vorrichtung ermöglicht wird, das Skript
hinsichtlich der Interaktion des Skripts mit seiner Umgebung zu charakterisieren (440);
wobei die Funktionen außerhalb des Skripts definiert sind;
Bestimmen, ob die Aufrufliste mit einer Signatur eines regulären Ausdrucks übereinstimmt;
und
Verarbeiten der Daten auf Basis des Bestimmens, ob die Aufrufliste mit der Signatur
eines regulären Ausdrucks übereinstimmt (445),
wobei die Daten durch einen ersten Prozess verarbeitet werden, wenn die Aufrufliste
mit der Signatur eines regulären Ausdrucks übereinstimmt (475),
wobei die Daten durch einen zweiten Prozess verarbeitet werden, wenn die Aufrufliste
nicht mit der Signatur eines regulären Ausdrucks übereinstimmt (450),
wobei sich der erste Prozess vom zweiten Prozess unterscheidet.
2. Vorrichtung nach Anspruch 1, wo der eine oder die mehreren Prozessoren, wenn sie den
normalisierten Text auf Basis des Syntaxbaums erzeugen, ferner zu Folgendem ausgelegt
sind:
Erzeugen eines normalisierten Skripts, das dem Skript entspricht, durch Traversieren
der Knoten des Syntaxbaums,
wobei der normalisierte Text das normalisierte Skript ist.
3. Vorrichtung nach Anspruch 1 oder 2, wo der eine oder die mehreren Prozessoren, wenn
sie den normalisierten Text auf Basis des Syntaxbaums erzeugen, ferner zu Folgendem
ausgelegt sind:
Erzeugen eines normalisierten Skripts, das dem Skript entspricht, durch Traversieren
der Knoten des Syntaxbaums;
Interpretieren eines Abschnitts des normalisierten Skripts;
Erzeugen einer Spur auf Basis von Informationen, die durch Interpretieren des Abschnitts
des normalisierten Skripts erhalten werden; und
Erzeugen des normalisierten Textes auf Basis der Spur.
4. Vorrichtung nach Anspruch 3, wo der eine oder die mehreren Prozessoren, wenn sie den
normalisierten Text auf Basis der Spur erzeugen, ferner zu Folgendem ausgelegt sind:
Interpretieren einer Funktion, die vom Skript während einer Interpretation aufgerufen
wird;
Bestimmen einer Reihenfolge und einer Menge, die mit der aufgerufenen Funktion verknüpft
sind;
Bestimmen von einem oder mehreren Parametern, die mit den aufgerufenen Funktionen
verknüpft sind; und
Erzeugen des normalisierten Textes auf Basis der Funktion, der Reihenfolge, der Menge
und des einen oder der mehreren Parameter.
5. Vorrichtung nach Anspruch 3 oder 4, wo der eine oder die mehreren Prozessoren, wenn
sie den normalisierten Text auf Basis des Spur erzeugen, ferner zu Folgendem ausgelegt
sind:
Identifizieren eines reduzierbaren Ausdrucks anhand des normalisierten Skripts;
Bestimmen auf Basis der Spur eines reduzierten Ausdrucks, der dem reduzierbaren Ausdruck
entspricht; und
Ersetzen des reduzierbaren Ausdrucks durch den reduzierten Ausdruck,
wobei der normalisierte Text das normale Skript ist, bei dem der reduzierbare Ausdruck
durch den reduzierten Ausdruck ersetzt wurde.
6. Vorrichtung nach Anspruch 5, wo der reduzierbare Ausdruck ein codierter Ausdruck ist
und der eine oder die mehreren Prozessoren, wenn sie den reduzierten Ausdruck bestimmen,
ferner zu Folgendem ausgelegt sind:
Decodieren des codierten Ausdrucks, um einen decodierten Ausdruck zu erzeugen,
wobei der reduzierte Ausdruck der decodierte Ausdruck ist.
7. Vorrichtung nach einem der vorhergehenden Ansprüche, wo der eine oder die mehreren
Prozessoren ferner die Daten von einem sendenden Endpunkt empfangen und, wenn sie
die Daten auf Basis des Bestimmens, ob die Aufrufliste mit der Signatur eines regulären
Ausdrucks übereinstimmt, verarbeiten, ferner zu Folgendem ausgelegt sind:
Identifizieren eines empfangenden Endpunktes, der mit den Daten verknüpft ist;
Senden der Daten an den empfangenden Endpunkt auf Basis des Bestimmens, dass die Aufrufliste
nicht mit der Signatur eines regulären Ausdrucks übereinstimmt; und
Verwerfen der Daten, ohne die Daten an den empfangenden Endpunkt zu senden, auf Basis
des Bestimmens, dass die Aufrufliste mit der Signatur eines regulären Ausdrucks übereinstimmt.
8. Verfahren, das Folgendes umfasst:
Empfangen von Daten von einer Endpunktvorrichtung durch eine Vorrichtung (405);
Extrahieren von Skriptcode aus den Daten durch die Vorrichtung (410);
Erzeugen durch die Vorrichtung eines normalisierten Skripts, das dem Skriptcode entspricht
und Leerraum und Kennungen aufweist (420),
wobei der Leerraum des normalisierten Skripts von der Vorrichtung gemäß einem Normalisierungsschema
formatiert wird und
wobei die Kennungen des normalisierten Skripts von der Vorrichtung gemäß dem Normalisierungsschema
umbenannt werden;
Erzeugen durch die Vorrichtung und auf Basis des Verfolgens einer Interpretation des
normalisierten Skripts einer Aufrufliste, die eine Liste von Aufrufen für Funktionen
beinhaltet, die mit dem Objektmodell interagieren, wodurch es der Vorrichtung ermöglicht
wird, das Skript hinsichtlich der Interaktion des Skripts mit seiner Umgebung zu charakterisieren
(440);
wobei die Funktionen außerhalb des Skriptcodes definiert sind;
Bestimmen durch die Vorrichtung, ob die Aufrufliste mit einer Signatur übereinstimmt
(445); und
selektives Senden der Daten auf Basis des Bestimmens, ob die Aufrufliste mit der Signatur
übereinstimmt, durch die Vorrichtung an eine andere Endpunktvorrichtung, die mit den
Daten verknüpft ist, wobei die Daten auf Basis des Bestimmens, dass die Aufrufliste
nicht mit der Signatur übereinstimmt, an die andere Endpunktvorrichtung gesendet wird
und
wobei die Daten auf Basis des Bestimmens, dass die Aufrufliste mit der Signatur übereinstimmt,
nicht an die andere Endpunktvorrichtung gesendet wird.
9. Verfahren nach Anspruch 8, wo das Erzeugen des normalisierten Skripts ferner Folgendes
umfasst:
Parsen des Skriptcodes, um Knoten zu erzeugen, die mit den Kennungen verknüpft sind;
Füllen eines abstrakten Syntaxbaums mit den Knoten;
Umbenennen der Kennungen auf Basis des Normalisierungsschemas und auf Basis von Knotenpositionen,
die mit den Kennungen verknüpft sind;
Konvertieren der Knoten des abstrakten Syntaxbaums in Skriptausdrücke; und
Erzeugen des normalisierten Skripts durch Zusammenstellen der Skriptausdrücke gemäß
dem Normalisierungsschema.
10. Verfahren nach Anspruch 8 oder 9, wo das Erzeugen des normalisierten Skripts ferner
Folgendes umfasst:
Bestimmen einer Umgebung, die mit dem Interpretieren des Skriptcodes verknüpft ist;
Emulieren der Umgebung;
Interpretieren eines Abschnitts des Skriptcodes in der emulierten Umgebung;
Beobachten einer Interaktion zwischen Objekten des interpretierten Abschnitts des
Skriptcodes und zwischen dem Abschnitt des Skriptcodes und der emulierten Umgebung;
und
Modifizieren des Abschnitts des Skriptcodes auf Basis der beobachteten Interaktion,
wobei das normalisierte Skript auf dem modifizierten Abschnitt des Skriptcodes basiert.
11. Verfahren nach Anspruch 10, wo das Modifizieren des Abschnitts des Skriptcodes ferner
Folgendes umfasst:
Verketten von einer oder mehreren geteilten Zeichenfolgen, die im Abschnitt des Skriptcodes
identifiziert wurden.
12. Verfahren nach Anspruch 10 oder 11, wo das Modifizieren des Abschnitts des Skriptcodes
ferner Folgendes umfasst:
wiederholtes Modifizieren des Abschnitts des Skriptcodes.
13. Verfahren nach Anspruch 10, 11 oder 12, wo das Modifizieren des Abschnitts des Skriptcodes
ferner Folgendes umfasst:
Decodieren des Abschnitts des Skriptcodes.
14. Computerlesbares Medium, auf dem Anweisungen gespeichert sind, die, wenn sie von einer
Vorrichtung ausgeführt werden, die Vorrichtung veranlassen, das Verfahren nach den
Ansprüchen 8-13 durchzuführen.
1. Dispositif, comprenant :
un ou plusieurs processeurs destinés à :
recevoir des données (405) ;
identifier un contexte associé aux données ;
identifier un script, à l'intérieur des données, associé au contexte (410) ;
analyser le script pour identifier des jetons ;
former des noeuds sur la base des jetons ;
assembler un arbre syntaxique en utilisant les noeuds (415) ;
renommer un ou plusieurs identifiants associés aux noeuds ;
générer un texte normalisé, associé au script, sur la base de l'arbre syntaxique après
avoir renommé les un ou plusieurs identifiants (420) ;
générer, sur la base du traçage d'une interprétation du script normalisé, une liste
d'appels qui comporte une liste d'appels à des fonctions qui interagissent avec le
modèle d'objet, permettant ainsi au dispositif de caractériser le script en termes
de l'interaction du script avec son environnement (440),
les fonctions étant définies à l'extérieur du script ;
déterminer si la liste d'appels correspond à une signature d'expression normale ;
et
traiter les données sur la base de la détermination que la liste d'appels correspond
à la signature d'expression normale (445) ou pas
les données étant traitées par un premier processus lorsque la liste d'appels correspond
à la signature d'expression normale (475),
les données étant traitées par un deuxième processus lorsque la liste d'appels ne
correspond pas à la signature d'expression normale (450),
le premier processus étant différent du deuxième processus.
2. Dispositif selon la revendication 1, dans lequel les un ou plusieurs processeurs,
lors de la génération du texte normalisé sur la base de l'arbre syntaxique, sont en
outre conçus pour :
générer un script normalisé, correspondant au script, en traversant les noeuds de
l'arbre syntaxique,
le texte normalisé étant le script normalisé.
3. Dispositif selon la revendication 1 ou 2, dans lequel les un ou plusieurs processeurs,
lors de la génération du texte normalisé sur la base de l'arbre syntaxique, sont en
outre conçus pour :
générer un script normalisé, correspondant au script, en traversant les noeuds de
l'arbre syntaxique ;
interpréter une partie du script normalisé ;
générer une trace sur la base des informations obtenues en interprétant la partie
du script normalisé ; et
générer le texte normalisé sur la base de la trace.
4. Dispositif selon la revendication 3, dans lequel les un ou plusieurs processeurs,
lors de la génération du texte normalisé sur la base de la trace, sont en outre conçus
pour :
identifier une fonction appelée par le script durant une interprétation ;
déterminer un ordre et une quantité associés à la fonction appelée ;
déterminer un ou plusieurs paramètres associés aux fonctions appelées ; et
générer le texte normalisé sur la base de la fonction, de l'ordre, de la quantité,
et des un ou plusieurs paramètres.
5. Dispositif selon la revendication 3 ou 4, dans lequel les un ou plusieurs processeurs,
lors de la génération du texte normalisé sur la base de la trace, sont en outre conçus
pour :
identifier, à partir du script normalisé, une expression réductible ;
déterminer, sur la base de la trace, une expression réduite correspondant à l'expression
réductible ; et
mettre l'expression réduite à la place de l'expression réductible,
le texte normalisé étant le script normalisé dont l'expression réduite a été mise
à la place de l'expression réductible.
6. Dispositif selon la revendication 5, dans lequel l'expression réductible est une expression
codée, et les un ou plusieurs processeurs, lors de la détermination de l'expression
réduite, sont en outre conçus pour :
décoder l'expression codée pour générer une expression décodée,
l'expression réduite étant l'expression décodée.
7. Dispositif selon l'une quelconque des revendications précédentes, dans lequel les
un ou plusieurs processeurs sont en outre conçus pour recevoir les données à partir
d'un point d'extrémité d'envoi et, lors du traitement des données sur la base de la
détermination que la liste d'appels correspond à la signature d'expression normale
ou pas, sont en outre conçus pour :
identifier un point d'extrémité de réception associé aux données ;
envoyer les données au point d'extrémité de réception sur la base de la détermination
que la liste d'appels ne correspond pas à la signature d'expression normale ; et
rejeter les données, sans envoyer les données au point d'extrémité de réception, sur
la base de la détermination que la liste d'appels correspond à la signature d'expression
normale.
8. Procédé comprenant :
recevoir, par un dispositif, des données à partir d'un dispositif de point d'extrémité
(405) ;
extraire, par le dispositif, un code scripté à partir des données (410) ;
générer, par le dispositif, un script normalisé correspondant au code scripté et ayant
un espace et des identifiants (420),
l'espace du script normalisé étant formaté, par le dispositif, selon un schéma de
normalisation, et
les identifiants du script normalisé étant renommés, par le dispositif, selon le schéma
de normalisation ;
générer, par le dispositif et sur la base d'un traçage d'une interprétation du script
normalisé, une liste d'appels qui comporte une liste d'appels à des fonctions qui
interagissent avec le modèle d'objet, permettant ainsi au dispositif de caractériser
le script en termes de l'interaction du script avec son environnement (440),
les fonctions étant définies à l'extérieur du code scripté;
déterminer, par le dispositif, si la liste d'appels correspond à une signature (445)
; et
envoyer sélectivement, par le dispositif, les données à un autre dispositif de point
d'extrémité, associé aux données, sur la base de la détermination que la liste d'appels
correspond ou pas à la signature,
les données étant envoyées à l'autre dispositif de point d'extrémité sur la base de
la détermination que la liste d'appels ne correspond pas à la signature, et
les données n'étant pas envoyées à l'autre dispositif de point d'extrémité sur la
base de la détermination que la liste d'appels correspond à la signature.
9. Procédé selon la revendication 8, dans lequel l'étape consistant à générer le script
normalisé comprend en outre :
analyser le code scripté pour générer des noeuds associés aux identifiants ;
remplir un arbre syntaxique abstrait avec les noeuds ;
renommer les identifiants sur la base du schéma de normalisation et sur la base de
positions de noeud associées aux identifiants ;
convertir les noeuds de l'arbre syntaxique abstrait en expressions scriptées ; et
générer le script normalisé en assemblant les expressions scriptées selon le schéma
de normalisation.
10. Procédé selon la revendication 8 ou 9, dans lequel l'étape consistant à générer le
script normalisé comprend en outre :
déterminer un environnement associé à l'interprétation du code scripté ;
émuler l'environnement ;
interpréter une partie du code scripté dans l'environnement émulé ;
observer une interaction entre des objets de la partie interprétée du code scripté
et entre la partie du code scripté et l'environnement émulé ; et
modifier la partie du code scripté sur la base de l'interaction observée,
le script normalisé étant basé sur la partie modifiée du code scripté.
11. Procédé selon la revendication 10, dans lequel l'étape consistant à modifier la partie
du code scripté comprend en outre :
concaténer une ou plusieurs chaînes fractionnées identifiées à l'intérieur de la partie
du code scripté.
12. Procédé selon la revendication 10 ou 11, dans lequel l'étape consistant à modifier
la partie du code scripté comprend en outre :
modifier itérativement la partie du code scripté.
13. Procédé selon la revendication 10, 11 ou 12, dans lequel l'étape consistant à modifier
la partie du code scripté comprend en outre :
décoder la partie du code scripté.
14. Support lisible par ordinateur stockant des instructions qui, lorsqu'elles sont exécutées
par un dispositif, amènent le dispositif à mettre en oeuvre le procédé des revendications
8-13.