TECHNICAL FIELD
[0001] This disclosure relates to fraud detection and more specifically to systems and methods
for e-commerce fraud detection.
BACKGROUND OF THE INVENTION
[0002] E-commerce systems exist where members of the general pubic, using an Internet accessible
website, can obtain sensitive information pertaining to individuals. Such information,
by way of example, takes the form of credit histories and other credit sensitive data.
These types of websites are prone to users trying to obtain (by fraudulent means)
private information about others. Often, such attempts are made by imposters who have
some, but not all, of the identification needed to identify a target. These imposters
are trying to steal the target's identity.
[0003] In a typical scenario, the fraudster has obtained some piece of the target's personal
information. Typically, this would be the target's name and perhaps his/her address.
The fraudster then obtains a (typically stolen) credit card belonging to someone other
than the target. The object then for the fraudster is to steal the full identity of
the target. To do this the fraudster will make use of a website that provides access
to a full range of credit history data pertaining to individuals. The fraudster will
issue a query in the form of a credit report request.
[0004] Using this scenario, the fraudster creates an account on the website and then attempts
to purchase a credit report belonging to the target using the stolen credit card number.
In this scenario the fraudster is trying to pass him/her self off as the target. In
order to obtain the report, the fraudster must go through an identity authentication
process administered by one of the credit bureaus. In this process the fraudster engages
in a computer-generated interview where a small number of questions are posed about
some of the items that the real target would know about the credit report. Since the
fraudster usually does not yet have access to sufficient information about the target
and past credit transactions, the fraudster often fails the interview. Fraudsters
being what they are, don't give up at this point.
[0005] The foiled fraudster then creates another account and tries again. Often the fraudster
will use similar (but not identical) information to create each new account. This
similar information can be, for example, password, security answer, e-mail address,
credit card number, and the like. Once in a while, the imposter will succeed and obtain
a target's credit report containing sensitive data that then facilitates the imposter's
desire to trade off of the credit of the target.
[0006] The occurrence of clusters of many accounts that are similar enough to have possibly
been created by the same individual is a strong indicator of potential fraud. Currently,
trying to identify collections of similar accounts is a laborious and time consuming
process which involves repeatedly querying the database for information and patterns.
BRIEF SUMMARY OF THE INVENTION
[0007] In the foregoing example, one wishes to identify clusters of entities (accounts)
that are similar in nature. The presence of tightly connected clusters is indicative
of fraud. While the example here (and subsequently in this document) is oriented around
the clustering of accounts in an eCommerce database, the clusters could just as easily
be collections of similar debit card transactions, similar insurance claims, similar
credit card transactions, similar credit card applications, similar student loan applications,
etc or any other entities where the occurrence of tight clusters of similar entities
is indicative of fraud. Fraud detection is facilitated by using matching rules to
uncover clusters of entities, by then generating cluster membership rules and converting
those rules to database queries. The cluster membership rules are based upon an accumulation
of links of various types and strengths between entities. In one embodiment, the entities
are website accounts, clusters are identified, and the system then constructs cluster
membership rules for identifying subsequent accounts that match the attributes of
those clusters. The cluster membership rules are designed to define the parameters
of the identified clusters. When the rules are deployed in a transaction blocking
system, for example, when a rule that describes an identified cluster is triggered,
the transaction blocking system blocks the transaction with respect to new users who
enter the website.
[0008] The foregoing has outlined rather broadly the features and technical advantages of
the present invention in order that the detailed description of the invention that
follows may be better understood. Additional features and advantages of the invention
will be described hereinafter which form the subject of the claims of the invention.
It should be appreciated by those skilled in the art that the conception and specific
embodiment disclosed may be readily utilized as a basis for modifying or designing
other structures for carrying out the same purposes of the present invention. It should
also be realized by those skilled in the art that such equivalent constructions do
not depart from the spirit and scope of the invention as set forth in the appended
claims. The novel features which are believed to be characteristic of the invention,
both as to its organization and method of operation, together with further objects
and advantages will be better understood from the following description when considered
in connection with the accompanying figures. It is to be expressly understood, however,
that each of the figures is provided for the purpose of illustration and description
only and is not intended as a definition of the limits of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] For a more complete understanding of the present invention, reference is now made
to the following descriptions taken in conjunction with the accompanying drawing,
in which:
[0010] FIGURE 1 shows one embodiment of a system for establishing rules for the detection
of possible fraudulent transactions in accordance with concepts of this invention;
[0011] FIGURES 2 through 12 show typical screen shots as a user works through the various
aspects of the invention;
[0012] FIGURE 13 shows one embodiment of the operation of a pattern matcher generation system;
and.
[0013] FIGURE 14 shows one embodiment of the use of a fraud rule to block, in real-time,
fraudulent activity with respect to an imposter attempting to obtain private data
belonging to a target, from a database of such information.
DETAILED DESCRIPTION OF THE INVENTION
[0014] Turning now to FIGURE 1, there is shown one embodiment 10 for practicing the invention.
In operation, the user (system administrator) formulates and issues an SQL query against
database 108 which is the database that stores entities of interest. In this particular
embodiment, the entities correspond to customer accounts on a website, but the database
could just as easily store entities corresponding to insurance claims, debit card
transactions and the like. In this context, a query would, for example, allow the
user to "select all of the entities created within the past 30 days, and extract all
of the following fields, A through G, but exclude fields C and F." Basically, the
query allows the user to extract the data for all of the entities created within the
past 30 days, but to exclude some of this information because it is not important
at that point in time.
[0015] The results of the query, namely the entities matching the SQL query, are then loaded
into pattern matcher 101. The pattern matcher takes as its input previously established
pattern matching rules. For example, a very simple pattern matching rule would say
"in order for two credit card numbers to match, they must be identical." Another rule
might say "in order for two e-mail addresses to match at least four letters or numbers
of the information before the @ must match." A more sophisticated rule might reflect
that "for two passwords to match, they must have a pair of identical substrings."
That means that if you had one password that said "dogdog," and another password that
said "catcat," those two passwords would match even though they are composed of different
letters.
[0016] One can imagine many types of sophisticated matches, such as, for example, two passwords
can match if they start with the first initial of the account holder's first name,
followed by a two digit number, followed by the account holder's last name. Thus,
"J12smith" would match with "d15jones." A pattern match generator, such as will be
discussed with respect to FIGURE 13, can be used when a user identifies a hitherto
unknown pattern and wishes to construct a matcher able to match this new pattern.
[0017] Matching rules, such as discussed above, are utilized by the pattern matcher and
all of the entities that match these rules are collected and linked together. For
example, all of the accounts found by a rule that defines matching e-mail addresses
could be linked. Also, all the accounts that are found by a rule that defines matching
passwords could be linked, as could all the accounts that are found based on a rule
that specifies matching credit card numbers. All of the accounts that are linked to
other accounts on the basis of the matching rules are then written to the link dataset.
The link dataset basically lists those accounts that are connected to other accounts
by which types of links and at what strength.
[0018] Links have certain types. In the customer account example, some of the link types
are: credit card number, password, e-mail address, etc. In addition, each link type
has a numerical strength, indicating the degree to which a pattern associated with
the particular link type is matched. Each link type corresponds to a "layer" which
is simply the way by which connected accounts for a particular link type are represented
to the user. The link dataset is loaded into layer builder 103 which creates an internal
data structure representing the way that those accounts are connected on each layer.
Again, a layer means a type of link. For example, an e-mail address is a layer, a
security answer is a layer, a password is a layer, and credit card number is a layer.
Layer builder 103 builds the layers and describes the way in which the accounts are
connected within each layer.
[0019] The layer information is then run through graph renderer 104 which generates a visual
display so that the user, as will be discussed, can visualize the various links. Different
colors assist in this visualization. The links are also shown with different width
connectors representing the relative strength of the association. The user then can
expand out on a layer-by-layer basis as will be discussed.
[0020] At a certain point, the user begins to identify what might be a cluster and then
the user can add or remove accounts from the cluster as desired using cluster editor
105. When the user is satisfied with a cluster, the cluster can be automatically characterized
by cluster explainer 106, with that characterization being represented by a decision
tree. That decision tree can then be transformed to a corresponding SQL expression
which can be applied to the database for later retrieval of additional matching accounts.
[0021] Cluster explainer 106 is used to automatically induce a set of cluster membership
rules that identify the parameters that caused an account to be part of the identified
cluster. For example, the rules might indicate that "to be a member of the cluster,
the e-mail address must follow a certain pattern and the security answer must follow
another pattern, and the account holder must be a resident in Bakersfield, and so
on and so forth." These membership rules can be modified, if desired, by the user
via rule editor 107.
[0022] The user can then transform a set of cluster membership rules into a SQL query and
apply that query against customer database 108 effectively asking "see whether any
accounts in the entire history of the database match the particular cluster membership
rule set corresponding to the current cluster." What the user is effectively saying
is "in this last month of data, a cluster of accounts has been identified that is
suspicious. The suspicious account activity is defined by a set of rules that describe
the attributes of accounts that are members of the cluster. Every account in the database
is searched (via the cluster membership rule set expressed as a SQL query) in order
to identify any other accounts that match the pattern described by the cluster membership
rules. If found, those accounts are loaded, run through the pattern matcher and then
displayed on the screen as were the previously loaded accounts. Then the user can
once again enter into the exploratory state and perhaps further refine the cluster.
This iteration can go on as long as the user desires
[0023] Returning now to cluster editor 105, in addition to simply assigning accounts to
clusters via the use of previously defined account matching rules (as discussed above)
the user can use pattern editor 109 to create new pattern matching rule(s) based on
patterns of data that have been hitherto unseen. For example, the user may notice
a password that is characterized by a pattern of: the first letter of the account
holder's first name, followed by the number 99, followed by the last letter of the
account holder's first name, followed by 99, followed by the remainder of the account
holder's surname. The user determines that this is an "interesting" password pattern.
The user might then want to find out if there are any other accounts in the entire
database that have a password patterns that match that one.
[0024] FIGURES 2 through 12 show one embodiment of typical screen shots encountered as a
user works through the various aspects of the inventive concepts as taught herein.
[0025] FIGURE 2 shows a common usage scenario which, in this view 20, is a screen shot indicating
the initiation of a charge-back analysis. A charge-back occurs when a person calls
the customer service system of the eCommerce website from which credit reports are
purchased. That person is typically directed to the system by a credit card company
when the person calls to complain that a charge on their credit card does not belong
to them. In this example, the charge is for the purchase of a credit report that the
caller did not knowingly make. This is typically (but not exclusively) how a search
for a fraudster begins. In more general terms, the search for clusters of fraudulent
activity typically begins with a "seed" entity that is somehow suspicious. Starting
with that seed entity, the user interactively follows links that connect to other
similar entities. In this particular embodiment, the seed entity is an account associated
with the suspicious usage of a credit card. But the scenario could just as easily
be one in which the seed entity is a debit card transaction or an insurance claim.
[0026] The search begins in this scenario with the user knowing the account which is associated
with the credit card transaction in question (since each account is associated with
one or more credit cards). The user also knows the true identity of the person whose
credit report was purchased since the purchased credit report information is stored
in association with the account.
[0027] In our example, the fraudulently purchased credit report belongs to a person named
Jones as shown in line 201 of screen section 21. Screen section 21 contains the true
names and credit card numbers (as well as other information) of a large number of
persons. The system user then types "jones" in jump-to field 202 which then brings
up an e-mail address 203 of, for example,
[email protected]. The user then can right click on screen 20 to show expand-on box 204. The user then
selects "credit card" for further expansion. In this context, the process of expansion
corresponds to displaying additional accounts linked to the currently displayed account
by virtue of a credit card number that matches the credit card number for the currently
selected account according to the matching rules for credit card numbers.
[0028] FIGURE 3 shows the results of the expansion. In this case, there is shown three nodes
301, 302 and 303 each of which represents an account that matches the currently selected
account based upon the matching rule for credit card numbers. Note that while "matching"
in the context of credit cards means "exact match", "matching" is generally determined
by matching rules specific to the layer (link type) being considered. They need not
be exact matches and, in fact, are often "fuzzy" matches. As shown in FIGURE 2, the
nodes are inter connected (linked) by a line which is color coded according to the
link type being matched. In addition, the thickness of the link is drawn in proportion
to the strength of the match.
[0029] The user inspects the display, looking for similarities across the three nodes being
displayed, and notices that the e-mail addresses for all of these nodes are similar.
The user brings up expand-on box 304 and checks the "e-mail" box. This instructs the
system to link to additional accounts that have email addresses that match any of
the email addresses of the three visible nodes according to the matching rules that
have been established for email addresses.
[0030] FIGURE 4 shows several nodes interconnected by different colors (shown in the drawing
as different line types), corresponding to the different match types. In particular,
we see accounts linked by credit card and email matches. The user then can inspect
the details of each account by, for example, rolling the mouse over the node corresponding
to the account. The results from placing the mouse pointer over node 404 is shown
in FIGURE 5 by box 510. This then shows the credit card holder's name, address, e-mail
address, login, password, security answer, credit card number (which is encrypted
in the drawing) and a variety of other data.
[0031] The lines of section 512 indicate how this particular account is connected to other
accounts. In this example, this node is connected to groups (unlabeled clusters),
of matching credit cards, groups of matching email addresses, groups of matching IP
addresses, etc.
[0032] The user can select all of the accounts displayed, and request that the characteristics
of those accounts alone be displayed in a table below the graph display. By looking
at this table, it can be observed that the selected accounts have similar passwords.
By right clicking "similar password" in expand-on box 503, the user can then expand
the graph to show those accounts with similar passwords.
[0033] FIGURE 6 shows a total of 14 accounts that are connected via similar passwords, credit
card numbers and email addresses. By further investigation (via the table mechanism
described above) it can be observed that they also have similarities in terms of their
respective security answers. The user then uses expand-on-box 603 to enable the display
(as shown in FIGURE 7) of accounts linked on the basis of security answers.
[0034] As shown in FIGURE 7, the interconnecting links have now expanded to a point where
it is difficult to focus on anything of value since it is all mostly hidden from view
by the clutter. However, there are a number of different links that have some things
in common. Because the links are colored, the overlapping colors intensify where many
links of the same color intersect. Thus, the links that have the most in common have
the most intense color and the links with the weakest interconnections have much less
color intensity.
[0035] Saying this another way, when the color is intense, there are a number of common
attributes, such as common passwords, common e-mails, common passwords, etc. Where
the color intensity is less, the number of common attributes are less. Accordingly,
it is possible to selectively remove links with less intense color from the screen
by drawing a box around the undesired (for now) links, right clicking and responding
to a prompt to remove the links within the box.
[0036] FIGURE 8 is a screen of what remains after removing the loosely connected (less intense
colored) sub-clusters. This screen shows e-mail addresses for the remaining accounts
with a high number of interconnected links in the background. There are so many links
on the security answer layer that it is difficult to see any other link types.
[0037] FIGURE 9 then shows what remains after temporarily hiding the security answer connections
for these accounts (so as to allow the user to see the links that were obscured by
the preponderance of links on the security answer layer). There is presented a set
of nodes 901 that are not connected at all, or do not appear to be connected. The
set of nodes 901 are actually connected based upon the security answer, but the display
of those links has been temporarily disabled. There is another group of connected
accounts 902 that are nicely connected. By placing the curser on each of them, the
attributes of each of those accounts can be determined.
[0038] It is then determined that every one of the accounts in list 902 has Bakersfield
as the home address. By then observing the accounts in list 901, it can be observed
that they are from cities all over the country. The only common connection is that
one account exhibits a Bakersfield address. Then, by removing all of the accounts
that do not list Bakersfield as a home address, the display can be reconfigured as
shown in FIGURE 10.
[0039] FIGURE 10 now shows all of the accounts belonging to the potentially fraudulent cluster.
By re-enabling the security answer layer, the display reveals that they are all connected.
This display is then labeled as cluster 1010. Cluster 1010 can then be expanded to
show all the interconnections.
[0040] In FIGURE 11, cluster 1010 has been expanded and given a name by cluster creator
1101. In the example shown, box 1102 is labeled "My Potentially Fraudulent Cluster."
Once created, this cluster is then run through cluster explainer 106 (FIGURE 1) which
applies a commonly used machine learning algorithm (Classification and Regression
Trees) to generate a decision tree.
[0041] FIGURE 12 shows one portion of the generated decision tree that says "if the security
answer is "barkyt" or "barky," and the AVS check is failed or not performed, then
the transaction is deemed to be fraudulent. Otherwise, if the AVS check is okay or
not required, then the rest of the decision tree would indicate, "if the transaction
is in the following set of zip codes, then it is deemed to be fraudulent." At this
point, the decision tree can be translated into a simple SQL expression that can be
applied to the entire database of known accounts, in order to identify accounts that
have the same attributes as the cluster of accounts that has just now been identified
as potentially fraudulent.
[0042] Note that the database that the fraud rule is run against can be the same database,
for example database 108 (FIGURE 1) that was used to begin the drill-down process,
as discussed above, and/or the fraud rule can be sent to one or more databases (not
shown) remote from the originating database via communication device 110 (FIGURE 1).
This then allows for fraud detection rules to be circulated among different databases,
perhaps at different credit monitoring facilities.
[0043] FIGURE 13 shows one embodiment of a method for creating a pattern for use in pattern
matcher 101, as shown in FIGURE 1. Assuming that the user who has been studying the
screen and looking at various items such as passwords notices a pattern. For example,
the user notices that there are several passwords that have one character from the
target's first name, then two digits, which could be two random digits, then the target's
last name. Another pattern that the user, for example, has noticed is that the password
could have one character from the target's first name followed by a specific string
of digits followed by the target's last name.
[0044] The user then brings up pattern match generator 1300 as shown in FIGURE 13 and begins
to create a pattern matcher. In this example, the user prepares an expression consisting
of two compound phrases connected by an OR condition. The user begins by using box
1301 and selecting what the first part of the pattern will be, in this case the user
selects the word "first." Then using box 1302, the user selects N (which would mean
the first N characters) and another box pops up to allow the user to select the specific
value for N. In our case, the user selects "1." The user would then go to box 1303
and select where those characters are from. In this case, the user would select "First
Name Field" and then using box 1304 would select the "followed by" notation. The user
would then press the "Next Phrase" button and then would repeat back at box 1301 to
select the word "exactly" followed by the "2" from box 1302, followed by "the integers"
from box 1303. Then the user would select "followed by" from box 1304, then press
the "Next Phrase" button again, then would repeat back at box 1301 and select the
words "all" from box 1301, and then "Last Name Field" from box 1303.
[0045] The user would then press "OR" then "(" then repeat the process described in 0044
to prepare the second compound phrase as shown in 1312. The two compound phrases are
shown in screen 1310 as the user is creating them, for example, the phrase that was
just created is shown as field 1312. Assuming that the user wants to save the phrase,
then box 1306 is used. If the user desires to generate sample strings that match the
current expression, the user can use box 1330 which generates sample matches which
correspond to the matching rules 1311 and 1312 and the user can therefore see on the
screen if, after a number of samples have been created, the pattern matcher has being
defined properly.
[0046] The user can create example matches using box 1330 and if the user desires to edit
the phrase, that can be done via screen 1320 where the syntax for controlling the
pattern matcher on the machine process is shown. If the user wants to edit the phrase,
then the user can do so at this point; or if, after editing, the user wants to check
the syntax to be sure that the syntax is still correct, then box 1322 can perform
that function. When the user is finished defining a pattern matcher, then the user
can create the pattern matcher using 1331. Sometimes the user may want to create a
phrase, name it, and then reuse the named phrase in another pattern or in another
portion of the same pattern. This action is accomplished by creating the pattern,
such as pattern 1311 and then enabling the save phrase box 1306. The save-phrase box
1306 then allows the user to name that phrase and then, if desired, to create a new
pattern matcher using that saved phrase as a building block.
[0047] FIGURE 14 shows one embodiment 1400 of the use of a fraud rule to block, in real-time,
fraudulent activity with respect to an imposter attempting to obtain credit history
data from a database of credit information. Process 1401 controls the logon access
to a credit database. This access can be, for example, so that the individual can
access his/her credit history. As is well-known, before such access will be granted
a process, such as process 1402, queries the accessing user for some combination of
attributes uniquely pertaining to that user's data file. Some of thee possible attributes
are shown in process 1402, but any number and any combination can be required, and
the combination can change depending upon security levels, or depending upon previous
query answers.
[0048] Process 1403 reviews the answers, either one at a time or in bulk, and process 1404
compares the answers against one or more fraud cluster membership rules that have
been generated, as discussed above. If one or more answers, such as the answer to
the password or the answer to the e-mail address, etc, match a fraud cluster membership
rule, then process 1405 acts to take whatever action is required by the system administrator,
such as recording the machine identity of the user or blocking further access for
this user, or invoking any other action defined by the system.
[0049] Process 1406, either acting concurrently with process 1404 or serial thereto, will
either grant access to the credit information if all the queries are answered correctly
or deny access in problem situations as is well-known. Note that the operation of
process 1400 can be within the same processor (not shown) that controls the operation
of the processes described for FIGURES 1 through 13 or can be in a processor remote
from the processor that generated the fraud query rule.
[0050] Although the present invention and its advantages have been described in detail,
it should be understood that various changes, substitutions and alterations can be
made herein without departing from the spirit and scope of the invention as defined
by the appended claims. Moreover, the scope of the present application is not intended
to be limited to the particular embodiments of the process, machine, manufacture,
composition of matter, means, methods and steps described in the specification. As
one of ordinary skill in the art will readily appreciate from the disclosure of the
present invention, processes, machines, manufacture, compositions of matter, means,
methods, or steps, presently existing or later to be developed that perform substantially
the same function or achieve substantially the same result as the corresponding embodiments
described herein may be utilized according to the present invention. Accordingly,
the appended claims are intended to include within their scope such processes, machines,
manufacture, compositions of matter, means, methods, or steps.
1. A method of determining fraudulent use of a database, said method comprising:
examining a database of information pertaining to entities, said database containing,
for each entity, at least one descriptive attribute, said examination calculated to
find linkages between similar attributes used for different entities in said database;
and
generating at least one rule pertaining to membership in a cluster of potentially
fraudulent entities, said generated rule based upon certain combinations of letters
and numbers constituting of at least one of said attributes.
2. The method of claim 1 further comprising:
running one or more of said generated cluster membership rules against one or more
databases of entities to identify which entities in said database have a high probability
of membership in the cluster of potentially fraudulent entities and wherein said last-mentioned
databases contain, for each entity, at least one attribute.
3. The method of claim 1 further comprising:
sending one or more of said cluster membership rules to a database manager so as to
allow said database manager to run said rules against one or more databases under
control of said manager to identify which entities in said databases have a high probability
of belonging to a cluster of potentially fraudulent entities.
4. The method of claim 1 further comprising:
using at least one of said cluster membership rules in real-time to detect credit
history transactions that have a high probability of being fraudulent.
5. The method of claim 1 wherein said database comprises:
information pertaining to a credit history of individuals.
6. The method of claim 1 wherein said database comprises:
information pertaining to insurance claims.
7. The method of claim 1 wherein said database comprises:
information pertaining to a debit card transactions.
8. The method of claim 1 wherein said at least one description attribute is selected
from the list of: credit card identification, home address, phone number, password,
e-mail address, answers to security questions, or a portion of a social security number.
9. The method of claim 4 wherein said examining comprises:
selecting a starting point based on a known anomaly for a particular entity, said
anomaly arising with respect to at least one particular attribute of said entity;
searching said database for linkages to other entities in said database, said search
based on said particular attributes;
determining linkages between attributes of said particular entity and attributes of
other entities based upon said database search; and
drilling down on said displayed linkages to generate a rule pertaining to membership
to a cluster of potentially fraudulent entities.
10. A method of tracking clusters of potentially fraudulent entities, said method comprising:
establishing rules defining parameters for various match operations;
selecting a starting point based on a known anomaly for a particular entity;
searching a first database of a plurality of entities for linkages to other entities,
said database having all of said attributes for said entities, and said search based
on a selected one of said attributes;
displaying a linkage between said particular entity and other entities based upon
said database search; and
drilling down on said displayed linkage to generate a cluster membership rule pertaining
to a cluster of potentially fraudulent entities.
11. The method of claim 10 further comprising:
running one or more of said generated cluster membership rules against one or more
databases of information sets pertaining to entities to identify which entities in
said database have a high probability of belonging to a cluster of potentially fraudulent
entities.
12. The method of claim 10 further comprising:
sending one or more of said generated cluster membership rules to a second database
remote from said first database so as to allow said generated cluster membership rule
to be run with respect to said second database so as to identify information sets
in said second database have a high probability of belonging to a cluster of potentially
fraudulent entities..
13. The method of claim 10 further comprising:
using at least one of said generated cluster membership rules in real-time to detect
first database related transactions having a high probability of being fraudulent.
14. The method of claim 10 further comprising:
using at least one of said generated cluster membership rules in real-time to detect
second database related transactions having a high probability of being fraudulent,
said second database being at a location remote from said first database.
15. The method of claim 10 wherein said starting point has at least one attribute selected
from the list of: credit card identification, home address, phone number, password,
e-mail address, answers to security questions, or a portion of a social security number.
16. A system for fraud detection, said system comprising:
a database containing, for each individual, at least one attribute;
means for examining said database to find linkages between similar attributes used
for different entities; and
means for generating at least one cluster membership rule pertaining to a cluster
of potentially fraudulent entities, said generated rule based upon certain combinations
of letters and numbers constituting of at least one of said attributes.
17. The system of claim 16 wherein said linkages are based, at least in part, upon certain
information pertaining to entities,
18. The system of claim 16 further comprising:
means for running one or more of said generated rules against at least one database
containing information pertaining to entities to identify which entities in said database
have a high probability of belonging to a cluster of potentially fraudulent entities.
19. The system of claim 18 wherein said last-mentioned database contains, for each entity,
at least one attribute.
20. The system of claim 16 further comprising:
means for sending one or more of said rules to a second database remote from said
database so as to identify which entities in said second database have a high probability
of belonging to a cluster of potentially fraudulent entities.
21. The system of claim 17 further comprising:
means for using at least one of said generated rules in real-time to detect entities
that have a high probability of belonging to a cluster of potentially fraudulent entities.
22. The system of claim 17 wherein said examining means comprises:
means for selecting a starting point based on a known anomaly for a particular entity,
said anomaly arising with respect to at least one particular attribute of said entity;
means for searching said database for linkages to other entities in said database,
said search based on said particular attributes;
means for determining linkages between attributes of said particular entity and attributes
of other entities based upon said database search; and
means for drilling down on said displayed linkages to generate a cluster membership
rule pertaining to a cluster of potentially fraudulent entities.
23. A system for detecting clusters of potentially fraudulent entities, said system comprising:
a database of information pertaining to entities; said database accepting both user
generated and system generated queries;
a pattern matcher for executing rules for attribute matching operations;
a link generator for creating linkages between similar entities in said database,
said similar entities based, at least in part, on results from said pattern matcher;
a cluster editor for allowing a user to modify selected aspects of generated ones
of said links; and
a rule editor for establishing at least one cluster membership rule based, at least
in part, on information determined from said user drilling down on said links; said
rule editor producing said system generated queries.
24. The system of claim 23 further comprising:
means for communicating generated ones of said cluster membership rules to at least
one database containing information of a plurality of entities, said information comprising
a plurality of items selected from a list of attributes.
25. A method for identifying a likeness between data sets, said method comprising:
selecting a plurality of data sets, each of said plurality of data sets comprising
a plurality of characters;
creating an expression specifying a pattern of said characters, said creating comprising:
populating a plurality of data fields; and
defining a relationship between said plurality of data fields;
testing said expression to determine whether said expression has been properly defined,
and
searching said data sets for said expression.
26. The method of claim 25 wherein said populating comprises:
selecting at least one quality from a predetermined set of available qualities, said
available qualities corresponding to a descriptive attribute of an entity
27. The method of claim 25 wherein said populating comprises:
selecting at least one quality from a predetermined set of available qualities, said
available qualities corresponding to a descriptive attribute of an entity.
28. The method of claim 25 wherein said data sets comprise a descriptive attribute relating
to an entity.
29. The method of claim 28 wherein said descriptive attribute is a first name.
30. The method of claim 28 wherein said descriptive attribute is a last name.
31. The method of claim 28 wherein said descriptive attribute is a pass word.
32. The method of claim 25 further comprising:
selecting at least a portion of said created expression;
saving said selected portion; and
using said saved selected portion in a subsequent search.
33. The method of claim 25 wherein said testing comprises determining whether the syntax
of said expression is correct.
34. The method of claim 25 wherein said testing comprises executing a preliminary search
and examining the returned result from said search.