By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 1185 |
Pages: 3|
6 min read
Published: Apr 11, 2019
Words: 1185|Pages: 3|6 min read
Published: Apr 11, 2019
The paper “Crime Data Mining: An Overview and Case Studies” by Chen et al shares results of several small studies developed to explore applying the field of data mining to fighting crime. It seems an obvious area to assist law enforcement in performing their duties, especially with the advent of data mining holding “the promise of making it easy, convenient, and practical to explore very large databases”. The paper notes increased concern about national security after 9/11 terrorist attacks as well as “information overload” as contributing drivers for the project and associated studies. Referring to the problems of terrorist attacks and information overload, Chen et al note that data mining for “law enforcement and intelligence analysis holds the promise of alleviating such problems.”
Chen et al note that “It is useful to review crime data mining in two dimensions: crime types and security concerns and crime data mining approaches and techniques.” The premise of this statement appears quite applicable to those interested in crime data mining as the various studies reveal that the type of criminal activity being investigated may yield better results with different techniques. Chen et al describe multiple techniques that may be used in crime data mining:
Entity extraction has been used to automatically identify person, address, vehicle, narcotic drug, and personal properties from police narrative reports (Chau et al., 2002). Clustering techniques such as “concept space” have been used to automatically associate different objects (such as persons, organizations, vehicles) in crime records (Hauck et al., 2002). Deviation detection has been applied in fraud detection, network intrusion detection, and other crime analyses that involve tracing abnormal activities. Classification has been used to detect email spamming and find authors who send out unsolicited emails (de Vel et al., 2001). String comparator has been used to detect deceptive information in criminal records (Wang et al., 2002). Social network analysis has been used to analyze criminals’ roles and associations among entities in a criminal network.
The paper presents four case studies including how those case studies were performed and their results.
This study proposes a neural network to extract entities in police reports based on three parts. The first part is “Noun Phrasing” which “extracts noun phrases as named entities from documents based on syntactical analysis”. The second part is “Finite State Machine and Lexical Lookup” which utilizes a finite state machine to check words within preceding and following phrases in the police report for matches in the reference phrase. The third part is a Neural Network using “feedforward/backpropagation” to predict the most likely type of entity (e.g. name, address, etc.). Chen et al found that their technique “achieved encouraging precision and recall rates for person names and narcotic drugs (74 – 85%), but did not perform as well for addresses and personal properties (47 – 60%) (Chau et al., 2002).”
The next approach covered was designed to detect deceptive identity data provided by criminals to law enforcement. By using a database of the Tucson Police Department the Chen et al research team was able to build a taxonomy of deceptive identity information “that consisted of name deceptions, address deceptions, date-of-birth deceptions, and identity number deceptions.” This taxonomy revealed that criminals typically altered their real identity information with minor variances in spelling and/or out of sequence digits. To identify this fraud the team developed an algorithm to compare corresponding fields between multiple records “by calculating the Euclidean Distance of disagreement measures over all attribute fields.” The Euclidean Distance was then used with a trigger level of a pre-determined level to identify deceptive records. Using a sample from the Tucson Police Department, Chen et al showed that their algorithm was 94% accurate in detecting deceptive identity information.
The third approach given was an effort to automatically detect identities of authors posting messages online. The authors noted that the anonymous nature of online activities makes cybercrime very difficult to investigate and thus a tool to assist would be useful. Chen et al developed a framework consisting of “three types of message features, including style markers, structural features, and content-specific features.” This framework was then tested by using experimental data sets of email and online messages. During the testing, three algorithms were deployed including “decision trees, backpropagation neural networks, and Support Vector Machines” in attempt to determine authorship of online material. They were able to predict authors with varying degrees of accuracy from 70-97% depending on the type online messages. Chen et al found that the Support Vector Machine algorithm performed best in their analysis.
The fourth topic covered was Criminal Network Analysis. The analysis is based on analyzing social networks with the premise that organized criminal organizations form networks to carry out their illegal activities. By analyzing these networks one may be able to determine structural relationships and/or hierarchies. To decipher the underlying structural organization of the network Chen et al used a method known as Social Network Analysis broken into four parts. The first part was Network Extraction which utilized existing records of the Tucson Police Department to form networks “because criminals who committed crimes together usually were related.” [1] The next part was Subgroup Detection which was designed to detect hierarchical subgroups based on strength of identified relationships. The third part was Interaction Pattern Discovery which was used to “reveal patterns of between-group interaction.” [1] The final part was Central Member Identification which identified central members of the criminal organization by determining measures of the relationships previously determined. The authors also show a figure depicting a network of 60 criminals which mostly appears as a large web, however the second figure shows a structure derived from the criminal network throughout the above processes which shows a strong chain-like structure.
Chen et al provided a brief conclusion in which they noted their belief that crime data mining has a promising future. They also note that there are many other applications of data mining that could be explored further.
Overall I felt that the paper was very informative and promising for potential utilization of data mining to fight crime. My original interest in this topic was regarding the potential to detect previously unknown criminal activities from large data, such as various types of fraud. However, this paper showed promising results of several different methods which may be deployed for different purposes to help analyze different types of crime. Several things I have learned from the paper are that different algorithms will work better for different purposes and/or different data types. Having knowledge of data mining algorithms, tools, and what data formats they perform better with would be a great asset for any data mining task. It is quite apparent that different data mining tasks will require different approaches. Another area that would improve data mining outcomes would be for the data scientists to work directly with their customer(s) and/or those entering data. For example, if this team were to work with the Tucson Police Department to improve how officers file police reports it may improve the outcome or efficiency of the data mining methods.
Browse our vast selection of original essay samples, each expertly formatted and styled