close
This essay has been submitted by a student. This is not an example of the work written by professional essay writers.

Feature Selection Technique in The Network Traffic Dataset

downloadDownload printPrint

Pssst… we can write an original essay just for you.

Any subject. Any type of essay.

We’ll even meet a 3-hour deadline.

Get your price

121 writers online

blank-ico
Download PDF

Nowadays security is a big threat to the digital world. The use of internet, computers, mobile, tablets has become ubiquitous and the cyber-attack has grown rapidly. There are various kinds of cyber-attacks such as Spoofing, sniffing, denial-of service, phishing, evil twins, pharming, click fraud and malware. Malicious software’s are harmful for both computer and network. Cyber-attack growth has increased drastically and has compromise the systems, take away valuable information and destroy important structure, producing vast losses, per incident it costs dollar 345 in average.

Not only the growth of internet uses but also number of new malware is become another reason of digital threat. More than 317 million new pieces of malware were created in 2014. Conventional anti-virus and intrusion detection system cannot detect zero day attack. According to the Symantec Internet Security Threat Report 2010 the circulation of malware over 5 million on the internet. As a result, security specialist are very much devoted to develop an efficient malware detection method. In this work we describe several feature selection technique, due to detect malware from network traffic dataset using machine learning algorithm. Because feature selection is very important task for malware detection. Malware can be detect through static and dynamic features. Although anti-virus software are developed based on signature of malware, it fails when zero day malware attack occur. Malware detection system captures network traffic dataset to distinguish between malware and goodware (normal and suspicious activity).

The network traffic dataset has lots of packets with huge features. Some feature may be very important but some are may not be relevant for making decision. However, it increases the processing time and decreases the efficiency of malware detection system. That’s why, the main purpose of feature selection technique is to reduce the dimensionality of feature space, remove the redundant and irrelevant feature from network traffic dataset.

There are many approach developed to represent the proliferation number of malware that revolt every day. Hansen et al. introduced an approach named Random Forests Classifier for detecting and classifying the vast amount of malware which comes from known or unknown malware family. This approach reduce the feature space expressively. And Cuckoo sandbox also used as a behavioral traces of analyzed samples due to achieving high malware detection rate and family classification.

Tian et al. were used logs of API calls to distinguish malware from cleanware by scrutinizing the behavioral features. This work also proposed for both malware family classification and detection by applying pattern recognition algorithms in virtual environment. They achieved approximately 97% accuracy by using a dataset of 1, 368 malware and 456 cleanware. In another study the applicability of sandbox environment to obtain the run-time behavior of malware was discussed. The proposed work differentiate malware by using a heuristic method termed N-grams analysis and adopt Information Gain feature selection technique to choose the best features for classification. Cuckoo sandbox examine the malware behavior which are running on Virtual Machine. They found SPegasos, achieved highest accuracy, better detection rate from different feature length such as 200, 400 and 600.

Authors proposed a method of bilayer abstraction based on the dynamic analysis of API sequences for malware detection. Behavioral features are abstracted by low layer and high layer behavior. They also propose an enriched support vector machine named OC-SVM Neg due to use benign software samples available which provide false alarm rate better. The number of 14863 malware and 2623 benign programs are collected from VXHeaven and Malheur. This work conveyed good result to detect unknown malware.

On the other hand, Santos et al. developed a hybrid malware detector for detecting unknown malware by attaining feature statically and dynamically. For testing their proposed system they collect malware and benign programs from two different source. One is VXHeaven for malware samples yet for benign programs they rely on their setup. For feature vector they used opcode sequence, system call, exceptions, etc. This hybrid approach is efficient for extracting feature both statically and dynamically.

In another research a supervised system introduced for detecting malware. From different observation area they extracted 972 behavioral features. They used naïve bayes, decision tree (J48) and random forest as machine learning algorithm to come up with decision. In this paper, unknown malware could be detected within one month if static rule pre-defined by Snort or Suricata systems.

Fukishima et al. have implemented a prototype for malware detection. Authors evaluated apprehensive process behavior on windows OS due to avoid false positives. This behavior based method achieved about 60% accuracy for detecting malware without false positive. That’s why, they used 83 malware and 41 goodware for evaluation.

Nari et al. proposed an automated method for classifying malware considering network activity of malware. They created a behavioral graph which not only characterize the samples network behavior but also dependencies on the network flows. This method were efficient for malware sample classification.

According to authors represented a data mining technique to detect new malicious executables. Three different types of feature: Portable Executable (PE), byte-sequence n-grams and string features were used for feature extraction. Their dataset consist of 3265 malware and 1001 clean programs where total number of programs 4266. For malware classification they also used multi-Naïve Bayes method which highest accuracy of detection rate 97. 76% over unfamiliar programs.

In the other study authors developed an efficient malware classification technique based on string information which executables. They extracted printable strings from 1367 sample containing viruses, unpacked Trojan and clean files. They flourished to gain 97% classification accuracy using k-fold cross validation from unpacked malicious and used also Random forest as an effective classifier.

R. Islam et al. introduced a classification systems which is integrated static and dynamic features. For this work they composed two set of dataset where first one is collected between 2003 and 2007 another one is collected between 2009 and 2010. Using Random forest classifier they achieved accuracy of 97%.

Ahmed et al. combined two different dynamic features (from spatial and temporal information) in sandbox to detect malware available in run-time API calls. They achieved classification accuracy of 96. 3% using 516 executables files. In similar way, Wagener et al. executed small amount of malware files (104) to generate lists of API calls and then calculated the similarity between two API call sequences by using similarity matrix. They succeeded to detect 93% accuracy.

infoRemember: This is just a sample from a fellow student.

Your time is important. Let us write you an essay from scratch

100% plagiarism-free

Sources and citations are provided

Find Free Essays

We provide you with original essay samples, perfect formatting and styling

Cite this Essay

To export a reference to this article please select a referencing style below:

Feature Selection Technique In The Network Traffic Dataset. (2020, July 14). GradesFixer. Retrieved July 26, 2021, from https://gradesfixer.com/free-essay-examples/feature-selection-technique-in-the-network-traffic-dataset/
“Feature Selection Technique In The Network Traffic Dataset.” GradesFixer, 14 Jul. 2020, gradesfixer.com/free-essay-examples/feature-selection-technique-in-the-network-traffic-dataset/
Feature Selection Technique In The Network Traffic Dataset. [online]. Available at: <https://gradesfixer.com/free-essay-examples/feature-selection-technique-in-the-network-traffic-dataset/> [Accessed 26 Jul. 2021].
Feature Selection Technique In The Network Traffic Dataset [Internet]. GradesFixer. 2020 Jul 14 [cited 2021 Jul 26]. Available from: https://gradesfixer.com/free-essay-examples/feature-selection-technique-in-the-network-traffic-dataset/
copy to clipboard
close

Sorry, copying is not allowed on our website. If you’d like this or any other sample, we’ll happily email it to you.

    By clicking “Send”, you agree to our Terms of service and Privacy statement. We will occasionally send you account related emails.

    close

    Attention! This essay is not unique. You can get a 100% Plagiarism-FREE one in 30 sec

    Receive a 100% plagiarism-free essay on your email just for $4.99
    get unique paper
    *Public papers are open and may contain not unique content
    download public sample
    close

    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.

    close

    Thanks!

    Your essay sample has been sent.

    Want us to write one just for you? We can custom edit this essay into an original, 100% plagiarism free essay.

    thanks-icon Order now
    boy

    Hi there!

    Are you interested in getting a customized paper?

    Check it out!
    Having trouble finding the perfect essay? We’ve got you covered. Hire a writer
    exit-popup-close

    Haven't found the right essay?

    Get an expert to write you the one you need!

    exit-popup-print

    Professional writers and researchers

    exit-popup-quotes

    Sources and citation are provided

    exit-popup-clock

    3 hour delivery

    exit-popup-persone