Feature Selection Technique in The Network Traffic Dataset

About this sample

About this sample


2 pages /

1052 words

Downloads: 24

2 pages /

1052 words

Downloads: 24

downloadDownload printPrint

Nowadays security is a big threat to the digital world. The use of internet, computers, mobile, tablets has become ubiquitous and the cyber-attack has grown rapidly. There are various kinds of cyber-attacks such as Spoofing, sniffing, denial-of service, phishing, evil twins, pharming, click fraud and malware. Malicious software’s are harmful for both computer and network. Cyber-attack growth has increased drastically and has compromise the systems, take away valuable information and destroy important structure, producing vast losses, per incident it costs dollar 345 in average.

'Why Violent Video Games Shouldn't Be Banned'?

Not only the growth of internet uses but also number of new malware is become another reason of digital threat. More than 317 million new pieces of malware were created in 2014. Conventional anti-virus and intrusion detection system cannot detect zero day attack. According to the Symantec Internet Security Threat Report 2010 the circulation of malware over 5 million on the internet. As a result, security specialist are very much devoted to develop an efficient malware detection method. In this work we describe several feature selection technique, due to detect malware from network traffic dataset using machine learning algorithm. Because feature selection is very important task for malware detection. Malware can be detect through static and dynamic features. Although anti-virus software are developed based on signature of malware, it fails when zero day malware attack occur. Malware detection system captures network traffic dataset to distinguish between malware and goodware (normal and suspicious activity).

The network traffic dataset has lots of packets with huge features. Some feature may be very important but some are may not be relevant for making decision. However, it increases the processing time and decreases the efficiency of malware detection system. That’s why, the main purpose of feature selection technique is to reduce the dimensionality of feature space, remove the redundant and irrelevant feature from network traffic dataset.

There are many approach developed to represent the proliferation number of malware that revolt every day. Hansen et al. introduced an approach named Random Forests Classifier for detecting and classifying the vast amount of malware which comes from known or unknown malware family. This approach reduce the feature space expressively. And Cuckoo sandbox also used as a behavioral traces of analyzed samples due to achieving high malware detection rate and family classification.

Tian et al. were used logs of API calls to distinguish malware from cleanware by scrutinizing the behavioral features. This work also proposed for both malware family classification and detection by applying pattern recognition algorithms in virtual environment. They achieved approximately 97% accuracy by using a dataset of 1, 368 malware and 456 cleanware. In another study the applicability of sandbox environment to obtain the run-time behavior of malware was discussed. The proposed work differentiate malware by using a heuristic method termed N-grams analysis and adopt Information Gain feature selection technique to choose the best features for classification. Cuckoo sandbox examine the malware behavior which are running on Virtual Machine. They found SPegasos, achieved highest accuracy, better detection rate from different feature length such as 200, 400 and 600.

Authors proposed a method of bilayer abstraction based on the dynamic analysis of API sequences for malware detection. Behavioral features are abstracted by low layer and high layer behavior. They also propose an enriched support vector machine named OC-SVM Neg due to use benign software samples available which provide false alarm rate better. The number of 14863 malware and 2623 benign programs are collected from VXHeaven and Malheur. This work conveyed good result to detect unknown malware.

On the other hand, Santos et al. developed a hybrid malware detector for detecting unknown malware by attaining feature statically and dynamically. For testing their proposed system they collect malware and benign programs from two different source. One is VXHeaven for malware samples yet for benign programs they rely on their setup. For feature vector they used opcode sequence, system call, exceptions, etc. This hybrid approach is efficient for extracting feature both statically and dynamically.

In another research a supervised system introduced for detecting malware. From different observation area they extracted 972 behavioral features. They used naïve bayes, decision tree (J48) and random forest as machine learning algorithm to come up with decision. In this paper, unknown malware could be detected within one month if static rule pre-defined by Snort or Suricata systems.

Fukishima et al. have implemented a prototype for malware detection. Authors evaluated apprehensive process behavior on windows OS due to avoid false positives. This behavior based method achieved about 60% accuracy for detecting malware without false positive. That’s why, they used 83 malware and 41 goodware for evaluation.

Nari et al. proposed an automated method for classifying malware considering network activity of malware. They created a behavioral graph which not only characterize the samples network behavior but also dependencies on the network flows. This method were efficient for malware sample classification.

According to authors represented a data mining technique to detect new malicious executables. Three different types of feature: Portable Executable (PE), byte-sequence n-grams and string features were used for feature extraction. Their dataset consist of 3265 malware and 1001 clean programs where total number of programs 4266. For malware classification they also used multi-Naïve Bayes method which highest accuracy of detection rate 97. 76% over unfamiliar programs.

In the other study authors developed an efficient malware classification technique based on string information which executables. They extracted printable strings from 1367 sample containing viruses, unpacked Trojan and clean files. They flourished to gain 97% classification accuracy using k-fold cross validation from unpacked malicious and used also Random forest as an effective classifier.

R. Islam et al. introduced a classification systems which is integrated static and dynamic features. For this work they composed two set of dataset where first one is collected between 2003 and 2007 another one is collected between 2009 and 2010. Using Random forest classifier they achieved accuracy of 97%.

Get a custom paper now from our expert writers.

Ahmed et al. combined two different dynamic features (from spatial and temporal information) in sandbox to detect malware available in run-time API calls. They achieved classification accuracy of 96. 3% using 516 executables files. In similar way, Wagener et al. executed small amount of malware files (104) to generate lists of API calls and then calculated the similarity between two API call sequences by using similarity matrix. They succeeded to detect 93% accuracy.

Image of Dr. Oliver Johnson
This essay was reviewed by
Dr. Oliver Johnson

Cite this Essay

Feature Selection Technique In The Network Traffic Dataset. (2020, July 14). GradesFixer. Retrieved October 2, 2023, from
“Feature Selection Technique In The Network Traffic Dataset.” GradesFixer, 14 Jul. 2020,
Feature Selection Technique In The Network Traffic Dataset. [online]. Available at: <> [Accessed 2 Oct. 2023].
Feature Selection Technique In The Network Traffic Dataset [Internet]. GradesFixer. 2020 Jul 14 [cited 2023 Oct 2]. Available from:
Keep in mind: This sample was shared by another student.
  • 450+ experts on 30 subjects ready to help
  • Custom essay delivered in as few as 3 hours
Write my essay

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled


Where do you want us to send this sample?

    By clicking “Continue”, you agree to our terms of service and privacy policy.


    Be careful. This essay is not unique

    This essay was donated by a student and is likely to have been used and submitted before

    Download this Sample

    Free samples may contain mistakes and not unique parts


    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.



    Please check your inbox.

    We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!


    Get Your
    Personalized Essay in 3 Hours or Less!


    We can help you get a better grade and deliver your task on time!

    • Instructions Followed To The Letter
    • Deadlines Met At Every Stage
    • Unique And Plagiarism Free
    Order your paper now