By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 1203 |
Pages: 3|
7 min read
Published: Mar 14, 2019
Words: 1203|Pages: 3|7 min read
Published: Mar 14, 2019
In this era of information-driven world, data has become an indispensable part of our daily lives. With the combination of cloud computing, internet, and mobile devices which have become greater portions of our lives and businesses, huge data are generated every day (Hima Bindu et al, 2016). For example, huge data is generated daily through social networking applications such as YouTube, Twitter, Facebook, LinkedIn, and WhatsApp, just to mention few. The amount of data which is generated is growing exponentially and estimates suggest that at least 2.5 quintillion bytes (that's 2.5 followed by staggering 18 zeros!) of data is produced every day (Harish Kumar and Menakadevi, 2017). Every second more data are stored currently than there were on the entire Internet 20 years ago (McAfee and Brynjolfsson, 2012). These collections of datasets which are large and complex and become difficult to handle by the traditional relational database management systems has brought about the term Big Data (Shirudkar and Motwani, 2015).
This term is now used everywhere in our daily lives. Big Data (BD) is increasingly becoming popular since the number of devices connected to the so-called Internet of Things (IoT) is still increasing to unforeseen levels, producing large volumes of data which needs to be transformed into valuable information (Moura and SerrГЈo, 2015). In addition, the advent of BD has brought about new challenges in terms of data security (Toshniwal et al, 2015). According to Toshniwal et al (2015), there is an increasing need to research in technologies that can handle these sets of large data and make it secure efficiently. They go on to further reiterate that the Current Technologies for securing data are slow when applied to huge amounts of data (Toshniwal et al, 2015, p. 17). This means security is of much concern when it comes to BD collection, processing, and analysing, the systems employed should be faster though secure. Ultimately the purpose of BD security is of no different from the fundamental CIA triad, that is, Confidentiality, Integrity, and Availability of data generated needed to be preserved. According to Tahboub and Saleh (2014), the need to protect information which is a valuable asset of the organization cannot be over emphasized. Data Leakage Prevention (DLP) has been found to be one of the effective ways of preventing Data Leak. DLP solutions detect and prevent unauthorized attempts to copy or send sensitive data, both intentionally or/and unintentionally, without authorization, by people who are authorized to access the sensitive information. DLP is designed to detect potential data breach incidents in timely manner and this happens by monitoring data while in-use (endpoint actions) or in-motion (network traffic) or at-rest (data storage) (Tahboub and Saleh, 2014).
Securing the BD process encompasses securing the sources, the pre-processing and the knowledge outcomes. According to ISACA (2010), DLP aims at halting the loss of sensitive information that occurs in enterprises globally. By focusing on the location, classification and monitoring of information at rest, in use and in motion, DLP has the task to helping enterprises get a handle on what information it has, and in stopping the numerous leaks of information that occur each and every day (ISACA, 2010). This research is set out to design a method to help organizations prevent data leakage in big data. DLP is sometimes referred to as Data Loss Prevention in most literatures, however, in this research DLP would mean Data Leakage Prevention.
The scope of this thesis is limited to the use of encryption as the preventive approach in preventing data leakage in BD with emphasis on semi-structured (textual) data. This means that other types of preventive methods such as access control, disabling functions, and awareness will not be addressed. More so, the detective approach of handling data leakage in any DLPS was not considered. Also, the encryption of other types of BD will not be considered though the method is capable of handling certain documents which are not in TXT formats such as DOCX, PDF, PPT, and many more. The encryption algorithms are also limited to only RSA and AES. The proposed method is not automated since data are manually fed into data mining tool in order to do classification. The volume of test data used in the experiments are too small since the whole idea is to prevent leakage in BD. This situation has arisen since the organization in question has not implemented BD technologies such as Hadoop to accommodate several data sources.
One of the important assets to many companies is data, and for that matter the protection of this data must take the first priority (Tahboub and Saleh, 2014). Even though many organizations have put in place certain security mechanisms and technical systems such as firewalls, virtual private networks (VPNs), and intrusion detection systems/intrusion prevention systems (IDSs/IPSs) still data leakage does occur (Tahboub and Saleh, 2014). Tahboub and Saleh (2014) reiterated that the data leakage occurs when sensitive data is revealed to unauthorized users or parties either intentionally or not. The data leakage can cause serious implications or threats to many organizations. For example, the loss of the confidential or sensitive data can have severe or adverse impact on a company's reputation and credibility, customers, employee confidence, competitive advantage, and in some cases, this can lead to the closure of the organization (Tahboub and Saleh, 2014). In addition, data leakage is an important concern for the business organizations in this increasingly networked world nowadays and for that matter any unauthorized disclosure of sensitive or confidential data may have serious consequences for an organization in both long and short terms (Soumya and Smitha, 2014).
In addition, according to Alneyadi et al (2016) the issue of data leakage is a growing concern among organizations and individuals. Alneyadi et al (2016) indicated that more leakages occurred in the business sectors than they were in the government sector. According to a report in 2014, the statistics stands at 50% in the business sector and 20% in the government sector. They further stated that although in some cases the data leaks were not detrimental to organizations, however, others have caused several millions of dollars' worth damage. More so, the credibility of several businesses or organizations are comprised when sensitive data such as trade secrets, project documents, and customer profiles are leaked to their competitors (Alneyadi et al, 2016). Alneyadi et al (2016) take it further that government sensitive information such as political decisions, law enforcement, and national security can also be leaked. A typical example of government sensitive information that was leaked was the United States diplomatic cables by WikiLeaks. The leak consisted of about 250,000 United States diplomatic cables and 400,000 military reports referred to as 'war logs'. This revelation was carried out by an internal entity using an external hard drive and about 100,000 diplomatic cables were labelled confidential and 15,000 cables were classified as secret (Alneyadi et al, 2016, p. 137). According to Alneyadi et al (2016), this incident received high public criticisms from among civil rights organizations all over the world. In another development hackers stole 160 million credit and debit card numbers which targeted 800,000 bank accounts in US, which were considered as one of the largest hacking incident that has occurred (Vadsola et al, 2014).
Browse our vast selection of original essay samples, each expertly formatted and styled