By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 1203 |
Pages: 3|
7 min read
Updated: 16 November, 2024
Words: 1203|Pages: 3|7 min read
Updated: 16 November, 2024
In this era of an information-driven world, data has become an indispensable part of our daily lives. With the combination of cloud computing, the internet, and mobile devices which have become integral parts of our lives and businesses, vast amounts of data are generated every day (Hima Bindu et al., 2016). For instance, significant volumes of data are produced daily through social networking applications such as YouTube, Twitter, Facebook, LinkedIn, and WhatsApp, to mention just a few. The amount of data generated is growing exponentially, with estimates suggesting that at least 2.5 quintillion bytes (that's 2.5 followed by a staggering 18 zeros!) of data are produced every day (Harish Kumar & Menakadevi, 2017). Every second, more data is stored now than was present on the entire Internet 20 years ago (McAfee & Brynjolfsson, 2012). These collections of large and complex datasets, which have become difficult to handle by traditional relational database management systems, have led to the emergence of the term Big Data (Shirudkar & Motwani, 2015).
The term Big Data (BD) is now prevalent in our daily lives. BD is increasingly gaining popularity as the number of devices connected to the so-called Internet of Things (IoT) continues to rise to unprecedented levels, generating large volumes of data that need to be transformed into valuable information (Moura & Serrão, 2015). Moreover, the advent of BD has introduced new challenges in terms of data security (Toshniwal et al., 2015). As Toshniwal et al. (2015) note, there is an increasing need for research into technologies that can efficiently handle these large datasets and secure them. They further emphasize that current technologies for securing data are slow when applied to enormous amounts of data (Toshniwal et al., 2015, p. 17). This highlights the importance of security in BD collection, processing, and analysis, necessitating systems that are both fast and secure.
Ultimately, the purpose of BD security aligns with the fundamental CIA triad—Confidentiality, Integrity, and Availability—of data generated, which must be preserved. According to Tahboub and Saleh (2014), the need to protect information, a valuable asset to organizations, cannot be overstated. Data Leakage Prevention (DLP) has proven to be one of the effective methods for preventing data leaks. DLP solutions detect and prevent unauthorized attempts to copy or send sensitive data, both intentionally and unintentionally, by individuals authorized to access the information. DLP is designed to detect potential data breach incidents in a timely manner by monitoring data in use (endpoint actions), in motion (network traffic), or at rest (data storage) (Tahboub & Saleh, 2014).
Securing the BD process involves safeguarding the sources, pre-processing, and knowledge outcomes. According to ISACA (2010), DLP aims to halt the loss of sensitive information occurring globally in enterprises. By focusing on the location, classification, and monitoring of information at rest, in use, and in motion, DLP assists enterprises in managing their information and preventing numerous leaks that occur daily (ISACA, 2010). This research is set out to design a method to help organizations prevent data leakage in big data. DLP is sometimes referred to as Data Loss Prevention in most literatures; however, in this research, DLP would mean Data Leakage Prevention.
The scope of this thesis is limited to using encryption as the preventive approach in preventing data leakage in BD, with an emphasis on semi-structured (textual) data. This means other preventive methods, such as access control, disabling functions, and awareness, will not be addressed. Furthermore, the detective approach for handling data leakage in any DLPS was not considered. Additionally, the encryption of other types of BD will not be considered, though the method can handle certain documents not in TXT formats, such as DOCX, PDF, PPT, and more. The encryption algorithms are limited to only RSA and AES. The proposed method is not automated, as data is manually fed into a data mining tool for classification. The volume of test data used in the experiments is too small since the organization in question has not implemented BD technologies such as Hadoop to accommodate several data sources.
One of the essential assets to many companies is data, and for that matter, the protection of this data must take first priority (Tahboub & Saleh, 2014). Even though many organizations have implemented security mechanisms and technical systems such as firewalls, virtual private networks (VPNs), and intrusion detection/prevention systems (IDSs/IPSs), data leakage still occurs (Tahboub & Saleh, 2014). Tahboub and Saleh (2014) reiterated that data leakage occurs when sensitive data is revealed to unauthorized users or parties, either intentionally or unintentionally. The data leakage can have serious implications or pose threats to many organizations. For example, the loss of confidential or sensitive data can severely impact a company's reputation and credibility, customer and employee confidence, competitive advantage, and, in some cases, lead to the organization's closure (Tahboub & Saleh, 2014). Moreover, data leakage is a significant concern for business organizations in today's increasingly networked world, and any unauthorized disclosure of sensitive or confidential data may have serious consequences for an organization in both the long and short term (Soumya & Smitha, 2014).
Additionally, according to Alneyadi et al. (2016), the issue of data leakage is a growing concern among organizations and individuals. Alneyadi et al. (2016) indicated that more leakages occurred in the business sector than in the government sector. According to a report in 2014, the statistics stood at 50% in the business sector and 20% in the government sector. They further stated that although in some cases the data leaks were not detrimental to organizations, others have caused several millions of dollars' worth of damage. Furthermore, the credibility of several businesses or organizations is compromised when sensitive data such as trade secrets, project documents, and customer profiles are leaked to their competitors (Alneyadi et al., 2016). Alneyadi et al. (2016) further noted that government-sensitive information, such as political decisions, law enforcement, and national security, can also be leaked. A typical example of government-sensitive information that was leaked was the United States diplomatic cables by WikiLeaks. The leak consisted of about 250,000 United States diplomatic cables and 400,000 military reports referred to as 'war logs'. This revelation was carried out by an internal entity using an external hard drive, with about 100,000 diplomatic cables labeled confidential and 15,000 cables classified as secret (Alneyadi et al., 2016, p. 137). According to Alneyadi et al. (2016), this incident received high public criticism from civil rights organizations worldwide. In another development, hackers stole 160 million credit and debit card numbers, targeting 800,000 bank accounts in the US, considered one of the largest hacking incidents that have occurred (Vadsola et al., 2014).
Browse our vast selection of original essay samples, each expertly formatted and styled