Various Challenges and Issues that Faced While Mining Big Data

download print

About this sample

About this sample


Words: 2321 |

Pages: 5|

12 min read

Published: Jul 10, 2019

Words: 2321|Pages: 5|12 min read

Published: Jul 10, 2019

Data is unorganized information in raw form. We are currently in the age of Big Data which are large data sets that can be characterized by large volume, complexity, variety, velocity, resolution, flexibility etc. This data cannot be handled with traditional software systems, but with modern frameworks which is able to handle huge volume, the complexity, and to find which data is useful.

'Why Violent Video Games Shouldn't Be Banned'?

This paper discusses all the various challenges and issues that we face while mining big data. We also present various number of technologies and tools that can be used to overcome such issues.

We live in the age where everything around is in digital form. Data is everywhere and in huge amounts. Big Data is nothing but large sets of these, out of which some are useful which is known as information and the rest is waste. It’s this information that we need which will help in analysing the current trends which will help industries to make strategic decisions based on it. Some even say that Big Data is the current fuel to future economic infrastructure.

Big Data in a simple perspective is a bond which connects physical world, human society and cyberspace. Data can be found in various forms, such as structured, semi-structured and unstructured formats. We need new advanced form of tools and technologies which can handle the complexity and is able to process the high volumes of data at high speed.

In future economy, labour productivity won’t be much of a crucial factor on deciding how the economy will shape itself but instead the efficiency of technologies which will be able to handle the Big Data and the fact that it is an inexhaustible resource, will have a larger role in deciding the direction in which the economy goes


Large sets of data which are useful to study trends, patterns and associations are called Big Data. Quantity of the data doesn’t hold as much as importance as it quality because having tons of waste data doesn’t help with economic related decisions. The main aim behind gathering and analysing data is to gain valuable information

Analysts have provided with the “Three V’s” to describe Big Data :

Volume – Corporate organizations collect data from every source be it from devices, social media or transactions and the data is collected is in huge amounts due to innumerable sources from which data can be mined or extracted.

Velocity – The data is being collected from each corner of the world in one trillionth of a second that also in huge quantity because billions of people are accessing devices around the globe which are constantly monitoring the activities which is also known as data mining.

Variety – The data collected has no fixed format or structure, it can be found in any digital format that we know, like, audio, document, video, financial transactions, emails etc.

With this paper, we are trying to focus on challenges and issues faced when handling such a complex set of information and the solutions such as advanced framework for tools and technologies to process it at high speed and able to handle huge quantities of it.We will now focus on the various challenges we face while handling Big Data.

As we all know, whenever we’re provided with opportunities we always some kind of challenges or obstacles when we try to get the most out of the opportunity provided to us. Such is the case with Big Data, it being such an immensely powerful resource, it does come with its specific set of challenges. There are many issues such as computational complexities, security of the acquired data, mathematical and statistical methods required for handling such large data sets. We will know discuss the various challenges and the possible solutions for them one by one.

    • With these approaches, firms can tackle the volume problem of big data, either by shrinking the size or by investing in good infrastructure which all depends on the cost and budget requirements of the firm.
    • Combing Multiple Data Sets – We don’t always get data in proper sorted form, we get it in raw form from all over the web pages, social media, emails, stream etc. The complexity of data rises with increase in various data types and formats .

Possible Solutions:

  • OLAP Tools (On-Line Analytical Processing Tools) – OLAP is one of the best tools when dealing with varied data types, it assembles data into a logical way in order to access it easily. It establishes connection between information. But it processes all data no matter if it’s useful or not, this is one of the drawback of the OLAP Tools
  • Apache HADOOP – It is an open source software and its main job is to process huge amounts of data by dividing it into different segments and distribute it to different system infrastructures to process it. HADOOP creates a map of the content so it can easily accessed.
  • SAP HANA – HANA is an another great tool which can be deployed as an on-premise application or can even be used in cloud systems. It can be used for performing real-time analytics, and developing and implementing real time applications.
  • Although these approaches are itself revolutionary, but neither are great enough for single handedly solve the variety issue. HANA is the only tool which lets users to process data in real time. Meanwhile, HADOOP is great for scalability and cost-effectiveness. By combining them together lets scientists create the most powerful big data solution.
    • Volume – First and foremost, the biggest and the most basic hurdle we face when dealing with large data sets is always it’s quantity or volume. In this age of technological advancements, volume of data is exploding. Every year it’s going to grow exponentially. Many analysts have predicted that the volume of data will pass Zetabytes by 2020. Social media is one such source where it gathers data from devices such as mobile phones.

Possible Solutions:

  • HADOOP - There are various tools that are currently out there such as “HADOOP” which is great tool when it comes to handling when large quantities of data. But it being a new technology and not many professionals not know about it, it isn’t that popular. But the downside for this being is that a lot of resources are required to learn and may eventually divert one’s attention from the main problem.
  • Robust Hardware - Another way is by improving the hardware which processes the data, like by increasing parallel processing capacity or increasing memory size to handle such large volume, one the examples being Grid Computing which is represented by a large number of servers which are interconnected to each other using high speed network.
  • Spark – This platform uses plus-in memory computing method to create huge performance gains diversified data and high volume.
    • Velocity Challenge – Processing data in real time is a real hurdle when it comes to big data. Moreover, data is flowing in at tremendous speed which gives us a challenge of how we respond to the data flow and how to manage it.

Possible Solutions:

  • Flash Memory – In dynamic solutions, where we need to differentiate the data between hot (or highly accessed data) or cold (rarely accessed data), we need high speed flash memory so as to provide cache area.
  • Hybrid Cloud Model – This model proposes the idea of expanding private cloud in hybrid model which allows additional computing power needed to analyse data and to select hardware, software and business process changes to handle high-pace data needs.
  • Sampling Data – Statistical analysis techniques are used to select, manipulate and examine the data to recognize patterns. There are many tools which uses cloud computation to access data at high speed and also helps in cutting IT support costs.
  • With one of them Hybrid SaaS which is known as Software as a Service, it is a web browser client which allows for immediate customization and promotes collaboration. It is being used in hybrid mode because with just SaaS users don’t have much of a control over their data or application. But in hybrid mode it provides much more control over the data as to where the user wants to store in what type of enviorment and provide encryption to increase the security of the data.
  • Other tools are PaaS, IaaS, ITaaS, DaaS etc.
    • Quality and Usefulness – It is important that when we are collecting data, it should be in context or should have some relevance to the problem, otherwise we won’t be able to take right decisions based on the data. So, determining data’s quality or usefulness is of utmost importance. Wrong information can be passed on if the data quality control isn’t there.

Possible Solutions –

  • Data Visualization – When the data quality is concerned. Visualization is an effective way to keep the data clean because visually we can know where the unwanted data lies. We can plot data points on a graph which can be difficult when dealing with large volumes of data. Other way is to grouping the data so you can visually distinguish between the data.
  • Special Algorithms – Data quality hasn’t been a new concern, it was there since the time when we started dealing with data. Also the fact that keeping ‘dirty’ data or irrelevant data is costly for firms. So, special algorithms built especially for managing, maintaining and keeping the data clean are required.
  • Even though when we are dealing with challenges of Big Data, huge volume, variety and its security always take top priority. The quality of the data is equally important, because it wastes time, money and space storing the irrelevant data.
  • Privacy and Security – In this rush for finding the trends by extracting data from all possible sources has left the privacy of the users from whom the data is being collected, ignored. Special care must be taken while extracting information to help people not compromise with their privacy

Possible Solutions:

  • Examine Cloud Providers – Cloud storage is really helpful when storing huge amounts of data, we just need to make sure the cloud provider provide good protection mechanisms and include penalties when adequate security is compromised.
  • Access Control Policy – This is a basic point in storing a data anywhere. It always is a must to have proper control policies as to provide access to authorized users only so as to provide misuse of personal data.
  • Data Protection – All data stages must be protected from raw, cleaned data uptill the final stage of analysis. There should be encryption to protect sensitive data from being leaked. There are many encryptions which firms currently use like Attribute-Based Encryption which is a type of public key encryption in which the secret key of a user and text is dependant on the attributes.
  • Real-time Monitoring – Surveillance should be used to monitor as to who tries access the data. Threat inspections should be used to prevent unauthorized access.
  • Use Key Management – Providing single layer encryption won’t be much of a help if the hacker can access encryption keys. Mostly, administrators store the keys in local drive which is highly risky and can be retrieved by hackers. Proper key management resources are required where separate groups, applications and users have different encryption keys and not just the same one.
  • Logging – Creating log files helps in keeping track of who accesses the data when, it also helps in detecting attacks, failures, or any unusual behaviour. So with log files, organizations can run inspections on the data on a daily basis to check for failures.
  • Secure Communication Protocols – Privacy is a huge issue everywhere. Private data can be misused to any limit. So, by providing secure communications between modules, interfaces, applications and processes such as SSL/TLS implementation protects all the network communications and not just any single part of it.

There are two ways to protect privacy of big data. One is by restricting access to unwanted users by developing secure access control mechanism. Other is by injecting randomness to the sensitive data so that the information can’t be traced back to its original user.

Scalability – Unlimited data scalability is a difficult thing to achieve. Because when we are dealing with huge amounts of data, being able to scale up and down on-demand is very crucial. When dealing with big data projects, we often spend resources on getting the desired output and not spare enough resources for data analysis. It is important to know as to where and how much resources should be allocated.

Possible Solutions:

Cloud Computing – Its one of the most efficient way of storing huge amounts of data, but also the fact that we can call it as many times as we want and the data scaling can be much easily done in cloud as compared to on-premise solutions.

There are many tools like Adobe’s Marketing Cloud, Salesforce Marketing Cloud which provide scaling natively.

While there are many algorithms out there which help with scalability issues, but not all of them can be fully efficient. Big Data demands more of foresight when developing scalable scripts, like running into problems such as lack of parallelism, data duplication etc. As Big Data is getting bigger, the meaning of data scalability is changing at an immense speed, so it’s going to important as to create algorithms that evolve with it.

Big Data Tool and Technologies

There have been many tools and technologies that have been built to deal with various issues of big data. While many of the main ones were stated above, but still there are many other small issues like the need of big data resource management, providing new storage technologies, machine learning etc.

Get a custom paper now from our expert writers.

Many honourable mentions for big data technologies are:

  1. HDFS –This algorithm stores data in the form of clusters and is known as a highly faults tolerant distributed file system
  2. MapReduce – A parallel programming technique for distributing processing power on huge amount of data clusters.
  3. HBase – A column-oriented NoSQL database for random read/write access.
  4. Hive – A data warehousing application that provides an SQL-like access and model.
  5. Sqoop – It is used in transferring/importing data between HADOOP and relational databases.
  6. Oozie – A workflow management and orchestration for dependant HADOOP jobs.
Image of Alex Wood
This essay was reviewed by
Alex Wood

Cite this Essay

Various Challenges and Issues that Faced While Mining Big Data. (2019, Jun 27). GradesFixer. Retrieved May 26, 2024, from
“Various Challenges and Issues that Faced While Mining Big Data.” GradesFixer, 27 Jun. 2019,
Various Challenges and Issues that Faced While Mining Big Data. [online]. Available at: <> [Accessed 26 May 2024].
Various Challenges and Issues that Faced While Mining Big Data [Internet]. GradesFixer. 2019 Jun 27 [cited 2024 May 26]. Available from:
Keep in mind: This sample was shared by another student.
  • 450+ experts on 30 subjects ready to help
  • Custom essay delivered in as few as 3 hours
Write my essay

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled


Where do you want us to send this sample?

    By clicking “Continue”, you agree to our terms of service and privacy policy.


    Be careful. This essay is not unique

    This essay was donated by a student and is likely to have been used and submitted before

    Download this Sample

    Free samples may contain mistakes and not unique parts


    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.



    Please check your inbox.

    We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!


    Get Your
    Personalized Essay in 3 Hours or Less!

    We can help you get a better grade and deliver your task on time!
    • Instructions Followed To The Letter
    • Deadlines Met At Every Stage
    • Unique And Plagiarism Free
    Order your paper now