close
test_template

Cognitive Computing and Big Data Analytics

Human-Written
download print

About this sample

About this sample

close
Human-Written

Words: 3501 |

Pages: 8|

18 min read

Published: Mar 19, 2020

Words: 3501|Pages: 8|18 min read

Published: Mar 19, 2020

Table of contents

  1. Introduction
  2. Framework of Cognitive Computing and big data
  3. Definition
  4. Computing with heterogeneous data
  5. Applications in Cognitive Computing comprising of big data
  6. For Transportation Systems
  7. For the Environment
  8. Urban Computing for Urban Energy Consumption
  9. Urban Computing for Economy
  10. Urban Computing for Public Safety and Security
  11. Typical Technology
  12. Urban Data Management Techniques
  13. Techniques Dealing with Data Sparsity
  14. Visualizing Big Data
  15. Optimization Techniques
  16. Information Security
  17. Future Directions
  18. Conclusion

There’s a problem within big data. The problem is that there’s too much information and not enough talent to manage it. The supply of analysts and data scientists can’t keep up with the ever growing demand for this type of talent. This shortage presents a problem, because even the most advanced data platforms are useless without experienced professionals to operate and manage them. How do we solve this? More training and better academic programs? Possibly, but what if there was another solution. What if instead we trained computers to do the work for us, or at least make it easier to manage data tools? Improvements in cognitive computing are making that an approaching reality.

Introduction

Sensing technologies and large-scale computing infrastructures have produced a varietyof big data in urban spaces (e. g. , human mobility, air quality, traffic patterns, and geographical data). The big data implies rich knowledge about the population of any organisation and can help tackle these challenges when used correctly. Motivated by the opportunities of building more intelligent cities, we can come up with a vision of computing, which aims to unlock the power of knowledge from big and heterogeneous data collected in urban spaces and apply this powerful information to solve major issues our cities face today. In short, we aim to tackle the big challenges in big cities by using big data.

Cognitive computing will bring a high level of fluidity to analytics. Data processing, which are normally essential for proper analytical functions, enable staff who aren't as familiar with data language to interact with programs and platforms the way humans interact with each other.

Therefore, platforms built with AI technology could translate regular speech and requests into data queries, by providing simple commands and using normal language, and then provide responses in the same manner they were received. With a functionality of this kind, it would be much easier for anyone to work in the data field.

Framework of Cognitive Computing and big data

Definition

Cognitive computing is a process of acquisition, integration, and analysis of big and heterogeneous data generated by diverse sources in urban spaces, such as sensors, devices, vehicles, buildings, and humans, to tackle the major issues that cities face (e. g. , air pollution, increased energy consumption, and traffic congestion).

It connects unobtrusive and ubiquitous sensing technologies, advanced data management and analytic models, and novel visualization methods to create win-win solutions that improve the environment, human life quality, and city operation systems. Cognitive computing also helps us understand the nature of urban phenomena and even predict the future. It is an interdisciplinary field fusing the computing science field with traditional fields like transportation, civil engineering, economy, ecology, and sociology in the context of urban spaces.

Computing with heterogeneous data

Learn mutually reinforced knowledge from heterogeneous data: Solving urban challenges involves a broad range of factors (e. g. , exploring air pollutions involves the simultaneous study of traffic flow, meteorology, and land uses). However, existing data-mining and machine-learning techniques usually handle one kind of data; for example, computer vision is dealing with images, and natural language processing is based on texts. Equally treating features extracted from different data sources (e. g. , simply putting these features into a feature vector and throwing them into a classification model) does not achieve the best performance.

In addition, using multiple data sources in an application leads to a high-dimension space, which usually aggravates the data sparsity problem. If not handled correctly, more data sources would even compromise the performance of a model. This calls for advanced data analytics models that can learn mutually reinforced knowledge among multiple heterogeneous data generated from different sources, including sensors, people, vehicles, and buildings.

Both effective and efficient learning ability:Many urban computing scenarios (e. g. , detecting traffic anomalies and monitoring air quality) require instant answers. Besides just increasing the number of machines to speed up the computation, we need to aggregate data management and mining and machine-learning algorithms into a computing framework to provide both an effective and efficient knowledge discovery ability. In addition, traditional datamanagement techniques are usually designed for a single modal data source. An advanced management methodology that can organize multimodal data (such as streaming, geospatial, and textual data) well is still missing. So, computing with multiple heterogeneous data is a fusion of data and algorithms.

Visualization: Massive data brings a tremendous amount of information that needs a better presentation. A good visualization of original data could inspire new ideas to solve a problem, while the visualization of computing results can reveal knowledge intuitively so as to help in decision making. The visualization of data may also suggest the correlation or causality between different factors. The multimodal data in urban computing scenarios leads to high dimensions of views, such as spatial, temporal, and social, for a visualization.

How to interrelate different kinds of data in different views and detect patterns and trends is challenging. In addition, when facing multiple types and huge volumes of data, seeing how exploratory visualization can provide an interactive way for people to generate new hypotheses becomes even more difficult. This calls for an integration of instant data-mining techniques into a visualization framework, which is still missing in urban computing.

Applications in Cognitive Computing comprising of big data

For Transportation Systems

Finding fast driving routes saves both the time of a driver and energy consumption as traffic congestion wastes a lot of gas. Intensive studies have been done to learn historical traffic patterns, estimate real-time traffic flows, and forecast future traffic conditions on individual road segments in terms of floating car data, such as GPS trajectories of vehicles, WiFi, and GSM signals. However, work modeling the citywide traffic patterns is still rare.

Taxis are an important commuting mode between public and private transportations, providing almost door-to-door traveling services. In major cities like New York City and Beijing, people usually wait for a nontrivial time before taking a vacant taxi, while taxi drivers are eager to find passengers. Effectively connecting passengers with vacant taxis is of great importance to saving people’s waiting time, increasing taxi drivers’ profit, and reducing unnecessary traffic and energy consumption.

By 2050, it is expected that 70% of the world’s population will be living in cities. Municipal planners will face an increasingly urbanized and polluted world, with cities everywhere suffering an overly stressed road transportation network. Building more effective public transportation systems, as alternatives to private vehicles, has thus become an urgent priority, both to provide a good quality of life and a cleaner environment and to remain economically attractive to prospective investors and employees. Public mass transit systems, coupled with integrated fare management and advanced traveler information systems, are considered key enablers to better manage mobility.

For the Environment

Without effective and adaptive planning, urbanization’s rapid progress will become a potential threat to cities’ environment. Recently, we have witnessed an increasing trend of pollution in different aspects of the environment, such as air quality, noise, and rubbish, around the world. Protecting the environment while modernizing people’s lives is of paramount importance in urban computing.

Urban Computing for Urban Energy Consumption

The rapid progress of urbanization is consuming more and more energy, calling for technologies that can sense city-scale energy cost, improve energy infrastructures, and finally reduce energy consumption.

Urban Computing for Economy

The dynamics of a city (e. g. , human mobility and the number of changes in a POI category) may indicate the trend of the city’s economy. For instance, the number of movie theaters in Beijing kept increasing from 2008 to 2012, reaching 260. This could mean that more and more people living in Beijing would like to watch a movie in a movie theater. On the contrary, some category of POIs is going to vanish in a city, denoting the downturn of the business. Likewise, human mobility could indicate the unemployment rate of some major cities, therefore helping predict the trend of a stock market.

Urban Computing for Public Safety and Security

Large events, pandemics, severe accidents, environmental disasters, and terrorism attacks pose additional threats to public security and order. The wide availability of different kinds of urban data provides us with the ability, on one hand, to learn from history how to handle the aforementioned threats correctly and, on the other hand, to detect them in a timely manner or even predict them in advance.

Typical Technology

Urban Data Management Techniques

The data generated in urban spaces is usually associated with a spatial or spatiotemporal property. For example, road networks and POIs are the frequently used spatial data in urban spaces; meteorological data, surveillance videos, and electricity consumption are temporal data (also called time series, or stream). Other data sources, like traffic flows and human mobility, have spatiotemporal properties simultaneously. Sometimes the temporal data can also be associated with a location, then becoming a kind of spatiotemporal data (e. g. , the temperature of a region and the electricity consumption of a building). Consequently, good urban data management techniques should be able todeal with spatial and spatiotemporal data efficiently. In addition, an urban computing system usually needs to harness a variety of heterogeneous data.

In many cases, these systems are required to quickly answer users’ instant queries (e. g. , predicting traffic conditions and forecasting air pollution). Without the data management techniques that can organize multiple heterogeneous data sources, it becomes impossible for the following data-mining process to quickly learn knowledge from these data sources. For instance, without an efficient spatiotemporal indexing structure that well organizes POIs, road networks, traffic, and human mobility data in advance, the sole feature extraction process of the U-Air project will last for a few hours. The delay will fail this application in telling people the air quality of a city every hour.

Techniques Dealing with Data Sparsity

There are many reasons that lead to a data-missing problem. For example, a user would only check in at a few venues in a location-based social networking service, and some venues may not have people visiting them at all. If we put user–location into a matrix where each entry denotes the number of visits of users to a place, the matrix is very sparse; that is, many entries do not have a value. If we further consider the activities (such as shopping, dining, and sports) that a user can perform in a location as the third dimension, a tensor can be formulated. Of course, the tensor is even sparser. Data sparsity is a general challenge that has been studied for years in many computing tasks.

Visualizing Big Data

When talking about data visualization, many people would only think about  the visualization of raw data and  the presentation of results generated by data-mining processes. The former may reveal the correlation between different factors, therefore suggesting features for a machine-learning model. As mentioned before, spatiotemporal data is widely used in urban computing.

For a comprehensive analysis, the data needs to be considered from two complementary perspectives: as spatial distributions changing over time (i. e. , spaces in time) and  as profiles of local temporal variation distributed over space. However, data visualization is not solely about displaying raw data and presenting results. Exploratory visualization becomes even more important in urban computing.

Semi supervised Learning and Transfer Learning. Semi supervised learning is a class of supervised learning tasks and techniques that also make use of unlabeled data for training — typically a small amount of labeled data with a large amount of unlabeled data. Many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy. There are multiple semi supervised learning methods, suchas generative models, graph-based methods, and co training.

Specifically, co training is a semi supervised learning technique that requires two views of the data. It assumes that each example is described by two different feature sets that provide different and complementary information about an instance. Ideally, the two feature sets of each instance are conditionally independent given the class, and the class of an instance can be accurately predicted from each view alone. Co training can generate a better inference result as one of the classifiers correctly labels data that the other classifier previously misclassified.

Transfer learning: A major assumption in many machine-learning and data-mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution.

Different from semisupervised learning, which assumes that the distributions of the labeled and unlabeled data are the same, transfer learning, in contrast, allows the domains, tasks, and distributions used in training and testing to be different. In the real world, we observe many examples of transfer learning. For instance, learning to recognize tables may help recognize chairs.

Optimization Techniques

First, many data-mining tasks can be solved by optimization methods, such as matrix factorization and tensor decomposition. Examples include the location–activity recommendations and the refueling behavior inference research. Second, the learning process of many machine-learning models is actually based on optimization and approximation algorithms, for example, maximum likelihood, gradient descent, and EM (estimation and maximization).

Third, the research results from operation research can be applied to solving an urban computing task if combined with other techniques, such as database algorithms. For instance, the ridesharing problem has been studied for many years in operation research. It has been proved to be an NP-hard problem if we want to minimize the total travel distance of a group of people who expect to share rides. As a consequence, it is really hard to apply existing solutions to a large group of users, especially in an online application.

In the dynamic taxi ridesharing system T-Share combined spatiotemporal database techniques with optimization algorithms to significantly scale down the number of taxis that needs to be checked. Finally, the service can be provided online to answer the instant queries of millions of users.

Another example combined a PCA-based anomaly detection algorithm with L1 minimization techniques to diagnose the traffic flows that lead to a traffic anomaly. The spatiotemporal property and dynamics of urban computing applications also bring new challenges to current operation research.

Information Security

Information security is also nontrivial for an urban computing system that may collect data from different sources and communicate with millions of devices and users. The common problems that would occur in urban computing systems include data security (e. g. , guaranteeing the received data is integrated, fresh, and undeniable), authentication between different sources and clients, and intrusion detection in a hybrid system (connecting digital and physical worlds).

Future Directions

Although many research projects about urban computing have been done in recent years, there are still quite a few technologies that are missing or not well studied.

  • Balanced crowd sensing: The data generated through a crowd sensing method is non uniformly distributed in geographical and temporal spaces. In some locations, we may have much more data than what we really need. A down-sampling method (e. g. , compressive sensing) could be useful to reduce a system’s communication loads. On the contrary, in the places where we may not have enough data or even do not have data at all, some incentives that can motivate users to contribute data should be considered. Given a limited budget, how to configure the incentive for different locations and time periods so as to maximize the quality of the received data (e. g. , the coverage or accuracy) for a specific application has yet to be explored.
  • Skewed data distribution: In many cases, what we can obtain is a sample of the urban data, whose distribution may be skewed from the complete dataset. Having the entire dataset may be always infeasible in an urban computing system. Some information is transferrable from the partial data to the entire dataset. For instance, the travel speed of taxis on roads can be transferred to other vehicles that are also traveling on the same road segment. Likewise, the waiting time of a taxi at a gas station can be used to infer the queuing time of other vehicles. Other information, however, cannot be directly transferred. For example, the traffic volume of taxis on a road may be different from private vehicles. As a consequence, observing more taxis on a road segment does not always suggest more other vehicles.
  • Managing and indexing multimode data sources: Different kinds of index structures have been proposed to manage different types of data individually, whereas the hybrid index that can simultaneously manage multiple types of data (e. g. , spatial, temporal, and social media) is yet to be studied. The hybrid index, such as the example shown in Figure 15, is a foundation enabling an efficient and effective learning of multiple heterogeneous data sources.
  • Knowledge fusion: Data-mining and machine-learning models dealing with a single data source have been well explored. However, the methodology that can learn mutually reinforced knowledge from multiple data sources is still missing. The fusion of knowledge does not mean simply putting together a collection of features extracted from different sources but also requires a deep understanding of each data source and an effective usage of different data sources in different parts of a computing framework.
  • Exploratory and interactive visualization for multiple data sources: An urban computing system usually has a lot of data and knowledge to visualize. So far, it is not an easy task to investigate the implicit relationship among multiple data sources through an exploratory visualization in spatial and spatiotemporal spaces. For instance, there are multiple factors (e. g. , traffic, factory emission, meteorology, and land use) that could influence the air quality of a location. Unfortunately, it is still not easy to answer the following questions: Which factor is more prominent in impacting the air quality of a given location or in a given time period?
  • Algorithm integration: To provide an end-to-end urban computing scenario, we need to integrate algorithms of different domains into a computing framework. For instance, we need to combine data management techniques with machine-learning algorithms to provide both an efficient and effective knowledge discovery ability. Similarly, through integrating spatiotemporal data management algorithms with optimization methods, we can solve the large-scale dynamic ridesharing problem.  Visualization techniques should be involved in a knowledge discovery process, working with machine-learning and data-mining algorithms. So, urban computing calls for both the fusion of data and the integration of algorithms. In the long run, theunprecedented data that we are facing will blur the boundary between different domains in conventional computer sciences (e. g. , databases and machine learning) or even bridge the gap between different disciplines’ theories, such as civil engineering and ecology.
  • Intervention-based analysis and prediction: In urban computing, it is vital to predict the impact of a change in a city’s setting. For instance, how will a region’s traffic change if a new road is built in the region? To what extent will air pollution be reduced if we remove a factory from a city? How will people’s travel patterns be affected if a new subway line is launched? Being able to answer these kinds of questions with automated and unobtrusive technologies will be tremendously helpful to inform governmental officials’ and city planners’ decision making. Unfortunately, the intervention-based analysis and prediction technology that can estimate the impact of a change in advance by plugging in and out some factors in a computing framework is not well studied yet.

Conclusion

The massive amount of data that has been generated in urban spaces and the advances in computing technology have provided us with unprecedented opportunities to tackle the big challenges that cities face. Urban computing is an interdisciplinary field where computer sciences meet conventional city-related disciplines, such as civil engineering, ecology, sociology, economy, and energy. In the context of cities, the vision of urban computing — acquisition, integration, and analysis of big data to improve urban systems and life quality — will lead to smarter and greener cities that are of great importance to billions of people.

The big data will also blur the boundary between different domains that were formulated in conventional computer sciences (e. g. , databases, machine learning, and visualization) or even bridge the gap between different disciplines (e. g. , computer sciences and civil engineering). While urban computing holds great promise to revolutionize urban sciences and progress, quite a few techniques, such as the hybrid indexing structure for multimode data, the knowledge fusion across heterogeneous data sources, exploratory visualization for urban data, the integration of algorithms of different domains, and intervention-based analysis, are yet to be explored.

Get a custom paper now from our expert writers.

This article discussed the concept, framework, and challenges of urban computing; introduced the representative applications and techniques for urban computing; and suggested a few research directions that call for efforts from the communities.

Image of Alex Wood
This essay was reviewed by
Alex Wood

Cite this Essay

Cognitive Computing and Big Data Analytics. (2020, March 16). GradesFixer. Retrieved November 19, 2024, from https://gradesfixer.com/free-essay-examples/cognitive-computing-and-big-data-analytics/
“Cognitive Computing and Big Data Analytics.” GradesFixer, 16 Mar. 2020, gradesfixer.com/free-essay-examples/cognitive-computing-and-big-data-analytics/
Cognitive Computing and Big Data Analytics. [online]. Available at: <https://gradesfixer.com/free-essay-examples/cognitive-computing-and-big-data-analytics/> [Accessed 19 Nov. 2024].
Cognitive Computing and Big Data Analytics [Internet]. GradesFixer. 2020 Mar 16 [cited 2024 Nov 19]. Available from: https://gradesfixer.com/free-essay-examples/cognitive-computing-and-big-data-analytics/
copy
Keep in mind: This sample was shared by another student.
  • 450+ experts on 30 subjects ready to help
  • Custom essay delivered in as few as 3 hours
Write my essay

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

close

Where do you want us to send this sample?

    By clicking “Continue”, you agree to our terms of service and privacy policy.

    close

    Be careful. This essay is not unique

    This essay was donated by a student and is likely to have been used and submitted before

    Download this Sample

    Free samples may contain mistakes and not unique parts

    close

    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.

    close

    Thanks!

    Please check your inbox.

    We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!

    clock-banner-side

    Get Your
    Personalized Essay in 3 Hours or Less!

    exit-popup-close
    We can help you get a better grade and deliver your task on time!
    • Instructions Followed To The Letter
    • Deadlines Met At Every Stage
    • Unique And Plagiarism Free
    Order your paper now