By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 3501 |
Pages: 8|
18 min read
Published: Mar 19, 2020
Words: 3501|Pages: 8|18 min read
Published: Mar 19, 2020
There’s a problem within big data. The problem is that there’s too much information and not enough talent to manage it. The supply of analysts and data scientists can’t keep up with the ever growing demand for this type of talent. This shortage presents a problem, because even the most advanced data platforms are useless without experienced professionals to operate and manage them. How do we solve this? More training and better academic programs? Possibly, but what if there was another solution. What if instead we trained computers to do the work for us, or at least make it easier to manage data tools? Improvements in cognitive computing are making that an approaching reality.
Sensing technologies and large-scale computing infrastructures have produced a varietyof big data in urban spaces (e. g. , human mobility, air quality, traffic patterns, and geographical data). The big data implies rich knowledge about the population of any organisation and can help tackle these challenges when used correctly. Motivated by the opportunities of building more intelligent cities, we can come up with a vision of computing, which aims to unlock the power of knowledge from big and heterogeneous data collected in urban spaces and apply this powerful information to solve major issues our cities face today. In short, we aim to tackle the big challenges in big cities by using big data.
Cognitive computing will bring a high level of fluidity to analytics. Data processing, which are normally essential for proper analytical functions, enable staff who aren't as familiar with data language to interact with programs and platforms the way humans interact with each other.
Therefore, platforms built with AI technology could translate regular speech and requests into data queries, by providing simple commands and using normal language, and then provide responses in the same manner they were received. With a functionality of this kind, it would be much easier for anyone to work in the data field.
Cognitive computing is a process of acquisition, integration, and analysis of big and heterogeneous data generated by diverse sources in urban spaces, such as sensors, devices, vehicles, buildings, and humans, to tackle the major issues that cities face (e. g. , air pollution, increased energy consumption, and traffic congestion).
It connects unobtrusive and ubiquitous sensing technologies, advanced data management and analytic models, and novel visualization methods to create win-win solutions that improve the environment, human life quality, and city operation systems. Cognitive computing also helps us understand the nature of urban phenomena and even predict the future. It is an interdisciplinary field fusing the computing science field with traditional fields like transportation, civil engineering, economy, ecology, and sociology in the context of urban spaces.
Learn mutually reinforced knowledge from heterogeneous data: Solving urban challenges involves a broad range of factors (e. g. , exploring air pollutions involves the simultaneous study of traffic flow, meteorology, and land uses). However, existing data-mining and machine-learning techniques usually handle one kind of data; for example, computer vision is dealing with images, and natural language processing is based on texts. Equally treating features extracted from different data sources (e. g. , simply putting these features into a feature vector and throwing them into a classification model) does not achieve the best performance.
In addition, using multiple data sources in an application leads to a high-dimension space, which usually aggravates the data sparsity problem. If not handled correctly, more data sources would even compromise the performance of a model. This calls for advanced data analytics models that can learn mutually reinforced knowledge among multiple heterogeneous data generated from different sources, including sensors, people, vehicles, and buildings.
Both effective and efficient learning ability:Many urban computing scenarios (e. g. , detecting traffic anomalies and monitoring air quality) require instant answers. Besides just increasing the number of machines to speed up the computation, we need to aggregate data management and mining and machine-learning algorithms into a computing framework to provide both an effective and efficient knowledge discovery ability. In addition, traditional datamanagement techniques are usually designed for a single modal data source. An advanced management methodology that can organize multimodal data (such as streaming, geospatial, and textual data) well is still missing. So, computing with multiple heterogeneous data is a fusion of data and algorithms.
Visualization: Massive data brings a tremendous amount of information that needs a better presentation. A good visualization of original data could inspire new ideas to solve a problem, while the visualization of computing results can reveal knowledge intuitively so as to help in decision making. The visualization of data may also suggest the correlation or causality between different factors. The multimodal data in urban computing scenarios leads to high dimensions of views, such as spatial, temporal, and social, for a visualization.
How to interrelate different kinds of data in different views and detect patterns and trends is challenging. In addition, when facing multiple types and huge volumes of data, seeing how exploratory visualization can provide an interactive way for people to generate new hypotheses becomes even more difficult. This calls for an integration of instant data-mining techniques into a visualization framework, which is still missing in urban computing.
Finding fast driving routes saves both the time of a driver and energy consumption as traffic congestion wastes a lot of gas. Intensive studies have been done to learn historical traffic patterns, estimate real-time traffic flows, and forecast future traffic conditions on individual road segments in terms of floating car data, such as GPS trajectories of vehicles, WiFi, and GSM signals. However, work modeling the citywide traffic patterns is still rare.
Taxis are an important commuting mode between public and private transportations, providing almost door-to-door traveling services. In major cities like New York City and Beijing, people usually wait for a nontrivial time before taking a vacant taxi, while taxi drivers are eager to find passengers. Effectively connecting passengers with vacant taxis is of great importance to saving people’s waiting time, increasing taxi drivers’ profit, and reducing unnecessary traffic and energy consumption.
By 2050, it is expected that 70% of the world’s population will be living in cities. Municipal planners will face an increasingly urbanized and polluted world, with cities everywhere suffering an overly stressed road transportation network. Building more effective public transportation systems, as alternatives to private vehicles, has thus become an urgent priority, both to provide a good quality of life and a cleaner environment and to remain economically attractive to prospective investors and employees. Public mass transit systems, coupled with integrated fare management and advanced traveler information systems, are considered key enablers to better manage mobility.
Without effective and adaptive planning, urbanization’s rapid progress will become a potential threat to cities’ environment. Recently, we have witnessed an increasing trend of pollution in different aspects of the environment, such as air quality, noise, and rubbish, around the world. Protecting the environment while modernizing people’s lives is of paramount importance in urban computing.
The rapid progress of urbanization is consuming more and more energy, calling for technologies that can sense city-scale energy cost, improve energy infrastructures, and finally reduce energy consumption.
The dynamics of a city (e. g. , human mobility and the number of changes in a POI category) may indicate the trend of the city’s economy. For instance, the number of movie theaters in Beijing kept increasing from 2008 to 2012, reaching 260. This could mean that more and more people living in Beijing would like to watch a movie in a movie theater. On the contrary, some category of POIs is going to vanish in a city, denoting the downturn of the business. Likewise, human mobility could indicate the unemployment rate of some major cities, therefore helping predict the trend of a stock market.
Large events, pandemics, severe accidents, environmental disasters, and terrorism attacks pose additional threats to public security and order. The wide availability of different kinds of urban data provides us with the ability, on one hand, to learn from history how to handle the aforementioned threats correctly and, on the other hand, to detect them in a timely manner or even predict them in advance.
The data generated in urban spaces is usually associated with a spatial or spatiotemporal property. For example, road networks and POIs are the frequently used spatial data in urban spaces; meteorological data, surveillance videos, and electricity consumption are temporal data (also called time series, or stream). Other data sources, like traffic flows and human mobility, have spatiotemporal properties simultaneously. Sometimes the temporal data can also be associated with a location, then becoming a kind of spatiotemporal data (e. g. , the temperature of a region and the electricity consumption of a building). Consequently, good urban data management techniques should be able todeal with spatial and spatiotemporal data efficiently. In addition, an urban computing system usually needs to harness a variety of heterogeneous data.
In many cases, these systems are required to quickly answer users’ instant queries (e. g. , predicting traffic conditions and forecasting air pollution). Without the data management techniques that can organize multiple heterogeneous data sources, it becomes impossible for the following data-mining process to quickly learn knowledge from these data sources. For instance, without an efficient spatiotemporal indexing structure that well organizes POIs, road networks, traffic, and human mobility data in advance, the sole feature extraction process of the U-Air project will last for a few hours. The delay will fail this application in telling people the air quality of a city every hour.
There are many reasons that lead to a data-missing problem. For example, a user would only check in at a few venues in a location-based social networking service, and some venues may not have people visiting them at all. If we put user–location into a matrix where each entry denotes the number of visits of users to a place, the matrix is very sparse; that is, many entries do not have a value. If we further consider the activities (such as shopping, dining, and sports) that a user can perform in a location as the third dimension, a tensor can be formulated. Of course, the tensor is even sparser. Data sparsity is a general challenge that has been studied for years in many computing tasks.
When talking about data visualization, many people would only think about the visualization of raw data and the presentation of results generated by data-mining processes. The former may reveal the correlation between different factors, therefore suggesting features for a machine-learning model. As mentioned before, spatiotemporal data is widely used in urban computing.
For a comprehensive analysis, the data needs to be considered from two complementary perspectives: as spatial distributions changing over time (i. e. , spaces in time) and as profiles of local temporal variation distributed over space. However, data visualization is not solely about displaying raw data and presenting results. Exploratory visualization becomes even more important in urban computing.
Semi supervised Learning and Transfer Learning. Semi supervised learning is a class of supervised learning tasks and techniques that also make use of unlabeled data for training — typically a small amount of labeled data with a large amount of unlabeled data. Many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy. There are multiple semi supervised learning methods, suchas generative models, graph-based methods, and co training.
Specifically, co training is a semi supervised learning technique that requires two views of the data. It assumes that each example is described by two different feature sets that provide different and complementary information about an instance. Ideally, the two feature sets of each instance are conditionally independent given the class, and the class of an instance can be accurately predicted from each view alone. Co training can generate a better inference result as one of the classifiers correctly labels data that the other classifier previously misclassified.
Transfer learning: A major assumption in many machine-learning and data-mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution.
Different from semisupervised learning, which assumes that the distributions of the labeled and unlabeled data are the same, transfer learning, in contrast, allows the domains, tasks, and distributions used in training and testing to be different. In the real world, we observe many examples of transfer learning. For instance, learning to recognize tables may help recognize chairs.
First, many data-mining tasks can be solved by optimization methods, such as matrix factorization and tensor decomposition. Examples include the location–activity recommendations and the refueling behavior inference research. Second, the learning process of many machine-learning models is actually based on optimization and approximation algorithms, for example, maximum likelihood, gradient descent, and EM (estimation and maximization).
Third, the research results from operation research can be applied to solving an urban computing task if combined with other techniques, such as database algorithms. For instance, the ridesharing problem has been studied for many years in operation research. It has been proved to be an NP-hard problem if we want to minimize the total travel distance of a group of people who expect to share rides. As a consequence, it is really hard to apply existing solutions to a large group of users, especially in an online application.
In the dynamic taxi ridesharing system T-Share combined spatiotemporal database techniques with optimization algorithms to significantly scale down the number of taxis that needs to be checked. Finally, the service can be provided online to answer the instant queries of millions of users.
Another example combined a PCA-based anomaly detection algorithm with L1 minimization techniques to diagnose the traffic flows that lead to a traffic anomaly. The spatiotemporal property and dynamics of urban computing applications also bring new challenges to current operation research.
Information security is also nontrivial for an urban computing system that may collect data from different sources and communicate with millions of devices and users. The common problems that would occur in urban computing systems include data security (e. g. , guaranteeing the received data is integrated, fresh, and undeniable), authentication between different sources and clients, and intrusion detection in a hybrid system (connecting digital and physical worlds).
Although many research projects about urban computing have been done in recent years, there are still quite a few technologies that are missing or not well studied.
The massive amount of data that has been generated in urban spaces and the advances in computing technology have provided us with unprecedented opportunities to tackle the big challenges that cities face. Urban computing is an interdisciplinary field where computer sciences meet conventional city-related disciplines, such as civil engineering, ecology, sociology, economy, and energy. In the context of cities, the vision of urban computing — acquisition, integration, and analysis of big data to improve urban systems and life quality — will lead to smarter and greener cities that are of great importance to billions of people.
The big data will also blur the boundary between different domains that were formulated in conventional computer sciences (e. g. , databases, machine learning, and visualization) or even bridge the gap between different disciplines (e. g. , computer sciences and civil engineering). While urban computing holds great promise to revolutionize urban sciences and progress, quite a few techniques, such as the hybrid indexing structure for multimode data, the knowledge fusion across heterogeneous data sources, exploratory visualization for urban data, the integration of algorithms of different domains, and intervention-based analysis, are yet to be explored.
This article discussed the concept, framework, and challenges of urban computing; introduced the representative applications and techniques for urban computing; and suggested a few research directions that call for efforts from the communities.
Browse our vast selection of original essay samples, each expertly formatted and styled