Big Data and Its Technical Challenges

1 minute read

Jagadish, H. V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R., & Shahabi, C. (2014). Big Data and Its Technical Challenges. Commun. ACM, 57(7), 86—94. doi:10.1145/2611567

Summary

This paper discusses of how big data is revolutionizing our lives and its impact for consumers, science and government. The author also identify the following phases in the big data life cycle:

data acquisition
information extraction and cleaning
data integration, aggregation and representation, where the cost of full integration is often prohibitive and, therefore, techniques which provide an on-demand integration are very attractive (e.g. only analyze relevant tweets, do on-demand focused crawls to complement data, ...)
modelling and analysis, which is often challenging due to the data's noisy, dynamic, heterogeneous, inter-related and untrustworthy nature.
interpretation, which requires decision makers to make use of the data. The financial crisis underscored how assumptions influence the outcome of such analyzes. Therefore, big data tools must provide users with both the ability to (a) interpret the results, and (b) to perform analyzes under different assumptions and parameters to consider different scenarios and outcomes.

Jagadish et al. also discuss the following challenges in big data analysis:

Heterogeneity
Inconsistency and incompleteness
Scale (i.e., the amount of data)
Timeliness (i.e., the ability to obtain relevant information before the data becomes irrelevant) - credit card fraud should ideally be detected before suspicious transaction have been completed.
Privacy and data ownership
The human perspective (visualization and collaboration)

Case Study

The paper also includes a case study of the Los Angeles Metropolitan Transportation Authority (LA-Metro) which collects transportation data from the LA Country road network. The data arrives at 46 MB/min and over 15 TB have been collected so far. The data is analyzed for traffic patterns and to obtain temporal models for road segments.

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

Big Data and Its Technical Challenges

Summary

Case Study

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers