Big Data and Its Technical Challenges

1 minute read

Jagadish, H. V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R., & Shahabi, C. (2014). Big Data and Its Technical Challenges. Commun. ACM, 57(7), 86—94. doi:10.1145/2611567

Summary

This paper discusses of how big data is revolutionizing our lives and its impact for consumers, science and government. The author also identify the following phases in the big data life cycle:

  1. data acquisition
  2. information extraction and cleaning
  3. data integration, aggregation and representation, where the cost of full integration is often prohibitive and, therefore, techniques which provide an on-demand integration are very attractive (e.g. only analyze relevant tweets, do on-demand focused crawls to complement data, ...)
  4. modelling and analysis, which is often challenging due to the data's noisy, dynamic, heterogeneous, inter-related and untrustworthy nature.
  5. interpretation, which requires decision makers to make use of the data. The financial crisis underscored how assumptions influence the outcome of such analyzes. Therefore, big data tools must provide users with both the ability to (a) interpret the results, and (b) to perform analyzes under different assumptions and parameters to consider different scenarios and outcomes.
Jagadish et al. also discuss the following challenges in big data analysis:

  1. Heterogeneity
  2. Inconsistency and incompleteness
  3. Scale (i.e., the amount of data)
  4. Timeliness (i.e., the ability to obtain relevant information before the data becomes irrelevant) - credit card fraud should ideally be detected before suspicious transaction have been completed.
  5. Privacy and data ownership
  6. The human perspective (visualization and collaboration)

Case Study

The paper also includes a case study of the Los Angeles Metropolitan Transportation Authority (LA-Metro) which collects transportation data from the LA Country road network. The data arrives at 46 MB/min and over 15 TB have been collected so far. The data is analyzed for traffic patterns and to obtain temporal models for road segments.