資料分析 Big Data (1)

基本介紹

教學目標

初步了解 Big Data 改變的未來新趨勢。

重點概念

Analytics & Machine Learning

Rapid Insights Providing Business Impact

  • “Just-in-time” analytics that can be directly embedded into business processes for business outcome comparisions.
  • Analytical solutions available at point of decision.
  • New solutions must dynamically mix & analyze data from realtime to historical to meet continuous business results - machine learning leveraged.

Best Practice: Apache Spark

Lambda Data Management

Lambda Data - new lens on data systems, designed to tame growing complexity.

  • Defineds set of principles for how batch & stream processing can work together.
    • Human fault-tolerant.
    • Immutability - keep data immutable for the range of business contexts.
    • Pre-computation & re-computataion.
  • Data Handling Layers
    • Batch Layer - stores the master data set. (e.g. Hadoop、HDFS)
    • Server Layer - indexes & offers precomputed views for ad hoc with low lantency queries.
    • Speed Layer - real-time views are incremental - “complexity isolation”, transient handle only transient additions until next batch reompilation.

Best Practice: Google BigQuery

Application Development & Business Integration

Notebook IDEs becoming all rage

  • OSS innovation for web-base, interactive approach for new solution collaboration rising fast - one unified place for team to share insights, business results, nodes, etc…
  • Notebook-as-a-servic - micro services “good enough” for some analytics-based solutions until business leaders need / expect realtime speeds.

Implications For Future Applications

  • Answering open-ended business questions - velocity, variety & volume for big data set new stage
  • Business can deal with close approximations sooner than higher analytics accuracy in hindsight
  • Innovations in data handling & analytics starting to address new class of business applications
    time-to-value - launch product -> continuously analyze business impact-> learn & refine then repeat.

Best Practice: The IPython Notebook

相關資源