初步了解 Big Data 改變的未來新趨勢。
Analytics & Machine Learning
Rapid Insights Providing Business Impact
- “Just-in-time” analytics that can be directly embedded into business processes for business outcome comparisions.
- Analytical solutions available at point of decision.
- New solutions must dynamically mix & analyze data from realtime to historical to meet continuous business results - machine learning leveraged.
Best Practice: Apache Spark
Lambda Data Management
Lambda Data - new lens on data systems, designed to tame growing complexity.
- Defineds set of principles for how batch & stream processing can work together.
- Human fault-tolerant.
- Immutability - keep data immutable for the range of business contexts.
- Pre-computation & re-computataion.
- Data Handling Layers
- Batch Layer - stores the master data set. (e.g. Hadoop、HDFS)
- Server Layer - indexes & offers precomputed views for ad hoc with low lantency queries.
- Speed Layer - real-time views are incremental - “complexity isolation”, transient handle only transient additions until next batch reompilation.
Best Practice: Google BigQuery
Application Development & Business Integration
Notebook IDEs becoming all rage
- OSS innovation for web-base, interactive approach for new solution collaboration rising fast - one unified place for team to share insights, business results, nodes, etc…
- Notebook-as-a-servic - micro services “good enough” for some analytics-based solutions until business leaders need / expect realtime speeds.
Implications For Future Applications
- Answering open-ended business questions - velocity, variety & volume for big data set new stage
- Business can deal with close approximations sooner than higher analytics accuracy in hindsight
- Innovations in data handling & analytics starting to address new class of business applications
time-to-value - launch product -> continuously analyze business impact-> learn & refine then repeat.
Best Practice: The IPython Notebook
- Rod Smith: “Big Data 3.0” - Strata Europe 2014