PyFlink 1.15 Documentation
pyflink-docs Release release-1.15 PyFlink Nov 23, 2022 CONTENTS 1 How to build docs locally 3 1.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . pyflink-docs, Release release-1.15 PyFlink is a Python API for Apache Flink that allows you to build scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory HOW TO BUILD DOCS LOCALLY 1. Install dependency requirements python3 -m pip install -r dev/requirements.txt 2. Conda install pandoc conda install pandoc 3. Build the docs python3 setup.py build_sphinx0 码力 | 36 页 | 266.77 KB | 1 年前3PyFlink 1.16 Documentation
pyflink-docs Release release-1.16 PyFlink Nov 23, 2022 CONTENTS 1 How to build docs locally 3 1.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . pyflink-docs, Release release-1.16 PyFlink is a Python API for Apache Flink that allows you to build scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory HOW TO BUILD DOCS LOCALLY 1. Install dependency requirements python3 -m pip install -r dev/requirements.txt 2. Conda install pandoc conda install pandoc 3. Build the docs python3 setup.py build_sphinx0 码力 | 36 页 | 266.80 KB | 1 年前3Course introduction - CS 591 K1: Data Stream Processing and Analytics Spring 2020
and processing guarantees of streaming systems • be proficient in using Apache Flink and Kafka to build end-to-end, scalable, and reliable streaming applications • have a solid understanding of how stream and Kafka to build a real-time monitoring and anomaly detection framework for datacenters. Your framework will: • Detect “suspicious” event patterns • Raise alerts for abnormal system metrics • Detect • Identify outlier tasks Inspired by this paper : “SAQL: A Stream-based Query System for Real- Time Abnormal System Behavior Detection”, USENIX Security '18 12 Interested in a more research-oriented0 码力 | 34 页 | 2.53 MB | 1 年前3High-availability, recovery semantics, and guarantees - CS 591 K1: Data Stream Processing and Analytics Spring 2020
fully processed? Was mo delivered downstream? Vasiliki Kalavri | Boston University 2020 A simple system model stream sources N1 NK N2 … input queue output queue primary nodes secondary nodes other external system • deterministic: it produces the same output when starting from the same initial state and given the same sequence of input tuples • convergent-capable: it can re-build internal state acknowledge reception of input tuples notify upstream of oldest logged tuples necessary to re-build current state Vasiliki Kalavri | Boston University 2020 Upstream backup Recovery time • The0 码力 | 49 页 | 2.08 MB | 1 年前3Flow control and load shedding - CS 591 K1: Data Stream Processing and Analytics Spring 2020
latency constraints that can tolerate approximate results. Slow down the flow of data: • The system buffers excess data for later processing, once input rates stabilize. • Requires a persistent process of discarding data when input rates increase beyond system capacity. • Load shedding techniques operate in a dynamic fashion: the system detects an overload situation during runtime and selectively streams with known arrival rates C: system processing capacity H: headroom factor, i.e. a conservative estimate of the percentage of resources required by the system at steady state Load(N(I)): the load0 码力 | 43 页 | 2.42 MB | 1 年前3Elasticity and state migration: Part I - CS 591 K1: Data Stream Processing and Analytics Spring 2020
Mechanism: How to apply the re-configuration? 3 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions processing a tuple and all its derived results • Policy • each operator as a single-server queuing system • generalized Jackson networks • Action • predictive, at-once for all operators ??? Vasiliki processing a tuple and all its derived results • Policy • each operator as a single-server queuing system • generalized Jackson networks • Action • predictive, at-once for all operators Too fine-grained0 码力 | 93 页 | 2.42 MB | 1 年前3监控Apache Flink应用程序(入门)
....................................................................................... 22 4.14 System Resources....................................................................................... is processed by Apache Flink, which then writes the results to a database or calls a downstream system. In such a pipeline, latency can be introduced at each stage and for various reasons including the TaskManager (in case of a containerized setup), or by providing more TaskManagers. In general, a system already running under very high load during normal operations, will need much more time to catch-up0 码力 | 23 页 | 148.62 KB | 1 年前3Fault-tolerance demo & reconfiguration - CS 591 K1: Data Stream Processing and Analytics Spring 2020
& reconfiguration ??? Vasiliki Kalavri | Boston University 2020 • To recover from failures, the system needs to • restart failed processes • restart the application and recover its state 2 Checkpointing and all required metadata, such as the application’s JAR file, into a remote persistent storage system • Zookeeper also holds state handles and checkpoint locations 5 JobManager failures ??? Vasiliki Vasiliki Kalavri | Boston University 2020 12 • Detect environment changes: external workload and system performance • Identify bottleneck operators, straggler workers, skew • Enumerate scaling actions0 码力 | 41 页 | 4.09 MB | 1 年前3Exactly-once fault-tolerance in Apache Flink - CS 591 K1: Data Stream Processing and Analytics Spring 2020
retrieve a distributed cut in a system execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid system configuration A full system configuration is eventually eventually captured A snapshot algorithm attempts to capture a coherent global state of a distributed system ??? Vasiliki Kalavri | Boston University 2020 Snapshotting Protocols p1 p2 p3 C msystem execution that yields a system configuration Validity (safety): Termination (liveness): Obtain a valid system configuration A full system configuration is eventually 0 码力 | 81 页 | 13.18 MB | 1 年前3State management - CS 591 K1: Data Stream Processing and Analytics Spring 2020
types • The system is unaware of which parts of an operator constitute state Streaming state 3 • Explicit state primitives including state types and interfaces • The system is aware of state persistent storage, e.g. a distributed filesystem or a database system • Available state backends in Flink: • In-memory • File system • RocksDB State backends 7 Vasiliki Kalavri | Boston University purposes! FsStateBackend • Stores state on TaskManager’s heap but checkpoints it to a remote file system • In-memory speed for local accesses and fault tolerance • Limited to TaskManager’s memory and might0 码力 | 24 页 | 914.13 KB | 1 年前3
共 20 条
- 1
- 2