Python Stateful Function example

Dmytro Vedetskyi
DevOops World … and the Universe
4 min readFeb 10, 2021

--

Overview

I created small overview about Flink service/clustering DataProcessing and Stateful Function in the previous topic.

I’m going to show and explain in the current topic about hands-on experience related to the developing first Stateful Function, overview of the DataProcessing and deploy prototype that everybody can use for the future projects to make your DataProcessing faster, scalable and so on.

Architecture of the prototype

Diagram shows how services communicate together and process the messages throw the workflow.

Schema explains a couple of steps how message goes from user request to storing on database:

  1. User sends request to API (python application) in Json
  2. API produce event to the Kafka Service into Topic1
  3. Flink using Stateful Function code consumes message in Language Guide (proto3) and save it into Topic2
  4. Consumer(python application) listens Topic2, in case new message arrives, consumer proceeds it and save into database (Postgres)

Components

  1. API service
    Service has written using python and flask framework. The main purpose of the service to process POST request in Json and store push it into Kafka Topic
    https://github.com/helli0n/flink-stateful-example/tree/main/api
  2. Apache Kafka
    Kafka is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol. It can be deployed on bare-metal hardware, virtual machines, and containers in on-premise as well as cloud environments.
  3. Apache Flink
    Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.
  4. Stateful Function
    Stateful Functions is an API that simplifies building distributed stateful applications. It’s based on functions with persistent state that can interact dynamically with strong consistency guarantees.
    https://github.com/helli0n/flink-stateful-example/tree/main/greeter
  5. Consumer
    Service has written using python that reads topics for processing messages and does action with them. In our case it saves messages into database from Kafka.
    https://github.com/helli0n/flink-stateful-example/tree/main/consumer
  6. Postgres
    PostgreSQL is a powerful, open source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.
    Application uses Postgres for insert or select data from is as data storage.

Deployment types

The Flink community is currently waiting for the official Docker images to be published to Docker Hub. In the meantime, Ververica has volunteered to make Stateful Functions’ images available via their public registry: FROM ververica/flink-statefun:2.2.0 You can follow the status of Docker Hub contribution here.

Local Deployment

The easiest way for deploy application is to use docker-compose that has everything in place.

https://github.com/helli0n/flink-stateful-example/blob/main/docker-compose.yml

Kubernetes deployment

Flink Statefun Function has ability to be deployed in existing Kubernetes Cluster.

Helm chart deploys Master and Worker where your Statefun Function will be running. The main advantages of this approach is to deploy your function together with Flink. In case you need to update/redeploy your function or upgrade Flink cluster, everything will be updated together.

Increasing cluster works according to the Kubernetes deployment configuration. (e.g.: if you need to add Flink workers, you just need to change property in the Helm chart)

Here is example of the official K8s Flink Helm charts that you can use for your deployment.

https://github.com/apache/flink-statefun/tree/master/tools/k8s

Performance test

I created simple script for testing Statefun Function.

Script does the POST call to the API and checks until all requests will be handled by workflow. It supports the parameter with number of requests that need to execute through the API.

Example of the scripts is already there:

Demo

Demo shows processing for the few requests

Summary

Following throw the topic you can see how easy to develop and deploy Statefun Function that provides ability to make your data processing faster and simpler.

In case you have high load Statefun is very easy to scale and process huge a mount of data.

Key Benefits

Dynamic Messaging
The API allows you to build and compose functions that communicate dynamic- and arbitrarily with each other. This gives you much more flexibility compared to the acyclic nature of classical stream processing topologies.

Consistent State
Functions can keep local state that is persistent and integrated with the messaging between functions. This gives you the effect of exactly-once state access/updates and guaranteed efficient messaging out-of-the-box.

Multi-language Support
Functions can be implemented in any programming language that can handle HTTP requests or bring up a gRPC server, with initial support for Python. More SDKs will be added for languages like Go, Javascript and Rust.

No Database Required
State durability and fault tolerance build on Apache Flink’s robust distributed snapshots model. This requires nothing but a simple blob storage tier (e.g. S3, GCS, HDFS) to store the state snapshots.

Cloud Native
Stateful Function’s approach to state and composition can be combined with the capabilities of modern serverless platforms like Kubernetes, Knative and AWS Lambda.

“Stateless” Operation
State access is part of the function invocation and so Stateful Functions applications behave like stateless processes that can be managed with the same simplicity and benefits, like rapid scalability, scale-to-zero and rolling/zero-downtime upgrades.

URLs:

https://flink.apache.org/
https://kafka.apache.org/
https://github.com/helli0n/flink-stateful-example/
https://flink.apache.org/stateful-functions.htm
l
https://developers.google.com/protocol-buffers/docs/proto3
https://github.com/apache/flink-statefun/tree/master/tools/k8s

--

--