We’re going to review one of the modern and simple solutions for scientists that makes life easier for developers as well as for DevOps engineers.
Let’s start the discussion about Metaflow, so Metaflow is…
Metaflow is a human-friendly Python library that helps scientists and engineers to build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.
Metaflow provides a unified API to the infrastructure stack that is required to execute data science projects, from prototype to…
I created small overview about Flink service/clustering DataProcessing and Stateful Function in the previous topic.
I’m going to show and explain in the current topic about hands-on experience related to the developing first Stateful Function, overview of the DataProcessing and deploy prototype that everybody can use for the future projects to make your DataProcessing faster, scalable and so on.
Diagram shows how services communicate together and process the messages throw the workflow.
Schema explains a couple of steps how message goes from user request to storing on database:
Stateful Functions is an API that simplifies the building of distributed stateful applications with a runtime built for serverless architectures. It brings together the benefits of stateful stream processing — the processing of large datasets with low latency and bounded resource constraints — along with a runtime for modeling stateful entities that supports location transparency, concurrency, scaling, and resiliency.
Flink can execute applications in one of three ways:
The above modes differ in:
Today everybody wants to use modern technologies for the application deployment. One of the best approaches is to use Kubernetes operators for this propose.
Kubernetes has a couple of operators but here we’re going to overview kudo.dev operator.
Kubernetes Universal Declarative Operator (KUDO) provides a declarative approach to building production-grade Kubernetes operators. To quote the official documentation: “Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components”. While Kubernetes already comes with a lot of built-in automation to run simple workloads, complex scenarios often need a human operator. …
Each web application should be tested by UI tests. Selenium is one of the most famous tool to do it. Selenium has the possibility to run tests in parallel and execute test cases faster. It helps to make your application more stable and get higher quality.
This topic describes how to deploy Selenium cluster automatically with 50–100 browsers using a modern approach (CloudFormation and Ansible).
Selenium is a portable framework for testing web applications. Selenium provides a playback tool for authoring functional tests without the need to learn a test scripting language (Selenium IDE). It also provides a test domain-specific…
Everybody needs to know what is going on with the infrastructure. We should be informed about unauthorized access to our services. The best way how to do it is logging collecting and monitoring.
This topic describes the way how to create centralized logging infrastructure and collect logs starting from hardware and finishing with the end user.
Basically, each infrastructure has different services and application that should be monitored. The easiest way is to collect logs from services.
Here is the example…
In case your business has Cloudera Ecosystem in your own BigData infrastructure on premises and your plan is migration to the AWS. When you need to deploy more than one Cloudera cluster on the different regions to involve new customers then this topic for you. Topic describes how to deploy Cloudera Cluster using automation tools with possibilities of the worker nodes on-demand scaling.
Cloudera, Inc. is a US-based software company that provides…
I have described Kafka Monitoring theme, here is the second part of the topic that includes advanced monitoring of the consumers and partitions
Grafana is the open source analytics & monitoring solution for every database.
Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.
Apache Kafka is modern, powerful and fancy service provides storing and managing messages for real-time data processing.
Unfortunately, Apache Kafka has no monitoring tools by default but sometimes when Kafka has issues we should define and fix issues asap to prevent interruptions, loosing data and make sure that our services work properly.
Topic explains the easiest way how to monitoring Kafka using official Zabbix open-source monitoring system plugin that includes collecting JMX metric, alerting and monitoring consumers as well.
Publish-subscribe durable messaging system
A messaging system sends messages between processes, applications, and servers. Apache Kafka is a software where topics…
Lead DevOps Engineer