Kubernetes Batch + HPC Day Europe 2022: Full Schedule

17 May 2022
Valencia, Spain
View More Details & Registration

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2022 - Valencia, Spain and add this Co-Located event to your registration to participate in these sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Central European Standard Time, UTC +2. To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

07:30 CEST

Registration + Badge Pick-up

Tuesday May 17, 2022 07:30 - 18:30 CEST
Central Forum

Registration

13:00 CEST

Opening + Welcome - Abdullah Gharaibeh & Ricardo Rocha, Kubernetes Batch + HPC Day Program Committee Members

Click here to view captioning/translation in the MeetingPlay platform!

Speakers

Ricardo Rocha

Computing Engineer, CERN

Ricardo is a Computing Engineer at CERN IT focusing on containerized deployments, networking and more recently machine learning platforms. He has led for several years the internal effort to transition services and workloads to use cloud native technologies, as well as dissemination... Read More →

Abdullah Gharaibeh

Staff Software Engineer, Google

Abdullah is a staff software engineer at Google and sig-scheduling and working group batch co-chair. He works on Kubernetes and Google Kubernetes Engine, focusing on scheduling and batch workloads.

Tuesday May 17, 2022 13:00 - 13:10 CEST
Room 4F | Event Center

Keynotes

CONTENT EXPERIENCE LEVEL Any (Anyone can attend - no experience required)
Talk Type In-person / Live Stream

13:15 CEST

Keynote: High Performance Computing on Google Kubernetes Engine- Maciek Różacki, Google Cloud

Google Kubernetes Engine is already a platform of choice for highly demanding high-performance computing workloads. We will present how we're investing into pushing the capabilities of our product further to maximize users' scientific output with ease, cost efficiency and industry leading performance.

Click here to view captioning/translation in the MeetingPlay platform!

Speakers

Maciek Różacki

Product Manager | Kubernetes & GKE, Google

Product Manager of Google Kubernetes Engine. Deeply involved in making Kubernetes the best tool for batch and high performance computing workloads.

Tuesday May 17, 2022 13:15 - 13:20 CEST
Room 4F | Event Center

Keynotes

CONTENT EXPERIENCE LEVEL Any (Anyone can attend - no experience required)
Talk Type In-person / Live Stream

13:25 CEST

Kueue: A Kubernetes-native Job Queueing - Abdullah Gharaibeh, Google

Most Kubernetes core components are pod centric, including the scheduler and cluster autoscaler. This works well for service workloads where the pods of a service are mostly independent and all services are expected to be running at all times. However, for batch workloads, it does not make sense to focus only on pods, as the partial execution of pods from multiple parallel batch jobs may lead to deadlocks where many jobs may be simultaneously active while none is able to make sufficient progress to completion or start at all. Even for single-pod batch jobs, whether on-prem or in the cloud with autoscaling capabilities, the reality is that clusters have finite capacity: constraints on resource usage exist for quota and cost management (especially true for GPUs) and so users will want an easy way to fairly and efficiently share the resources. Kueue addresses the above limitations, offering queueing capabilities commonly exist in legacy batch schedulers in the most k8s native way. It is a k8s subproject currently under development at https://github.com/kubernetes-sigs/kueue.

Click here to view captioning/translation in the MeetingPlay platform!

Speakers

Abdullah Gharaibeh

Staff Software Engineer, Google

Abdullah is a staff software engineer at Google and sig-scheduling and working group batch co-chair. He works on Kubernetes and Google Kubernetes Engine, focusing on scheduling and batch workloads.

[Public] Keue Kubernetes native Job Queueing (1) pdf

Tuesday May 17, 2022 13:25 - 13:50 CEST
Room 4F | Event Center

Sessions

CONTENT EXPERIENCE LEVEL Any (Anyone can attend - no experience required)
Talk Type In-person / Live Stream
Subject Solutions and Tools Built to Manage and Efficiently Run Batch Workloads on Kubernetes (Such as Queuing Managers, Workflow Frameworks, Observability Tools, Multitenancy and Support for Accelerators)

13:55 CEST

Resource Orchestration of HPC on Kubernetes: Where We Are Now and the Journey Ahead! - Swati Sehgal & Francesco Romani, Red Hat

Kubernetes has become a norm for orchestrating containerized microservice applications in the domain of cloud and enterprise; it is however not yet widely adopted in HPC. HPC enablement on Kubernetes is still a challenge due to requirements like NUMA aware scheduling, advanced resource reservation/allocation capabilities and managing job dependencies and synchronization. Resource managers in Kubelet facilitate the allocation and NUMA alignment of CPU, memory, and devices. The information disconnect between kubelet and the scheduler however, is still a gap that needs to be addressed. The scheduler is oblivious to the resources availability at a more granular, NUMA-zone level which can lead to suboptimal scheduling decisions placing workloads to nodes where alignment of resources is impossible. Contributors from sig-node formed a team to address this problem and implement a numa-aware scheduler and the related infrastructure. Representing the team, the presenters will educate the attendees about the journey of this feature, challenges encountered, end to end solution, current adoption, its roadmap and cover the deployment steps for optimized performance of workloads.

Click here to view captioning/translation in the MeetingPlay platform!

Speakers

Swati Sehgal

Principal Software Engineer, Red Hat

Swati Sehgal works to enhance Kubernetes and its platform to deliver best-in-class networking applications, leading edge solutions and innovative enhancements across the stack. Her work includes working on prototypes to deliver future high speed container technologies and enable customers... Read More →

Francesco Romani

Principal Software Engineer, Red Hat

Principal software engineer, joined Red Hat in late 2013, involved in open source projects since 2006. Worked in Red Hat about all things virtualization, then moved to the cloud native virtualization and now on cloud-native network functions. Currently works in the resource management... Read More →

Kubecon EU 2022 Resource Orchestration of HPC on K8s pdf

Tuesday May 17, 2022 13:55 - 14:20 CEST
Room 4F | Event Center

Sessions

CONTENT EXPERIENCE LEVEL Any (Anyone can attend - no experience required)
Talk Type Pre-Record
Subject Open Source Contributions and Projects Presentations Related to Batch Data Processing on K8S

14:25 CEST

Volcano – Cloud Native Batch System for AI, BigData and HPC - William(LeiBo) Wang, Huawei Cloud Computing Co., Ltd

Volcano is a cloud native batch system which is also the first batch computing project in CNCF. The major use cases are in the field of high-performance computing (HPC), such as big data, AI, Gene computing. Volcano offers job based fair-share, priority, preemption, reclaim, queue management abilities which are important for HPC users. It has integrated with computing ecosystem like spark-operator, fink-operator, kubeflow, Cromwell in big data, AI and HPC computing domains. This year Volcano is also being integrated to spark with it's custom batch scheduler natively. And many new features are being developed by contributors. f.g. co-location, elastic training, vGPU, throughput optimization and multi-cluster scheduling for HPC users.

The community has helped more than 50 users to deploy Volcano in their production environments around the world since it is open-sourced in 2019. William(Leibo) Wang who is the tech lead of Volcano community will present the latest features, use cases, progress, roadmap and best practices. He will also show how to accelerate AI training, serving, big data analysis and how to improve cluster utilization based on Volcano and other cloud native projects for users.

Click here to view captioning/translation in the MeetingPlay platform!

Speakers

William Wang

Architect, Huawei Cloud Technologies Co., LTD

William(LeiBo) Wang is an architect of Huawei Cloud. And He is responsible for planning and implementing cloud native scheduling system on HUAWEI CLOUD. He is also the tech lead of CNCF Volcano project, focusing on large-scale cluster resource management, batch scheduling, BigData... Read More →

KubeCon EU 2022 Volcano Cloud Native Batch System William pdf

Tuesday May 17, 2022 14:25 - 14:50 CEST
Room 4F | Event Center

Sessions

CONTENT EXPERIENCE LEVEL Intermediate (Mid-level experience)
Talk Type Pre-Record
Subject Best Practices and Challenges Running Batch Workloads on Kubernetes

14:55 CEST

Get More Computing Power by Helping the OS Scheduler - Antti Kervinen, Intel & Alexander Kanevskiy, Intel

When Linux schedules a thread on a CPU core, there is no guarantee which memories the thread will access. If the workload is lucky, the thread will use data that is already in CPU caches or in a memory that is close to the CPU core. But if not, millions of memory operations need to travel a longer way to reach the physical memory. Yet this may sound too low-level to be controllable and make a difference, you can easily help the scheduler running Kubernetes workloads, and make a big difference! Antti and Sasha show how to get a lot more computing power out of your CPUs by adding CRI Resource Manager (CRI-RM) on your Kubernetes nodes. CRI-RM affects process scheduling and memory locality by dynamically managing CPU and memory pinning of all Kubernetes containers on the node. In case studies CRI-RM has given major improvements in database and AI training performances without any workload-specific configurations or changes to upstream Kubernetes components.

Click here to view captioning/translation in the MeetingPlay platform!

Speakers

Antti Kervinen

Cloud Orchestration Software Engineer, Intel

Antti Kervinen is a Cloud Orchestration Software Engineer working at Intel, whose interest in Linux and distributed systems has led him from academic research of concurrency to the world of Kubernetes. When unplugged, Antti spends his time outdoors discovering wonders of nature.

Alexander Kanevskiy

Principal Engineer, Cloud Software, Intel

Alexander is currently employed by Intel as Principal Engineer, Cloud Software, focusing on various aspects in Kubernetes: Resource Management, Device plugins for hardware accelerators, Cluster Lifecycle and Cluster APIs. Alexander has over 25+ years of experience in areas of Linux... Read More →

get more computing power pdf

Tuesday May 17, 2022 14:55 - 15:05 CEST
Room 4F | Event Center

Lightning Talks

CONTENT EXPERIENCE LEVEL Any (Anyone can attend - no experience required)
Talk Type In-person / Live Stream
Subject Solutions and Tools Built to Manage and Efficiently Run Batch Workloads on Kubernetes (Such as Queuing Managers, Workflow Frameworks, Observability Tools, Multitenancy and Support for Accelerators)

15:05 CEST

Coffee Break + Networking

Tuesday May 17, 2022 15:05 - 15:25 CEST
Room 4F Reception Area | Event Center

Breaks

15:25 CEST

How to Handle Fair Scheduling in a Private Academic K8s infrastructure - Lukas Hejtmanek, Masaryk University & Dalibor Klusacek, CESNET

While the usefulness of container-oriented computing is widely recognized, its adoption in academic environments is not so straightforward. Existing orchestrators like Kubernetes are not primarily designed to support fair execution of (bursty) workloads belonging to various researchers and/or competing projects. While public providers are using efficient pay-per-use model, academic use-cases often expect traditional fair-sharing mechanism which is widely available in current HPC installations. This talk will discuss the challenges related to the application of containerized computing within K8s-operated infrastructure used by various users and research groups in the CERIT-SC infrastructure. Specifically, we will discuss how CERIT-SC guarantees that eligible pods will be executed in a reasonable time frame, making sure that running pods of other users will eventually free their allocations to guarantee fair use of available resources.

Click here to view captioning/translation in the MeetingPlay platform!

Speakers

Dalibor Klusacek

Researcher, CESNET

Dalibor Klusáček received his Ph.D. degree in Computer Science from the Masaryk University, Brno, Czech Republic. He works as a computer science researcher at the CESNET, Czech Republic. His main research interest is to improve scheduling and system performance in parallel and distributed... Read More →

Tuesday May 17, 2022 15:25 - 15:35 CEST
Room 4F | Event Center

Lightning Talks

CONTENT EXPERIENCE LEVEL Intermediate (Mid-level experience)
Talk Type Pre-Record
Subject Challenges Deploying and Managing Kubernetes in Traditional HPC Centers (Rootless Requirements, Authorization Mechanisms, Posix/Shared Filesystem Integration)

15:35 CEST

Fast Data on-Ramp with Apache Pulsar on K8 - Timothy Spann, StreamNative

As the Apache Pulsar communities grows, more and more connectors will be added. To enhance the availability of sources and sinks and to make use of the greater Apache Streaming community, joining forces between Apache NiFi and Apache Pulsar is a perfect fit.

Apache NiFi also adds the benefits of ELT, ETL, data crunching, transformation, validation and batch data processing. Once data is ready to be an event, NiFi can launch it into Pulsar at light speed. I will walk through how to get started, some use cases and demos and answer questions. Benefits to the Ecosystem.

https://www.datainmotion.dev/
https://github.com/tspannhw

Click here to view captioning/translation in the MeetingPlay platform!

Speakers

Timothy Spann

Developer Advocate, StreamNative

Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar, Apache Flink, Flink SQL, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT... Read More →

Fast Data On Ramp with Apache Pulsar on K8 pdf

Tuesday May 17, 2022 15:35 - 15:45 CEST
Room 4F | Event Center

Lightning Talks

CONTENT EXPERIENCE LEVEL Any (Anyone can attend - no experience required)
Talk Type In-person / Live Stream
Subject Open Source Contributions and Projects Presentations Related to Batch Data Processing on K8S

15:50 CEST

Apache YuniKorn A Kubernetes Scheduler Plugin for Batch Workloads - Wilfred Spiegelenburg, Cloudera & Craig Condit, Cloudera

Kubernetes has historically focused on service-type workloads. Stateful workloads have also become better supported in recent releases. Batch scheduling continues to lag in Kubernetes core. To better support batch scheduling, several alternative schedulers have been created, including Apache YuniKorn, which has a growing community and is utilised by several large organisations such as Alibaba, Apple, and Cloudera. Over the past few years, Apache YuniKorn has matured into a highly performant, flexible workload scheduler. Recently, we have enhanced Apache YuniKorn with a new execution mode which allows Apache YuniKorn's full power and flexibility to be deployed as a set of plugins to the default Kubernetes scheduler. This allows service and batch workloads to coexist seamlessly. This session will dive into using Apache YuniKorn to schedule batch workloads leveraging the advanced options like workload queueing and quota sharing without affecting the traditional non batch Kubernetes workloads.

Click here to view captioning/translation in the MeetingPlay platform!

Speakers

Wilfred Spiegelenburg

Principal Software Engineer, Cloudera

Wilfred is a Principal Software Engineer with Cloudera, based in Melbourne Australia. He has worked as a software engineer for more than 25 years. Involved in multiple open source projects for over 10 years. Tech lead of the Apache YuniKorn project. He is an Apache YuniKorn PMC member... Read More →

Craig Condit

Senior Staff Software Engineer, Cloudera

Craig Condit is an open source, cloud native, and high performance computing enthusiast. He has been a software engineer for over 20 years, and has worked in Big Data for over a decade, with extensive experience in Apache Hadoop, YARN, and YuniKorn. He is an Apache YuniKorn PMC and... Read More →

kubecon yunikorn plugin pdf

Tuesday May 17, 2022 15:50 - 16:15 CEST
Room 4F | Event Center

Sessions

CONTENT EXPERIENCE LEVEL Any (Anyone can attend - no experience required)
Talk Type In-person / Live Stream
Subject Open Source Contributions and Projects Presentations Related to Batch Data Processing on K8S

16:20 CEST

Efficient Deep Learning Training with Ludwig AutoML, Ray, and Nodeless Kubernetes - Anne Marie Holler, Elotl & Travis Addair, Predibase

Deep Learning(DL) has been successfully applied to many fields, including computer vision, natural language, business, and science. The open-source platforms Ray and Ludwig make DL accessible to diverse users, by reducing complexity barriers to training, scaling, deploying, and serving DL models. However, DL’s cost and operational overhead present significant challenges. DL model dev/test/tuning requires intermittent use of substantial GPU resources, which cloud vendors are well-positioned to provide, though at non-trivial prices. Given the expense, managing GPU resources is critical to the practical use of DL. This talk describes running Ray and Ludwig on cloud Kubernetes clusters, using Nodeless K8s to add right-sized GPU resources when they are needed and to remove them when not. Experiments comparing cost and operational overhead of using Nodeless K8s vs directly on EC2 show sizable improvements in efficiency and usability.

Click here to view captioning/translation in the MeetingPlay platform!

Speakers

Anne Holler

Chief Scientist, Elotl

Anne Holler has an ongoing interest in the intersection of resource efficiency and artificial intelligence. She worked on Uber's Michelangelo Machine Learning platform, on the management stack for Velocloud's SD-WAN, on VMware's Distributed Resource Schedulers for servers and storage... Read More →

Travis Addair

CTO, Predibase

Travis Addair is co-founder and CTO of Predibase, a data-oriented low-code machine learning platform. Within the Linux Foundation, he serves as lead maintainer for the Horovod distributed deep learning framework and is a co-maintainer of the Ludwig automated deep learning framework... Read More →

EfficientDLTraining pdf

EfficientDLTraining.mp4 gz

Tuesday May 17, 2022 16:20 - 16:45 CEST
Room 4F | Event Center

Sessions

CONTENT EXPERIENCE LEVEL Intermediate (Mid-level experience)
Talk Type In-person / Live Stream
Subject Solutions and Tools Built to Manage and Efficiently Run Batch Workloads on Kubernetes (Such as Queuing Managers, Workflow Frameworks, Observability Tools, Multitenancy and Support for Accelerators)

16:45 CEST

Closing - Aldo Culquicondor, Kubernetes Batch + HPC Day Program Committee Member

Click here to view captioning/translation in the MeetingPlay platform!

Speakers

Aldo Culquicondor

Sr. Software Engineer, Google

Aldo is a Senior Software Engineer at Google. He works on Kubernetes and Google Kubernetes Engine, where he contributes to kube-scheduler, the Job API and other features to support batch, AI/ML and HPC workloads. He is currently a TL at SIG Scheduling and an Organizer of the WG Batch... Read More →

KubernetesBatch+HPCDay EU 2022 closing pdf

Tuesday May 17, 2022 16:45 - 17:00 CEST
Room 4F | Event Center

Keynotes

CONTENT EXPERIENCE LEVEL Any (Anyone can attend - no experience required)
Talk Type In-person / Live Stream

17:00 CEST

CNCF-hosted Co-located Events Happy Hour

Join us onsite for drinks and appetizers with fellow co-located attendees from Tuesday's CNCF-hosted Co-located Events.

Network with attendees from:

Cloud Native Security Conference Europe hosted by CNCF
GitOpsCon Europe hosted by CNCF
KnativeCon Europe hosted by CNCF
Kubernetes Batch + HPC Day Europe hosted by CNCF
Kubernetes on Edge Day Europe hosted by CNCF
Prometheus Day Europe hosted by CNCF
ServiceMeshCon hosted by CNCF

Tuesday May 17, 2022 17:00 - 18:30 CEST
Plaza | Feria Valencia

Experiences