<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Apache SeaTunnel</title>
    <description>The latest articles on Forem by Apache SeaTunnel (@seatunnel).</description>
    <link>https://forem.com/seatunnel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F844122%2Fc6155eb3-df58-448b-8d88-36865c4f1d84.jpg</url>
      <title>Forem: Apache SeaTunnel</title>
      <link>https://forem.com/seatunnel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/seatunnel"/>
    <language>en</language>
    <item>
      <title>A Practical DataOps Development Framework Based on WhaleStudio’s Three Layer Model</title>
      <dc:creator>Apache SeaTunnel</dc:creator>
      <pubDate>Fri, 10 Apr 2026 09:37:01 +0000</pubDate>
      <link>https://forem.com/seatunnel/a-practical-dataops-development-framework-based-on-whalestudios-three-layer-model-1j9l</link>
      <guid>https://forem.com/seatunnel/a-practical-dataops-development-framework-based-on-whalestudios-three-layer-model-1j9l</guid>
      <description>&lt;p&gt;As data platforms evolve from simply “getting jobs to run” to achieving stable and reliable operations, the challenges teams face also begin to shift. Early on, the focus is mainly on whether tasks execute successfully. As scale increases, the concerns move toward access control, clarity of data pipelines, manageability of changes, and the ability to recover from failures.&lt;/p&gt;

&lt;p&gt;This is where DataOps starts to show its real value. It is not just a set of tool usage guidelines, but an engineering methodology that spans development, scheduling, and governance. Using WhaleStudio’s development management framework as an example, this article distills a set of practical standards drawn directly from real production experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Layer Development Framework
&lt;/h2&gt;

&lt;p&gt;In complex data platforms, managing everything through a single dimension quickly becomes insufficient as the system grows. WhaleStudio introduces a three-layer structure of Project, Workflow, and Task, which decouples governance, orchestration, and execution, creating clear boundaries for system management.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F150g5rxu5mh8gr6ws2gd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F150g5rxu5mh8gr6ws2gd.jpg" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Project as the Governance Boundary
&lt;/h3&gt;

&lt;p&gt;The project layer is the most fundamental part of the system, yet it is also the most commonly misused. In many teams, projects are treated merely as a way to organize directories. This approach often leads to problems later, such as unclear permissions, resource misuse, and ambiguous ownership.&lt;/p&gt;

&lt;p&gt;In a well-designed system, projects should serve as governance boundaries. Everything related to access control should be scoped within a project, including user permissions, data source access, script resources, alerting strategies, and Worker group configurations.&lt;/p&gt;

&lt;p&gt;A practical rule is simple. Whenever there is a scenario where certain users should not be able to view or modify specific resources, isolation must be enforced at the project level rather than relying on conventions or manual processes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Workflow as the Business Pipeline
&lt;/h3&gt;

&lt;p&gt;If projects define who can do what, workflows define how work is organized.&lt;/p&gt;

&lt;p&gt;A workflow is essentially a DAG that represents dependencies between tasks. In a typical data pipeline, workflows connect data ingestion, SQL processing, script execution, and sub-process calls into a complete business flow.&lt;/p&gt;

&lt;p&gt;Beyond orchestration, workflows also handle scheduling concerns such as dependency management, parallel and sequential execution strategies, retry mechanisms, and backfill logic. This means a workflow is not just a representation of execution logic, but also a key part of system stability design.&lt;/p&gt;

&lt;p&gt;In practice, workflows should be treated as traceable and replayable pipelines rather than just collections of tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task as the Smallest Execution Unit
&lt;/h3&gt;

&lt;p&gt;Under workflows, tasks represent the smallest unit of execution and have the most direct impact on system stability.&lt;/p&gt;

&lt;p&gt;Common task types include SQL, Shell, Python, and data integration jobs. Despite their differences, they should follow consistent design principles such as traceability, retry capability, and recoverability.&lt;/p&gt;

&lt;p&gt;In many production scenarios, issues do not originate from the scheduler itself, but from the tasks. For example, non-idempotent SQL logic, scripts without proper error handling, or strong dependencies on external systems can amplify risks during retries or backfills. Establishing standards at the task level is therefore critical to overall system reliability.&lt;/p&gt;

&lt;p&gt;Once the responsibilities of the three layers are clearly defined, the next step is to manage permissions and design workflows effectively to prevent the system from becoming unmanageable as it scales.&lt;/p&gt;

&lt;h2&gt;
  
  
  Principles for Data Access and Workflow Design
&lt;/h2&gt;

&lt;p&gt;As teams grow and business logic becomes more complex, access control and workflow design become key factors affecting both efficiency and stability. Without consistent standards, systems can quickly become chaotic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Organize Projects by Business Domain
&lt;/h3&gt;

&lt;p&gt;Projects should primarily be structured around business domains such as sales, risk control, or finance. This aligns naturally with organizational structure and helps clarify ownership.&lt;/p&gt;

&lt;p&gt;When cross-team collaboration is required, resource sharing should be implemented through authorization mechanisms rather than placing everything into a single project. While the latter may seem convenient initially, it often leads to uncontrolled permissions over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Separate Responsibilities in Permission Design
&lt;/h3&gt;

&lt;p&gt;Permissions should never default to giving everyone full access. Roles such as development, testing, operations, and auditing should be clearly separated, each with its own scope of authority.&lt;/p&gt;

&lt;p&gt;This approach reduces the risk of accidental changes and helps standardize release processes, making system changes more controlled.&lt;/p&gt;

&lt;h3&gt;
  
  
  Balance Isolation and Reuse
&lt;/h3&gt;

&lt;p&gt;Resource management must balance isolation with reuse. Data sources, scripts, resource pools, and Worker groups should be isolated by default to avoid unintended interference.&lt;/p&gt;

&lt;p&gt;When reuse is necessary, it should be achieved through controlled authorization rather than duplicating configurations. This reduces maintenance overhead and avoids inconsistencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resolve Permission Differences Through Projects
&lt;/h3&gt;

&lt;p&gt;Whenever permission differences exist, they must be handled through project-level isolation. For example, if certain datasets should only be accessible to specific users, this must be enforced through system mechanisms rather than informal agreements.&lt;/p&gt;

&lt;p&gt;Although this principle seems straightforward, it is often overlooked, leading to loss of control over the permission system.&lt;/p&gt;

&lt;p&gt;Once the permission model is stable, workflow design becomes the key factor in maintainability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Control Workflow Size
&lt;/h3&gt;

&lt;p&gt;As the number of tasks grows, placing everything into a single workflow leads to rapidly increasing maintenance costs and higher risk during changes.&lt;/p&gt;

&lt;p&gt;In practice, workflows should be split based on data layers or business domains, such as ODS, DWD, DWS, and ADS. The number of nodes within a workflow should remain within a manageable range to avoid excessive complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Upgrade Governance When Complexity Increases
&lt;/h3&gt;

&lt;p&gt;When the number of workflows grows too large or directory structures become unmanageable, relying on labels or folders is no longer sufficient. At this point, governance should be elevated to a higher level, such as introducing additional project segmentation.&lt;/p&gt;

&lt;p&gt;This is not merely structural optimization, but an evolution of governance strategy.&lt;/p&gt;

&lt;p&gt;Once design principles are clear, implementation should align with team size. There is no single solution that fits all teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Strategies for Different Team Sizes
&lt;/h2&gt;

&lt;p&gt;DataOps does not have a universal solution. The right approach depends on team size and system complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large Teams with Layered Isolation
&lt;/h3&gt;

&lt;p&gt;In large or complex data warehouse environments, multiple business domains, permission boundaries, and data pipelines coexist. In such cases, data warehouse layers such as ODS, DWD, DWS, and ADS should be mapped to different projects and workflows.&lt;/p&gt;

&lt;p&gt;Dependencies across projects and workflows must be clearly defined. Impact analysis tools should be used for global governance to ensure changes do not introduce cascading failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Medium Sized Teams with Balanced Design
&lt;/h3&gt;

&lt;p&gt;For medium-sized teams, the goal is to maintain stability while avoiding unnecessary complexity.&lt;/p&gt;

&lt;p&gt;Projects should not be overly fragmented, and workflows should not be split excessively. Instead, different scheduling cycles such as daily and monthly jobs can be connected through well-defined dependencies.&lt;/p&gt;

&lt;p&gt;The focus at this stage should be on unified scheduling strategies and resource pool management rather than introducing overly complex governance frameworks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Small Teams with Fast Execution
&lt;/h3&gt;

&lt;p&gt;For small teams or early-stage projects, the priority is to establish a working delivery pipeline.&lt;/p&gt;

&lt;p&gt;A single workflow can be used to handle core business processes, supported by naming conventions, alerting mechanisms, and backfill strategies to ensure baseline quality. As complexity increases, the system can gradually evolve toward more fine-grained structures.&lt;/p&gt;

&lt;p&gt;This approach keeps costs under control while avoiding overly heavy design in the early stages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;From Project to Workflow to Task, WhaleStudio’s three-layer model provides a clear division of responsibilities. Projects define governance boundaries, workflows manage business orchestration, and tasks handle execution.&lt;/p&gt;

&lt;p&gt;With well-designed permission models and properly structured workflows, systems can remain stable and controllable even as complexity grows.&lt;/p&gt;

&lt;p&gt;The essence of DataOps lies not in the tools themselves, but in building an engineering system that can evolve sustainably. Only when permissions, resources, and execution logic are governed under a unified framework can a data platform truly support long-term business growth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Previous Articles
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/@apacheseatunnel/5-when-your-data-warehouse-breaks-down-its-probably-a-naming-problem-32ba42558db1" rel="noopener noreferrer"&gt;(5)When Your Data Warehouse Breaks Down, It’s Probably a Naming Problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://medium.com/codex/4-why-your-ads-layer-always-goes-wild-and-how-a-strong-dws-layer-fixes-it-4fddecde4288?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;(4)Why Your ADS Layer Always Goes Wild and How a Strong DWS Layer Fixes It&lt;/a&gt;

&lt;ul&gt;
&lt;li&gt;(3) Key Design Principles for ODS/Detail Layer Implementation: Building the Data Ingestion Layer as a “Stable and Operable” Infrastructure&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;a href="https://medium.com/@apacheseatunnel/i-a-complete-guide-to-building-and-standardizing-a-modern-lakehouse-architecture-an-overview-of-9a2a263f2f1b?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;(I) A Complete Guide to Building and Standardizing a Modern Lakehouse Architecture: An Overview of Data Warehouses and Data Lakes&lt;/a&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Coming Next
&lt;/h2&gt;

&lt;p&gt;Part 7 Scheduling design best practices&lt;/p&gt;




</description>
      <category>dataops</category>
      <category>ai</category>
      <category>database</category>
      <category>terraform</category>
    </item>
    <item>
      <title>You Don’t Apply to Become an ASF Member, You Grow Into It</title>
      <dc:creator>Apache SeaTunnel</dc:creator>
      <pubDate>Fri, 10 Apr 2026 09:11:30 +0000</pubDate>
      <link>https://forem.com/seatunnel/you-dont-apply-to-become-an-asf-member-you-grow-into-it-4oa8</link>
      <guid>https://forem.com/seatunnel/you-dont-apply-to-become-an-asf-member-you-grow-into-it-4oa8</guid>
      <description>&lt;p&gt;Very few people set “becoming an ASF Member” as a clear goal.&lt;/p&gt;

&lt;p&gt;Not because it lacks appeal, but because there is no application process and no defined path. It is more of an outcome, something that happens after sustained contributions are naturally recognized within a community.&lt;/p&gt;

&lt;p&gt;Fan Jia followed exactly that kind of path.&lt;/p&gt;

&lt;p&gt;Recently, he was invited to join the Apache Software Foundation as a Member. Taking this opportunity, we had an in-depth conversation with him. More than a recognition of achievement, the discussion felt like a reflection on his journey—from data integration, to open source participation, to system design and community understanding—tracing how an engineer gradually arrives at this point.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqij6yoerzb0vvm4ozss.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqij6yoerzb0vvm4ozss.jpg" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting from Data Integration
&lt;/h2&gt;

&lt;p&gt;Fan Jia’s current work focuses on data integration, particularly in areas such as data synchronization, Change Data Capture, and data infrastructure. As he describes it, his day-to-day work can be distilled into one core objective: enabling data to flow reliably across different systems.&lt;/p&gt;

&lt;p&gt;In practice, this is far more complex than it sounds. It involves synchronizing data between heterogeneous systems, handling schema evolution, and ensuring stability in complex production environments. Alongside this, he has been actively contributing to the Apache SeaTunnel community over the long term.&lt;/p&gt;

&lt;p&gt;What stands out is that his starting point was not open source itself, but a set of concrete and persistent engineering problems. Those problems became the foundation for his later involvement in open source.&lt;/p&gt;

&lt;h2&gt;
  
  
  How He Got Into Open Source
&lt;/h2&gt;

&lt;p&gt;When asked how he first got involved in open source, his answer was straightforward—it started with his job. After joining WhaleOps, he became involved in the development, maintenance, and partial architectural design of Apache SeaTunnel.&lt;/p&gt;

&lt;p&gt;In the early stage, his contributions were similar to those of most engineers, focusing on solving specific issues such as fixing bugs and improving features. Over time, however, his attention shifted toward system design and how the project could run reliably across broader and more diverse scenarios.&lt;/p&gt;

&lt;p&gt;This transition did not happen overnight. It emerged gradually through continuous involvement. As his focus moved from isolated problems to the system as a whole, his role evolved along with it.&lt;/p&gt;

&lt;h2&gt;
  
  
  From User to Maintainer
&lt;/h2&gt;

&lt;p&gt;He describes this phase as a shift in perspective and responsibility.&lt;/p&gt;

&lt;p&gt;As a user, the focus is on whether a feature exists and whether it meets immediate needs. As a maintainer, the concerns expand to system stability, backward compatibility, adaptability across different use cases, and the real experience of community users.&lt;/p&gt;

&lt;p&gt;At the same time, the sense of responsibility becomes more concrete. Writing code is no longer just about completing a task. It becomes part of maintaining a system that runs in real production environments, making every technical decision more deliberate.&lt;/p&gt;

&lt;p&gt;Once this shift in perspective happens, the truly complex problems begin to surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Memorable Technical Challenge
&lt;/h2&gt;

&lt;p&gt;During his time contributing to SeaTunnel, one of the most memorable challenges was building the Zeta engine from scratch.&lt;/p&gt;

&lt;p&gt;This was not about solving a single isolated issue, but about tackling a combination of complex system-level problems. At the execution model level, the engine needed to support both batch and stream processing, balancing throughput and latency while avoiding bottlenecks under high concurrency.&lt;/p&gt;

&lt;p&gt;From a concurrency perspective, multi-threaded execution introduced challenges such as race conditions, deadlocks, and unpredictable execution order. These issues are often difficult to reproduce and tend to surface only after prolonged runtime.&lt;/p&gt;

&lt;p&gt;In terms of resource management, real production workloads involve long-running tasks and large data volumes. Memory control, thread pool isolation, and backpressure handling become critical. Out-of-memory errors are especially dangerous, as they can impact not only individual tasks but the stability of the entire service process.&lt;/p&gt;

&lt;p&gt;For stability and recoverability, the system must guarantee no data loss, avoid uncontrolled duplication, and correctly restore state after failures or restarts. This typically requires integrating checkpointing and state management mechanisms.&lt;/p&gt;

&lt;p&gt;Overall, this was not a single technical problem, but a full-scale systems engineering challenge.&lt;/p&gt;

&lt;p&gt;These experiences also shaped how he understands collaboration in open source.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Most Important Skill in Open Source
&lt;/h2&gt;

&lt;p&gt;When asked what matters most in an open source community, his answer was patience.&lt;/p&gt;

&lt;p&gt;A pull request in open source rarely gets merged immediately. It usually goes through multiple stages, including initial implementation, community review, several rounds of revision, CI validation, and documentation updates. Along the way, various issues can arise. Without patience, it is easy to give up midway.&lt;/p&gt;

&lt;p&gt;However, consistently pushing through these details is exactly what defines high-quality contributions.&lt;/p&gt;

&lt;p&gt;This understanding of the process is also reflected in his advice to newcomers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advice for New Contributors
&lt;/h2&gt;

&lt;p&gt;For developers just getting started in open source, he believes the most important things are curiosity and the willingness to act.&lt;/p&gt;

&lt;p&gt;Often, the biggest barrier is not technical difficulty, but simply not getting started. Once you take the first step—submitting a small PR or joining a discussion—everything else tends to follow naturally.&lt;/p&gt;

&lt;p&gt;He also emphasizes the importance of expressing your own ideas and even questioning existing designs. Open source communities are inherently open environments, and everyone starts as a beginner.&lt;/p&gt;

&lt;p&gt;As participation deepens, feedback from the community becomes more visible.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Moment He Became an ASF Member
&lt;/h2&gt;

&lt;p&gt;When he learned that he had become an ASF Member, his first reaction was excitement and happiness.&lt;/p&gt;

&lt;p&gt;Unlike many achievements, this is not something you apply for. It is a recognition from the community based on long-term contributions, which makes it especially meaningful.&lt;/p&gt;

&lt;p&gt;At the same time, he sees it not just as an honor, but as an increase in responsibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Role Means
&lt;/h2&gt;

&lt;p&gt;In his view, being an ASF Member is fundamentally about responsibility.&lt;/p&gt;

&lt;p&gt;It is not only about continuing technical contributions, but also about fostering a healthy community, helping new contributors grow, and participating in higher-level governance. It also means being accountable to users, ensuring that projects run reliably in real-world environments.&lt;/p&gt;

&lt;p&gt;As his role evolves, so does his understanding of the community.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding The Apache Way
&lt;/h2&gt;

&lt;p&gt;He summarizes his understanding of The Apache Way in one phrase: Community Over Code.&lt;/p&gt;

&lt;p&gt;The long-term success of an open source project depends not only on its technology but also on whether it maintains open and transparent decision-making, encourages contributors from diverse backgrounds, and builds governance based on consensus.&lt;/p&gt;

&lt;p&gt;These factors ultimately determine the vitality of a project.&lt;/p&gt;

&lt;p&gt;With this perspective, he approaches projects from a broader viewpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  How He Sees SeaTunnel
&lt;/h2&gt;

&lt;p&gt;In his view, SeaTunnel’s strengths lie in several areas.&lt;/p&gt;

&lt;p&gt;From an architectural standpoint, it supports a multi-engine model, allowing users to choose the most suitable execution engine for different scenarios. From an ecosystem perspective, it provides a rich set of connectors, enabling integration with various databases, data lakes, and messaging systems.&lt;/p&gt;

&lt;p&gt;In terms of capabilities, CDC is a key strength, supporting both data change capture and schema evolution, making the system more adaptable to complex production environments.&lt;/p&gt;

&lt;p&gt;At the same time, despite these capabilities, SeaTunnel maintains a relatively lightweight design, allowing users to adopt and use it at a lower cost.&lt;/p&gt;

&lt;p&gt;These insights come from long-term hands-on experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Open Source Changed Him
&lt;/h2&gt;

&lt;p&gt;Open source has had a significant impact on his career, especially in how he approaches problems.&lt;/p&gt;

&lt;p&gt;Within a company, systems are usually designed around specific business needs. In open source, however, solutions must consider much broader and more general use cases, which pushes engineers to make longer-term architectural decisions.&lt;/p&gt;

&lt;p&gt;Collaborating with developers from different companies and backgrounds also expands one’s technical perspective.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Sentence About Open Source
&lt;/h2&gt;

&lt;p&gt;When asked to summarize open source in one sentence, he said&lt;/p&gt;

&lt;p&gt;Open source is not just about sharing code, it is a process where developers and communities grow together&lt;/p&gt;

&lt;p&gt;It may sound simple, but when viewed in the context of his journey, it is less a conclusion and more a natural outcome.&lt;/p&gt;

&lt;p&gt;From solving concrete data problems, to participating in system design, to thinking about how projects run reliably across different scenarios, and eventually to engaging in community collaboration and consensus building, there is no clear boundary between these stages.&lt;/p&gt;

&lt;p&gt;It is a continuous process where perspective gradually expands through doing the work.&lt;/p&gt;

&lt;p&gt;Becoming an ASF Member is not the end of this journey, but a milestone along the way. It reflects recognition of past contributions and signals greater responsibility ahead.&lt;/p&gt;

&lt;p&gt;If there is one deeper takeaway from this experience, it may not be a specific technology or a single project, but a more enduring capability&lt;/p&gt;

&lt;p&gt;The ability to keep investing in uncertainty and to continue doing the right thing even when there is no immediate reward&lt;/p&gt;




&lt;p&gt;About Apache SeaTunnel&lt;br&gt;
Apache SeaTunnel is an easy-to-use, ultra-high-performance distributed data integration platform that supports real-time synchronization of massive amounts of data and can synchronize hundreds of billions of data per day stably and efficiently.&lt;/p&gt;

&lt;p&gt;Welcome to fill out this form to be a speaker of Apache SeaTunnel: &lt;a href="https://forms.gle/vtpQS6ZuxqXMt6DT6" rel="noopener noreferrer"&gt;https://forms.gle/vtpQS6ZuxqXMt6DT6&lt;/a&gt; :)&lt;/p&gt;

&lt;p&gt;Why do we need Apache SeaTunnel?&lt;br&gt;
Apache SeaTunnel does everything it can to solve the problems you may encounter in synchronizing massive amounts of data.&lt;br&gt;
Data loss and duplication&lt;br&gt;
Task buildup and latency&lt;br&gt;
Low throughput&lt;br&gt;
Long application-to-production cycle time&lt;br&gt;
Lack of application status monitoring&lt;/p&gt;

&lt;p&gt;Apache SeaTunnel Usage Scenarios&lt;br&gt;
Massive data synchronization&lt;br&gt;
Massive data integration&lt;br&gt;
ETL of large volumes of data&lt;br&gt;
Massive data aggregation&lt;br&gt;
Multi-source data processing&lt;/p&gt;

&lt;p&gt;Features of Apache SeaTunnel&lt;br&gt;
Rich components&lt;br&gt;
High scalability&lt;br&gt;
Easy to use&lt;br&gt;
Mature and stable&lt;/p&gt;

&lt;p&gt;How to get started with Apache SeaTunnel quickly?&lt;br&gt;
Want to experience Apache SeaTunnel quickly? SeaTunnel 2.1.0 takes 10 seconds to get you up and running.&lt;br&gt;
&lt;a href="https://seatunnel.apache.org/docs/2.1.0/developement/setup" rel="noopener noreferrer"&gt;https://seatunnel.apache.org/docs/2.1.0/developement/setup&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How can I contribute?&lt;br&gt;
We invite all partners who are interested in making local open-source global to join the Apache SeaTunnel contributors family and foster open-source together!&lt;/p&gt;

&lt;p&gt;Submit an issue:&lt;br&gt;
&lt;a href="https://github.com/apache/seatunnel/issues" rel="noopener noreferrer"&gt;https://github.com/apache/seatunnel/issues&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Contribute code to:&lt;br&gt;
&lt;a href="https://github.com/apache/seatunnel/pulls" rel="noopener noreferrer"&gt;https://github.com/apache/seatunnel/pulls&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Subscribe to the community development mailing list :&lt;br&gt;
&lt;a href="mailto:dev-subscribe@seatunnel.apache.org"&gt;dev-subscribe@seatunnel.apache.org&lt;/a&gt;&lt;br&gt;
Development Mailing List :&lt;br&gt;
&lt;a href="mailto:dev@seatunnel.apache.org"&gt;dev@seatunnel.apache.org&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Join Slack:&lt;br&gt;
&lt;a href="https://join.slack.com/t/apacheseatunnel/shared_invite/zt-3uouszk3m-PtLLNyZsJVqE5Gb6gn24mA" rel="noopener noreferrer"&gt;https://join.slack.com/t/apacheseatunnel/shared_invite/zt-3uouszk3m-PtLLNyZsJVqE5Gb6gn24mA&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Follow us on Twitter:&lt;br&gt;
&lt;a href="https://twitter.com/ASFSeaTunnel" rel="noopener noreferrer"&gt;https://twitter.com/ASFSeaTunnel&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Join us now!❤️❤️&lt;/p&gt;

</description>
      <category>asf</category>
      <category>ai</category>
      <category>opensource</category>
      <category>apacheseatunnel</category>
    </item>
    <item>
      <title>What Happened in Apache SeaTunnel? This March You Shouldn’t Miss</title>
      <dc:creator>Apache SeaTunnel</dc:creator>
      <pubDate>Fri, 10 Apr 2026 07:06:02 +0000</pubDate>
      <link>https://forem.com/seatunnel/what-happened-in-apache-seatunnel-this-march-you-shouldnt-miss-2l12</link>
      <guid>https://forem.com/seatunnel/what-happened-in-apache-seatunnel-this-march-you-shouldnt-miss-2l12</guid>
      <description>&lt;p&gt;Hey there! The March 2026 report is here. The Apache SeaTunnel community has been incredibly active. A total of 26 contributors participated, version 2.3.13 was released, five new connectors were added, and major improvements were made across the core engine, file connectors, CDC, and Transform modules. More than 20 bugs were also fixed.&lt;/p&gt;

&lt;p&gt;On top of that, infrastructure upgrades were rolled out. Whether you’re an enterprise or individual user, it’s a great time to upgrade, explore new features, and stay in sync with the community momentum.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fodj024zrqk6ky1zx1isr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fodj024zrqk6ky1zx1isr.jpg" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reporting period March 1, 2026 to March 30, 2026&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Release Overview
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Release Date&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2.3.13&lt;/td&gt;
&lt;td&gt;March 14, 2026&lt;/td&gt;
&lt;td&gt;Released this month with 50+ new features and 20+ bug fixes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Download&lt;br&gt;
&lt;a href="https://seatunnel.apache.org/download" rel="noopener noreferrer"&gt;https://seatunnel.apache.org/download&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Key Updates in Version 2.3.13
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 New Connectors
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connector&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;HugeGraph Sink&lt;/td&gt;
&lt;td&gt;Adds support for Apache HugeGraph&lt;/td&gt;
&lt;td&gt;#10002&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DuckDB&lt;/td&gt;
&lt;td&gt;Introduces DuckDB as both Source and Sink&lt;/td&gt;
&lt;td&gt;#10285&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lance&lt;/td&gt;
&lt;td&gt;Adds support for writing to Lance datasets&lt;/td&gt;
&lt;td&gt;#9894&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS DSQL&lt;/td&gt;
&lt;td&gt;Adds AWS DSQL Sink connector&lt;/td&gt;
&lt;td&gt;#9739&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IoTDB&lt;/td&gt;
&lt;td&gt;Adds Source and Sink support for IoTDB 2.x&lt;/td&gt;
&lt;td&gt;#9872&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2.2 Core Engine Enhancements
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Zeta Engine&lt;/td&gt;
&lt;td&gt;Supports arbitrarily nested arrays and map types&lt;/td&gt;
&lt;td&gt;#9881&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zeta Engine&lt;/td&gt;
&lt;td&gt;Adds min-pause checkpoint configuration&lt;/td&gt;
&lt;td&gt;#9804&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zeta Engine&lt;/td&gt;
&lt;td&gt;Introduces REST API to inspect pending queue details&lt;/td&gt;
&lt;td&gt;#10078&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flink&lt;/td&gt;
&lt;td&gt;Adds support for Flink 1.20.1&lt;/td&gt;
&lt;td&gt;#9576&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flink&lt;/td&gt;
&lt;td&gt;Enables schema evolution for CDC sources&lt;/td&gt;
&lt;td&gt;#9867&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metrics&lt;/td&gt;
&lt;td&gt;Adds sink committed metrics and commit rate calculation&lt;/td&gt;
&lt;td&gt;#10233&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2.3 File Connector Improvements
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connector&lt;/th&gt;
&lt;th&gt;Enhancement&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;HdfsFile&lt;/td&gt;
&lt;td&gt;Enables parallel reading for large files&lt;/td&gt;
&lt;td&gt;#10332&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LocalFile&lt;/td&gt;
&lt;td&gt;Supports chunked parallel reading for CSV, TEXT, JSON files&lt;/td&gt;
&lt;td&gt;#10142&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parquet&lt;/td&gt;
&lt;td&gt;Adds logical partitioning support&lt;/td&gt;
&lt;td&gt;#10239&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HdfsFile and LocalFile&lt;/td&gt;
&lt;td&gt;Adds sync_mode=update support&lt;/td&gt;
&lt;td&gt;#10437, #10268&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HBase&lt;/td&gt;
&lt;td&gt;Supports time-range scanning&lt;/td&gt;
&lt;td&gt;#10318&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hive&lt;/td&gt;
&lt;td&gt;Supports automatic failover across multiple Metastore URIs&lt;/td&gt;
&lt;td&gt;#10253&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2.4 CDC Improvements
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Maxwell Canal Debezium&lt;/td&gt;
&lt;td&gt;Optimizes JSON format and supports merging update_before and update_after&lt;/td&gt;
&lt;td&gt;#9805&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka&lt;/td&gt;
&lt;td&gt;Adds Protobuf deserialization support via Schema Registry wire format&lt;/td&gt;
&lt;td&gt;#10183&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka&lt;/td&gt;
&lt;td&gt;Injects record timestamp as EventTime metadata&lt;/td&gt;
&lt;td&gt;#9994&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MySQL CDC&lt;/td&gt;
&lt;td&gt;Optimizes wait time for schema evolution&lt;/td&gt;
&lt;td&gt;#10040&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2.5 Transform Enhancements
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Transformation&lt;/th&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Multimodal Embeddings&lt;/td&gt;
&lt;td&gt;Adds support for multimodal embeddings&lt;/td&gt;
&lt;td&gt;#9673&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RegexExtract&lt;/td&gt;
&lt;td&gt;Introduces regex-based extraction transform&lt;/td&gt;
&lt;td&gt;#9829&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL to Paimon&lt;/td&gt;
&lt;td&gt;Adds support for MERGE INTO syntax&lt;/td&gt;
&lt;td&gt;#10206&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  3. Bug Fixes in Version 2.3.13
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CSV Reader&lt;/td&gt;
&lt;td&gt;Fixes parsing failure caused by empty first column&lt;/td&gt;
&lt;td&gt;#10383&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ClickHouse&lt;/td&gt;
&lt;td&gt;Improves batch parallel reads by replacing limit offset with last batch sort value&lt;/td&gt;
&lt;td&gt;#9801&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;Adds support for TIMESTAMP_TZ type&lt;/td&gt;
&lt;td&gt;#10048&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Redis&lt;/td&gt;
&lt;td&gt;Fixes cluster mode bug and adds end-to-end tests&lt;/td&gt;
&lt;td&gt;#9869&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MongoDB&lt;/td&gt;
&lt;td&gt;Improves writer close logic&lt;/td&gt;
&lt;td&gt;#10051&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elasticsearch&lt;/td&gt;
&lt;td&gt;Optimizes resource cleanup for Scroll API&lt;/td&gt;
&lt;td&gt;#10124&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MySQL CDC&lt;/td&gt;
&lt;td&gt;Optimizes schema evolution wait time&lt;/td&gt;
&lt;td&gt;#10040&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  4. Community Highlights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Contributors in March 2026
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Contributor&lt;/th&gt;
&lt;th&gt;PR Count&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🏅&lt;/td&gt;
&lt;td&gt;@zhangshenghang&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🥈&lt;/td&gt;
&lt;td&gt;@yzeng1618&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🥈&lt;/td&gt;
&lt;td&gt;@davidzollo&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🥈&lt;/td&gt;
&lt;td&gt;@chl-wxp&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🥉&lt;/td&gt;
&lt;td&gt;@liunaijie&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🥉&lt;/td&gt;
&lt;td&gt;@dybyte&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🥉&lt;/td&gt;
&lt;td&gt;@ricky2129&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🥉&lt;/td&gt;
&lt;td&gt;@corgy-w&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@zooo-code&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@kuleat&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@LeonYoah&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@OmkarK-7&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@icekimchi&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@assokhi&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@Sephiroth1024&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@Best2Two&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@ic4y&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@misi1987107&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@CosmosNi&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@chocoboxxf&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@xiaochen-zhou&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@qingzheguo-flash&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;a class="mentioned-user" href="https://dev.to/rameshreddy-adutla"&gt;@rameshreddy-adutla&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@CNF96&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@MuraliMon&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;@ocean-zhc&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Contributor&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A total of 51 PRs were merged in March. Huge thanks to all 26 contributors.&lt;/p&gt;

&lt;p&gt;Full contributor list&lt;br&gt;
&lt;a href="https://github.com/apache/seatunnel/graphs/contributors" rel="noopener noreferrer"&gt;https://github.com/apache/seatunnel/graphs/contributors&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Infrastructure Updates
&lt;/h3&gt;

&lt;p&gt;End-to-end test Docker images migrated to the seatunnelhub repository&lt;br&gt;
JDK Docker images upgraded&lt;br&gt;
CI timeout optimization with Kafka set to 140 minutes and Kudu to 60 minutes&lt;br&gt;
Added Metalake support for managing data source metadata&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Recommendations for Enterprises
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Upgrade Guidance
&lt;/h3&gt;

&lt;p&gt;Production environments are strongly recommended to upgrade to version 2.3.13&lt;br&gt;
This release includes more than 50 new features and over 20 bug fixes&lt;/p&gt;

&lt;h3&gt;
  
  
  Features to Watch
&lt;/h3&gt;

&lt;p&gt;New connectors including HugeGraph, DuckDB, IoTDB, AWS DSQL, and Lance&lt;br&gt;
Improved large file processing with parallel chunked reads in HdfsFile and LocalFile&lt;br&gt;
Enhanced CDC capabilities including schema evolution and multi-format Kafka support&lt;br&gt;
Improved observability with new sink committed metrics&lt;br&gt;
Support for Flink 1.20.1&lt;/p&gt;

&lt;h3&gt;
  
  
  Notes
&lt;/h3&gt;

&lt;p&gt;Some connector APIs have changed, so reviewing the upgrade documentation is recommended&lt;br&gt;
Using the seatunnelhub image repository is strongly encouraged&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Key Metrics
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;March Data&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Releases&lt;/td&gt;
&lt;td&gt;1 release (2.3.13)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New Connectors&lt;/td&gt;
&lt;td&gt;5+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feature Enhancements&lt;/td&gt;
&lt;td&gt;50+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bug Fixes&lt;/td&gt;
&lt;td&gt;20+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contributors&lt;/td&gt;
&lt;td&gt;50+&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  7. What’s Coming Next
&lt;/h2&gt;

&lt;p&gt;Further optimization of CDC performance&lt;br&gt;
More cloud-native data source integrations&lt;br&gt;
Improved metrics and monitoring capabilities&lt;/p&gt;

&lt;p&gt;Compiled and edited by the SeaTunnel Community&lt;/p&gt;

</description>
      <category>seatunnel</category>
      <category>opensource</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>(5)When Your Data Warehouse Breaks Down, It’s Probably a Naming Problem</title>
      <dc:creator>Apache SeaTunnel</dc:creator>
      <pubDate>Fri, 03 Apr 2026 06:59:33 +0000</pubDate>
      <link>https://forem.com/seatunnel/5when-your-data-warehouse-breaks-down-its-probably-a-naming-problem-3p1c</link>
      <guid>https://forem.com/seatunnel/5when-your-data-warehouse-breaks-down-its-probably-a-naming-problem-3p1c</guid>
      <description>&lt;p&gt;As a data warehouse grows, the first thing that tends to get out of control is not the data itself—but naming. Naming conventions may seem like a minor detail, but they directly determine whether data is easy to find, understand, and maintain. As the fifth article in the Data Lakehouse Design and Practice series, this article starts from real-world usage and summarizes core methods for table and field naming. By combining layered prefixes, unified terminology (word roots), and cycle encoding, table names become self-explanatory. Together with metric naming and governance processes, this helps build a clear and collaborative data system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Goals and Methods of Naming Conventions: Make Table Names Self-Explanatory and Teams Work Automatically
&lt;/h2&gt;

&lt;p&gt;In a data warehouse system, naming conventions are not just about form—they are foundational infrastructure that directly impacts collaboration efficiency and data quality. A good naming system has one core goal: make the table name itself carry enough information so that people can understand what the table is, where it comes from, and how to use it—without needing extra documentation. Ideally, a table name should be “readable at a glance” and include key information such as data layer, owning team, business domain, subject domain, core object meaning, and update cycle or data scope. When these elements are systematically encoded into table names, data discovery, metric interpretation, troubleshooting, and team handovers all become significantly more efficient, reducing communication costs.&lt;/p&gt;

&lt;p&gt;A naming system is essentially a “word root system” that standardizes business language. For example, the same business object must use the same term consistently across tables (e.g., avoid mixing “rack” and “shelf”). Similarly, metric naming should follow unified rules—for instance, all ratio-type metrics should use the &lt;code&gt;_rate&lt;/code&gt; suffix, avoiding ambiguity from mixing terms like ratio, percent, or rt.&lt;/p&gt;

&lt;p&gt;Layer prefixes must be strictly standardized. They allow users to immediately identify the data layer and purpose of a table: &lt;code&gt;ods_&lt;/code&gt; for source-aligned data, &lt;code&gt;dwd_&lt;/code&gt; for detailed standardized data, &lt;code&gt;dws_&lt;/code&gt; for aggregated data, &lt;code&gt;ads_&lt;/code&gt; for application-facing outputs, and &lt;code&gt;dim_&lt;/code&gt; for shared dimensions. These prefixes are not just naming conventions—they directly reflect the data architecture.&lt;/p&gt;

&lt;p&gt;Another often overlooked but critical aspect is encoding update cycles or data scope into table names. For example, &lt;code&gt;_1d&lt;/code&gt; represents the last day, &lt;code&gt;_td&lt;/code&gt; means up to today, and &lt;code&gt;_7d&lt;/code&gt; means the last seven days. This prevents confusion between tables with the same name but different time semantics, reducing the risk of metric misuse.&lt;/p&gt;

&lt;p&gt;At the asset management level, table types must be clearly distinguished. Production tables are long-term assets, intermediate tables serve only processing workflows and should have retention policies, and temporary tables are for one-time validation and must not enter production pipelines. Prefixes like &lt;code&gt;mid_&lt;/code&gt; and &lt;code&gt;tmp_&lt;/code&gt; help prevent data asset pollution at the source.&lt;/p&gt;

&lt;p&gt;Finally, naming conventions must be integrated with governance processes. Any new table or field must include complete metadata such as owner, field definitions, metric definitions, update frequency, dependencies, and lifecycle. Tables without such metadata may be usable in the short term but will almost certainly become technical debt in the long run. In practice, it is best to standardize templates first—ensuring key fields like layer, domain, and cycle are strictly consistent—while allowing limited flexibility in non-critical parts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table Naming Conventions: Templates, Cycle Encoding, and Examples
&lt;/h2&gt;

&lt;p&gt;In practice, table naming should follow a structured template to ensure completeness and consistency. A general template can be defined as &lt;code&gt;{layer}_{dept}_{biz_domain}_{subject}_{object}_{cycle_or_range}&lt;/code&gt;, where each component has a clear role: layer indicates data level, dept indicates ownership, biz_domain defines the business domain, subject represents analytical abstraction, object defines the entity or behavior, and cycle_or_range specifies the time scope.&lt;/p&gt;

&lt;p&gt;Cycle and range encoding is especially important. Common patterns include &lt;code&gt;_1d&lt;/code&gt; (last day), &lt;code&gt;_td&lt;/code&gt; (to date), &lt;code&gt;_7d&lt;/code&gt; or &lt;code&gt;_30d&lt;/code&gt; (last N days). Additional markers can distinguish data types or update modes, such as &lt;code&gt;d&lt;/code&gt; for daily snapshots, &lt;code&gt;w&lt;/code&gt; for weekly data, &lt;code&gt;i&lt;/code&gt; for incremental tables, &lt;code&gt;f&lt;/code&gt; for full tables, and &lt;code&gt;l&lt;/code&gt; for slowly changing tables. These conventions allow users to quickly understand temporal semantics.&lt;/p&gt;

&lt;p&gt;For example, in the aggregation layer, &lt;code&gt;dws_asale_trd_byr_subpay_1d&lt;/code&gt; represents buyer-level, staged payment transactions aggregated over the last day, while &lt;code&gt;dws_asale_trd_itm_slr_hh&lt;/code&gt; represents hourly aggregation at the seller-item level. Although long, such names are highly informative and readable.&lt;/p&gt;

&lt;p&gt;Dimension tables follow a separate convention, using the &lt;code&gt;dim_&lt;/code&gt; prefix and a &lt;code&gt;{scope}_{object}&lt;/code&gt; structure, such as &lt;code&gt;dim_pub_area&lt;/code&gt; (public area dimension) or &lt;code&gt;dim_asale_item&lt;/code&gt; (item dimension), emphasizing cross-domain reuse.&lt;/p&gt;

&lt;p&gt;Intermediate tables should be tightly bound to their target tables, typically named as &lt;code&gt;mid_{target_table}_{suffix}&lt;/code&gt;, such as &lt;code&gt;mid_dws_xxx_01&lt;/code&gt;. Temporary tables must use the &lt;code&gt;tmp_&lt;/code&gt; prefix and are strictly limited to development or validation, never entering production dependencies. For manually maintained data, tables in the DWD layer can explicitly include &lt;code&gt;manual&lt;/code&gt;, such as &lt;code&gt;dwd_trade_manual_client_info_l&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Field and Metric Naming Conventions: Rules, Structure, and Examples
&lt;/h2&gt;

&lt;p&gt;At the field level, naming must be strictly standardized. All field names should use lowercase with underscores—camelCase is not allowed. Readability should take priority over brevity, and consistent naming must be maintained for the same semantic meaning.&lt;/p&gt;

&lt;p&gt;Partition fields should be unified globally—for example, &lt;code&gt;dt&lt;/code&gt; for date, &lt;code&gt;hh&lt;/code&gt; for hour, and &lt;code&gt;mi&lt;/code&gt; for minute—with fixed formats. This improves development efficiency and avoids confusion across tables.&lt;/p&gt;

&lt;p&gt;Field suffixes should clearly indicate meaning: &lt;code&gt;_cnt&lt;/code&gt; for counts, &lt;code&gt;_amt&lt;/code&gt; or &lt;code&gt;_price&lt;/code&gt; for monetary values (choose one consistently), and boolean fields should use the &lt;code&gt;is_&lt;/code&gt; prefix and never be nullable. These conventions allow users to infer data types and meanings at a glance.&lt;/p&gt;

&lt;p&gt;NULL handling must also follow consistent rules. Typically, dimension fields use &lt;code&gt;-1&lt;/code&gt; for unknown values, while metric fields use &lt;code&gt;0&lt;/code&gt; to indicate no occurrence. This prevents NULL propagation in aggregations and improves data stability.&lt;/p&gt;

&lt;p&gt;Metric naming should be structured as a combination of business qualifier, time qualifier, aggregation method, and base metric. For example, &lt;code&gt;trade_amt&lt;/code&gt; represents transaction amount, &lt;code&gt;install_poi_cnt&lt;/code&gt; represents installation point count, and &lt;code&gt;pay_succ_rate&lt;/code&gt; represents payment success rate. Aggregation methods should use fixed terms like &lt;code&gt;sum&lt;/code&gt;, &lt;code&gt;avg&lt;/code&gt;, &lt;code&gt;max&lt;/code&gt;, and &lt;code&gt;min&lt;/code&gt;, avoiding inconsistent alternatives like “total.”&lt;/p&gt;

&lt;p&gt;A full example from fields to metrics: in the detail layer, an incremental order table might be named &lt;code&gt;dwd_trade_order_i&lt;/code&gt;, containing fields such as order ID, user ID, payment amount, order status, and partition keys. In the aggregation layer, &lt;code&gt;dws_trade_user_pay_1d&lt;/code&gt; summarizes user-level payments over the last day, including metrics like payment success count, total payment amount, and success rate. Finally, in the application layer, a table like &lt;code&gt;ads_fin_kpi_board_d&lt;/code&gt; provides business-facing dashboards with KPIs such as GMV, refund amount, net revenue, and number of paying users.&lt;/p&gt;

&lt;p&gt;By standardizing naming across tables, fields, and metrics, a data warehouse can achieve clear semantics, consistent structure, and efficient collaboration. While such conventions may introduce some overhead initially, they are essential for scalability and team coordination in the long term.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Earlier Posts in This Series：&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/codex/4-why-your-ads-layer-always-goes-wild-and-how-a-strong-dws-layer-fixes-it-4fddecde4288?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;(4)Why Your ADS Layer Always Goes Wild and How a Strong DWS Layer Fixes It&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;(3) Key Design Principles for ODS/Detail Layer Implementation: Building the Data Ingestion Layer as a “Stable and Operable” Infrastructure&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@apacheseatunnel/i-a-complete-guide-to-building-and-standardizing-a-modern-lakehouse-architecture-an-overview-of-9a2a263f2f1b?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;(I) A Complete Guide to Building and Standardizing a Modern Lakehouse Architecture: An Overview of Data Warehouses and Data Lakes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Next Post:&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  - (6) DataOps Development Standards and Best Practices
&lt;/h2&gt;

</description>
      <category>database</category>
      <category>datascience</category>
      <category>bigdata</category>
      <category>datawarehouse</category>
    </item>
    <item>
      <title>Growing with the Community: Zhang Shenghang’s Path to Apache SeaTunnel PMC Member</title>
      <dc:creator>Apache SeaTunnel</dc:creator>
      <pubDate>Fri, 03 Apr 2026 02:55:16 +0000</pubDate>
      <link>https://forem.com/seatunnel/growing-with-the-community-zhang-shenghangs-path-to-apache-seatunnel-pmc-member-3co1</link>
      <guid>https://forem.com/seatunnel/growing-with-the-community-zhang-shenghangs-path-to-apache-seatunnel-pmc-member-3co1</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhipmcy6jrz7ao2ul4w5h.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhipmcy6jrz7ao2ul4w5h.jpg" width="800" height="377"&gt;&lt;/a&gt;&lt;br&gt;
🎉 Hi Community—more exciting news! Zhang Shenghang has been invited to join the Apache SeaTunnel PMC in recognition of his outstanding contributions—well deserved!&lt;/p&gt;

&lt;p&gt;Over the years, Zhang has been highly active in the Apache SeaTunnel community. From improving code quality, refining documentation, to engaging with the community and mentoring newcomers, his presence has been everywhere. He consistently embraces the Apache Way, contributing with dedication and passion to the growth of the project.&lt;/p&gt;

&lt;p&gt;We took this opportunity to conduct an in-depth interview with him. Covering his background, open source journey, PMC role, and thoughts on community development and culture, this conversation offers a closer look at his story and his enthusiasm for open source.&lt;/p&gt;

&lt;h2&gt;
  
  
  Personal Background &amp;amp; Open Source Journey
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Could you briefly introduce yourself and how you entered the big data and open source space?
Name: Zhang Shenghang
GitHub: zhangshenghang&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxvnu7d1ec2vu0l315yhw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxvnu7d1ec2vu0l315yhw.jpg" width="415" height="312"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;When did you start contributing to Apache SeaTunnel, and what was the motivation?&lt;br&gt;
I started contributing to Apache SeaTunnel in June 2024. Initially, I was using DataX, a classic standalone data integration tool. However, it lacks service-oriented and distributed capabilities, which creates limitations in large-scale data synchronization scenarios. That’s when I came across Apache SeaTunnel as a more comprehensive solution.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What key contributions or features have you worked on in SeaTunnel?&lt;br&gt;
He has contributed to multiple core features and improvements, including adding a pending queue feature for SeaTunnel Engine task scheduling, enabling Kafka Protobuf format support, introducing Kerberos testing in e2e workflows, implementing a new resource scheduling algorithm in SeaTunnel Engine, adding TTL support for HBase Sink, introducing API-based log retrieval, fixing Flink source 100% busy issues, supporting the Typesense connector, enabling default value substitution for configuration variables, fixing Doris custom SQL execution issues, correcting Kafka consumer offset auto-commit logic, and resolving RabbitMQ checkpoint issues in Flink mode.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Open Source Contributions &amp;amp; Growth
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Which contribution or experience impressed you the most?&lt;br&gt;
What impressed me most was not just submitting a PR, but the full process—from discovering a problem, analyzing it, discussing solutions with the community, to finally implementing and validating the fix. Issues involving engine scheduling, resource allocation, and Flink stability often look simple on the surface but are deeply tied to framework mechanisms and runtime behavior. Solving them requires both deep code understanding and close collaboration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What is the most important skill in open source collaboration?&lt;br&gt;
All are important, but if I had to choose one, it would be the ability to collaborate continuously. Technical skills are foundational, but communication is equally critical—open source is not just about writing code, but explaining context, design decisions, and trade-offs clearly so others can understand.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What advice would you give to beginners in open source?&lt;br&gt;
Don’t overestimate the difficulty. You don’t need to start with massive features or deep architectural changes. Fixing a bug, improving documentation, adding tests, or optimizing small features are all valuable contributions.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Becoming a PMC Member
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Congratulations on becoming a PMC Member! What was your first reaction?&lt;br&gt;
Thank you. My first reaction was both excitement and a strong sense of responsibility. It’s recognition of past contributions, but also a reminder that a PMC Member is not just a contributor, but a community builder.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What does becoming a PMC Member mean to you and the community?&lt;br&gt;
To me, it represents recognition of long-term contributions, collaboration ability, and responsibility. Personally, it means thinking beyond individual modules and considering the project’s overall development, governance, and ecosystem. For the community, more PMC Members mean more people willing to take responsibility and drive sustainable growth.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How important is the Apache Way to open source success?&lt;br&gt;
It emphasizes “Community Over Code.” A project succeeds not just because of good code, but because of an open, transparent, and sustainable collaboration culture.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  SeaTunnel Community Development
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;What key milestones has SeaTunnel gone through?&lt;br&gt;
SeaTunnel has evolved from a data synchronization tool into a more comprehensive data integration platform, expanding across connectors, orchestration, engines, and observability. The maturation of SeaTunnel Engine is a major turning point, enabling stronger unified execution capabilities. Additionally, increased community activity and internationalization have significantly boosted its impact.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How do you see SeaTunnel’s position and future?&lt;br&gt;
SeaTunnel is building a unique position by balancing rich connectors, strong engine capabilities, scalability, and enterprise readiness. Compared to traditional tools, it fits modern data infrastructure better; compared to heavyweight platforms, it remains flexible and extensible. It has strong potential to become a leading global open source data integration project.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What are your future plans as a PMC Member?&lt;br&gt;
I plan to focus on improving SeaTunnel Engine, scheduling, resource management, and system stability; strengthening connectors and production readiness; and helping new contributors onboard faster through issue guidance, PR reviews, and knowledge sharing.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Personal Growth &amp;amp; Open Source Culture
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;How has open source impacted your career and growth?&lt;br&gt;
Professionally, it has exposed me to real-world complex problems and high-standard collaboration environments. Personally, it has deepened my understanding of collaboration, responsibility, and long-term thinking. Open source has shaped not only my technical skills but also my mindset and working style.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How would you summarize the spirit of open source in one sentence?&lt;br&gt;
Open source is about collaboratively creating, improving, and sharing technology in an open and inclusive way for the benefit of everyone.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>asf</category>
      <category>community</category>
      <category>bigdata</category>
      <category>apacheseatunnel</category>
    </item>
    <item>
      <title>Rethinking ClassLoader Governance in Apache SeaTunnel</title>
      <dc:creator>Apache SeaTunnel</dc:creator>
      <pubDate>Fri, 03 Apr 2026 02:45:04 +0000</pubDate>
      <link>https://forem.com/seatunnel/rethinking-classloader-governance-in-apache-seatunnel-2leh</link>
      <guid>https://forem.com/seatunnel/rethinking-classloader-governance-in-apache-seatunnel-2leh</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwjud5he2ysxi7mt0jg01.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwjud5he2ysxi7mt0jg01.jpg" width="800" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recently, while diving into the Apache SeaTunnel Zeta Engine codebase, I followed the ClassLoader thread and conducted a relatively systematic review.&lt;/p&gt;

&lt;p&gt;Overall, the current design already has a clear foundational structure, especially the centralized management approach of &lt;code&gt;ClassLoaderService&lt;/code&gt;, which is actually quite rare among similar systems 👍.&lt;/p&gt;

&lt;p&gt;Here, I try to take a different perspective—starting from &lt;strong&gt;“ClassLoader governance in long-running runtimes”&lt;/strong&gt;—to summarize some observations and outline a possible evolution path. These may not be entirely accurate, but are intended to spark discussion.&lt;/p&gt;

&lt;h2&gt;
  
  
  From “Usable” to “Governable”
&lt;/h2&gt;

&lt;p&gt;Apache SeaTunnel already supports well: multi-connector coexistence and dynamic loading and execution. From a “functional availability” perspective, the mechanism works. But if we move one step further and ask: &lt;strong&gt;can ClassLoaders have a controllable lifecycle and verifiable reclamation?&lt;/strong&gt; the evaluation criteria begin to change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observations (Runtime-Oriented)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The Semantic Gap Between “Release” and “Close”
&lt;/h3&gt;

&lt;p&gt;Currently, &lt;code&gt;releaseClassLoader()&lt;/code&gt; removes cache entries and performs some thread-level cleanup when the reference count drops to zero, but it does not explicitly call &lt;code&gt;URLClassLoader.close()&lt;/code&gt;. For example: &lt;code&gt;DefaultClassLoaderService.releaseClassLoader()&lt;/code&gt; (no close call observed) and &lt;code&gt;DefaultClassLoaderService.close()&lt;/code&gt; mainly clears internal cache structures. This raises a noteworthy point: JAR handle release depends on GC timing, and in long-running scenarios or on certain platforms (such as Windows), files may not be released promptly. 👉 This is closer to “logical release” rather than “end of resource lifecycle”.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Class Loading Boundaries Can Still Change at Runtime
&lt;/h3&gt;

&lt;p&gt;In some paths, dependencies are still injected into the current ClassLoader via &lt;code&gt;addURL&lt;/code&gt;, such as: reflective calls to &lt;code&gt;addURL&lt;/code&gt; in &lt;code&gt;AbstractPluginDiscovery&lt;/code&gt;, and plugin dependency injection into the current loader in Flink execution paths. This leads to an interesting phenomenon: class loading boundaries are not only defined by loader structure, but also influenced by runtime behavior. While not problematic for a single job, under scenarios like repeated jobs in the same process or switching plugin combinations, boundaries may accumulate “historical residue”.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Some Residual Surfaces Are Not Fully Closed
&lt;/h3&gt;

&lt;p&gt;There are multiple TCCL usage patterns in the codebase (synchronous / asynchronous / cross-thread), and some paths show: TCCL not restored in &lt;code&gt;finally&lt;/code&gt;, or inconsistent baselines during cross-thread restoration. For example: TCCL usage in cooperative workers within &lt;code&gt;TaskExecutionService&lt;/code&gt;, and asymmetric restoration in some operations (such as source / restore). Additionally, some typical ClassLoader retention points are not yet uniformly governed, such as JDBC Driver registration (e.g., TDengine-related implementations) and connectors directly setting TCCL without restoring it.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Possible Evolution Path (For Reference)
&lt;/h2&gt;

&lt;p&gt;Based on these observations, I’ve outlined a &lt;strong&gt;progressive governance path&lt;/strong&gt; that avoids large-scale refactoring and can be implemented in phases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: Close the ClassLoader Lifecycle
&lt;/h3&gt;

&lt;p&gt;Key ideas: explicitly call &lt;code&gt;close()&lt;/code&gt; on URLClassLoaders created by SeaTunnel at the appropriate time, and define clear ownership—“who creates, who closes”. This shifts from “GC-dependent release” to “controlled release”.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: Stabilize Loading Boundaries
&lt;/h3&gt;

&lt;p&gt;Goals: avoid runtime &lt;code&gt;addURL&lt;/code&gt; where possible, and determine the full classpath before loader creation. This ensures consistent behavior of the same loader over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: Consolidate Common Residual Points
&lt;/h3&gt;

&lt;p&gt;Standardize patterns such as: wrapping TCCL with try-with-resources, pairing JDBC Driver registration and deregistration, and clearly assigning ClassLoader ownership to threads and ThreadLocal. This turns implicit references into manageable resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 4: Introduce Verifiable Reclamation
&lt;/h3&gt;

&lt;p&gt;As an enhancement: use &lt;code&gt;WeakReference + ReferenceQueue&lt;/code&gt; to track loaders, or expose simple runtime metrics (e.g., number of live loaders). The goal is not absolute precision, but the ability to reasonably judge whether resources have been released.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;These issues rarely surface in short-lived tasks. But in scenarios such as long-running engine nodes, repeated task scheduling, or frequent plugin switching, these boundary issues accumulate over time. The results may include Metaspace growth, inability to replace JARs, and occasional class conflicts.&lt;/p&gt;

&lt;h2&gt;
  
  
  One-Sentence Summary
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;From “class isolation” to “governable ClassLoaders with verifiable reclamation.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The above reflects my current understanding and organization of the topic. Some points may not be entirely accurate—feedback and real-world scenarios are very welcome 🙌. If the community is interested, this could evolve into a more general and reusable infrastructure capability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Appendix: Code References
&lt;/h2&gt;

&lt;p&gt;Some code locations noted during analysis (not exhaustive): &lt;code&gt;DefaultClassLoaderService&lt;/code&gt; (release/close), &lt;code&gt;AbstractPluginDiscovery&lt;/code&gt; (addURL), Flink starter execution paths (plugin injection), &lt;code&gt;TaskExecutionService&lt;/code&gt; (TCCL usage), various operations (source/restore), and connectors (Iceberg / Paimon / TDengine, etc.).&lt;/p&gt;

</description>
      <category>classloader</category>
      <category>apacheseatunnel</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>From Apache SeaTunnel to ASF Member: A Story of Long-Term Commitment</title>
      <dc:creator>Apache SeaTunnel</dc:creator>
      <pubDate>Fri, 27 Mar 2026 03:15:17 +0000</pubDate>
      <link>https://forem.com/seatunnel/from-apache-seatunnel-to-asf-member-a-story-of-long-term-commitment-4pp9</link>
      <guid>https://forem.com/seatunnel/from-apache-seatunnel-to-asf-member-a-story-of-long-term-commitment-4pp9</guid>
      <description>&lt;p&gt;Recently, after internal discussions, the Apache Software Foundation invited several PMC Members from the Apache SeaTunnel project to become ASF Members—one of the highest honors within the foundation. Among them is &lt;strong&gt;Wang Hailin&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxp33vya9ozsbnl9drwnn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxp33vya9ozsbnl9drwnn.png" alt="3d5c8aaf1091f7a7ef66425e97d147bc" width="800" height="721"&gt;&lt;/a&gt;&lt;br&gt;
Congratulations to &lt;a class="mentioned-user" href="https://dev.to/wang"&gt;@wang&lt;/a&gt; Hailin on becoming an ASF Member! As a key contributor to the SeaTunnel community, this recognition is not only a personal milestone, but also a moment of pride for the entire community.&lt;/p&gt;

&lt;p&gt;Over the years, he has remained deeply involved in the community: from refining documentation to improving code, from participating in technical discussions to helping newcomers. His contributions can be seen across almost every corner of the project. Beyond SeaTunnel, he has also been actively contributing to multiple ASF projects, consistently practicing the Apache Way advocated by the foundation. It is this steady, long-term dedication that has led to this important recognition.&lt;/p&gt;

&lt;p&gt;To mark the occasion, the community conducted an in-depth interview with him. This article is structured into five sections—personal background, open-source journey, the path to ASF Member, SeaTunnel community development, and open-source culture—to give a closer look at his growth, his experiences in open source, and the passion and persistence behind his contributions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Personal Background &amp;amp; Open Source Journey
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Falcyr6qckib47t2xmgng.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Falcyr6qckib47t2xmgng.png" alt="王海林" width="800" height="1069"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q1: Could you briefly introduce yourself and how you got into big data and open source?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Hey guys, I’m Wang Hailin, and my GitHub ID is hailin0. I mainly work on data infrastructure, with a focus on data integration, data synchronization, and data platforms.&lt;/p&gt;

&lt;p&gt;Outside of work, I enjoy engaging with open-source communities—sharing practical experience and exchanging ideas around data platforms and integration technologies.&lt;/p&gt;

&lt;p&gt;My entry into big data and open source is closely tied to my earlier work experience. While working on systems like data development platforms and performance monitoring, I frequently dealt with data ingestion and synchronization challenges, which required exploring various data integration tools.&lt;/p&gt;

&lt;p&gt;That’s when I came across SeaTunnel. What stood out to me was its extensible architecture—it supports a wide range of data sources and complex synchronization scenarios, making it well-suited for enterprise use. This sparked my interest, and I gradually started contributing to the community. Over time, through continuous contributions and discussions, I became one of the core contributors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q2: When did you start contributing to SeaTunnel, and what was the trigger?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: It started from a practical need at work. At the time, I was building a data platform and needed a reliable data integration tool. During that evaluation process, I discovered SeaTunnel.&lt;/p&gt;

&lt;p&gt;Back then, the project wasn’t as mature as it is today, but its architecture left a strong impression on me—especially the plugin-based Connector system and the flexible data synchronization model.&lt;/p&gt;

&lt;p&gt;I began using SeaTunnel in real-world scenarios, and gradually got involved in contributing. Starting with small fixes and bug patches, I later participated in more feature development and community discussions, eventually becoming a long-term contributor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q3: What key areas or features have you contributed to in SeaTunnel?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: My contributions mainly fall into a few areas.&lt;/p&gt;

&lt;p&gt;Early on, I worked on Connector development and improvements. For a data integration platform, the Connector ecosystem is fundamental—it determines which data sources and systems the platform can connect to.&lt;/p&gt;

&lt;p&gt;As I became more involved, I also contributed to framework-level and infrastructure work, such as improving the E2E testing system and refining the logging framework to make the project more robust and standardized.&lt;/p&gt;

&lt;p&gt;Later, as I gained a deeper understanding of the synchronization engine, I started working on CDC (Change Data Capture) capabilities, including CDC read/write and DDL synchronization. In real production environments, schema changes (DDL) are unavoidable. If a system cannot handle schema evolution properly, data pipelines can easily break.&lt;/p&gt;

&lt;p&gt;Overall, these efforts are driven by a single goal: to make SeaTunnel not just a data synchronization tool, but a reliable data integration infrastructure for enterprise environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open Source Contributions &amp;amp; Growth
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q4: Which contribution or experience left the deepest impression on you?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: One experience that stands out is working on DDL support in CDC scenarios.&lt;/p&gt;

&lt;p&gt;At first glance, DDL may seem like a simple SQL parsing problem. But in a data synchronization system, it must flow correctly through the entire pipeline: from Source capturing the event, to passing it through the data stream, to executing schema changes on the Sink.&lt;/p&gt;

&lt;p&gt;The real challenge lies in maintaining consistency between DDL and data changes. In practice, synchronization jobs run concurrently across multiple nodes, so DDL events must maintain a consistent order throughout the distributed pipeline.&lt;/p&gt;

&lt;p&gt;This requires tight integration with state management mechanisms like Checkpoint and Savepoint, ensuring that after recovery or restart, DDL and data events remain in the correct order.&lt;/p&gt;

&lt;p&gt;When you combine all these factors, DDL handling becomes a system-level challenge involving distributed data flow, state consistency, and multi-system compatibility.&lt;/p&gt;

&lt;p&gt;This work took quite a long time and involved extensive discussions with other contributors. It’s one of the more complex aspects of many data synchronization systems, and we aimed to make SeaTunnel more reliable for enterprise real-time scenarios.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q5: What do you think is the most important skill in open source collaboration?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: I would say communication and collaboration are critical.&lt;/p&gt;

&lt;p&gt;Technical skills are the foundation, but many decisions in open source are made through discussion and consensus. Being able to clearly express your ideas, understand others’ perspectives, and move toward agreement is essential.&lt;/p&gt;

&lt;p&gt;Another important factor is patience and long-term commitment. Open source is not a short-term effort—it requires sustained involvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q6: What advice would you give to newcomers in open source?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Start small. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fix a bug&lt;/li&gt;
&lt;li&gt;Improve documentation&lt;/li&gt;
&lt;li&gt;Submit a small feature enhancement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This helps you get familiar with the codebase and development workflow.&lt;/p&gt;

&lt;p&gt;Also, participate in discussions. Even asking questions or joining simple conversations helps you understand the project’s design.&lt;/p&gt;

&lt;p&gt;Open source is a long journey—you don’t need to aim for big features at the beginning. What matters more is understanding the architecture, not just the code.&lt;/p&gt;

&lt;p&gt;Many core contributors grow over years—from users to contributors, and eventually to maintainers.&lt;/p&gt;

&lt;p&gt;For me, the biggest gain from open source is not a specific piece of code, but the opportunity to collaborate with developers from different companies and backgrounds. That experience is incredibly valuable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Becoming an ASF Member
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q7: What was your first reaction when you were invited to become an ASF Member?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: I was surprised and very grateful.&lt;/p&gt;

&lt;p&gt;ASF Membership is not something you apply for—it comes through nomination and voting by existing members. So it represents recognition from the community for long-term contributions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q8: How closely is this achievement tied to your work in SeaTunnel?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Very closely.&lt;/p&gt;

&lt;p&gt;The SeaTunnel community gave me many opportunities to grow—from contributing code to participating in community governance. Through this process, I gradually learned how Apache communities operate.&lt;/p&gt;

&lt;p&gt;It’s not just about technical contributions, but also collaboration and governance, which are all important factors in becoming an ASF Member.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q9: What does becoming an ASF Member mean to you?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: To me, it represents responsibility.&lt;/p&gt;

&lt;p&gt;It’s not only recognition of past contributions, but also a commitment to continue contributing to the Apache community—helping projects grow, supporting new projects entering the ecosystem, and promoting open-source culture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q10: How do you see the importance of the Apache Way?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: The Apache community emphasizes &lt;strong&gt;“Community Over Code.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A successful project needs not only strong technology, but also a healthy community, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open and transparent decision-making&lt;/li&gt;
&lt;li&gt;Consensus-driven governance&lt;/li&gt;
&lt;li&gt;Encouraging participation from diverse contributors&lt;/li&gt;
&lt;li&gt;Continuously welcoming new contributors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are key reasons why Apache projects can succeed in the long run.&lt;/p&gt;

&lt;h2&gt;
  
  
  SeaTunnel Community Development
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q11: What are the key milestones in SeaTunnel’s growth?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Several milestones stand out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Entering the Apache Incubator&lt;/li&gt;
&lt;li&gt;Unifying APIs and introducing the Zeta engine&lt;/li&gt;
&lt;li&gt;Graduating as a Top-Level Project (TLP)&lt;/li&gt;
&lt;li&gt;Rapid iteration in the 2.3.x series with increasing stability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SeaTunnel was open-sourced in 2017, entered the Apache Incubator in 2021, and became a TLP in 2023. This journey reflects not only technical evolution but also the maturation of community governance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q12: How do you see SeaTunnel’s positioning in data integration?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: In recent years, the demand for efficient data movement has grown significantly, and synchronization scenarios have become more complex.&lt;/p&gt;

&lt;p&gt;SeaTunnel aims to be a high-performance, extensible platform that supports diverse data integration needs across different use cases.&lt;/p&gt;

&lt;p&gt;It already supports multiple data sources, batch processing, real-time synchronization, and CDC.&lt;/p&gt;

&lt;p&gt;Looking ahead, I believe it will continue to evolve in areas such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expanding the connector ecosystem&lt;/li&gt;
&lt;li&gt;Strengthening data transformation capabilities&lt;/li&gt;
&lt;li&gt;Improving fault handling&lt;/li&gt;
&lt;li&gt;Enhancing ecosystem integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Open Source Culture &amp;amp; Personal Growth
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q13: How has open source influenced your career?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: It has influenced me in two major ways.&lt;/p&gt;

&lt;p&gt;First, it broadened my technical perspective. In company projects, decisions are often driven by specific business needs. In open source, designs must work across different use cases, systems, and organizations. This leads to a more comprehensive understanding of system design.&lt;/p&gt;

&lt;p&gt;Second, it deepened my understanding of software engineering and collaboration. In open source, a feature goes through idea proposal, design discussion, review, and iteration before merging. This process emphasizes design and communication, not just coding.&lt;/p&gt;

&lt;p&gt;Working with developers from different countries and backgrounds also brings fresh perspectives.&lt;/p&gt;

&lt;p&gt;For me, the biggest gain is the opportunity to collaborate in an open environment and solve problems with talented engineers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q14: How would you summarize the spirit of open source in one sentence?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Based on my experience, the most valuable aspect of open source is that it provides a space for long-term participation and growth.&lt;/p&gt;

&lt;p&gt;I started as a user, using tools to solve problems. Then I began contributing small fixes, and gradually got involved in feature development and core system design.&lt;/p&gt;

&lt;p&gt;Looking back, it’s a journey from user → contributor → maintainer.&lt;/p&gt;

&lt;p&gt;In a company, knowledge often stays within a team. In open source, your work can be seen, used, and improved by many others. As the project grows, so do the people involved.&lt;/p&gt;

&lt;p&gt;So if I had to summarize it in one sentence:&lt;/p&gt;

&lt;p&gt;Open source is not just about sharing code—it’s about growing together with the community.&lt;/p&gt;

</description>
      <category>apacheseatunnel</category>
      <category>asf</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>Apache SeaTunnel Performance Tuning: How to Set JVM Parameters the Right Way</title>
      <dc:creator>Apache SeaTunnel</dc:creator>
      <pubDate>Fri, 27 Mar 2026 03:13:13 +0000</pubDate>
      <link>https://forem.com/seatunnel/apache-seatunnel-performance-tuning-how-to-set-jvm-parameters-the-right-way-28e0</link>
      <guid>https://forem.com/seatunnel/apache-seatunnel-performance-tuning-how-to-set-jvm-parameters-the-right-way-28e0</guid>
      <description>&lt;p&gt;As a high-performance distributed data integration platform, properly tuning JVM parameters for Apache SeaTunnel is essential if you want better throughput, lower latency, and stable execution.&lt;/p&gt;

&lt;p&gt;So how should you tune JVM parameters?&lt;br&gt;
In this article, we’ll walk through where to configure them, how precedence works, the key parameters to focus on, and some practical tuning strategies.&lt;/p&gt;
&lt;h2&gt;
  
  
  1. Configuration File Locations
&lt;/h2&gt;

&lt;p&gt;SeaTunnel manages JVM parameters through configuration files under &lt;code&gt;$SEATUNNEL_HOME/config/&lt;/code&gt;. Depending on the deployment role, there are four main files:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File Name&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Default Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;jvm_options&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Hybrid mode (&lt;code&gt;master_and_worker&lt;/code&gt;), where Master and Worker run in the same process&lt;/td&gt;
&lt;td&gt;&lt;code&gt;-Xms2g -Xmx2g -XX:+UseG1GC&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;jvm_master_options&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Dedicated Master node, responsible for scheduling and state management (no computation)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;-Xms2g -Xmx2g&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;jvm_worker_options&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Dedicated Worker node, responsible for data reading, transformation, and writing (main memory consumer)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;-Xms2g -Xmx2g&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;jvm_client_options&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Client side (&lt;code&gt;seatunnel.sh&lt;/code&gt;), used to parse configs and submit jobs&lt;/td&gt;
&lt;td&gt;&lt;code&gt;-Xms256m -Xmx512m&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  2. Parameter Precedence
&lt;/h2&gt;

&lt;p&gt;Understanding parameter precedence is critical when troubleshooting.&lt;/p&gt;

&lt;p&gt;SeaTunnel loads JVM parameters in the following order, and &lt;strong&gt;later ones override earlier ones&lt;/strong&gt; (for example, the last &lt;code&gt;-Xmx&lt;/code&gt; wins):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Environment variable &lt;code&gt;JAVA_OPTS&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Loaded first. You can define it in system env variables or in &lt;code&gt;config/seatunnel-env.sh&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Configuration files (&lt;code&gt;config/jvm_*_options&lt;/code&gt;)&lt;/strong&gt;&lt;br&gt;
Loaded next, and &lt;strong&gt;override anything set in &lt;code&gt;JAVA_OPTS&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Command-line parameters (&lt;code&gt;-DJvmOption&lt;/code&gt;)&lt;/strong&gt;&lt;br&gt;
Loaded last, with &lt;strong&gt;the highest priority&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
If &lt;code&gt;JAVA_OPTS="-Xmx4g"&lt;/code&gt;, the config file sets &lt;code&gt;-Xmx2g&lt;/code&gt;, and the startup command includes &lt;code&gt;-DJvmOption="-Xmx8g"&lt;/code&gt;, then the effective value will be &lt;strong&gt;8g&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  3. Key JVM Tuning Parameters
&lt;/h2&gt;
&lt;h3&gt;
  
  
  3.1 Heap Memory
&lt;/h3&gt;

&lt;p&gt;Heap memory is the most important part of JVM tuning. It directly determines how much data SeaTunnel can process in parallel without running into OOM (Out Of Memory).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;-Xms&lt;/code&gt;&lt;/strong&gt;: Initial heap size&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;-Xmx&lt;/code&gt;&lt;/strong&gt;: Maximum heap size&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best practices:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Worker nodes&lt;/strong&gt;:&lt;br&gt;
It’s strongly recommended to set &lt;code&gt;-Xms&lt;/code&gt; and &lt;code&gt;-Xmx&lt;/code&gt; to the &lt;strong&gt;same value&lt;/strong&gt; (for example, &lt;code&gt;-Xms8g -Xmx8g&lt;/code&gt;).&lt;br&gt;
This avoids runtime heap resizing, reduces performance fluctuations, and helps prevent memory fragmentation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Master nodes&lt;/strong&gt;:&lt;br&gt;
Memory requirements are relatively low. In most cases, &lt;code&gt;2g–4g&lt;/code&gt; is sufficient. Increase it only if the cluster handles many jobs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Client&lt;/strong&gt;:&lt;br&gt;
The default &lt;code&gt;512m&lt;/code&gt; is usually enough. If your job configuration (SQL/JSON) is very large (tens of thousands of lines), consider increasing it to &lt;code&gt;1g&lt;/code&gt; or more.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3.2 Off-Heap Memory
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Important note:&lt;/strong&gt;&lt;br&gt;
You may notice that the actual physical memory (RSS) used by SeaTunnel is significantly larger than the &lt;code&gt;-Xmx&lt;/code&gt; value.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt;&lt;br&gt;
SeaTunnel uses Netty for network communication, which relies heavily on &lt;strong&gt;off-heap (direct) memory&lt;/strong&gt; for zero-copy data transfer.&lt;br&gt;
In addition, thread stacks (&lt;code&gt;-Xss * number of threads&lt;/code&gt;), Metaspace, and JVM overhead also consume non-heap memory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Risk:&lt;/strong&gt;&lt;br&gt;
If the machine runs out of physical memory, the Linux OOM Killer may terminate the process (usually a Worker).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Recommendations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reserve memory for the OS:&lt;/strong&gt;&lt;br&gt;
On an 8GB machine, keep &lt;code&gt;-Xmx&lt;/code&gt; below &lt;code&gt;5g&lt;/code&gt;, leaving around 3GB for off-heap memory and the operating system.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Docker/Kubernetes:&lt;/strong&gt;&lt;br&gt;
The container memory limit must be larger than &lt;code&gt;-Xmx&lt;/code&gt; plus estimated off-heap usage.&lt;br&gt;
A common rule is to set it to about &lt;strong&gt;1.5× &lt;code&gt;-Xmx&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3.3 Garbage Collector
&lt;/h3&gt;

&lt;p&gt;SeaTunnel’s Zeta engine recommends using &lt;strong&gt;G1GC&lt;/strong&gt;, which provides more predictable pause times for large heaps.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;-XX:+UseG1GC&lt;/code&gt;&lt;/strong&gt;: Enable G1 GC (enabled by default)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;-XX:MaxGCPauseMillis=200&lt;/code&gt;&lt;/strong&gt;: Target maximum GC pause time (in milliseconds)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-time workloads&lt;/strong&gt;:
If latency is critical, you can lower this value (e.g., &lt;code&gt;100&lt;/code&gt;).
Keep in mind this may increase GC frequency and slightly reduce overall throughput.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch workloads&lt;/strong&gt;:
The default &lt;code&gt;200ms&lt;/code&gt; is usually a good balance.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;-XX:InitiatingHeapOccupancyPercent=45&lt;/code&gt;&lt;/strong&gt;:&lt;br&gt;
Heap occupancy threshold that triggers concurrent GC.&lt;br&gt;
If you observe frequent Full GC, try lowering it (e.g., &lt;code&gt;40&lt;/code&gt;) so GC starts earlier.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3.4 Metaspace
&lt;/h3&gt;

&lt;p&gt;Metaspace stores class metadata. SeaTunnel consumes metaspace when loading connectors.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;-XX:MaxMetaspaceSize&lt;/code&gt;&lt;/strong&gt;: Maximum metaspace size&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The default (&lt;code&gt;2g&lt;/code&gt;) is usually sufficient.&lt;br&gt;
If you encounter &lt;code&gt;java.lang.OutOfMemoryError: Metaspace&lt;/code&gt;, increase it accordingly.&lt;/p&gt;
&lt;h3&gt;
  
  
  3.5 Troubleshooting
&lt;/h3&gt;

&lt;p&gt;When OOM happens, heap dumps are extremely helpful for diagnosis.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;-XX:+HeapDumpOnOutOfMemoryError&lt;/code&gt;&lt;/strong&gt;: Generate a heap dump automatically on OOM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;-XX:HeapDumpPath=/tmp/seatunnel/dump/&lt;/code&gt;&lt;/strong&gt;: Path to store dump files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Notes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Make sure the disk has enough space (at least larger than &lt;code&gt;-Xmx&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;In container environments, ensure the path is mounted to the host; otherwise, dumps will be lost after restart&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  4. JDK Compatibility
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Recommended versions&lt;/strong&gt;: &lt;strong&gt;Java 8 (JDK 1.8)&lt;/strong&gt; or &lt;strong&gt;Java 11&lt;/strong&gt;&lt;br&gt;
These are the most thoroughly tested versions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Java 17+&lt;/strong&gt;:&lt;br&gt;
Generally supported, but due to the module system introduced in Java 9+, you may encounter &lt;code&gt;InaccessibleObjectException&lt;/code&gt; caused by restricted reflection access.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;br&gt;
If this happens, add &lt;code&gt;--add-opens&lt;/code&gt; options in &lt;code&gt;jvm_options&lt;/code&gt;, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;--add-opens&lt;/span&gt; java.base/java.lang&lt;span class="o"&gt;=&lt;/span&gt;ALL-UNNAMED
&lt;span class="nt"&gt;--add-opens&lt;/span&gt; java.base/java.util&lt;span class="o"&gt;=&lt;/span&gt;ALL-UNNAMED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Production Tuning Scenarios
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario 1: Large-Scale Batch Processing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Characteristics&lt;/strong&gt;: Large data volume (TB scale), throughput is the priority&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Worker recommendation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;-Xms8g&lt;/span&gt; &lt;span class="nt"&gt;-Xmx8g&lt;/span&gt;
&lt;span class="nt"&gt;-XX&lt;/span&gt;:+UseG1GC
&lt;span class="nt"&gt;-XX&lt;/span&gt;:ParallelGCThreads&lt;span class="o"&gt;=&lt;/span&gt;8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Notes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the source reads data too quickly, memory may build up&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Besides increasing heap size, consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Limiting &lt;code&gt;read_limit.rows_per_second&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Adjusting &lt;code&gt;parallelism&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario 2: Real-Time CDC Synchronization
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Characteristics&lt;/strong&gt;: Long-running jobs, latency-sensitive, relatively stable memory usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Worker recommendation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;-Xms4g&lt;/span&gt; &lt;span class="nt"&gt;-Xmx4g&lt;/span&gt;
&lt;span class="nt"&gt;-XX&lt;/span&gt;:+UseG1GC
&lt;span class="nt"&gt;-XX&lt;/span&gt;:MaxGCPauseMillis&lt;span class="o"&gt;=&lt;/span&gt;100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Notes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Checkpoint frequency also affects memory usage (state backend caching)&lt;/li&gt;
&lt;li&gt;If memory pressure is high, consider increasing &lt;code&gt;checkpoint.interval&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario 3: Low-Memory Deployment (e.g., 4GB)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Risk&lt;/strong&gt;: High chance of being killed by the OS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Worker recommendation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;-Xmx2560m&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Allocate about 2.5GB to heap&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Leave the remaining 1.5GB for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Off-heap memory (Netty)&lt;/li&gt;
&lt;li&gt;OS&lt;/li&gt;
&lt;li&gt;Other processes&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  6. How to Verify Your Configuration
&lt;/h2&gt;

&lt;p&gt;After starting SeaTunnel, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;jps &lt;span class="nt"&gt;-v&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;SeaTunnel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;12345 SeaTunnelServer ... &lt;span class="nt"&gt;-Xms8g&lt;/span&gt; &lt;span class="nt"&gt;-Xmx8g&lt;/span&gt; &lt;span class="nt"&gt;-XX&lt;/span&gt;:+UseG1GC ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure your parameters (e.g., &lt;code&gt;-Xmx8g&lt;/code&gt;) appear &lt;strong&gt;at the end of the list&lt;/strong&gt; (or are not overridden by later ones).&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Docker / Kubernetes-Specific Configuration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  7.1 Recommended Approach: Container-Aware Memory
&lt;/h3&gt;

&lt;p&gt;In Kubernetes, memory is typically controlled via &lt;code&gt;resources.limits.memory&lt;/code&gt;.&lt;br&gt;
Instead of hardcoding &lt;code&gt;-Xmx&lt;/code&gt;, it’s better to use percentage-based settings so the JVM can adapt automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;JAVA_OPTS&lt;/span&gt;
    &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-XX:+UseContainerSupport&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-XX:MaxRAMPercentage=70.0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-XshowSettings:vm"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;-XX:+UseContainerSupport&lt;/code&gt;: Allows JVM to detect container limits&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-XX:MaxRAMPercentage=70.0&lt;/code&gt;: Sets heap to 70% of container memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why 70%?&lt;/strong&gt;&lt;br&gt;
The remaining 30% is needed for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Direct memory (Netty)&lt;/li&gt;
&lt;li&gt;Metaspace&lt;/li&gt;
&lt;li&gt;Thread stacks&lt;/li&gt;
&lt;li&gt;JVM overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7.2 Resource Limits
&lt;/h3&gt;

&lt;p&gt;Make sure Kubernetes resource settings align with JVM needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Want 8GB heap&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JVM: 70%&lt;/li&gt;
&lt;li&gt;K8s limit: &lt;code&gt;8 / 0.7 ≈ 11.5GB&lt;/code&gt; → set to &lt;code&gt;12Gi&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;12Gi"&lt;/span&gt;
    &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4"&lt;/span&gt;
  &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;12Gi"&lt;/span&gt;
    &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  7.3 Overriding Default Config
&lt;/h3&gt;

&lt;p&gt;If default config files already define memory settings, they may override &lt;code&gt;JAVA_OPTS&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To ensure your settings take effect:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use command-line parameters (highest priority):&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-DJvmOption=-XX:MaxRAMPercentage=70.0"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Mount custom config files via ConfigMap&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  7.4 Common Pitfalls
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;❌ Setting &lt;code&gt;limits.memory = 4Gi&lt;/code&gt; and &lt;code&gt;-Xmx4g&lt;/code&gt;&lt;br&gt;
→ No space left for non-heap memory → process will be killed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;❌ Not setting &lt;code&gt;requests&lt;/code&gt;&lt;br&gt;
→ Pod may be scheduled on a node without enough memory&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Code References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;jvm_options&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;seatunnel-cluster.sh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;values.yaml&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>apacheseatunnel</category>
      <category>opensource</category>
      <category>jvm</category>
      <category>ai</category>
    </item>
    <item>
      <title>Why Your ADS Layer Always Goes Wild and How a Strong DWS Layer Fixes It</title>
      <dc:creator>Apache SeaTunnel</dc:creator>
      <pubDate>Fri, 20 Mar 2026 10:13:54 +0000</pubDate>
      <link>https://forem.com/seatunnel/why-your-ads-layer-always-goes-wild-and-how-a-strong-dws-layer-fixes-it-4cfa</link>
      <guid>https://forem.com/seatunnel/why-your-ads-layer-always-goes-wild-and-how-a-strong-dws-layer-fixes-it-4cfa</guid>
      <description>&lt;p&gt;In a data warehouse system, the DWS and ADS layers mark the critical boundary between “data modeling” and “data delivery.” The former carries shared aggregation and metric reuse capabilities, determining the stability and efficiency of the data system; the latter is oriented toward specific consumption scenarios, directly impacting business delivery efficiency and user experience.&lt;/p&gt;

&lt;p&gt;If the DWS layer is poorly designed, metrics will be repeatedly produced in the ADS layer, ultimately leading to inconsistent definitions and siloed data; if the ADS layer runs out of control, it can even backfire on the shared layer, forming unmanageable data assets. Therefore, a healthy data system must establish a clear boundary and evolution mechanism between “shared foundation” and “flexible delivery.”&lt;/p&gt;

&lt;p&gt;As the fourth article in the Data Lakehouse design and practice series, this piece systematically summarizes &lt;strong&gt;the core design principles of the DWS/ADS delivery layer&lt;/strong&gt;, including methods for shared aggregation and subject-wide table modeling, metric definition frameworks, delivery layer strategies, and lifecycle governance practices. It also addresses common issues, helping teams build a highly reusable, governable, and sustainable data delivery system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why DWS Must Be “Thick Enough”
&lt;/h2&gt;

&lt;p&gt;In many team data systems, the DWS layer is often underestimated or even weakened, resulting in all requirements being pushed to the ADS layer. In the short term, this seems flexible, but over time it quickly spirals out of control.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgtm5tm3b03fhpgvqzkf7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgtm5tm3b03fhpgvqzkf7.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The core positioning of DWS is as a shared aggregation and reuse layer. It is not designed to serve a single report, but to provide a unified data foundation for &lt;strong&gt;multiple applications to share&lt;/strong&gt;. If this layer is underdeveloped, every new requirement will trigger recalculation and redefinition of metrics, resulting in a bunch of incompatible results.&lt;/p&gt;

&lt;p&gt;In practice, a healthy state is: &lt;strong&gt;about 70% of analytical needs can be directly fulfilled by combining DWS tables.&lt;/strong&gt; This means most scenarios do not require creating new tables, but rather combining existing shared capabilities. This “ready-to-use” capability is the core of reuse value.&lt;/p&gt;

&lt;p&gt;Conversely, if each department has its own ADS tables and each report has its own metric definitions, typical silo problems emerge: metrics with the same name do not match, computations are duplicated, and data cannot be aligned. Teams spend most of their time reconciling definitions instead of analyzing business.&lt;/p&gt;

&lt;p&gt;The value of DWS lies precisely in solving these common issues. By precomputing aggregated results of high-frequency dimension combinations, building subject-wide tables, and unifying metric outputs, DWS moves dispersed computations to the offline layer. As a result, online queries no longer rely on temporary large-scale joins or full table scans, making performance and cost more controllable.&lt;/p&gt;

&lt;p&gt;More importantly, it changes team collaboration. Metrics no longer depend on verbal agreements—they exist as data assets: with owners, definitions, lineage, and quality rules. So-called “metric disputes” essentially become “asset governance issues.”&lt;/p&gt;

&lt;p&gt;But there is a prerequisite: DWS must be governable. If fields lack explanations, metrics lack definitions, update frequency is unclear, or quality rules are missing, this layer will become a “wide-table collection nobody dares to use,” reducing reuse rates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shared Aggregation and Subject-Wide Tables: Balancing Reuse and Performance
&lt;/h2&gt;

&lt;p&gt;DWS design revolves around two types of tables: shared aggregation tables and subject-wide tables.&lt;/p&gt;

&lt;p&gt;Shared aggregation tables hinge on &lt;strong&gt;clarity&lt;/strong&gt;. They must clearly define aggregation granularity (e.g., daily, weekly, monthly, or cumulative), dimension combinations (e.g., time, organization, channel, category), and metric calculation scope (e.g., amount, count, or frequency). Without clear boundaries, downstream reuse becomes unreliable.&lt;/p&gt;

&lt;p&gt;Subject-wide tables emphasize &lt;strong&gt;usability&lt;/strong&gt;. They usually focus on a business domain, e.g., users, transactions, or products, flattening frequently joined dimensions in advance to reduce query complexity. Importantly, wide tables are a result-oriented form for analytics—they are &lt;strong&gt;not a replacement for fact tables&lt;/strong&gt; and must be traceable back to underlying models.&lt;/p&gt;

&lt;p&gt;A common practical problem is wide tables continually growing. To mitigate this, fields can be governed based on usage frequency: retain high-frequency fields in the main wide table, split or join low-frequency fields on demand, and regularly slim tables according to usage.&lt;/p&gt;

&lt;p&gt;Another common pitfall is mixing different aggregation levels in the same table, e.g., daily and monthly data together. This greatly increases misuse risk and complicates maintenance. A better approach is to split tables by level or at least enforce strict naming conventions.&lt;/p&gt;

&lt;p&gt;All these designs assume &lt;strong&gt;consistent dimensions&lt;/strong&gt; exist. Core dimensions such as user, organization, channel, and time must have unified codes and definitions, otherwise cross-table reuse fails.&lt;/p&gt;

&lt;p&gt;From a performance perspective, DWS’s core strategy is always &lt;strong&gt;pre-aggregation first&lt;/strong&gt;. Reduce data scan scale via offline computation before applying indexing, partitioning, or materialized views. Otherwise, all optimizations become remedial measures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Metric Framework: Layered Design from Atomic to Composite
&lt;/h2&gt;

&lt;p&gt;If DWS solves &lt;strong&gt;data reuse&lt;/strong&gt;, then the metric framework ensures &lt;strong&gt;definition consistency&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A governable metric system typically has three levels: atomic metrics, derived metrics, and composite metrics.&lt;/p&gt;

&lt;p&gt;Atomic metrics are the fundamental units. They must clearly define the target, scope, filters, and time granularity. For example, “successful payment amount” must clearly count only successful payments and use the payment completion time.&lt;/p&gt;

&lt;p&gt;Derived metrics are calculated from atomic metrics. For example, average order value = “successful payment amount / number of successful orders.” Key here: derived metrics must inherit atomic metric definitions, or bias will occur.&lt;/p&gt;

&lt;p&gt;Composite metrics span multiple processes or business domains, e.g., conversion rate, retention, or repeat purchase. These rely heavily on a consistent dimension system and event definitions, making them the most prone to ambiguity.&lt;/p&gt;

&lt;p&gt;To avoid confusion, every metric must have four elements: business definition, calculation formula, scope, and time granularity. This is not just documentation—it is the basis for traceability and auditability.&lt;/p&gt;

&lt;p&gt;Metrics must also support version control. Changes to definitions cannot overwrite historical results directly; versions or effective dates should be used to prevent “historical data being rewritten.”&lt;/p&gt;

&lt;p&gt;In terms of layering, atomic metrics should reside in DWS (or traceable to DWD), while ADS handles only lightweight combination and presentation. If ADS takes on definition duties, it quickly becomes a new “metric generation layer.”&lt;/p&gt;

&lt;h2&gt;
  
  
  ADS and Data Marts: Delivery for Consumption
&lt;/h2&gt;

&lt;p&gt;If DWS is about &lt;strong&gt;accumulation&lt;/strong&gt;, ADS is about &lt;strong&gt;delivery&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;ADS (or DM, data marts) aims to provide data products for specific consumption scenarios, e.g., BI reports, API services, or analytical datasets. Structures here emphasize &lt;strong&gt;usability&lt;/strong&gt;, not generality.&lt;/p&gt;

&lt;p&gt;Delivery tables should follow a &lt;strong&gt;“one table, one scenario”&lt;/strong&gt; principle. Field names can be closer to business semantics, and additional display, sort, or status fields can be added to improve user experience.&lt;/p&gt;

&lt;p&gt;But one bottom line must be enforced: &lt;strong&gt;delivery should not invent metrics&lt;/strong&gt;. All core metrics must come from DWS or the metric system; ADS only handles combination, formatting, and lightweight calculation. Violating this quickly returns to “one metric per report.”&lt;/p&gt;

&lt;p&gt;Update frequency must respect business SLA. Daily, hourly, or minute-level updates directly affect compute chains and resource costs. The higher the frequency, the more careful you must be with field scale and calculation complexity.&lt;/p&gt;

&lt;p&gt;Governance of data marts is also crucial. They can be department- or scenario-specific, but must be built on a unified dimension and metric framework. Views or semantic layers may meet variation needs, but duplicating underlying logic is not allowed.&lt;/p&gt;

&lt;h2&gt;
  
  
  From “Fast Delivery” to “Sustainable Evolution”
&lt;/h2&gt;

&lt;p&gt;Early on, many teams experience a phase: stacking tables in ADS for fast delivery. Initially responsive, but over time, problems emerge—delivery layers balloon, shared layers hollow out, and maintenance costs soar.&lt;/p&gt;

&lt;p&gt;A healthier model: &lt;strong&gt;gradually thicken the shared layer (DWS), keep the delivery layer light, and continuously recover general capabilities back to DWS.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This also implies delivery tables must support lifecycle management. Track usage frequency, retire low-value tables, or recycle general fields and metrics back to the shared layer to avoid duplication.&lt;/p&gt;

&lt;p&gt;Ultimately, a mature data system is not “built fast,” but “used long.” Layered DWS and ADS design underpins this long-term evolution.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>ads</category>
      <category>database</category>
      <category>mongodb</category>
    </item>
    <item>
      <title>SeaTunnel Gravitino: Schema URL–Driven Automatic Table Structure Detection</title>
      <dc:creator>Apache SeaTunnel</dc:creator>
      <pubDate>Fri, 20 Mar 2026 09:53:29 +0000</pubDate>
      <link>https://forem.com/seatunnel/seatunnel-x-gravitino-schema-url-driven-automatic-table-structure-detection-3e59</link>
      <guid>https://forem.com/seatunnel/seatunnel-x-gravitino-schema-url-driven-automatic-table-structure-detection-3e59</guid>
      <description>&lt;p&gt;Recently, the community published an article titled &lt;a href="https://medium.com/@apacheseatunnel/say-goodbye-to-hand-written-schemas-bedbf1a49cf3" rel="noopener noreferrer"&gt;“Say Goodbye to Hand-Written Schemas! SeaTunnel’s Integration with Gravitino Metadata REST API Is a Really Cool Move”&lt;/a&gt;, which drew strong reactions from readers, with many saying, “This is really awesome!”&lt;/p&gt;

&lt;p&gt;The contributor behind this feature is extremely proactive, and it’s expected to be available soon (according to reliable sources, likely in version 3.0.0). To help the community better understand it, the contributor wrote a detailed article explaining the initial capabilities of the Gravitino REST API and how to use it—let’s take a closer look!&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Background and Problems to Solve
&lt;/h2&gt;

&lt;p&gt;When using Apache SeaTunnel for batch or sync tasks, if the source is unstructured or semi-structured, &lt;strong&gt;the source usually requires an explicit schema definition&lt;/strong&gt; (field names, types, order).&lt;/p&gt;

&lt;p&gt;In real production environments, this leads to several typical issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tables have many fields and complex types, making manual schema maintenance costly and error-prone&lt;/li&gt;
&lt;li&gt;Upstream table structure changes (adding fields, changing types) require corresponding updates to SeaTunnel jobs&lt;/li&gt;
&lt;li&gt;For existing tables, simply syncing data still requires repeated metadata description, leading to redundancy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thus, the core question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Can SeaTunnel directly reuse table structure definitions from an existing metadata system, instead of declaring schema repeatedly in jobs?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This feature was introduced to solve this problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Introduction to Gravitino (Relevant Capabilities)
&lt;/h2&gt;

&lt;p&gt;Gravitino is a unified metadata management and access service, providing standardized REST APIs to manage and expose the following objects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metalake (logical isolation unit)&lt;/li&gt;
&lt;li&gt;Catalogs (e.g., MySQL, Hive, Iceberg)&lt;/li&gt;
&lt;li&gt;Schema / Database&lt;/li&gt;
&lt;li&gt;Table and its field definitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With Gravitino:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Table structures can be &lt;strong&gt;centrally managed&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Downstream systems can dynamically fetch schema definitions via &lt;strong&gt;HTTP APIs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;No need to maintain field information in every compute or sync job&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The new capability introduced in SeaTunnel is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Support for automatically pulling table structures via &lt;code&gt;schema_url&lt;/code&gt; provided by Gravitino in the source schema definition.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  3. Local Test Environment Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 Prepare MySQL Environment
&lt;/h3&gt;

&lt;h4&gt;
  
  
  3.1.1 Create Target Table
&lt;/h4&gt;

&lt;p&gt;Pre-create the target table &lt;code&gt;test.demo_user&lt;/code&gt; in MySQL with the following SQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;`demo_user`&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nv"&gt;`id`&lt;/span&gt; &lt;span class="nb"&gt;bigint&lt;/span&gt; &lt;span class="nb"&gt;unsigned&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="n"&gt;AUTO_INCREMENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`user_code`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`user_name`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`password`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`email`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`phone`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`gender`&lt;/span&gt; &lt;span class="nb"&gt;tinyint&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`age`&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`status`&lt;/span&gt; &lt;span class="nb"&gt;tinyint&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`level`&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`score`&lt;/span&gt; &lt;span class="nb"&gt;decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`balance`&lt;/span&gt; &lt;span class="nb"&gt;decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`is_deleted`&lt;/span&gt; &lt;span class="nb"&gt;tinyint&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`register_ip`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`last_login_ip`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`login_count`&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`remark`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`ext1`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`ext2`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`ext3`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`ext4`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`ext5`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`created_by`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`updated_by`&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`created_time`&lt;/span&gt; &lt;span class="nb"&gt;datetime&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`updated_time`&lt;/span&gt; &lt;span class="nb"&gt;datetime&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`birthday`&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`last_login_time`&lt;/span&gt; &lt;span class="nb"&gt;datetime&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nv"&gt;`version`&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;`id`&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="nv"&gt;`uk_user_code`&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;`user_code`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ENGINE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;InnoDB&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;CHARSET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;utf8mb4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3.1.2 Create the Table Schema to Sync
&lt;/h4&gt;

&lt;p&gt;In practice, table structures might be managed centrally in components like &lt;code&gt;paimon&lt;/code&gt;, &lt;code&gt;hive&lt;/code&gt;, or &lt;code&gt;hudi&lt;/code&gt;. For testing, the table schema points to the target table &lt;code&gt;test.demo_user&lt;/code&gt; created in the previous step.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Register the Table Schema in Gravitino
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Gravitino supports direct database connections and scans all tables in a database&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fablntfy94wjck5i4cpj9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fablntfy94wjck5i4cpj9.png" alt="img" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This table is managed in Gravitino as a table under the &lt;code&gt;local-mysql&lt;/code&gt; catalog.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fze7w3w0izx2jq4teno24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fze7w3w0izx2jq4teno24.png" alt="img\_1" width="800" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Metalake: &lt;code&gt;test_Metalake&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.3 Table Structure Access Explanation
&lt;/h3&gt;

&lt;p&gt;Table structures in Gravitino can be accessed via the REST API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:8090/api/metalakes/test_Metalake/catalogs/${catalog}/schemas/${schema}/tables/${table}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this test, the actual &lt;code&gt;schema_url&lt;/code&gt; used is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:8090/api/metalakes/test_Metalake/catalogs/local-mysql/schemas/test/tables/demo_user
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The returned JSON contains the complete field definitions of the &lt;code&gt;demo_user&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flpvr81wp05j9l2hupvha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flpvr81wp05j9l2hupvha.png" alt="img\_2" width="800" height="357"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.4 Local Deployment of SeaTunnel
&lt;/h3&gt;

&lt;p&gt;Since this feature hasn’t been officially released, you need to manually compile the latest &lt;code&gt;dev&lt;/code&gt; branch and deploy it locally.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.5 Prepare Data Files
&lt;/h3&gt;

&lt;p&gt;This test case uses a CSV file containing 2,000 records.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdb03go0l8uj41xwhct7m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdb03go0l8uj41xwhct7m.png" alt="img\_3" width="800" height="132"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. SeaTunnel Job Configuration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4.1 Core Configuration Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hocon"&gt;&lt;code&gt;&lt;span class="nl"&gt;env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;parallelism&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;job.mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BATCH"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;LocalFile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/Users/wangxuepeng/Desktop/seatunnel/apache-seatunnel-2.3.13-SNAPSHOT/test_data"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;file_format_type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"csv"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;schema&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;schema_url&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8090/api/metalakes/test_Metalake/catalogs/local-mysql/schemas/test/tables/demo_user"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;sink&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;jdbc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="k"&gt;url&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"jdbc:mysql://localhost:3306/test"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;driver&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"com.mysql.cj.jdbc.Driver"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;username&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"root"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;password&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"123456"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;database&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;table&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"demo_user"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;generate_sink_sql&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.2 Key Configuration Notes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;schema.schema_url&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Points to the table metadata REST API in Gravitino&lt;/li&gt;
&lt;li&gt;SeaTunnel automatically fetches the table schema at job start&lt;/li&gt;
&lt;li&gt;No need to manually declare field lists in jobs&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;generate_sink_sql = true&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sink automatically generates INSERT SQL based on the parsed schema&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  5. Data and Job Execution Results
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Log screenshot:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffud8tvig4gx1xpzxdrtm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffud8tvig4gx1xpzxdrtm.png" alt="img\_4" width="800" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;During job execution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Source automatically parses field structure via &lt;code&gt;schema_url&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;CSV fields automatically align with the table schema&lt;/li&gt;
&lt;li&gt;Data successfully written to MySQL &lt;code&gt;demo_user&lt;/code&gt; table&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  6. FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  6.1 Supported Connectors
&lt;/h3&gt;

&lt;p&gt;Currently, the &lt;code&gt;dev&lt;/code&gt; branch supports file-type connectors including &lt;code&gt;local&lt;/code&gt;, &lt;code&gt;hdfs&lt;/code&gt;, &lt;code&gt;s3&lt;/code&gt;, etc.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.2 Does &lt;code&gt;schema_url&lt;/code&gt; support multiple tables?
&lt;/h3&gt;

&lt;p&gt;The feature does not affect multi-table functionality and can be used in combination, e.g.:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hocon"&gt;&lt;code&gt;&lt;span class="nl"&gt;source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;LocalFile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;tables_configs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/seatunnel/read/metalake/table1"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;file_format_type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"csv"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;field_delimiter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;","&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;row_delimiter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;skip_header_row_number&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;schema&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;table&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"db.table1"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;fields&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;c_string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;string&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;c_int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;int&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;c_boolean&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;boolean&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;c_double&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;double&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/seatunnel/read/metalake/table2"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;file_format_type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"csv"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;field_delimiter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;","&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;row_delimiter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;skip_header_row_number&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;schema&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;table&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"db.table2"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;schema_url&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://gravitino:8090/api/metalakes/test_metalake/catalogs/test_catalog/schemas/test_schema/tables/table2"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Feature Summary
&lt;/h2&gt;

&lt;p&gt;By introducing &lt;strong&gt;Gravitino &lt;code&gt;schema_url&lt;/code&gt;–based automatic schema parsing&lt;/strong&gt;, SeaTunnel gains the following advantages in data sync scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Eliminates repeated schema definitions, reducing job configuration complexity&lt;/li&gt;
&lt;li&gt;Reuses a unified metadata management system, improving consistency&lt;/li&gt;
&lt;li&gt;Job-friendly in case of table structure changes, significantly lowering maintenance costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This feature is ideal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enterprises with mature metadata platforms&lt;/li&gt;
&lt;li&gt;Large tables with many fields or frequent schema changes&lt;/li&gt;
&lt;li&gt;Users seeking improved maintainability of SeaTunnel jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  8. References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Code PR&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://github.com/apache/seatunnel/pull/10402" rel="noopener noreferrer"&gt;https://github.com/apache/seatunnel/pull/10402&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;schema_url&lt;/code&gt; Configuration Docs&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://seatunnel.apache.org/zh-CN/docs/introduction/concepts/schema-feature#schema_url" rel="noopener noreferrer"&gt;https://seatunnel.apache.org/zh-CN/docs/introduction/concepts/schema-feature#schema_url&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>gravitino</category>
      <category>apacheseatunnel</category>
      <category>opensource</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Apache SeaTunnel 2.3.13 Major Release! Top 10 Features You Should Know</title>
      <dc:creator>Apache SeaTunnel</dc:creator>
      <pubDate>Fri, 20 Mar 2026 09:40:59 +0000</pubDate>
      <link>https://forem.com/seatunnel/apache-seatunnel-2313-major-release-top-10-features-you-should-know-94i</link>
      <guid>https://forem.com/seatunnel/apache-seatunnel-2313-major-release-top-10-features-you-should-know-94i</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqif2qqdenxyzo3u7zwsg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqif2qqdenxyzo3u7zwsg.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
Apache SeaTunnel community officially released &lt;strong&gt;version 2.3.13&lt;/strong&gt;! This release is a milestone for Apache SeaTunnel, bringing important features such as &lt;strong&gt;Checkpoint API, Flink engine upgrade, large file parallel processing, multi-table sync, AI Embedding Transform, and richer connector extensions&lt;/strong&gt;. Whether for batch processing or real-time CDC syncing to Lakehouse, SeaTunnel can now support your data integration tasks more efficiently, stably, and intelligently.&lt;/p&gt;

&lt;p&gt;Thanks to &lt;strong&gt;50+ community contributors&lt;/strong&gt;, this release includes &lt;strong&gt;100+ PRs&lt;/strong&gt; of new features, optimizations, and bug fixes. If you are building &lt;strong&gt;data warehouses, real-time sync platforms, or AI data pipelines&lt;/strong&gt;, this release is worth your attention.&lt;/p&gt;

&lt;p&gt;No time to read the full Release Notes? No worries, here are the &lt;strong&gt;Top 10 features of this release&lt;/strong&gt; with PR references for your reference.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full Release Note: &lt;a href="https://github.com/apache/seatunnel/releases/tag/2.3.13" rel="noopener noreferrer"&gt;https://github.com/apache/seatunnel/releases/tag/2.3.13&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  01 New Checkpoint API Enhances Task Fault Tolerance
&lt;/h2&gt;

&lt;p&gt;In data sync tasks, checkpoints are one of the core mechanisms to ensure task reliability. SeaTunnel 2.3.13 introduces &lt;strong&gt;Checkpoint API&lt;/strong&gt; (#10065), making task state management more flexible and providing a solid foundation for future scheduling and operation capabilities. The Zeta engine supports &lt;strong&gt;min-pause configuration&lt;/strong&gt; (#9804) to avoid system pressure caused by frequent checkpoints.&lt;/p&gt;

&lt;p&gt;Monitoring has also been enhanced, such as adding Sink commit metrics and calculating commit rate (#10233), returning PendingJobs information in the task overview interface (#9902), and providing REST API to view the Pending queue (#10078).&lt;/p&gt;

&lt;p&gt;These capabilities help users better understand task execution status and optimize checkpoint strategies.&lt;/p&gt;

&lt;h2&gt;
  
  
  02 Flink 1.20.1 Support and Enhanced CDC
&lt;/h2&gt;

&lt;p&gt;On the engine side, this version improves Apache Flink support. SeaTunnel now supports &lt;strong&gt;Flink 1.20.1&lt;/strong&gt; (#9576), and CDC sync capabilities have been enhanced. CDC Source now supports &lt;strong&gt;Schema Evolution&lt;/strong&gt; (#9867), automatically adapting sync tasks to source table structure changes.&lt;/p&gt;

&lt;p&gt;Additionally, NO_CDC Source also supports checkpoints (#10094), improving task recovery. These changes make SeaTunnel more stable in scenarios with frequent database schema changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  03 Large File Parallel Reading Significantly Improved
&lt;/h2&gt;

&lt;p&gt;In real data platforms, large amounts of data often exist as files, such as HDFS, object storage, or local file systems.&lt;/p&gt;

&lt;p&gt;This release significantly optimizes file processing performance. HDFS File Connector supports true large file parallel splitting (#10332), LocalFile Connector supports CSV, Text, JSON large file parallel reading (#10142), and Parquet files now support Logical Split (#10239).&lt;/p&gt;

&lt;p&gt;HDFS File also supports multi-table reading (#9816). These improvements significantly increase throughput for TB-scale file processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  04 File Connector Adds Update Sync Mode
&lt;/h2&gt;

&lt;p&gt;Previously, file sync tasks only supported append or overwrite. In this version, multiple file connectors add &lt;strong&gt;sync_mode=update&lt;/strong&gt;, including FTP, SFTP, and LocalFile Source (#10437), and HdfsFile Source (#10268). This allows file sync tasks to support update semantics, better fitting incremental data processing scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  05 Connector Ecosystem Expansion
&lt;/h2&gt;

&lt;p&gt;SeaTunnel 2.3.13 continues to expand and enhance the connector ecosystem. For analytical databases, it adds DuckDB Source and Sink support (#10285), suitable for local analysis and data exploration.&lt;/p&gt;

&lt;p&gt;New or enhanced connectors include Apache HugeGraph Sink (#10002), AWS DSQL Sink (#9739), Lance Dataset Sink (#9894), IoTDB 2.x Source and Sink (#9872).&lt;/p&gt;

&lt;p&gt;Existing connectors have also been improved: PostgreSQL supports TIMESTAMP_TZ (#10048), Hive Sink supports SchemaSaveMode and DataSaveMode (#9743), MongoDB Sink supports multi-table writing and adds SaveMode (#9958 / #9883).&lt;/p&gt;

&lt;p&gt;These updates significantly improve SeaTunnel’s adaptability in database and Lakehouse scenarios and the efficiency of building data pipelines.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Connector&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Feature Highlights&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Analytical DB&lt;/td&gt;
&lt;td&gt;DuckDB&lt;/td&gt;
&lt;td&gt;Source/Sink&lt;/td&gt;
&lt;td&gt;Read and write data from DuckDB, suitable for local analysis and exploration&lt;/td&gt;
&lt;td&gt;#10285&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graph DB&lt;/td&gt;
&lt;td&gt;Apache HugeGraph&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Write data into HugeGraph&lt;/td&gt;
&lt;td&gt;#10002&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL Lakehouse&lt;/td&gt;
&lt;td&gt;AWS DSQL&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Write data into AWS DSQL&lt;/td&gt;
&lt;td&gt;#9739&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File/Dataset&lt;/td&gt;
&lt;td&gt;Lance Dataset&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Write data into Lance Dataset&lt;/td&gt;
&lt;td&gt;#9894&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time Series DB&lt;/td&gt;
&lt;td&gt;IoTDB 2.x&lt;/td&gt;
&lt;td&gt;Source/Sink&lt;/td&gt;
&lt;td&gt;Add IoTDB 2.x source and sink support&lt;/td&gt;
&lt;td&gt;#9872&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Relational DB&lt;/td&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;Source&lt;/td&gt;
&lt;td&gt;Support TIMESTAMP_TZ type&lt;/td&gt;
&lt;td&gt;#10048&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Warehouse&lt;/td&gt;
&lt;td&gt;Hive&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Support SchemaSaveMode and DataSaveMode&lt;/td&gt;
&lt;td&gt;#9743&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document DB&lt;/td&gt;
&lt;td&gt;MongoDB&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Support multi-table write and new SaveMode&lt;/td&gt;
&lt;td&gt;#9958 / #9883&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  06 Kafka Supports Protobuf Schema Registry
&lt;/h2&gt;

&lt;p&gt;In real-time scenarios, Kafka often uses Schema Registry. This release adds &lt;strong&gt;Protobuf Schema Registry Wire Format support&lt;/strong&gt; (#10183) to Kafka Connector, allowing SeaTunnel to directly parse Protobuf data managed via Schema Registry, making real-time pipeline construction easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  07 New AI Embedding Transform
&lt;/h2&gt;

&lt;p&gt;With AI and data engineering integration, more companies need vector data pipelines.&lt;/p&gt;

&lt;p&gt;SeaTunnel adds &lt;strong&gt;Multimodal Embedding Transform&lt;/strong&gt; (#9673) in the Transform component, generating vector data directly in pipelines for vector databases, RAG systems, and AI retrieval applications. &lt;strong&gt;RegexExtract Transform&lt;/strong&gt; (#9829) further enhances data cleaning.&lt;/p&gt;

&lt;h2&gt;
  
  
  08 Markdown Parser Supports RAG Scenarios
&lt;/h2&gt;

&lt;p&gt;Markdown documents are common in AI data preparation. This release adds &lt;strong&gt;Markdown Parser&lt;/strong&gt; (#9760) and related documentation (#9834) for parsing and structuring Markdown, facilitating RAG pipeline construction.&lt;/p&gt;

&lt;h2&gt;
  
  
  09 Stability and Performance Improvements
&lt;/h2&gt;

&lt;p&gt;This release includes numerous stability and performance optimizations, such as ClickHouse Connector parallel read strategy (#9801), MySQL Connector shard calculation (#9975), JSON parsing for nested structures (#10000), Zeta engine task metrics (#9833), and more.&lt;/p&gt;

&lt;p&gt;It also fixes production issues like Zeta engine memory leak on task cancellation (#10315), ClickHouse ThreadLocal memory leak (#10264), MongoDB multi-task submit (#10116), HBase Source scan exception (#10287), Hive Sink init failure (#10331), etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  10 Bug Fixes and Documentation Updates
&lt;/h2&gt;

&lt;p&gt;Fixes include CDC Snapshot Split null pointer (#10404), ClickHouse memory leak (#10264), MongoDB multi-task submit (#10064, #10116), HBase scan exceptions (#10336, #10287), JDBC schema merge overflow (#10387, #9942, #10093), Hive Sink overwrite semantics (#10279, #9823, #9743), Elasticsearch Sink task exit issue (#10038), and other Connector, Transform, Engine, UI, CI fixes (#10422, #10013, etc.).&lt;/p&gt;

&lt;p&gt;Documentation improvements include SeaTunnel MCP &amp;amp; x2SeaTunnel docs (#10108), connector config examples (#10283, #10250, #10241, #10202), multi-table sync examples (#10241), upgrade incompatibility notes (#10068), and doc structure optimizations (#10262, #10395, #10351, #10420, #10438, #10424, #10109, #10382, #10385), helping new users get started and developers better understand architecture and features.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thanks to Contributors ❤️
&lt;/h2&gt;

&lt;p&gt;Special thanks to release manager @xiaochen-zhou for strong support in planning and execution. Thanks to all volunteers; your efforts keep the SeaTunnel community growing!&lt;/p&gt;

&lt;p&gt;Adam Wang, AzkabanWarden.Gf, Bo Schuster, cloud456, CloverDew, corgy-w, CosmosNi, Cyanty, David Zollo, dotfive-star, dy102, dyp12, Frui Guo, Jarvis, Jast, Jeremy, JeremyXin, Jia Fan, Joonseo Lee, krutoileshii, 老王, Leon Yoah, Li Dongxu, LiJie20190102, limin, LimJiaWenBrenda, liucongjy, loupipalien, mengxpgogogo-eng, misi, 巧克力黑, shfshihuafeng, silenceland, Sim Chou, Steven Zhao, wanmingshi, wtybxqm, yzeng1618, zhan7236, zhangdonghao, zhuxt2015, zy&lt;/p&gt;

&lt;h2&gt;
  
  
  Download &amp;amp; Try
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Download: &lt;a href="https://seatunnel.apache.org/download" rel="noopener noreferrer"&gt;https://seatunnel.apache.org/download&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Upgrade Guide: &lt;a href="https://seatunnel.apache.org/docs/upgrade-guide" rel="noopener noreferrer"&gt;https://seatunnel.apache.org/docs/upgrade-guide&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Upgrade Note&lt;/strong&gt;: If you are on &lt;strong&gt;SeaTunnel 2.3.x&lt;/strong&gt;, upgrading to 2.3.13 is generally safe as it focuses on feature enhancement and stability. Back up config files and test in staging. For tasks using checkpoints, stop tasks and confirm state consistency to avoid checkpoint conflicts. Check connector config changes (Hive, MongoDB, Kafka). If using Flink engine, consider upgrading to Flink 1.20.x for better compatibility and CDC support.&lt;/p&gt;

</description>
      <category>apacheseatunnel</category>
      <category>opensource</category>
      <category>datascience</category>
      <category>database</category>
    </item>
    <item>
      <title>Apache DolphinScheduler 3.4.1 Released with Task Dispatch Timeout Detection</title>
      <dc:creator>Apache SeaTunnel</dc:creator>
      <pubDate>Fri, 13 Mar 2026 08:02:50 +0000</pubDate>
      <link>https://forem.com/seatunnel/apache-dolphinscheduler-341-released-with-task-dispatch-timeout-detection-3i5c</link>
      <guid>https://forem.com/seatunnel/apache-dolphinscheduler-341-released-with-task-dispatch-timeout-detection-3i5c</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ia1426s8ss2jsv1x1wy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ia1426s8ss2jsv1x1wy.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;3.4.1 version&lt;/strong&gt; of Apache DolphinScheduler has been officially released by the community. As a maintenance release in the &lt;strong&gt;3.4.x series&lt;/strong&gt;, this update focuses on &lt;strong&gt;improving scheduling stability, enhancing task execution control, and fixing system issues&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The new version introduces a &lt;strong&gt;task dispatch timeout detection mechanism&lt;/strong&gt; and &lt;strong&gt;maximum runtime control for tasks&lt;/strong&gt;, while also resolving multiple issues in scheduling logic, plugin functionality, and API behavior. In addition, system documentation, development processes, and project structure have been further optimized.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For more details, see the Release Note:
&lt;a href="https://github.com/apache/dolphinscheduler/releases/tag/3.4.1" rel="noopener noreferrer"&gt;https://github.com/apache/dolphinscheduler/releases/tag/3.4.1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Source code download:
&lt;a href="https://dolphinscheduler.apache.org/zh-cn/download/3.4.1" rel="noopener noreferrer"&gt;https://dolphinscheduler.apache.org/zh-cn/download/3.4.1&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Key Highlights
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Task Dispatch Timeout Detection Mechanism
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;task dispatch timeout checking logic&lt;/strong&gt; has been added to the Master scheduling module. When a task is dispatched to a Worker for execution, if the &lt;strong&gt;Worker Group does not exist or no Worker nodes are available&lt;/strong&gt;, the scheduler can detect the dispatch exception within a certain period and handle it accordingly.&lt;/p&gt;

&lt;p&gt;This mechanism prevents tasks from remaining in a waiting state for an extended time and improves the system’s fault tolerance in scenarios involving resource anomalies (#17795, #17796).&lt;/p&gt;

&lt;h2&gt;
  
  
  Support for Configuring Maximum Runtime for Workflow and Task Instances
&lt;/h2&gt;

&lt;p&gt;The new version allows users to configure a &lt;strong&gt;maximum runtime&lt;/strong&gt; for both &lt;strong&gt;Workflow Instances&lt;/strong&gt; and &lt;strong&gt;Task Instances&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Users can define the maximum execution duration for tasks or workflows. If the runtime exceeds the configured threshold, the system can trigger timeout handling mechanisms, preventing tasks from hanging or occupying resources indefinitely and improving overall operational controllability (#17931, #17932).&lt;/p&gt;

&lt;h1&gt;
  
  
  Key Fixes and Improvements
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Scheduling System Stability Fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Fixed an issue where &lt;strong&gt;task timeout alerts were not triggered&lt;/strong&gt; (#17820, #17818)&lt;/li&gt;
&lt;li&gt;Fixed the issue where the &lt;strong&gt;workflow failure strategy did not take effect&lt;/strong&gt; (#17834, #17851)&lt;/li&gt;
&lt;li&gt;Automatically mark a task as failed when &lt;strong&gt;task execution context initialization fails&lt;/strong&gt; (#17758, #17821)&lt;/li&gt;
&lt;li&gt;Fixed incorrect &lt;strong&gt;parallelism calculation in backfill tasks under parallel execution mode&lt;/strong&gt; (#17831, #17853)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Database and Compatibility Fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Fixed SQL execution errors for &lt;strong&gt;dependent tasks in PostgreSQL environments&lt;/strong&gt; (#17690, #17837)&lt;/li&gt;
&lt;li&gt;Fixed mismatched &lt;strong&gt;INT/BIGINT column types in database tables&lt;/strong&gt; (#17979, #17988)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  API and Permission Fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Removed the &lt;code&gt;WAIT_TO_RUN&lt;/code&gt; state and added a &lt;strong&gt;FAILOVER state&lt;/strong&gt; when querying workflow instances (#17838, #17839)&lt;/li&gt;
&lt;li&gt;Added &lt;strong&gt;tenant validation&lt;/strong&gt; for the Workflow API (#17969, #17970)&lt;/li&gt;
&lt;li&gt;Fixed an issue where &lt;strong&gt;non-admin users could not delete their own Access Tokens&lt;/strong&gt; (#17995, #17997)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Plugin and Task Execution Fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Fixed incorrect &lt;strong&gt;JVM parameter position in Java Task&lt;/strong&gt; (#17848, #17850)&lt;/li&gt;
&lt;li&gt;Fixed an issue where &lt;strong&gt;Procedure Task parameters could not be passed correctly&lt;/strong&gt; (#17967, #17968)&lt;/li&gt;
&lt;li&gt;Fixed the issue where &lt;strong&gt;ProcedureTask could not return parameters or execute query stored procedures&lt;/strong&gt; (#17971, #17973)&lt;/li&gt;
&lt;li&gt;Fixed an issue where the &lt;strong&gt;HTTP plugin could not send nested JSON structures&lt;/strong&gt; (#17912, #17911)&lt;/li&gt;
&lt;li&gt;Fixed inconsistent &lt;strong&gt;timeout units in the HTTP alert plugin&lt;/strong&gt; (#17915, #17920)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  UI and Documentation Fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Removed the &lt;strong&gt;STOP state&lt;/strong&gt; from task instances in the UI (#17864, #17865)&lt;/li&gt;
&lt;li&gt;Fixed an issue where &lt;strong&gt;locks were not released when workflow definition list loading failed&lt;/strong&gt; (#17984, #17989)&lt;/li&gt;
&lt;li&gt;Fixed the &lt;strong&gt;Keycloak login icon 404 issue&lt;/strong&gt; (#18006, #18007)&lt;/li&gt;
&lt;li&gt;Corrected errors in the &lt;strong&gt;installation documentation&lt;/strong&gt; (#17901, #17903)&lt;/li&gt;
&lt;li&gt;Fixed a &lt;strong&gt;SeaTunnel documentation link 404 issue&lt;/strong&gt; (#17904, #17905)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  In-Depth Feature Analysis
&lt;/h1&gt;

&lt;p&gt;In modern data platform architectures, scheduling systems often serve as key infrastructure connecting various computing engines. Tasks from systems such as Apache Spark, Apache Flink, and Apache Hive are commonly orchestrated through a unified scheduler.&lt;/p&gt;

&lt;p&gt;However, in production environments, scheduling systems often face challenges such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Worker resource anomalies preventing tasks from being scheduled&lt;/li&gt;
&lt;li&gt;Uncontrollable task execution time&lt;/li&gt;
&lt;li&gt;Unstable plugin execution behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The newly introduced &lt;strong&gt;task dispatch timeout detection mechanism&lt;/strong&gt; enables the scheduler to quickly identify anomalies when Workers do not exist or resources are unavailable, preventing tasks from waiting indefinitely (#17795, #17796).&lt;/p&gt;

&lt;p&gt;At the same time, the &lt;strong&gt;maximum runtime control capability&lt;/strong&gt; provides a more flexible management approach for task execution. By setting a maximum runtime for workflows or tasks, the system can take action when tasks hang or run abnormally long, preventing resources from being occupied for extended periods (#17931, #17932).&lt;/p&gt;

&lt;p&gt;These improvements further enhance DolphinScheduler’s &lt;strong&gt;stability and controllability in production-grade data platform environments&lt;/strong&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Acknowledgements
&lt;/h1&gt;

&lt;p&gt;The release of &lt;strong&gt;Apache DolphinScheduler 3.4.1&lt;/strong&gt; would not have been possible without the contributions of community developers. Special thanks to the release manager &lt;strong&gt;@ruanwenjun&lt;/strong&gt; and the following contributors for their work on this version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SbloodyS&lt;/li&gt;
&lt;li&gt;njnu-seafish&lt;/li&gt;
&lt;li&gt;Mrhs121&lt;/li&gt;
&lt;li&gt;ylq5126&lt;/li&gt;
&lt;li&gt;qiong-zhou&lt;/li&gt;
&lt;li&gt;XpengCen&lt;/li&gt;
&lt;li&gt;iampratap7997-dot&lt;/li&gt;
&lt;li&gt;yzeng1618&lt;/li&gt;
&lt;li&gt;Alexander1902&lt;/li&gt;
&lt;li&gt;maomao199691&lt;/li&gt;
&lt;li&gt;asadjan4611&lt;/li&gt;
&lt;li&gt;dill21yu&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Apache DolphinScheduler 3.4.1&lt;/strong&gt; is a maintenance release focused on &lt;strong&gt;improving scheduling stability and enhancing task runtime control&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;With the introduction of scheduling fault-tolerance mechanisms, maximum task runtime control, and numerous bug fixes, this version further strengthens the system’s reliability in production environments.&lt;/p&gt;

&lt;p&gt;As the community continues to grow, Apache DolphinScheduler is steadily improving its capabilities in the data workflow orchestration space, providing enterprises with a more stable and efficient infrastructure for building modern data platforms. We welcome more contributors to join the community and help drive the development of the project forward.&lt;/p&gt;

</description>
      <category>automation</category>
      <category>dataengineering</category>
      <category>news</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
