Pipeline API - changes, deprecations and known issues

Summary of changes

June 2021

Added: Support for Scala 2.12 in Pipelines & more...

Platform Pipelines now support Scala 2.12 along with minor version upgrades for both Flink and Spark. Below are the newly supported versions:

- stream-4.0 for stream environment with Flink 1.10.3
- batch-3.0 for batch environment with Spark 2.4.7

April 2021

Added: Read-only permissions were added to the pipeline template's permission model to enable sharing of pipeline templates between projects.

March 2021

Added: Manage your organization's pipelines and pipeline templates

Organization admins can now manage all the pipelines and pipeline templates within their Org and perform all the actions that the original pipeline author can perform, including but not limited to:

- Deactivate, cancel or delete a pipeline created by anybody within the Org
- Delete a pipeline template created by anybody within the Org
- Activate a pipeline created by anybody within the Org

Added: All Pipeline APIs now accept UUID or HRN of pipelines and pipeline templates as path parameter.

Added: HRNs for Pipeline and Pipeline Template now include the Realm name to help in quickly identifying the realm of the Pipeline and Pipeline Template.

August 2020

Added: Ensure business continuity with Multi-region setup for Pipelines

For a pipeline that requires minimum downtime, use the Multi-region option while creating the pipeline version so that when the primary region fails, the pipeline version gets automatically transferred to the secondary region. Just switch on the Multi-region option within the Web Portal or use the "--multi-region" flag via the CLI.

Note: For a pipeline to work successfully in the secondary region, the input and output catalogs used by the pipeline should be available in the secondary region.

Note: For a Stream pipeline to successfully utilize the multi-region option, it's important to enable Checkpointing within the code to allow Flink to take a periodic Savepoint while running the pipeline in the primary region. When the primary region fails, the last available Savepoint is used to restart the pipeline in the secondary region.

Note: For a Batch pipeline, the Spark History details are also transferred during a region failure.

Note: The state of an on-demand Batch pipeline is not transferred and it will need to be manually re-activated.

July 2020

Added: Develop Stream Pipelines with Apache Flink 1.10.1 for better control over the memory configuration of Workers.

A new Stream-3.0.0 run-time environment with Apache Flink 1.10.1 is now available for creating Stream pipelines. With Stream-3.0.0, the Memory Model of Task Managers has changed in order to provide more control over the memory configuration of the Workers of a Stream pipeline. Include version 2.17 of the HERE Data SDK for Java & Scala in your pipeline project to start developing with this new environment and choose Stream-3.0.0 as the run-time environment while creating a pipeline version.

Deprecated: The Stream-2.0.0 (with Apache Flink 1.7.1) run-time environment is now deprecated. For more details about migrating an existing stream pipeline to the new Stream-3.0.0 run-time environment, see Migrate Pipeline to new Run-time Environment. For general support for Apache Flink, please see Stream Pipelines - Apache Flink Support FAQ.

June 2020

Performance: Reduced Stream Pipeline downtime during Upgrade and Restart. The downtime experienced during a Stream pipeline's Upgrade or Restart has been reduced to under 30 seconds, delivering a 50% improvement that is important for critical stream processing.

Summary of currently active deprecation notices

Deprecated: Batch-2.0.0 run-time environment for pipelines (Deprecation period announced: February 2020 / Deprecation period end: August 19, 2020 (past due))

The deprecation period is over and Batch-2.0.0 will be removed soon. Pipelines still using it will be canceled. Migrate your batch pipelines to the Batch-2.1.0 run-time environment to benefit from the latest functionality and improvements. For more details about migrating a batch pipeline to the new Batch-2.1.0 run-time environment, see Migrate Pipeline to new Run-time Environment.

Deprecated: Stream-2.0.0 run-time environment for pipelines (Deprecation period announced: July 2020 / Deprecation period end: February 1, 2021 (past due))

Stream-2.0.0 (with Apache Flink 1.7.1) run-time environment is now deprecated. Existing stream pipelines that use the Stream-2.0.0 run-time environment will continue to operate normally until February 1, 2021. During this time, Stream-2.0.0 run-time environment will receive security patches only.

For this period, to continue developing pipelines with the Stream-2.0.0 environment, use platform SDK 2.16 or older. After February 1, 2021, the Stream-2.0.0 run-time environment will be removed and pipelines using it will be canceled. Migrate your stream pipelines to the new Stream-3.0.0 run-time environment to benefit from the latest functionality and improvements. For more details about migrating an existing stream pipeline to the new Stream-3.0.0 run-time environment, see Migrate Pipeline to new Run-time Environment. For general support for Apache Flink, please see Stream Pipelines - Apache Flink Support FAQ.

Deprecated: pipeline_jobs_canceled metric in pipeline status dashboard (Deprecation period announced: July 2020 / Deprecation period end: February 1, 2021 (past due))

The pipeline_jobs_canceled metric used in the pipeline status dashboard is now deprecated because it was tied to the pause functionality and caused confusion. The metric and its explanation will be available to use until February 1, 2021. Thereafter, the metric will be removed.

Summary of current known issues

Known issue: Pipeline templates can't be deleted from the platform portal UI.
Workaround: Use the CLI or API to delete pipeline templates.

Known issue: In the platform portal, new jobs and operations are not automatically added to the list of jobs and operations for a pipeline version when the list is open for viewing.
Workaround: Refresh the "Jobs" and "Operations" pages to see the latest job or operation in the list.

Known issue: A pipeline failure or exception can sometimes take several minutes to respond.

Known issue: Pipelines can still be activated after a catalog is deleted.
Workaround: The pipeline will fail when it starts running and show an error message about the missing catalog. Find the missing catalog or use a different one.

Known issue: If several pipelines are consuming data from the same stream layer and belong to the same group (pipeline permissions are managed through a group), then each pipeline will only receive a subset of the messages from the stream. This is because, by default, the pipelines share the same application ID.
Workaround: Use the Data Client Library to configure your pipelines so they consume from a single stream. If your pipelines/apps use the Direct Kafka connector, you can specify a Kafka Consumer group ID per pipeline/application. If the Kafka consumer group IDs are unique, the pipelines/apps can consume all the messages from the stream.
If your pipelines use the HTTP connector, create a new group for each pipeline/app, each with its own app ID.

Known issue: All users and apps in a group are granted permissions to perform all actions on any pipeline associated with that group. There's no support for users or apps with limited permissions. For example, you can't have a role that is limited to viewing pipeline statuses, but not starting and stopping a pipeline.
Workaround: Limit the users in a pipeline group only to those who should have full control over the pipeline.

Jorge Zapata