Configurations available for pipeline developers
- Last UpdatedFeb 26, 2025
- 12 minute read
As mentioned in the Develop pipelines section, the end product of the application development process is a JAR file that can be deployed to the pipeline service and used to process data. There are several configuration options available during the application development, including setting up system variables and creating configuration files such as:
SDK BOM files
- are used to help manage dependencies during pipeline development.credentials.properties
- is used to manage access to the services and resources provided by the HERE platform.logging configuration files
- are used to control the amount of information reported in the pipeline logs.runtime parameters
- are used to configure the pipeline runtime environment.pipeline-config.conf
- contains the parameters describing the input catalogs, output catalog, and billing tags.pipeline-job.conf
- is used to customise the execution mode of batch pipelines.configuration files for third-party services
- are used to connect a pipeline to third-party services.
The following sections examine configuration options.
Use of the runtime environment
An essential part of the pipeline development process is the selection of the runtime environment. The HERE platform provides two types of runtime environments - batch and stream. Different versions of the stream
and batch
runtime environments are based on the different versions of the Apache Flink and Apache Spark frameworks, with a number of additional libraries included.
For a list of libraries included in the latest versions of the runtime environments, see the following articles:
Note
It is recommended to use the latest versions of runtime environments available and avoid using deprecated versions.
To ensure that library versions are aligned during the pipeline development, we recommend using the sdk-batch-bom_2.12.pom
and sdk-stream-bom_2.12.pom
BOM files, depending on the chosen runtime environment.
For more information on these BOM files, please see this article.
credentials.properties
The credentials.properties
file is used to manage access to services and resources provided by the HERE platform. You can download this file from the platform portal when you create an access key for an application. For more information, please see the Credentials setup.
Local development
For local development, you need to copy the credentials.properties
file to the .here
folder in your home directory. For more information, see the Set up your credentials user guide.
Platform development
The credentials.properties
file is not used during the platform pipeline development. Instead, the HERE account token is provided, which is generated based on the application or user credentials that were selected during the pipeline version activation:
This token is available to the Data Client library which resolves it and refreshes it before it expires. This token has the same access level as the application or user selected when the pipeline version was activated.
For more information, see the Identity and Access Management - Developer Guide.
Logging configuration
For troubleshooting and other maintenance purposes, your data processing pipelines may need to track various custom events. To control how events are logged and how logs are processed, you need to provide a logging configuration for your pipeline. The configuration details depend on whether you develop your pipelines locally or via the HERE platform.
Note
The user is charged for the amount of logs written during the execution of the pipeline.
The Log Search IO chargeable billing record is responsible for this.
Local development
During the local development, if you want to add logging to your application code, the slf4j
-abstracted log API should be used. You are free to provide any slf4j
binding, although we recommend using logback
.
To specify a logging configuration, it's possible to use the external configuration files in .xml
, Java Properties, or any other format.
Whichever option you choose, make sure that the configuration files you've added are not included in the application's Fat JAR file - this can lead to unexpected application behavior and the loss of logs, as multiple logging configuration files are present in the process classpath at the same time.
Another requirement is that no separate logging implementation JAR files should be included in the application JAR file artifact - such as slf4j-api
or slf4j-log4j12
. For example, slf4j-api
should be a provided JAR file defined in the BOM for the application's Fat JAR file.
Platform development
Files related to the logging configuration are not used during the platform pipeline development - the platform itself is responsible for this. The amount of information reported in the logs depends on the logging level you select for each pipeline version when it is executed. The Debug
, Info
, Warn
, and Error
logging levels are supported, with Warn
being used by default.
Use the Logging configuration
menu on the pipeline version page to update it:
For more information about the basics of pipeline logging, changing and retrieving the pipeline version logging level, etc., see the Pipeline logging section.
Runtime parameters
During pipeline development, certain parameters can be specified at runtime to configure the pipeline runtime environment. There are several ways to use them for your pipeline. All of these options are described below.
Local development
For local development, you can use the application.properties
file to describe the runtime parameters in the Java Properties format. You need to include this file in the process classpath, or specify its location on the development machine using the config.file
system property:
Platform development
For the platform development, this file is constructed from the value of the pipeline template’s defaultRuntimeConfig
property overridden on a key-by-key basis with the value of the pipeline version’s customRuntimeConfig
property.
Please note, that the pipeline template’s defaultRuntimeConfig
property could only be specified if the template was created using the OLP CLI. If only platform portal is used for pipeline deployment, the values specified in the runtime parameters
form will be used as the contents of the application.properties
file.
The example below demonstrates how the defaultRuntimeConfig
and customRuntimeConfig
properties interact during the construction of application.properties
:
Note
For stream applications, if the JAR contains
application.properties
, then it will take precedence in the classpath over theapplication.properties
provided by the runtime.
pipeline-config.conf
The pipeline-config.conf
is a configuration file that specifies output, input catalogs, and billing tags.
An example of the pipeline-config.conf
is shown below:
Where:
billing-tag
specifies cost allocation tags used to group billing records. If multiple tags are used, they should be separated by a comma (,
).output-catalog
specifies the HRN that identifies the output catalog of the pipeline.input-catalogs
specifies one or more input catalogs for the pipeline. For each input catalog, its fixed identifier is provided along with the HRN of the actual catalog.
Note
The format of the file is HOCON, a superset of JSON and Java properties. It can be parsed by the open-source Typesafe Config library of Lightbend.
Local development
For local development, you can include the pipeline-config.conf
file in the process classpath or specify its location on the development machine using the pipeline-config.file
system property:
Whichever option you choose, make sure that the pipeline-config.file
file you've added is not included in the application's Fat JAR file, as explained in the next chapter.
If the data processing application is implemented using the Data Processing Library, the parsing is handled automatically by the pipeline-runner
package.
Platform development
The pipeline-config.conf
file is not used during the platform pipeline development. Instead, it is generated by the pipeline service based on the values of billing tags, input and output catalogs that are specified during the pipeline template and pipeline version creation. For more information about these properties, please the see following chapters in the Deploy a pipeline via the web portal
section:
During platform development, we strongly recommend against using Fat JAR files that contain pipeline-config.conf
files.
It is considered as a bad practice because:
- Pipeline implementations may bind to and distinguish between multiple input catalogs using fixed identifiers. The fixed identifiers are defined in a pipeline template. An HRN is defined for each pipeline version so that the same pipeline template may be reused in multiple setups. If the
pipeline-config.conf
file is included in the template's Fat JAR, such a template may not be reusable for different pipeline versions, because the HRNs of the catalogs are hard-coded in the config file at the pipeline template level. - It can lead to unexpected application behaviour because two
pipeline-config.conf
files (one generated by the pipeline service and another included in the template's Fat JAR) are available in the process classpath at the same time.
pipeline-job.conf
Batch pipelines perform a specific job and then terminate. Stream pipelines don't perform a specific, time-constrained job, but run continuously. For batch pipelines, you may be interested in customizing the execution mode of the application, so that it only runs when certain conditions are met.
Use the pipeline-job.conf
file to do this:
Where:
base-version
ofoutput-catalog
indicates the already-existing version of the catalog on top of which new data should be published.input-catalogs
contain, for each input, theversion
of that input that is the most up-to-date. This is the version that should be processed. In addition, information that specifies what has changed since the last time the job ran is also included. Catalogs can be distinguished via the same identifiers present in the pipeline configuration file.processing-type
describes what has changed in each input since the last successful run. The value can beno_changes
,changes
, andreprocess
.no_changes
indicates that that input catalog has not changed since the last run.changes
indicates that that input catalog has changed. A second parametersince-version
is included to indicate which version of that catalog was processed the last run.reprocess
does not specify whether that input catalog has changed or not. The pipeline is requested to reprocess that whole catalog instead of attempting any kind of incremental processing. This may be due to an explicit user request or to a system condition, such as the first time a pipeline runs.
Local development
For local development, you can include the pipeline-job.conf
file in the process classpath or specify its location on the development machine using the pipeline-job.file
system property:
Whichever option you choose, make sure that the pipeline-job.conf
file you've added is not included in the application's Fat JAR file, as explained in the next chapter.
Platform development
The pipeline-job.conf
file is not used during the platform pipeline development. Instead, it is generated based on the properties selected during the pipeline version activation, and then added to the process classpath.
Two activation modes are available. The first is the Run Now
mode, which forces the pipeline version to run immediately without waiting for the input data to change:
When this mode is selected, the contents of the generated pipeline-job.conf
file will look like this:
We can see that the content of the generated file is fully aligned with the values specified during the pipeline version activation, the including input catalog key, its version, etc.
The other activation mode is Schedule
. In this mode, the pipeline version only runs when the input data changes:
As you can see from the screenshot above, the web portal does not allow you to specify which catalog version you want to depend on. It is determined automatically by the Pipelines API - when the input data changes, the new version of the catalog is created, then the input catalogs are validated and an appropriate version is selected. Based on this information, the pipeline-job.conf
file is generated:
For more information about the batch pipeline activation options, see this article.
During platform development, we strongly recommend against using Fat JAR files that contain pipeline-job.conf
files.
It is considered as a bad practice because:
- If the
pipeline-job.conf
file is included in the template's Fat JAR, this may prevent the activation mode from being customized for different pipeline versions, because the values of processing type and catalogs versions are hard-coded in the config file at the pipeline template level. - It can lead to unexpected application behaviour because two
pipeline-job.conf
files (one generated by the pipeline service and another included in the template's Fat JAR) are available in the process classpath at the same time.
System properties
The following JVM system properties are set by the Pipeline API when a pipeline is submitted as a new job to provide integration with other HERE services.
They can be obtained using the System.getProperties()
method, or the equivalent:
olp.pipeline.id
: Identifier of the pipeline, as defined in the Pipeline API.olp.pipeline.version.id
: Identifier of the pipeline version, as defined in the Pipeline API.olp.deployment.id
: Identifier of the job, as defined in the Pipeline API.olp.realm
: The customer realm.
Below are additional properties paths used by the platform:
env.api.lookup.host
akka.*
here.platform.*
com.here.*
In addition to these, other properties are set by the system to configure the runtime environment. These include Spark or Flink configuration parameters associated with the pipeline version configuration that you have selected. These configuration parameters are specific to the chosen framework and its version. Because these configuration parameters may change, they are considered implementation-specific and are left to your determination.
System properties specified in this section are visible from the main user process only. These system properties are not necessarily replicated to the JVMs that run in worker nodes of the cluster.
Configuration for third-party services
Connecting your application to third-party services can offer several advantages and functionalities that might be challenging or impractical to implement independently. This section presents the method of connecting a pipeline application to a third-party service using the credentials for that service and the platform's secrets
mechanism.
Local development
For example, you have developed an application that lists all available S3
buckets with an AWS
credentials file:
S3Client s3client = S3Client.builder()
.region(Region.US_EAST_1)
.httpClient(UrlConnectionHttpClient.builder().build())
.build();
List<Bucket> buckets = s3client.listBuckets().buckets();
for (Bucket bucket : buckets) {
LOGGER.info(bucket.name());
}
The following dependencies are used for this application:
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3</artifactId>
<version>2.20.37</version>
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>url-connection-client</artifactId>
<version>2.20.37</version>
</dependency>
To run this application successfully and to allow interaction with S3
buckets, the location of the AWS
credentials file must be provided to the pipeline application via the AWS_SHARED_CREDENTIALS_FILE
environment variable:
Platform development
As mentioned above, during the platform pipeline development, you can use the platform's secrets
mechanism to securely upload and manage third-party credentials that are used to connect your pipeline to third-party services. The platform supports two types of third-party credentials - custom
and AWS
.
Credentials of the custom
type are used to connect pipeline applications to a variety of web services that are provided by different vendors. The format of such a credentials file is defined by the vendor and may vary from one third-party service to another.
Credentials of the AWS
type are used to connect to and use various Amazon web services - for example, to interact with S3
buckets. For more information about AWS
credentials, their format, etc., please see the AWS SDKs and Tools User Documentation.
Note
The
AWS
credentials must be in the form of AWS Key-Secret (AWS IAM roles are not supported at this time). Contact yourAWS
administrator or manager to create it and set up the access. To reduce the security risk, it is recommended to grant minimal privileges to this new identity.
To run an application from the above chapter as a platform pipeline, follow these steps:
- Create all the necessary resources such as pipeline, pipeline template, pipeline version, etc.
- Use the
olp secret create
command with the--grant-read-to
parameter to create a new platformsecret
for the sameAWS
credentials file that was used previously. This grants read permission on thesecret
to the HERE application or user whose HRN is specified by the--grant-read-to
parameter. - During pipeline version activation, select the appropriate HERE application or user from the
SELECT RUNTIME CREDENTIALS
drop-down menu:
Once the pipeline is activated, the AWS SDK
reads the credentials from the file whose location is specified by the AWS_SHARED_CREDENTIALS_FILE
variable, which is set by the platform.
If custom
secrets have been used, the credentials are stored as credentials
file in the /dev/shm/identity/.here/
directory. Note that this file may not be read automatically by your pipeline application - in this case you will need to do this programmatically.
Third-party credentials are automatically refreshed every 12 hours to maintain pipeline functionality. If the credentials were changed and needed to be consumed immediately, the pipeline version had to be manually reactivated.