here.platform.layer module — HERE Data SDK for Python documentation
- Last UpdatedMar 27, 2025
- 41 minute read
This module contains a Layer
class to access data in HERE platform catalogs.
- class here.platform.layer.HexbinClustering(clustering_type: str = 'hexbin', absolute_resolution: int | None = None, resolution: int | None = None, relative_resolution: int | None = None, property: str | None = None, pointmode: bool | None = None)[source]#
Bases:
object
This class defines attributes for
hexbin
clustering algorithm.- absolute_resolution: int | None = None#
- clustering_type: str = 'hexbin'#
- pointmode: bool | None = None#
- property: str | None = None#
- relative_resolution: int | None = None#
- resolution: int | None = None#
- class here.platform.layer.IndexLayer(layer_id: str, catalog: Catalog)[source]#
Bases:
Layer
This class provides access to data stored in index layers.
- blob_exists(data_handle: str, billing_tag: str | None = None) bool [source]#
Check if a blob exists for the requested data handle.
- Parameters:
data_handle – The data handle identifies a specific blob so that you can get that blob’s contents.
billing_tag – A string which is used for grouping billing records.
- Returns:
a boolean indicating if the handle exists.
- delete_blob(data_handle: str, billing_tag: str | None = None)[source]#
Delete blob (raw bytes) for given layer ID and data-handle from storage.
- Parameters:
data_handle – The data handle identifies a specific blob so that you can get that blob’s contents.
billing_tag – A string which is used for grouping billing records.
- Returns:
a boolean flag true on successful delete
- delete_partitions(query: str)[source]#
Delete the partitions that match the query in an index layer.
The query must be in RSQL format, see also: jirutka/rsql-parser.
- Parameters:
query – A string representing a RSQL query.
:return : true when delete partitions succeeds. :raises ValueError: delete partitions failed.
- get_blob(data_handle: str, range_header: str | None = None, billing_tag: str | None = None, stream: bool = False, chunk_size: int = 102400) bytes | Iterator[bytes] [source]#
Get blob (raw bytes) for given layer ID and data-handle from storage.
- Parameters:
data_handle – The data handle identifies a specific blob so that you can get that blob’s contents.
range_header – an optional Range parameter to resume download of a large response
billing_tag – A string which is used for grouping billing records.
stream – whether to stream data.
chunk_size – the size to request each iteration when streaming data.
- Returns:
a blob response as bytes or iterator of bytes if stream is True
- get_partitions_metadata(query: str, adapter: Adapter | None = None, part: str | None = None, billing_tag: str | None = None, **kwargs) Iterator[IndexPartition] | pd.DataFrame [source]#
Get list of all partitions matching the query.
The query must be in RSQL format, see also: jirutka/rsql-parser.
- Parameters:
query – the RSQL query
adapter – the
Adapter
to transform and return the result.None
to use the default adapter of the catalog.part – Indicates which part of the layer shall be queried.
billing_tag – A string which is used for grouping billing records.
kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
Iterator
ofIndexPartition
objects, or adapter-specific
- get_parts(num_requested_parts: int = 1, billing_tag: str | None = None) dict [source]#
Return a list of Part Ids which represent the layer parts that can be used to limit the scope of a query operation. This allows to run parallel queries with multiple parts. The user has to provide the desired number of parts and the service will return a list of Part Ids. Please note in some cases the requested number of parts will make them too small and in this case the service might return lesser amount of the parts than requested.
- Parameters:
num_requested_parts – Indicates requested number of layer parts.
billing_tag – A string which is used for grouping billing records.
- Returns:
dict of parts as per num_requested_parts.
- put_blob(path_or_data: str | bytes | Path, publication: Publication | None = None, partition_id: str | None = None, data_handle: str | None = None, part_size: int = 50, fields: Dict[str, str | int | bool] = {}, additional_metadata: Dict[str, str] = {}, timestamp: int | None = None) Partition [source]#
Upload a blob to the durable blob service.
- Parameters:
path_or_data – content to be uploaded, it must match the layer content type, if set.
publication – the publication this operation is part of
partition_id – partition identifier the blob relates to.
data_handle – data handle to use for the blob, in case already available, if not available an appropriate one is generated and returned.
part_size – An int representing size in MB, to upload in multiple parts minimum value is 5MB and maximum is 50MB.
fields – A dict representing the fields of index record for data being uploaded for index layer only.
additional_metadata – A dict of additional metadata about data being uploaded for index layer only.
timestamp – timestamp, in milliseconds since Unix epoch (1970-01-01T00:00:00 UTC)
- Returns:
partition object referencing the uploaded data
- read_partitions(query: str, decode: bool = True, adapter: Adapter | None = None, part: str | None = None, stream: bool = False, chunk_size: int = 102400, **kwargs) Iterator[Tuple[IndexPartition, bytes]] | Iterator[Tuple[IndexPartition, Iterator[bytes]]] | Iterator[Tuple[IndexPartition, Any]] | pd.DataFrame [source]#
Read of all partition data matching the query.
The query must be in RSQL format, see also: jirutka/rsql-parser.
- Parameters:
query – the RSQL query
decode – whether to decode the data through an adapter or return raw bytes
adapter – the
Adapter
to transform and return the result.None
to use the default adapter of the catalog.part – Indicates which part of the layer shall be queried.
stream – whether to stream data. This implies decode=False.
chunk_size – the size to request each iteration when streaming data.
kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
Iterator
ofIndexPartition
objects each with its raw data, in casedecode=False
, adapter-specific otherwise- Raises:
ValueError – in case decoding is requested but the adapter does not support the content type of the layer requested, or invalid parameters # noqa
LayerConfigurationException – in case decoding is requested but the layer doesn’t have any content type configured # noqa
- set_partitions_metadata(update: Iterable[IndexPartition] | None = None, delete: Iterable[str] | None = None)[source]#
Update the metadata of the layer as part of a publication by publishing updated partitions and/or deleting partitions.
- Parameters:
update – the complete partitions to update.
delete – the data handles to delete.
- write_single_partition(data: str | Path | bytes | pd.DataFrame, timestamp: int | None = None, fields: Dict[str, str | int | bool] = {}, additional_metadata: Dict[str, str] = {}, part_size: int = 50, encode: bool = True, adapter: Adapter | None = None, **kwargs)[source]#
Upload content to the layer and publish the related partition metadata.
- Parameters:
data – data to upload to the layer and derive metadata from.
timestamp – timestamp
fields – a dict representing the fields of index record for data being uploaded
additional_metadata – a dict of additional metadata about data being uploaded
part_size – An int representing size in MB, to upload in multiple parts minimum value is 5MB and and Maximum is 50MB.
encode – whether to encode the data through an adapter or store raw bytes
adapter – the
Adapter
to transform the input data.None
to use the default adapter of the catalog.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- class here.platform.layer.InteractiveMapLayer(layer_id: str, catalog: Catalog)[source]#
Bases:
Layer
This class provides access to data stored in Interactive Map layers.
- delete_feature(feature_id: str) None [source]#
Delete feature from the layer.
- Parameters:
feature_id – A feature_id to be deleted.
- delete_features(feature_ids: List[str] | pd.Series, **kwargs) None [source]#
Delete features from layer.
- Parameters:
feature_ids – A list of feature_ids to be deleted, or adapter-specific
kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- get_feature(feature_id: str, selection: List[str] | None = None, force_2d: bool = False) Feature [source]#
Return GeoJSON feature for the provided
feature_id
.- Parameters:
feature_id – Feature id which is to fetched.
selection – A list, only these properties will be present in returned feature.
force_2d – If set to
True
then features in the response will have only X and Y components, else all x,y,z coordinates will be returned.
- Returns:
Feature
object.
- get_features(feature_ids: List[str], selection: List[str] | None = None, force_2d: bool = False, **kwargs) FeatureCollection | gpd.GeoDataFrame [source]#
Return GeoJSON FeatureCollection for the provided feature_ids.
- Parameters:
feature_ids – A list of feature identifiers to fetch.
selection – A list, only these properties will be present in returned features.
force_2d – If set to
True
then features in the response will have only X and Y components, else all x,y,z coordinates will be returned.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Raises:
ValueError – If
feature_ids
is empty list.- Returns:
FeatureCollection object, or adapter-specific
- get_features_in_bounding_box(bounds: Tuple[float, float, float, float], clip: bool = False, limit: int = 30000, params: Dict[str, str | list | tuple] | None = None, selection: List[str] | None = None, skip_cache: bool = False, clustering: HexbinClustering | QuadbinClustering | None = None, force_2d: bool = False, **kwargs) FeatureCollection | gpd.GeoDataFrame [source]#
Return the features which are inside a bounding box stipulated by
bounds
parameter.- Parameters:
bounds – A tuple of four numbers representing the West, South, East and North margins, respectively, of the bounding box.
clip – A Boolean indicating if the result should be clipped (default: False)
limit – A maximum number of features to return in the result. Default is 30000. Hard limit is 100000.
params –
A dict to represent additional filters on features to be searched.
Properties initiated with ‘p.’ are used to access values in the stored feature which are under the ‘properties’ property. -
params={"p.name": "foo"}
returns all features with a value of property
p.name
equal tofoo
.- Properties initiated with ‘f.’ are used to access values which are added by default
in the stored feature.The possible values are: ‘f.id’, ‘f.createdAt’ and ‘f.updatedAt’
params={"f.createdAt": 1634}
returns all features with a value of property
f.createdAt
equal to1634
.
- The query can also be written by using the long operators: “=gte”, “=lte”, “=gt”,
”=lt” and “=cs”
params={"p.count=gte": 10}
returns all features with a value of propertyp.count
greater than or equal to10
.params={"p.count=lte": 10}
returns all features with a value of propertyp.count
less than or equal to10
.params={"p.count=gt": 10}
returns all features with a value of propertyp.count
greater than10
.params={"p.count=lt": 10}
returns all features with a value of propertyp.count
less than10
.params={"p.name=cs": "bar"}
returns all features with a value of propertyp.name
which containsbar
.
selection – A list, only these properties will be present in returned features.
skip_cache – If set to
True
the response is not returned from cache. Default isFalse
.clustering – An object of either
HexbinClustering
orQuadbinClustering
.force_2d – If set to
True
then features in the response will have only X and Y components, else all x,y,z coordinates will be returned.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
FeatureCollection object, or adapter-specific
- iter_features(chunk_size: int = 30000, selection: List[str] | None = None, skip_cache: bool = False, force_2d: bool = False) Iterator[Feature] [source]#
Return all the features in a Layer as Generator.
- Parameters:
chunk_size – A number of features to return in single iteration.
selection – A list, only these properties will be present in returned features.
skip_cache – If set to
True
the response is not returned from cache. Default isFalse
.force_2d – If set to
True
then features in the response will have only X and Y components, else all x,y,z coordinates will be returned.
- Yields:
A
Feature
object
- search_features(limit: int = 30000, params: Dict[str, str | list | tuple] | None = None, selection: List[str] | None = None, skip_cache: bool = False, force_2d: bool = False, **kwargs) FeatureCollection | gpd.GeoDataFrame [source]#
Search for features in the layer based on the properties.
- Parameters:
limit – A maximum number of features to return in the result. Default is 30000. Hard limit is 100000.
params –
A dict to represent additional filters on features to be searched.
- Properties initiated with ‘p.’ are used to access values in the stored feature
which are under the ‘properties’ property.
params={"p.name": "foo"}
returns all features with a value of property
p.name
equal tofoo
.
Properties initiated with ‘f.’ are used to access values which are added by default in the stored feature.The possible values are: ‘f.id’, ‘f.createdAt’ and ‘f.updatedAt’. -
params={"f.createdAt": 1634}
returns all features with a value of property
f.createdAt
equal to1634
- The query can also be written by using the long operators: “=gte”, “=lte”, “=gt”,
”=lt” and “=cs”
params={"p.count=gte": 10}
returns all features with a value of propertyp.count
greater than or equal to10
.params={"p.count=lte": 10}
returns all features with a value of propertyp.count
less than or equal to10
.params={"p.count=gt": 10}
returns all features with a value of propertyp.count
greater than10
.params={"p.count=lt": 10}
returns all features with a value of propertyp.count
less than10
.params={"p.name=cs": "bar"}
returns all features with a value of propertyp.name
which containsbar
.
selection – A list, only these properties will be present in returned features.
skip_cache – If set to
True
the response is not returned from cache. Default isFalse
.force_2d – If set to
True
then features in the response will have only X and Y components, else all x,y,z coordinates will be returned.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
FeatureCollection object, or adapter-specific
- spatial_search(lng: float, lat: float, radius: int, limit: int = 30000, params: Dict[str, str | list | tuple] | None = None, selection: List[str] | None = None, skip_cache: bool = False, force_2d: bool = False, **kwargs) FeatureCollection | gpd.GeoDataFrame [source]#
Return the features which are inside the specified radius.
- Parameters:
lng – The longitude in WGS’84 decimal degree (-180 to +180) of the center Point.
lat – The latitude in WGS’84 decimal degree (-90 to +90) of the center Point.
radius – Radius in meter which defines the diameter of the search request.
limit – The maximum number of features in the response. Default is 30000. Hard limit is 100000.
params –
A dict to represent additional filters on features to be searched.
Properties initiated with ‘p.’ are used to access values in the stored feature which are under the ‘properties’ property. -
params={"p.name": "foo"}
returns all features with a value of property
p.name
equal tofoo
.- Properties initiated with ‘f.’ are used to access values which are added by default
in the stored feature.The possible values are: ‘f.id’, ‘f.createdAt’ and ‘f.updatedAt’
params={"f.createdAt": 1634}
returns all features with a value of property
f.createdAt
equal to1634
- The query can also be written by using the long operators: “=gte”, “=lte”, “=gt”,
”=lt” and “=cs”
params={"p.count=gte": 10}
returns all features with a value of propertyp.count
greater than or equal to10
.params={"p.count=lte": 10}
returns all features with a value of propertyp.count
less than or equal to10
.params={"p.count=gt": 10}
returns all features with a value of propertyp.count
greater than10
.params={"p.count=lt": 10}
returns all features with a value of propertyp.count
less than10
.params={"p.name=cs": "bar"}
returns all features with a value of propertyp.name
which containsbar
.
selection – A list, only these properties will be present in returned features.
skip_cache – If set to
True
the response is not returned from cache. Default isFalse
.force_2d – If set to
True
then features in the response will have only X and Y components, else all x,y,z coordinates will be returned.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
FeatureCollection object or ‘Geo dataframe’ specific to adapter.
- spatial_search_geometry(geometry: Feature | Geometry | dict | Any, radius: int | None = None, limit: int = 30000, params: Dict[str, str | list | tuple] | None = None, selection: List[str] | None = None, skip_cache: bool = False, force_2d: bool = False, **kwargs) FeatureCollection | gpd.GeoDataFrame [source]#
Return the features which are inside the specified radius and geometry.
The origin point is calculated based on the provided geometry.
- Parameters:
geometry – Geometry which will be used in intersection. It supports GeoJSON Feature, GeoJSON Geometry, or
__geo_interface__
.radius – Radius in meter which defines the diameter of the search request.
limit – The maximum number of features in the response. Default is 30000. Hard limit is 100000.
params –
A dict to represent additional filters on features to be searched.
Properties initiated with ‘p.’ are used to access values in the stored feature which are under the ‘properties’ property. -
params={"p.name": "foo"}
returns all features with a value of property
p.name
equal tofoo
.- Properties initiated with ‘f.’ are used to access values which are added by default
in the stored feature.The possible values are: ‘f.id’, ‘f.createdAt’ and ‘f.updatedAt’
params={"f.createdAt": 1634}
returns all features with a value of property
f.createdAt
equal to1634
.
- The query can also be written by using the long operators: “=gte”, “=lte”, “=gt”,
”=lt” and “=cs”
params={"p.count=gte": 10}
returns all features with a value of propertyp.count
greater than or equal to10
.params={"p.count=lte": 10}
returns all features with a value of propertyp.count
less than or equal to10
.params={"p.count=gt": 10}
returns all features with a value of propertyp.count
greater than10
.params={"p.count=lt": 10}
returns all features with a value of propertyp.count
less than10
.params={"p.name=cs": "bar"}
returns all features with a value of propertyp.name
which containsbar
.
selection – A list, only these properties will be present in returned features.
skip_cache – If set to
True
the response is not returned from cache. Default isFalse
.force_2d – If set to
True
then features in the response will have only X and Y components, else all x,y,z coordinates will be returned.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
FeatureCollection object, or adapter-specific
- property statistics: dict#
The statistical information of the layer.
- subscribe(subscription_name: str, description: str, destination_catalog_hrn: str, destination_layer_id: str, interactive_map_subscription_type: InteractiveMapSubscriptionType) InteractiveMapSubscription [source]#
Method to Subscribe to a Stream Layer from Layer’s Catalog HRN. Source Layer is the current layer and Source Catalog is Layer’s Catalog which it belongs.
- Parameters:
subscription_name – Name of the subscription.
description – Description of the subscription.
destination_catalog_hrn – Catalog HRN of the destination Catalog.
destination_layer_id – Layer Id of the destination Stream Layer.
interactive_map_subscription_type – InteractiveMapSubscriptionType containing type of
subscription. :raises KeyError: in case statusToken in Response of createSubscription. :raises ValueError: in case Created Subscription Status is NOT Active after
multiple retry till max retry time.
- Returns:
InteractiveMapSubscription object containing details of the created subscription.
- update_feature(feature_id: str, data: Feature | dict) None [source]#
Update the GeoJSON feature in the Layer.
- Parameters:
feature_id – A feature_id to be updated.
data – A GeoJSON Feature object to update.
- update_features(data: FeatureCollection | dict | gpd.GeoDataFrame, **kwargs) None [source]#
Update multiple features provided as
FeatureCollection
object.- Parameters:
data – A
FeatureCollection
, dict, or adapter-specifickwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- write_feature(feature_id: str, data: Feature | dict) None [source]#
Write GeoJSON feature to Layer.
- Parameters:
feature_id – Identifier for the feature.
data – GeoJSON feature which is written to layer.
- write_features(features: FeatureCollection | dict | Iterator[Feature] | List[Feature] | gpd.GeoDataFrame | None = None, from_file: str | Path | None = None, feature_count: int = 2000, **kwargs) None [source]#
Write GeoJSON FeatureCollection to layer.
As API has a limitation on the size of features, features are divided into groups, and each group has number of features based on
feature_count
.- Parameters:
features – Features represented by
FeatureCollection
, Dict,Iterator
, list of features, or adapter-specificfrom_file – Path of GeoJSON file.
feature_count – An int representing a number of features to upload at a time.
kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- class here.platform.layer.KafkaTokenProvider(stream_layer: StreamLayer)[source]#
Bases:
AbstractTokenProvider
This class provides token to Kafka consumer and producer.
- class here.platform.layer.Layer(layer_id: str, catalog: Catalog)[source]#
Bases:
object
This base class provides access to data stored in catalog layers.
Instances can read their schemas for data stored in protobuf format, all available partition IDs as well as the raw data blobs inside such partitions. You have to use the
Schema
class to access the decoded protobuf data.- property configuration: LayerConfiguration#
The configuration of the layer
- get_details() Dict[str, Any] [source]#
Get layer details from the platform.
- Returns:
a dictionary with the layer details
- get_schema() Schema | None [source]#
Return the schema of the layer, if available.
This allows for parsing the partition data. It only works for layers which define a protobuf schema.
- Returns:
a Schema instance
- has_schema() bool [source]#
Check whether the layer has a schema defined. This does not obtain and register the schema.
- Returns:
whether the layer has a schema
- is_index() bool [source]#
Check if this is an index layer.
- Returns:
True if this is an index layer otherwise False
- is_interactivemap() bool [source]#
Check if this is an interactive map layer.
- Returns:
True if this is an interactive map layer otherwise False
- is_objectstore() bool [source]#
Check if this is an objectstore layer.
- Returns:
True if this is an objectstore layer otherwise False
- is_stream() bool [source]#
Check if this is a stream layer.
- Returns:
True if this is a stream layer otherwise False
- is_versioned() bool [source]#
Check if this is a versioned layer.
- Returns:
True if this is a versioned layer otherwise False
- class here.platform.layer.LayerConfiguration(json_dict: Dict[str, Any])[source]#
Bases:
JsonDictDocument
The configuration of a layer, including its most significant properties
- property billing_tag: List[str] | None#
List of billing tags used for grouping billing records together for the layer
- property content_encoding: str | None#
The content transfer encoding used to transfer blobs, typically
gzip
or empty
- property content_type: str | None#
The MIME type of the blobs stored in the layer, e.g.
application/x-protobuf
.
- property created: datetime#
Timestamp, in ISO 8601 format, when the layer was initially created
- property description: str | None#
A longer description of the layer
- property hrn: str | None#
The HERE Resource Name (HRN) of the layer
- property id: str | None#
The ID of the layer
- property name: str#
The name of the layer
- property partitioning: Partitioning | None#
Describes the way in which data is partitioned within the layer
- property properties: Dict[str, Any] | None#
Returns additional properties depending on layer type. :return: Dict of layer-specific properties or None if no extra properties are set.
- property schema: Dict[str, str] | None#
Describes a HRN for the layer schema. Can be updated by the user for any kind of layer. :return: Dict of schema or None
- property summary: str | None#
The summary of the layer
- property tags: List[str]#
List of user-defined tags applied to the layer
- class here.platform.layer.LayerType(value)[source]#
Bases:
Enum
LayerType enum defines the different layer types supported.
Supported types: versioned, index, volatile, stream, interactivemap, objectstore.
The string representation is lowercase, to match with strings used in the platform APIs.
- INDEX = 2#
- INTERACTIVEMAP = 5#
- OBJECTSTORE = 6#
- STREAM = 4#
- UNKNOWN = 0#
- VERSIONED = 1#
- VOLATILE = 3#
- class here.platform.layer.ObjectMetadata(key: str, last_modified: str | None, size: int | None, object_type: ObjectType, content_type: str | None, content_encoding: str | None)[source]#
Bases:
object
Metadata and details of an object stored in an
ObjectStoreLayer
.This includes, among others, object type and size, HTTP content type and last modified date.
- content_encoding: str | None#
- content_type: str | None#
- key: str#
- last_modified: str | None#
- object_type: ObjectType#
- size: int | None#
- class here.platform.layer.ObjectStoreLayer(layer_id: str, catalog: Catalog)[source]#
Bases:
Layer
This class provides access to data stored in object store layers.
- MAX_UPLOAD_PART_SIZE = 96#
- MIN_UPLOAD_PART_SIZE = 5#
- copy_object(key: str, copy_from: str, replace: bool = False)[source]#
Copy object using the source to copy from in the object store layer .
- Parameters:
key – key for the object to created.
copy_from – key for the object to copy from.
replace – if true, will replace the object while copying if the destination already exists. This replace is not atomic, if the delete is succeeded and put object fails then the object is gone.
- Raises:
ValueError – in case given key and copy_from are same or destination already exists with replace=False.
- delete_all_objects(parent_key: str = '/', strict: bool = False)[source]#
Delete all objects which are associated with given key from the object store layer.
- Parameters:
parent_key – parent key for the object to delete
strict – when
True
, raise aPlatformException
if the object doesn’t exist, whenFalse
, no exception is raised
- Raises:
PlatformException – if the platform responds with an HTTP error
- delete_object(key: str, strict: bool = False)[source]#
Delete an object from the object store layer.
- Parameters:
key – key for the object to delete
strict – when
True
, raise aPlatformException
if the object doesn’t exist, whenFalse
, no exception is raised
- Raises:
PlatformException – if the platform responds with an HTTP error
- get_object_metadata(key: str) ObjectMetadata [source]#
Get the metadata of the object with the given key.
- Parameters:
key – key of the object to fetch metadata of
- Returns:
object metadata of the given object
- get_objects_metadata(parent: str | None = None, limit: int = 1000, deep: bool = False) Iterator[ObjectMetadata] [source]#
Iterate over the metadata of the objects stored in the layer.
- Parameters:
parent – a string that tells what “directory” should be the root for the returned content. When not set, the root is assumed
deep – if
True
, returns also metadata from the subdirectorieslimit – number of results to return per request call: a larger value performs larger but less frequent requests to the service, a smaller value performs shorter but more frequent requests to the service. The overall content retrieved is independent of this value. To limit the amount of metadata returned, simply filter the iterator or consume the iterator up to the number of elements wanted.
- Returns:
an iterator of
ObjectMetadata
- is_directory(key: str) bool [source]#
Check if given key is a directory.
- Parameters:
key – key of the object.
- Returns:
returns
True
if the given key is a directory.
- iter_keys(parent: str | None = None, deep: bool = False, limit: int = 1000) Iterator[str] [source]#
Iterate over the keys of the objects stored in the layer.
- Parameters:
parent – a string that tells what “directory” should be the root for the returned content. When not set, the root is assumed
deep – if
True
, returns also keys from the subdirectorieslimit – number of results to return per request call: a larger value performs larger but less frequent requests to the service, a smaller value performs shorter but more frequent requests to the service. The overall content retrieved is independent of this value. To limit the amount of keys returned, simply filter the iterator or consume the iterator up to the number of elements wanted.
- Returns:
an iterator of object keys
- key_exists(key: str) bool [source]#
Check if the layer contains an object with the given key.
- Parameters:
key – the object key to check
- Returns:
if the layer contain an object with the given key
- list_keys(parent: str | None = None, deep: bool = False) List[str] [source]#
List the keys of the objects stored in the layer.
- Parameters:
parent – a string that tells what “directory” should be the root for the returned content. When not set, the root is assumed
deep – if
True
, returns also keys from the subdirectories
- Returns:
a list of object keys
- read_object(key: str, include_metadata: bool = False, stream: bool = False, chunk_size: int = 102400) bytes | Iterator[bytes] | Tuple[bytes | Iterator[bytes], ObjectMetadata] [source]#
Read and return the content of an object.
Optionally, also return the corresponding object metadata.
- Parameters:
key – key for the object to read
include_metadata – whether to also return the object metadata
stream – whether to stream data
chunk_size – the size to request each iteration when streaming data
- Returns:
the content of the object and, if requested, also its metadata
- set_max_upload_part_size(size: int)[source]#
Sets the maximum size of uploaded parts in MB (megabytes).
- Parameters:
size – max. size of uploaded parts.
- write_object(key: str, path_or_data: str | Path | bytes, content_type: str = 'application/octet-stream', overwrite: bool = True, upload_part_size: int | None = None, content_encoding: str | None = None)[source]#
Write an object to the object store layer. If file/bytes size is larger than max. upload part size then the blob will be written in multiple parts.
This functions adds a new object or overwrites an existing object.
- Parameters:
key – key for the object to write.
path_or_data – data to be written.
content_type – the standard MIME type describing the format of the data.
overwrite – if True then this method will overwrite if the key exists, and if False then this method will raise error if key exists.
upload_part_size – optional size of upload parts; if not specified the class’ default is used
content_encoding – Content-encoding of the object. This header is optional. For more information, see https://tools.ietf.org/html/rfc2616#section-14.11
- Returns:
None
- Raises:
ValueError – in case the file does not exist or upload_part_size is out of range.
- class here.platform.layer.ObjectType(value)[source]#
Bases:
Enum
ObjectType defines the different types of object stored in an
ObjectStoreLayer
.- DIRECTORY = 'commonPrefix'#
- OBJECT = 'object'#
- class here.platform.layer.QuadbinClustering(clustering_type: str = 'quadbin', no_buffer: bool = False, relative_resolution: int | None = None, resolution: int | None = None, countmode: str | None = None)[source]#
Bases:
object
This class defines attributes for
quadbin
clustering algorithm.- clustering_type: str = 'quadbin'#
- countmode: str | None = None#
- no_buffer: bool = False#
- relative_resolution: int | None = None#
- resolution: int | None = None#
- class here.platform.layer.StreamIngestion(json_dict: Dict[str, Any])[source]#
Bases:
JsonDictDocument
Response of the stream layer to confirm successful data ingestion.
- property message_ids: List[str]#
The identifiers assigned to each SDII message ingested.
- property message_list_id: str#
The identifier assigned to the ingested SDII message list.
- class here.platform.layer.StreamLayer(layer_id: str, catalog: Catalog)[source]#
Bases:
Layer
This class provides access to data stored in stream layers.
- append_stream_metadata(partitions: Iterable[StreamPartition] | pd.DataFrame, publication: Publication | None = None, adapter: Adapter | None = None, **kwargs) None [source]#
Append new partition metadata to the stream layer directly as messages to the stream.
- Parameters:
publication – the publication this operation is part of
partitions – the partitions to append as messages, or adapter-specific
adapter – the
Adapter
to transform the input.None
to use the default adapter of the catalog.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- blob_exists(data_handle: str, billing_tag: str | None = None) bool [source]#
Check if a blob exists for the requested data handle.
- Parameters:
data_handle – The data handle identifies a specific blob so that you can get that blob’s contents.
billing_tag – A string which is used for grouping billing records.
- Returns:
a boolean indicating if the handle exists.
- get_stream_metadata(subscription: StreamSubscription, commit_offsets: bool = True, adapter: Adapter | None = None, **kwargs) Iterator[StreamPartition] | pd.DataFrame [source]#
Consume metadata for a subscription.
It does not download blobs, use
read_stream
for that.The function consumes and returns messages for the stream subscription. The amount of messages retrieved depends on a variety of factors and it is not possible to assume that all the available messages are returned with one single invocation: users should invoke this function multiple times to read all the content present in the stream, at least until some data is returned, if this is what is wanted.
When no more data is returned, users can reasonably assume the end of stream is reached. However, when operating with a distributed, asynchronous messaging system like the one employed in this case, producers can append new messages at any point in time and there may be a delay between the moment when data is produced and the moment when data is available for consumption.
While no message is lost, the end of stream can’t always be detected reliably.
- Parameters:
subscription – the subscription from where to consume the data
commit_offsets – automatically commit offsets so next read starts at the end of the last consumed message.
adapter – the
Adapter
to transform and return the result.None
to use the default adapter of the catalog.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
Iterator
ofStreamPartition
objects, or adapter-specific- Raises:
ValueError – in case the subscription is invalid # noqa
- kafka_consumer(group_id: str | None = None, **kwargs) KafkaConsumer [source]#
Instantiate and return a new KafkaConsumer pre-configured to operate with the layer.
- Parameters:
group_id – identifies the consumer group this consumer belongs to.
kwargs – Kafka consumer properties. See also: https://kafka.apache.org/11/documentation.html#newconsumerconfigs
- Returns:
kafka consumer
- kafka_producer(**kwargs) KafkaProducer [source]#
Instantiate and return a new KafkaProducer pre-configured to operate with the layer.
- Parameters:
kwargs – Kafka producer properties.
- Returns:
Kafka producer
- put_blob(path_or_data: str | bytes | Path, partition_id: str | None = None, data_handle: str | None = None, inline_stream_data_limit: int | None = 1048576) Partition [source]#
Upload a blob to the durable blob service.
- Parameters:
path_or_data – content to be uploaded, it must match the layer content type, if set.
partition_id – partition identifier the blob relates to.
data_handle – data handle to use for the blob, in case already available, if not available an appropriate one is generated and returned.
inline_stream_data_limit – threshold data size in bytes to decide if inline stream data field should be populated, if data size is less than the inline_data_limit then the data would be added to StreamPartition.data field or else blob would be uploaded and its data_handle will be added to StreamPartition.data_handle field.
- Returns:
partition object referencing the uploaded data
- read_stream(subscription: StreamSubscription, commit_offsets: bool = True, decode: bool = True, adapter: Adapter | None = None, stream: bool = False, chunk_size: int = 102400, **kwargs) Iterator[Tuple[StreamPartition, bytes]] | Iterator[Tuple[StreamPartition, Iterator[bytes]]] | Iterator[Tuple[StreamPartition, Any]] | pd.DataFrame [source]#
Consume data for this subscription. Download and decode the blobs.
The function consumes and returns messages for the stream subscription. The amount of messages retrieved depends on a variety of factors and it is not possible to assume that all the available messages are returned with one single invocation: users should invoke this function multiple times to read all the content present in the stream, at least until some data is returned, if this is what is wanted.
When no more data is returned, users can reasonably assume the end of stream is reached. However, when operating with a distributed, asynchronous messaging system like the one employed in this case, producers can append new messages at any point in time and there may be a delay between the moment when data is produced and the moment when data is available for consumption.
While no message is lost, the end of stream can’t always be detected reliably.
- Parameters:
subscription – the subscription from where to consume the data
commit_offsets – automatically commit offset so next read starts at the end of the last message.
decode – whether to decode the data through an adapter or return raw bytes
adapter – the
Adapter
to transform and return the result.None
to use the default adapter of the catalog.stream – whether to stream data. This implies decode=False.
chunk_size – the size to request each iteration when streaming data.
kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
Iterator
ofStreamPartition
objects each with its raw data, in casedecode=False
, adapter-specific otherwise- Raises:
ValueError – in case the subscription is invalid # noqa
ValueError – in case decoding is requested but the adapter does not support the content type of the layer requested, or invalid parameters
LayerConfigurationException – in case decoding is requested but the layer doesn’t have any content type configured
- subscribe(mode: Mode = Mode.SERIAL, consumer_id: str | None = None, kafka_consumer_properties: dict | None = None, group_id: str | None = None, auto_offset_reset: str | None = None, subscription_id: str | None = None) StreamSubscription [source]#
Enable message consumption for this layer.
- Parameters:
mode – The subscription mode for this subscription. By default value is serial.
consumer_id – The Id to use to identify this consumer. It must be unique within the consumer group. If you do not provide one, the system will generate one.
kafka_consumer_properties – Properties to configure the kafka consumer on the service
group_id – set the consumer group id
auto_offset_reset –
to seek to some predefined locations in the stream. earliest: automatically reset the offset to the earliest offset latest: automatically reset the offset to the latest offset none: the service will return an error if no previous offset is
found for the consumer’s group
subscription_id – subscription id returned from a previous call to subscribe(). This allows a previously created subscription (e.g. saved to persistent storage between application runs) to be restored.
For other kafka consumer available settings, see https://kafka.apache.org/documentation/#consumerconfigs. :return: a new subscription to the stream layer
- write_stream(data: Iterable[Tuple[str | int, str | Path | bytes] | Tuple[str | int, str | Path | bytes, int | None]] | Mapping[str | int, str | Path | bytes] | Iterable[Tuple[str | int, Any] | Tuple[str | int, Any, int | None]] | Mapping[str | int, Any] | pd.DataFrame, timestamp: int | None = None, encode: bool = True, inline_data_limit: int = 819200, adapter: Adapter | None = None, **kwargs)[source]#
Write new content to the layer and push the related partition metadata to the stream as part of a publication.
- Parameters:
data – data to upload to the layer and derive metadata from: a sequence of elements, each either
(id, data)
or(id, data, timestamp)
. Timestamp is optional and in milliseconds since Unix epoch (1970-01-01T00:00:00 UTC)encode – whether to encode the data or upload raw bytes
timestamp – optional timestamp for all the messages, if none is specified in data: in milliseconds since Unix epoch (1970-01-01T00:00:00 UTC)
inline_data_limit – threshold data size in bytes to decide if inline stream data field should be populated ,if data size is less than the inline_data_limit then the data would be added to StreamPartition.data field or else blob would be uploaded and its data_handle will be added to StreamPartition.data_handle field. Default is 819200 bytes.
adapter – the
Adapter
to transform the input.None
to use the default adapter of the catalog.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- class here.platform.layer.StreamSubscription(layer: StreamLayer, sub_id: str, sub_mode: Mode, node_base_url: str)[source]#
Bases:
object
Represent a subscription to consume data from a stream layer.
The subscription must be closed by unsubscribing to free resources on the service.
- class Mode(value)[source]#
Bases:
Enum
Mode of a stream subscription.
- PARALLEL = 'parallel'#
- SERIAL = 'serial'#
- commit_offsets(offsets: Dict[int, int])[source]#
Commit specified offsets once read is done.
- Parameters:
offsets – Dict of offset {<Partition ID>:<Offset Number>, <Partition ID>:<Offset Number>}. Partition id is kafka partition id.
- seek_to_offsets(offsets: Dict[int, int])[source]#
Seek to stream offsets for a stream layer subscription. It will start reading data from a specified offsets.
- Parameters:
offsets – Dict of offset {<Partition ID>:<Offset Number>, <Partition ID>:<Offset Number>}. Partition id is kafka partition id.
- unsubscribe(strict: bool = False)[source]#
Disable message consumption for this layer.
After unsubscribing, you need to subscribe to the stream layer again to be able to resume the data consumption.
- Parameters:
strict – True to require that the subscription exists, False to allow it to have already been cancelled.
- class here.platform.layer.VersionedLayer(layer_id: str, catalog: Catalog)[source]#
Bases:
Layer
This class provides access to data stored in versioned layers.
- blob_exists(data_handle: str, billing_tag: str | None = None) bool [source]#
Check if a blob exists for the requested data handle.
- Parameters:
data_handle – The data handle identifies a specific blob so that you can get that blob’s contents.
billing_tag – A string which is used for grouping billing records.
- Returns:
a boolean indicating if the handle exists.
- get_age_map(data_level: str) VersionedLayerStatisticsMap [source]#
Retrieve layer age map.
- Parameters:
data_level – One of the Data Levels configured for this layer.
By default, assets generated at deepest data level are returned. Note that assets returned for data levels greater than 11 represent data at data level 11.
- Returns:
VersionedLayerStatisticsMap object containing properties data, image.
- get_blob(data_handle: str, range_header: str | None = None, billing_tag: str | None = None, stream: bool = False, chunk_size: int = 102400) bytes | Iterator[bytes] [source]#
Get blob (raw bytes) for given layer ID and data-handle from storage.
- Parameters:
data_handle – The data handle identifies a specific blob so that you can get that blob’s contents.
stream – whether to stream data.
chunk_size – the size to request each iteration when streaming data.
range_header – an optional Range parameter to resume download of a large response
billing_tag – A string which is used for grouping billing records.
- Returns:
a blob response as bytes or iterator of bytes if stream is True
- get_partition_changes(since_version: int | None = None, version: int | None = None, part: str | None = None, additional_fields: List[str] | None = ['dataSize', 'checksum', 'compressedDataSize', 'crc'], adapter: Adapter | None = None, **kwargs) Iterator[VersionedPartition] | pd.DataFrame [source]#
Get list of all partition objects for the catalog with the given version.
- Parameters:
since_version – version from which partitions need to be tracked.
version – the catalog version. If not specified, the latest catalog version will be used
part – indicates which part of the layer shall be queried. If not specified, return all the partitions. It cannot be specified together with
partition_ids
additional_fields – Additional metadata fields dataSize, checksum, compressedDataSize, crc. By default considers all.
adapter – the
Adapter
to transform and return the result.None
to use the default adapter of the catalog.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
Iterator
ofVersionedPartition
objects, or adapter-specific
- get_partitions_metadata(partition_ids: List[int | str] | None = None, version: int | None = None, part: str | None = None, additional_fields: List[str] | None = ['dataSize', 'checksum', 'compressedDataSize', 'crc'], adapter: Adapter | None = None, **kwargs) Iterator[VersionedPartition] | pd.DataFrame [source]#
Get list of all partition objects for the catalog with the given version.
- Parameters:
partition_ids – The list of partition IDs. If not specified, all partitions are returned
version – the catalog version. If not specified, the latest catalog version will be used
part – indicates which part of the layer shall be queried. If not specified, return all the partitions. It cannot be specified together with
partition_ids
additional_fields – Additional metadata fields dataSize, checksum, compressedDataSize, crc. By default considers all.
adapter – the
Adapter
to transform and return the result.None
to use the default adapter of the catalog.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
Iterator
ofVersionedPartition
objects, or adapter-specific- Raises:
ValueError – in case of invalid parameter combination
- get_size_map(data_level: str) VersionedLayerStatisticsMap [source]#
Retrieve layer size map.
- Parameters:
data_level – One of the Data Levels configured for this layer.
By default, assets generated at deepest data level are returned. Note that assets returned for data levels greater than 11 represent data at data level 11.
- Returns:
VersionedLayerStatisticsMap object containing properties data, image.
- get_statistics() VersionedLayerStatistics [source]#
Retrieve layer statistics.
- Returns:
VersionedLayerStatistics object containing layer statistics.
- get_tile_map(data_level: str) VersionedLayerStatisticsMap [source]#
Retrieve layer tile map.
- Parameters:
data_level – One of the Data Levels configured for this layer.
By default, assets generated at deepest data level are returned. Note that assets returned for data levels greater than 11 represent data at data level 11.
- Returns:
VersionedLayerStatisticsMap object containing properties data, image.
- put_blob(path_or_data: str | bytes | Path, publication: Publication | None = None, partition_id: str | None = None, data_handle: str | None = None) Partition [source]#
Upload a blob to the durable blob service.
- Parameters:
path_or_data – content to be uploaded, it must match the layer content type, if set.
publication – the publication this operation is part of
partition_id – partition identifier the blob relates to.
data_handle – data handle to use for the blob, in case already available, if not available an appropriate one is generated and returned.
- Returns:
partition object referencing the uploaded data
- read_partitions(partition_ids: List[int | str] | None = None, version: int | None = None, part: str | None = None, decode: bool = True, adapter: Adapter | None = None, stream: bool = False, chunk_size: int = 102400, **kwargs) Iterator[Tuple[VersionedPartition, bytes]] | Iterator[Tuple[VersionedPartition, Iterator[bytes]]] | Iterator[Tuple[VersionedPartition, Any]] | pd.DataFrame [source]#
Read partition data from a layer.
- Parameters:
partition_ids – The list of partition IDs. If not specified, all partitions are read.
version – the catalog version. If not specified, the latest catalog version will be used.
part – indicates which part of the layer shall be queried. If not specified, return all the partitions. It cannot be specified together with
partition_ids
decode – whether to decode the data through an adapter or return raw bytes
stream – whether to stream data. This implies decode=false.
chunk_size – the size to request each iteration when streaming data.
adapter – the
Adapter
to transform and return the result.None
to use the default adapter of the catalog.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
Iterator
ofVersionedPartition
objects each with its raw data, in casedecode=False
, adapter-specific otherwise- Raises:
ValueError – in case decoding is requested but the adapter does not support the content type of the layer requested, or invalid parameters
LayerConfigurationException – in case decoding is requested but the layer doesn’t have any content type configured # noqa
- set_partitions_metadata(publication: Publication, update: None | Iterable[VersionedPartition] | pd.DataFrame = None, delete: None | Iterable[str | int] | pd.Series = None, adapter: Adapter | None = None, **kwargs)[source]#
Update the metadata of the layer as part of a publication by publishing updated partitions and/or deleting partitions.
- Parameters:
publication – the publication this operation is part of
update – the complete partitions to update, if any, or adapter-specific
delete – the partition ids to delete, if any, or adapter-specific
adapter – the
Adapter
to transform the input.None
to use the default adapter of the catalog.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- write_partitions(publication: Publication, data: Iterable[Tuple[str | int, str | Path | bytes]] | Mapping[str | int, str | Path | bytes] | Iterable[Tuple[str | int, Any]] | Mapping[str | int, Any] | pd.DataFrame, encode: bool = True, adapter: Adapter | None = None, **kwargs)[source]#
Upload content to the layer and publish the related partition metadata as part of a publication.
- Parameters:
publication – the publication this operation is part of
data – data to upload to the versioned layer, or adapter-specific
encode – whether to encode the data or upload raw bytes.
adapter – the
Adapter
to transform the input.None
to use the default adapter of the catalog.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- class here.platform.layer.VersionedLayerStatistics(json_dict: Dict[str, Any])[source]#
Bases:
JsonDictDocument
Response of the versioned layer containing layer statistics.
- property level_summary: Dict[int, VersionedLevelSummary]#
The summary of each level.
- class here.platform.layer.VersionedLayerStatisticsMap(data: bytes)[source]#
Bases:
object
Response of the versioned layer containing layer bitmap (bytes) data type response handling.
- property data: bytes#
The raw data bytes.
- property image#
The representation of the raw bytes as an IPython.display.Image. Note: IPython toolkit needs to be installed.
- Returns:
IPython.display.Image
- Raises:
RuntimeError – in case IPython toolkit is not installed
- class here.platform.layer.VersionedLevelSummary(json_dict: Dict[str, Any])[source]#
Bases:
JsonDictDocument
Response of the versioned layer containing level summary.
- property bounding_box: dict#
The bounding box of the level.
- property max_partition_size: int#
The max partition size of level.
- property min_partition_size: int#
The minimum partition size of level.
- property processed_timestamp: int#
The processed timestamp of level.
- property size: int#
The size in bytes in the level .
- property total_partitions: int#
The total number of partitions in level.
- property version: int#
The version of level.
- class here.platform.layer.VolatileLayer(layer_id: str, catalog: Catalog)[source]#
Bases:
Layer
This class provides access to data stored in volatile layers.
- blob_exists(data_handle: str, billing_tag: str | None = None) bool [source]#
Check if a blob exists for the requested data handle.
- Parameters:
data_handle – The data handle identifies a specific blob so that you can get that blob’s contents.
billing_tag – A string which is used for grouping billing records.
- Returns:
a boolean indicating if the handle exists.
- delete_partitions(publication: Publication, partitions: Iterable[VolatilePartition])[source]#
Delete content to selected partitions of the layer.
- Parameters:
publication – the publication this operation is part of
partitions – identifiers of the volatile partitions to delete
- get_blob(data_handle: str, billing_tag: str | None = None, stream: bool = False, chunk_size: int = 102400) bytes | Iterator[bytes] [source]#
Get blob (raw bytes) for given layer ID and data-handle from storage.
- Parameters:
data_handle – The data handle identifies a specific blob so that you can get that blob’s contents.
billing_tag – A string which is used for grouping billing records.
stream – whether to stream data
chunk_size – the size to request each iteration when streaming data.
- Returns:
a blob response as bytes or iterator of bytes if stream is True
- get_partitions_metadata(partition_ids: List[int | str] | None = None, additional_fields: List[str] | None = ['dataSize', 'checksum', 'compressedDataSize', 'crc'], adapter: Adapter | None = None, **kwargs) Iterator[VolatilePartition] | pd.DataFrame [source]#
Get list of all partition objects for the catalog.
- Parameters:
partition_ids – The list of partition IDs. If not specified, all partitions are read.
additional_fields – Additional metadata fields dataSize, checksum, compressedDataSize, crc. By default considers all.
adapter – the
Adapter
to transform and return the result.None
to use the default adapter of the catalog.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
Iterator
ofVolatilePartition
objects, or adapter-specific
- put_blob(path_or_data: str | bytes | Path, publication: Publication | None = None, partition_id: str | None = None, data_handle: str | None = None) Partition [source]#
Upload a blob to the volatile blob service.
- Parameters:
path_or_data – content to be uploaded, it must match the layer content type, if set.
publication – the publication this operation is part of
partition_id – partition identifier the blob relates to
data_handle – data handle to use for the blob, in case already available, if not available an appropriate one is generated and returned.
- Returns:
partition object referencing the uploaded data
- read_partitions(partition_ids: List[int | str] | None = None, decode: bool = True, adapter: Adapter | None = None, stream: bool = False, chunk_size: int = 102400, **kwargs) Iterator[Tuple[VolatilePartition, bytes]] | Iterator[Tuple[VolatilePartition, Iterator[bytes]]] | Iterator[Tuple[VolatilePartition, Any]] | pd.DataFrame [source]#
Read partition data from a layer.
- Parameters:
partition_ids – The list of partition IDs. If not specified, all partitions are read.
decode – whether to decode the data through an adapter or return raw bytes
stream – whether to stream data. This implies decode=false.
adapter – the
Adapter
to transform and return the result.None
to use the default adapter of the catalog.chunk_size – the size to request each iteration when streaming data.
kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- Returns:
Iterator
ofVolatilePartition
objects each with its raw data, in casedecode=False
, adapter-specific otherwise- Raises:
ValueError – in case decoding is requested but the adapter does not support the content type of the layer requested, or invalid parameters
LayerConfigurationException – in case decoding is requested but the layer doesn’t have any content type configured # noqa
- set_partitions_metadata(publication: Publication, update: None | Iterable[VolatilePartition] | pd.DataFrame = None, delete: None | Iterable[str | int] | pd.Series = None, adapter: Adapter | None = None, **kwargs) None [source]#
Update the metadata of the layer as part of a publication by publishing updated partitions and/or deleting partitions.
- Parameters:
publication – the publication this operation is part of
update – the complete partitions to update, if any, or adapter-specific
delete – the partition ids to delete, if any, or adapter-specific
adapter – the
Adapter
to transform the input.None
to use the default adapter of the catalog.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports
- write_partitions(publication: Publication, data: Iterable[Tuple[str | int, str | Path | bytes]] | Mapping[str | int, str | Path | bytes] | Iterable[Tuple[str | int, Any]] | Mapping[str | int, Any] | pd.DataFrame, encode: bool = True, adapter: Adapter | None = None, **kwargs) None [source]#
Upload content to the layer and publish the related partition metadata as part of a publication.
- Parameters:
publication – the publication this operation is part of.
data – data to upload to the volatile layer, or adapter-specific
encode – whether to encode the data or upload raw bytes.
adapter – the
Adapter
to transform the input.None
to use the default adapter of the catalog.kwargs – adapter-specific, please consult the documentation of the specific adapter to for the parameters and types it supports