HERE platform catalog layer partition abstraction.

class here.platform.partition.IndexPartition(layer: IndexLayer, data_handle: str | None = None, checksum: str | None = None, data_size: int | None = None, crc: str | None = None, timestamp: int | None = None, fields: Dict[str, str | int | bool] = {}, additional_metadata: Dict[str, str] = {})[source]#

Bases: Partition

Partition subclass used for ``IndexLayer``s.

In addition to the fields present in the base class, this class adds: - timestamp: timestamp of the partition - fields: fields according to the index layer configuration - additional_metadata: free-form additional metadata

class here.platform.partition.Partition(data_handle: str | None = None, layer=None, id: str | None = None, checksum: str | None = None, data_size: int | None = None, crc: str | None = None)[source]#

Bases: object

HERE platform partition abstraction.

static generate_data_handle() str[source]#

Generate a unique data handle.

Returns:

A string representing a unique blob identifier

get_blob(stream: bool = False, chunk_size: int = 102400) bytes | Iterator[bytes][source]#

Get blob (raw bytes) inside the partition for this layer with given data_handle.

Parameters:
  • stream – whether to stream data.

  • chunk_size – the size to request each iteration when streaming data.

Returns:

Content of the blob referenced by the data handle.

Raises:

ValueError – Unsupported content encoding or invalid data handle

static get_data_handler(path_or_data: str | bytes | Path)[source]#

Get the data handler to read from file path or data.

Parameters:

path_or_data – Path of the file or data to be read and uploaded as a partition.

Yield Union[BytesIO, BinaryIO]:

Data Handle

multipart_upload(path_or_data: str | bytes | Path, part_size: int)[source]#

Multipart upload data to a blob store.

Parameters:
  • path_or_data – Path of the file or data in bytes to be upload as a partition.

  • part_size – An int representing size in MB, to upload in multiple parts minimum value is 5MB and and Maximum is 50MB.

Raises:

ValueError – Data handle not generated, max retry timeout reached

put_blob(path_or_data: str | bytes | Path)[source]#

Upload data in single part.

Parameters:

path_or_data – Path of the file or data in bytes to be uploaded as a partition.

async read_and_upload_parts(path_or_data: str | bytes | Path, upload_url: str, content_type: str, headers: Dict[str, str], part_size: int)[source]#

Read input file in chunks of 50Mb each and creates urls.

Parameters:
  • path_or_data – Path of the file or data to be read and upload as a partition.

  • upload_url – An URL to upload blob data.

  • content_type – A standard MIME type describing the format of the blob data.

  • headers – A dict containing http headers.

  • part_size – An int representing size in MB, to upload in multiple parts minimum value is 5MB and and Maximum is 50MB.

Returns:

A list of dict with key as part number and value as url response body.

async upload_part(session: ClientSession, upload_url: str, part_num: int, headers: Dict[str, str], data: bytes, max_retry_count: int = 3)[source]#

Upload a part data to a url, using specified ClientSession.

Parameters:
  • session – a ClientSession object for making requests.

  • upload_url – an URL to upload blob data.

  • part_num – an unique number for the part to be uploaded.

  • headers – HTTP headers to be send with the request.

  • data – blob data to be send with the request.

  • max_retry_count – Max retry count in case of failure with exponential backoff - 60 seconds.

Returns:

Returns a dict with key as part number and value as response body.

upload_volatile_blob(path_or_data: str | Path | bytes)[source]#

Upload data to volatile blob.

Parameters:

path_or_data – Path of the file or data to be read and upload as a partition

Raises:

ValueError – in case data handle is not set

class here.platform.partition.StreamPartition(layer: StreamLayer, data_handle: str | None = None, id: str | None = None, checksum: str | None = None, data_size: int | None = None, crc: str | None = None, data: bytes | None = None, timestamp: int | None = None, kafka_partition: int | None = None, kafka_offset: int | None = None)[source]#

Bases: Partition

Partition subclass used for ``StreamLayer``s.

In addition to the fields present in the base class, this class adds: - data: inline data, alternative to data_handle - timestamp: timestamp in milliseconds since the Unix epoch (1970-01-01T00:00:00 UTC) - kafka_offset: the offset of the message in the Kafka stream partition - kafka_partition: the Kafka stream partition number the offset is related to

get_data(stream: bool = False, chunk_size: int = 102400) bytes | Iterator[bytes][source]#

Get the data associated with the StreamPartition.

Data can be present directly in the data field or retrieved via data_handle. This function returns the data, regardless of where it is stored.

Returns:

Data associated with the stream partition

Parameters:
  • stream – whether to stream data

  • chunk_size – the size to request each iteration when streaming data

Raises:

ValueError – in case neither in data nor data_handle is available

class here.platform.partition.VersionedPartition(data_handle: str | None, layer: VersionedLayer, id: str, checksum: str | None = None, data_size: int | None = None, compressed_data_size: int | None = None, crc: str | None = None, version: int | None = None)[source]#

Bases: Partition

Partition subclass used for ``VersionedLayer``s.

In addition to the fields present in the base class, this class adds: - version: version of the partition

class here.platform.partition.VolatilePartition(data_handle: str | None, layer: VolatileLayer, id: str | None = None, checksum: str | None = None, data_size: int | None = None, compressed_data_size: int | None = None, crc: str | None = None)[source]#

Bases: Partition

Partition subclass used for ``VolatileLayer``s.