TileDB Python API Reference¶
Modules¶
Typical usage of the Python interface to TileDB will use the top-level module tiledb
, e.g.
import tiledb
There is also a submodule libtiledb
which contains the necessary bindings to the underlying TileDB native library. Most of the time you will not need to interact with tiledb.libtiledb
unless you need native-library specific information, e.g. the version number:
import tiledb
tiledb.libtiledb.version() # Native TileDB library version number
Getting Started¶
Arrays may be opened with the tiledb.open
function:
- tiledb.open(uri, mode='r', key=None, attr=None, config=None, timestamp=None, ctx=None)¶
Open a TileDB array at the given URI
- Parameters:
uri – any TileDB supported URI
timestamp – array timestamp to open, int or None. See the TileDB time traveling documentation for detailed functionality description.
key – encryption key, str or None
mode (str) – (default ‘r’) Open the array object in read ‘r’, write ‘w’, modify exclusive ‘m’ mode, or delete ‘d’ mode
attr – attribute name to select from a multi-attribute array, str or None
config – TileDB config dictionary, dict or None
- Returns:
open TileDB {Sparse,Dense}Array object
Data import helpers¶
- tiledb.from_numpy(uri, array, config=None, ctx=None, **kwargs)¶
Write a NumPy array into a TileDB DenseArray, returning a readonly DenseArray instance.
- Parameters:
uri (str) – URI for the TileDB array (any supported TileDB URI)
array (numpy.ndarray) – dense numpy array to persist
config – TileDB config dictionary, dict or None
ctx (tiledb.Ctx) – A TileDB Context
kwargs – additional arguments to pass to the DenseArray constructor
- Return type:
- Returns:
An open DenseArray (read mode) with a single anonymous attribute
- Raises:
TypeError – cannot convert
uri
to unicode string- Raises:
- Keyword Arguments:
full_domain - Dimensions should be created with full range of the dtype (default: False)
mode - Creation mode, one of ‘ingest’ (default), ‘schema_only’, ‘append’
append_dim - The dimension along which the Numpy array is append (default: 0).
start_idx - The starting index to append to. By default, append to the end of the existing data.
timestamp - Write TileDB array at specific timestamp.
dim_dtype - Dimension data type, default np.uint64
attr_name - Attribute name, default empty string
tile - Tile extent for each dimension, default None
Additionally, arguments accepted by ArraySchema constructor can also be passed to customize the underlying array schema.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... # Creates array 'array' on disk. ... with tiledb.from_numpy(tmp + "/array", np.array([1.0, 2.0, 3.0])) as A: ... pass
- tiledb.from_csv(uri: str, csv_file: str | List[str], **kwargs)¶
Create TileDB array at given URI from a CSV file or list of files
- Parameters:
uri – URI for new TileDB array
csv_file – input CSV file or list of CSV files. Note: multi-file ingestion requires a chunksize argument. Files will be read in batches of at least chunksize rows before writing to the TileDB array.
- Keyword Arguments:
Any pandas.read_csv supported keyword argument
ctx - A TileDB context
sparse - (default True) Create sparse schema
index_dims (
List[str]
) – List of column name(s) to use as dimension(s) in TileDB array schema. This is the recommended way to create dimensions. (note: the Pandasread_csv
argumentindex_col
will be passed through if provided, which results in indexes that will be converted to dimnesions by default; howeverindex_dims
is preferred).allows_duplicates - Generated schema should allow duplicates
mode - Creation mode, one of ‘ingest’ (default), ‘schema_only’, ‘append’
attr_filters - FilterList to apply to Attributes: FilterList or Dict[str -> FilterList] for any attribute(s). Unspecified attributes will use default.
dim_filters - FilterList to apply to Dimensions: FilterList or Dict[str -> FilterList] for any dimensions(s). Unspecified dimensions will use default.
offsets_filters - FilterList to apply to all offsets
full_domain - Dimensions should be created with full range of the dtype
tile - Dimension tiling: accepts either an int that applies the tiling to all dimensions or a dict(“dim_name”: int) to specifically assign tiling to a given dimension
row_start_idx - Start index to start new write (for row-indexed ingestions).
fillna - Value to use to fill holes
column_types - Dictionary of {
column_name
: dtype} to apply dtypes to columnsvarlen_types - A set of {dtypes}; any column wihin the set is converted to a variable length attribute
capacity - Schema capacity.
date_spec - Dictionary of {
column_name
: format_spec} to apply to date/time columns which are not correctly inferred by pandas ‘parse_dates’. Format must be specified using the Python format codes: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behaviorcell_order - (default ‘row-major) Schema cell order: ‘row-major’, ‘col-major’, or ‘hilbert’
tile_order - (default ‘row-major) Schema tile order: ‘row-major’ or ‘col-major’
timestamp - Write TileDB array at specific timestamp.
- Returns:
None
Example:
>>> import tiledb >>> tiledb.from_csv("iris.tldb", "iris.csv") >>> tiledb.object_type("iris.tldb") 'array'
- tiledb.from_pandas(uri, dataframe, **kwargs)¶
Create TileDB array at given URI from a Pandas dataframe
Supports most Pandas series types, including nullable integers and bools.
- Parameters:
uri – URI for new TileDB array
dataframe – pandas DataFrame
- Keyword Arguments:
Any pandas.read_csv supported keyword argument
ctx - A TileDB context
sparse - (default True) Create sparse schema
- chunksize - (default None) Maximum number of rows to read at a time. Note that this is also a pandas.read_csv argument
which tiledb.read_csv checks for in order to correctly read a file batchwise.
index_dims (
List[str]
) – List of column name(s) to use as dimension(s) in TileDB array schema. This is the recommended way to create dimensions.allows_duplicates - Generated schema should allow duplicates
mode - Creation mode, one of ‘ingest’ (default), ‘schema_only’, ‘append’
attr_filters - FilterList to apply to Attributes: FilterList or Dict[str -> FilterList] for any attribute(s). Unspecified attributes will use default.
dim_filters - FilterList to apply to Dimensions: FilterList or Dict[str -> FilterList] for any dimensions(s). Unspecified dimensions will use default.
offsets_filters - FilterList to apply to all offsets
full_domain - Dimensions should be created with full range of the dtype
tile - Dimension tiling: accepts either an int that applies the tiling to all dimensions or a dict(“dim_name”: int) to specifically assign tiling to a given dimension
row_start_idx - Start index to start new write (for row-indexed ingestions).
fillna - Value to use to fill holes
column_types - Dictionary of {
column_name
: dtype} to apply dtypes to columnsvarlen_types - A set of {dtypes}; any column wihin the set is converted to a variable length attribute
capacity - Schema capacity.
date_spec - Dictionary of {
column_name
: format_spec} to apply to date/time columns which are not correctly inferred by pandas ‘parse_dates’. Format must be specified using the Python format codes: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behaviorcell_order - (default ‘row-major) Schema cell order: ‘row-major’, ‘col-major’, or ‘hilbert’
tile_order - (default ‘row-major) Schema tile order: ‘row-major’ or ‘col-major’
timestamp - Write TileDB array at specific timestamp.
- Raises:
- Returns:
None
Context¶
- class tiledb.Ctx(config: Config = None)¶
Class representing a TileDB context.
A TileDB context wraps a TileDB storage manager.
- Parameters:
config (tiledb.Config or dict) – Initialize Ctx with given config parameters
- config()¶
Returns the Config instance associated with the Ctx.
- tiledb.default_ctx(config: Config | dict = None) Ctx ¶
Returns, and optionally initializes, the default tiledb.Ctx context variable.
This Ctx object is used by Python API functions when no ctx keyword argument is provided. Most API functions accept an optional ctx kwarg, but that is typically only necessary in advanced usage with multiple contexts per program.
For initialization, this function must be called before any other tiledb functions. The initialization call accepts a
tiledb.Config
object to override the defaults for process-global parameters.- Parameters:
config –
tiledb.Config
object or dictionary with config parameters.- Returns:
Ctx
- tiledb.scope_ctx(ctx_or_config: Ctx | Config | dict = None) Ctx ¶
Context manager for setting the default tiledb.Ctx context variable when entering a block of code and restoring it to its previous value when exiting the block.
- Parameters:
ctx_or_config –
tiledb.Ctx
ortiledb.Config
object or dictionary with config parameters.- Returns:
Ctx
Config¶
- class tiledb.Config(params: dict = None, path: str = None)¶
TileDB Config class
The Config object stores configuration parameters for both TileDB Embedded and TileDB-Py.
For TileDB Embedded parameters, see:
The following configuration options are supported by TileDB-Py:
py.init_buffer_bytes:
Initial allocation size in bytes for attribute and dimensions buffers. If result size exceed the pre-allocated buffer(s), then the query will return incomplete and TileDB-Py will allocate larger buffers and resubmit. Specifying a sufficiently large buffer size will often improve performance. Default 10 MB (1024**2 * 10).
py.use_arrow:
Use pyarrow from the Apache Arrow project to convert query results into Pandas dataframe format when requested. Default True.
py.deduplicate:
Attempt to deduplicate Python objects during buffer conversion to Python. Deduplication may reduce memory usage for datasets with many identical strings, at the cost of some performance reduction due to hash calculation/lookup for each object.
Unknown parameters will be ignored!
- Parameters:
- clear()¶
Unsets all Config parameters (returns them to their default values)
- from_file(path: str)¶
Update a Config object with from a persisted config file
- Parameters:
path – A local Config file path
- items(prefix: str = '')¶
Returns an iterator object over Config parameters, values
- Parameters:
prefix (str) – return only parameters with a given prefix
- Return type:
ConfigItems
- Returns:
iterator over Config parameter, value tuples
- keys(prefix: str = '')¶
Returns an iterator object over Config parameters (keys)
- Parameters:
prefix (str) – return only parameters with a given prefix
- Return type:
ConfigKeys
- Returns:
iterator over Config parameter string keys
- static load(uri: str)¶
Constructs a Config class instance from config parameters loaded from a local Config file
- save(uri: str)¶
Persist Config parameter values to a config file
- update(odict: dict)¶
Update a config object with parameter, values from a dict like object
- Parameters:
odict – dict-like object containing parameter, values to update Config.
Current Domain¶
- class tiledb.CurrentDomain(ctx: Ctx)¶
Represents a TileDB current domain.
- property ndrectangle¶
Gets the N-dimensional rectangle associated with the current domain object.
- Return type:
- Raises:
- set_ndrectangle(ndrect: NDRectangle)¶
Sets an N-dimensional rectangle representation on a current domain.
- Parameters:
ndrect – The N-dimensional rectangle to be used.
- Raises:
- property type¶
The type of the current domain.
- Return type:
tiledb.CurrentDomainType
NDRectangle¶
- class tiledb.NDRectangle(ctx: Ctx, domain: Domain)¶
Represents a TileDB N-Dimensional Rectangle.
Array Schema¶
- class tiledb.ArraySchema(domain: Domain = None, attrs: Sequence[Attr] = (), cell_order: str = 'row-major', tile_order: str = 'row-major', capacity: int = 0, coords_filters: FilterList | Sequence[Filter] = None, offsets_filters: FilterList | Sequence[Filter] = None, validity_filters: FilterList | Sequence[Filter] = None, allows_duplicates: bool = False, sparse: bool = False, dim_labels={}, enums=None, ctx: Ctx = None)¶
Schema class for TileDB dense / sparse array representations
- Parameters:
domain – Domain of schema
attrs – tuple of attributes
cell_order – TileDB label for cell layout
tile_order – TileDB label for tile layout
capacity (int) – tile cell capacity
offsets_filters – (default None) offsets filter list
validity_filters – (default None) validity filter list
allows_duplicates (bool) – True if duplicates are allowed
sparse (bool) – True if schema is sparse, else False (set by SparseArray and DenseArray derived classes)
dim_labels – dict(dim_index, dict(dim_name, tiledb.DimSchema))
enums – list of enumeration names
ctx (tiledb.Ctx) – A TileDB Context
- Raises:
- attr(key: str | int) Attr ¶
Returns an Attr instance given an int index or string label
- check() bool ¶
Checks the correctness of the array schema
- Return type:
None
- Raises:
tiledb.TileDBError
if invalid
- property coords_filters: FilterList¶
The FilterList for the array’s coordinates
- Return type:
- Raises:
- property current_domain: CurrentDomain¶
Get the current domain
- Return type:
- dim_label(name: str) DimLabel ¶
Returns a TileDB DimensionLabel given the label name
- Parameters:
name – name of the dimensin label
- Returns:
The dimension label associated with the given name
- dump()¶
Dumps a string representation of the array object to standard output (stdout)
- classmethod from_file(uri: str = None, ctx: Ctx = None)¶
Create an ArraySchema for a Filestore Array from a given file. If a uri is not given, then create a default schema.
- has_attr(name: str) bool ¶
Returns true if the given name is an Attribute of the ArraySchema
- Parameters:
name – attribute name
- Return type:
boolean
- has_dim_label(name: str) bool ¶
Returns true if the given name is a DimensionLabel of the ArraySchema
Note: If using an version of libtiledb that does not support dimension labels this will return false.
- Parameters:
name – dimension label name
- Return type:
boolean
- property offsets_filters: FilterList¶
The FilterList for the array’s variable-length attribute offsets
- Return type:
- Raises:
- set_current_domain(current_domain)¶
Set the current domain
- Parameters:
current_domain (tiledb.CurrentDomain) – The current domain to set
- property validity_filters: FilterList¶
The FilterList for the array’s validity
- Return type:
- Raises:
- tiledb.empty_like(uri, arr, config=None, key=None, tile=None, ctx=None, dtype=None)¶
Create and return an empty, writeable DenseArray with schema based on a NumPy-array like object.
- Parameters:
uri – array URI
arr – NumPy ndarray, or shape tuple
config – (optional, deprecated) configuration to apply to new Ctx
key – (optional) encryption key, if applicable
tile – (optional) tiling of generated array
ctx – (optional) TileDB Ctx
dtype – (optional) required if arr is a shape tuple
- Returns:
Attribute¶
- class tiledb.Attr(name: str = '', dtype: ~numpy.dtype = <class 'numpy.float64'>, fill: ~typing.Any = None, var: bool = None, nullable: bool = False, filters: ~tiledb.filter.FilterList | ~typing.Sequence[~tiledb.filter.Filter] = None, enum_label: str = None, ctx: ~tiledb.ctx.Ctx | None = None)¶
Represents a TileDB attribute.
- property dtype: dtype¶
Return numpy dtype object representing the Attr type
- Return type:
numpy.dtype
- dump()¶
Dumps a string representation of the Attr object to standard output (stdout)
- property fill: Any¶
Fill value for unset cells of this attribute
- Return type:
depends on dtype
- Raises:
tiledb.TileDBERror
- property filters: FilterList¶
FilterList of the TileDB attribute
- Return type:
- Raises:
- property isascii: bool¶
True if the attribute is TileDB dtype TILEDB_STRING_ASCII
- Return type:
- Raises:
- property name: str¶
Attribute string name, empty string if the attribute is anonymous
- Return type:
- Raises:
Filters¶
- class tiledb.FilterList(filters: Sequence[Filter] = None, chunksize: int = None, ctx: Ctx | None = None)¶
An ordered list of Filter objects for filtering TileDB data.
FilterLists contain zero or more Filters, used for filtering attribute data, the array coordinate data, etc.
- Parameters:
ctx (tiledb.Ctx) – A TileDB context
filters – An iterable of Filter objects to add.
chunksize (int) – (default None) chunk size used by the filter list in bytes
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... # Create several filters ... gzip_filter = tiledb.GzipFilter() ... bw_filter = tiledb.BitWidthReductionFilter() ... # Create a filter list that will first perform bit width reduction, then gzip compression. ... filters = tiledb.FilterList([bw_filter, gzip_filter]) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, filters=filters) ... # Create a second attribute filtered only by gzip compression. ... a2 = tiledb.Attr(name="a2", dtype=np.int64, ... filters=tiledb.FilterList([gzip_filter])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1, a2)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- __getitem__(idx: int) Filter ¶
- __getitem__(idx: slice) List[Filter]
Gets a copy of the filter in the list at the given index
- Parameters:
- Returns:
A filter at given index / slice
- Raises:
IndexError – invalid index
- Raises:
- append(filter: Filter)¶
- Parameters:
filter (Filter) – the filter to append into the FilterList
- Raises:
ValueError – filter argument incorrect type
- class tiledb.CompressionFilter(type: FilterType, level: int = -1, ctx: Ctx | None = None)¶
Base class for filters performing compression.
All compression filters support a compression level option, although some (such as RLE) ignore it.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.CompressionFilter(level=10)])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.GzipFilter(level: int = -1, ctx: Ctx | None = None)¶
Filter that compresses using gzip.
- Parameters:
ctx (tiledb.Ctx) – TileDB Ctx
level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.GzipFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.ZstdFilter(level: int = -1, ctx: Ctx | None = None)¶
Filter that compresses using zstd.
- Parameters:
ctx (tiledb.Ctx) – TileDB Ctx
level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.ZstdFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.LZ4Filter(level: int = -1, ctx: Ctx | None = None)¶
Filter that compresses using lz4.
- Parameters:
ctx (tiledb.Ctx) – TileDB Ctx
level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.LZ4Filter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.Bzip2Filter(level: int = -1, ctx: Ctx | None = None)¶
Filter that compresses using bzip2.
- Parameters:
level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.Bzip2Filter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.RleFilter(level: int = -1, ctx: Ctx | None = None)¶
Filter that compresses using run-length encoding (RLE).
- Parameters:
level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.RleFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.DeltaFilter(level: int = -1, reinterp_dtype: dtype | DataType | None = None, ctx: Ctx | None = None)¶
Filter that compresses using run-length encoding (RLE).
- Parameters:
level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.
reinterp_dtype – (optional) sets the compressor to compress the data treating
as the new datatype. :type reinterp_dtype: numpy, lt.DataType Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.DeltaFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.DoubleDeltaFilter(level: int = -1, reinterp_dtype: dtype | DataType | None = None, ctx: Ctx | None = None)¶
Filter that performs double-delta encoding.
- Parameters:
level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.
reinterp_dtype – (optional) sets the compressor to compress the data treating as the new datatype.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.DoubleDeltaFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.DictionaryFilter(level: int = -1, ctx: Ctx | None = None)¶
Filter that performs dictionary encoding.
- Parameters:
level (int) – -1 (default) sets the compressor level to the default level as specified in TileDB core. Otherwise, sets the compressor level to the given value.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.DictionaryFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.BitShuffleFilter(ctx: Ctx | None = None)¶
Filter that performs a bit shuffle transformation.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.BitShuffleFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.ByteShuffleFilter(ctx: Ctx | None = None)¶
Filter that performs a byte shuffle transformation.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.ByteShuffleFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.BitWidthReductionFilter(window: int = -1, ctx: Ctx | None = None)¶
Filter that performs bit-width reduction.
- param ctx:
A TileDB Context
- type ctx:
tiledb.Ctx
- param window:
-1 (default) sets the max window size for the filter to the default window size as specified in TileDB core. Otherwise, sets the compressor level to the given value.
- type window:
int
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.BitWidthReductionFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.PositiveDeltaFilter(window: int = -1, ctx: Ctx | None = None)¶
Filter that performs positive-delta encoding.
- Parameters:
ctx (tiledb.Ctx) – A TileDB Context
window (int) – -1 (default) sets the max window size for the filter to the default window size as specified in TileDB core. Otherwise, sets the compressor level to the given value.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.PositiveDeltaFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.ChecksumMD5Filter(ctx: Ctx | None = None)¶
MD5 checksum filter.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.ChecksumMD5Filter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.ChecksumSHA256Filter(ctx: Ctx | None = None)¶
SHA256 checksum filter.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.ChecksumSHA256Filter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.FloatScaleFilter(factor: float = None, offset: float = None, bytewidth: int = None, ctx: Ctx | None = None)¶
Filter that stores floats as integers in a reduced representation via scaling. The reduced storage space is in lieu of some precision loss. The float scaling filter takes three parameters: the factor, the offset, and the bytewidth. On write, the float scaling filter applies the factor (scaling factor) and offset, and stores the value of round((raw_float - offset) / factor) as an integer with the specified NumPy dtype. On read, the float scaling filter will reverse the factor and offset, and returns the floating point data, with a potential loss of precision. :param factor: the scaling factor used to translate the data :type factor: float :param offset: the offset value used to translate the data :type offset: float :param bytewidth: values may be stored as integers of bytewidth 1, 2, 4, or 8 :type np.integer: :param ctx: A TileDB Context :type ctx: tiledb.Ctx Example: >>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: … dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) … a1 = tiledb.Attr(name=”a1”, dtype=np.int64, … filters=tiledb.FilterList([tiledb.FloatScaleFilter(1, 0)])) … schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) … tiledb.DenseArray.create(tmp + “/array”, schema)
- class tiledb.XORFilter(ctx: Ctx | None = None)¶
XOR filter.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.XORFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
- class tiledb.WebpFilter(input_format: WebpInputFormat = None, quality: float = None, lossless: bool = None, ctx: Ctx | None = None)¶
The WebP filter provides three options: quality, format, and lossless
The quality option is used as quality_factor setting for WebP lossy compression and expects a float value in the range of 0.0f - 100.0f Quality of 0 corresponds to low quality and small output sizes, whereas 100 is the highest quality and largest output size.
The format option is used to define colorspace format of image data and expects an enum of TILEDB_WEBP_RGB, TILEDB_WEBP_BGR, TILEDB_WEBP_RGBA, or TILEDB_WEBP_BGRA.
The lossless option is used to enable(1) or disable(0) lossless compression. With this option enabled, the quality setting will be ignored.
On write this filter takes raw colorspace values (RGB, RBGA, etc) and encodes into WebP format before writing data to the array.
On read, this filter decodes WebP data and returns raw colorspace values to the caller.
This filter expects the array to provide two dimensions for Y, X pixel position. Dimensions may be defined with any name, but Y, X should be at dimension index 0, 1 respectively. Dimensions can be any two matching integral types, such as {uint64_t, uint64_t} or {int64_t, int64_t}.
The WebP filter supports only the uint8_t type for attributes.
- Parameters:
quality (float in range [0.0, 100.0]) – quality_factor setting for lossy WebP compression
input_format (np.uint8 corresponding to one of TILEDB_WEBP_{RGB, BGR, RGBA, BGRA}) – The input colorspace format of the image
lossless (np.uint8) – Enable (1) or disable (0) lossless image compression
ctx (tiledb.Ctx) – A TileDB Context
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... # Using RGB colorspace format ... pixel_depth = 3 # For RGBA / BGRA pixel_depth is 4 ... dims = (tiledb.Dim(name='Y', ... domain=(1, img_height), ... dtype=np.uint8, ... tile=img_height / 2,), ... tiledb.Dim(name='X', ... domain=(1, img_width * pixel_depth), ... dtype=np.uint8, ... tile=(img_width / 2) * pixel_depth,)) ... dom = tiledb.Domain(*dims) ... rgb = tiledb.Attr(name="rgb", dtype=np.uint8, ... filters=tiledb.FilterList([tiledb.WebpFilter(input_format=1, quality=100.0, lossless=1)])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(rgb,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
Dimension¶
- class tiledb.Dim(name: str = '__dim_0', domain: ~typing.Tuple[~typing.Any, ~typing.Any] = None, tile: ~typing.Any = None, filters: ~tiledb.filter.FilterList | ~typing.Sequence[~tiledb.filter.Filter] = None, dtype: ~numpy.dtype = <class 'numpy.uint64'>, var: bool = None, ctx: ~tiledb.ctx.Ctx | None = None)¶
Represents a TileDB dimension.
- create_label_schema(order: str = 'increasing', dtype: ~numpy.dtype = <class 'numpy.uint64'>, tile: ~typing.Any = None, filters: ~tiledb.filter.FilterList | ~typing.Sequence[~tiledb.filter.Filter] = None)¶
Creates a dimension label schema for a dimension label on this dimension
- Parameters:
order – Order or sort of the label data (‘increasing’ or ‘decreasing’).
dtype – Datatype of the label data.
tile – Tile extent for the dimension of the dimension label. If
None
, it will use the tile extent of this dimension.label_filters – Filter list for the attribute storing the label data.
- Return type:
DimLabelSchema
- property domain: Tuple[generic, generic]¶
The dimension (inclusive) domain.
The dimension’s domain is defined by a (lower bound, upper bound) tuple.
- Return type:
tuple(numpy scalar, numpy scalar)
- property dtype: dtype¶
Numpy dtype representation of the dimension type.
- Return type:
numpy.dtype
- property filters: FilterList¶
FilterList of the TileDB dimension
- Return type:
- Raises:
- property name: str¶
The dimension label string.
Anonymous dimensions return a default string representation based on the dimension index.
- Return type:
- property shape: Tuple[generic, generic]¶
The shape of the dimension given the dimension’s domain.
Note: The shape is only valid for integer and datetime dimension domains.
- property tile: generic¶
The tile extent of the dimension.
- Return type:
numpy scalar or np.timedelta64
Domain¶
- class tiledb.Domain(*dims: Dim, ctx: Ctx | None = None)¶
Represents a TileDB domain.
- dim(dim_id)¶
Returns a Dim object from the domain given the dimension’s index or name.
- Parameters:
dim_d – dimension index (int) or name (str)
- Raises:
- property dtype¶
The numpy dtype of the domain’s dimension type.
- Return type:
numpy.dtype
- dump()¶
Dumps a string representation of the domain object to standard output (STDOUT)
- has_dim(name)¶
Returns true if the Domain has a Dimension with the given name
- Parameters:
name – name of Dimension
- Return type:
- Returns:
- property homogeneous¶
Returns True if the domain’s dimension types are homogeneous.
- property shape¶
The domain’s shape, valid only for integer domains.
Array¶
- class tiledb.Array(uri, mode='r', key=None, timestamp=None, attr=None, ctx=None, **kwargs)¶
Base class for TileDB array objects.
Defines common properties/functionality for the different array types. When an Array instance is initialized, the array is opened with the specified mode.
- Parameters:
uri (str) – URI of array to open
mode (str) – (default ‘r’) Open the array object in read ‘r’, write ‘w’, modify exclusive ‘m’, or delete ‘d’ mode
key (str) – (default None) If not None, encryption key to decrypt the array
timestamp (tuple) – (default None) If int, open the array at a given TileDB timestamp. If tuple, open at the given start and end TileDB timestamps.
attr (str) – (default None) open one attribute of the array; indexing a dense array will return a Numpy ndarray directly rather than a dictionary.
ctx (Ctx) – TileDB context
- close()¶
Closes this array, flushing all buffered data.
- consolidate(config=None, key=None, fragment_uris=None, timestamp=None)¶
Consolidates fragments of an array object for increased read performance.
Overview: https://docs.tiledb.com/main/concepts/internal-mechanics/consolidation
- Parameters:
config (tiledb.Config) – The TileDB Config with consolidation parameters set
key (str or bytes) – (default None) encryption key to decrypt an encrypted array
fragment_uris – (default None) Consolidate the array using a list of fragment _names_ (note: the __ts1_ts2_<label>_<ver> fragment name form alone, not the full path(s))
timestamp (tuple (int, int)) – (default None) If not None, consolidate the array using the given tuple(int, int) UNIX seconds range (inclusive). This argument will be ignored if fragment_uris is passed.
- Raises:
Rather than passing the timestamp into this function, it may be set with the config parameters “sm.vacuum.timestamp_start”`and `”sm.vacuum.timestamp_end” which takes in a time in UNIX seconds. If both are set then this function’s timestamp argument will be used.
- classmethod create(uri, schema, key=None, overwrite=False, ctx=None)¶
Creates a TileDB Array at the given URI
- Parameters:
uri (str) – URI at which to create the new empty array.
schema (ArraySchema) – Schema for the array
key (str) – (default None) Encryption key to use for array
overwrite (bool) – (default False) Overwrite the array if it already exists
ctx (Ctx) – (default None) Optional TileDB Ctx used when creating the array, by default uses the ArraySchema’s associated context (not necessarily
tiledb.default_ctx
).
- static delete_array(uri, ctx=None)¶
Delete the given array.
Example:
>>> import tiledb, tempfile, numpy as np >>> path = tempfile.mkdtemp()
>>> with tiledb.from_numpy(path, np.zeros(4), timestamp=1) as A: ... pass >>> tiledb.array_exists(path) True
>>> tiledb.Array.delete_array(path)
>>> tiledb.array_exists(path) False
- static delete_fragments(uri, timestamp_start, timestamp_end, ctx=None)¶
Delete a range of fragments from timestamp_start to timestamp_end. The array needs to be opened in ‘m’ mode as shown in the example below.
- Parameters:
Example:
>>> import tiledb, tempfile, numpy as np >>> path = tempfile.mkdtemp()
>>> with tiledb.open(path, 'w', timestamp=2) as A: ... A[:] = np.ones(4, dtype=np.int64)
>>> with tiledb.open(path, 'r') as A: ... A[:] array([1., 1., 1., 1.])
>>> tiledb.Array.delete_fragments(path, 2, 2)
>>> with tiledb.open(path, 'r') as A: ... A[:] array([0., 0., 0., 0.])
- property df¶
Retrieve data cells as a Pandas dataframe, with multi-range, domain-inclusive indexing using
multi_index
.- Parameters:
selection (list) – Per dimension, a scalar,
slice
, or list of scalars orslice
objects. Scalars andslice
components should match the type of the underlying Dimension.- Returns:
dict of {‘attribute’: result}. Coords are included by default for Sparse arrays only (use Array.query(coords=<>) to select).
- Raises:
IndexError – invalid or unsupported index selection
- Raises:
df[]
accepts, for each dimension, a scalar,slice
, or list of scalars orslice
objects. Each item is interpreted as a point (scalar) or range (slice
) used to query the array on the corresponding dimension.** Example **
>>> import tiledb, tempfile, numpy as np, pandas as pd >>> >>> with tempfile.TemporaryDirectory() as tmp: ... data = {'col1_f': np.arange(0.0,1.0,step=0.1), 'col2_int': np.arange(10)} ... df = pd.DataFrame.from_dict(data) ... tiledb.from_pandas(tmp, df) ... A = tiledb.open(tmp) ... A.df[1] ... A.df[1:5] col1_f col2_int 1 0.1 1 col1_f col2_int 1 0.1 1 2 0.2 2 3 0.3 3 4 0.4 4 5 0.5 5
- property dtype¶
The NumPy dtype of the specified attribute
- enum(name)¶
Return the Enumeration from the attribute name.
- Parameters:
name – attribute name
- Return type:
Enumeration
- property isopen¶
True if this array is currently open.
- property iswritable¶
This array is currently opened as writable.
- label_index(labels)¶
Retrieve data cells with multi-range, domain-inclusive indexing by label. Returns the cross-product of the ranges.
Accepts a scalar,
slice
, or list of scalars per-label for querying on the corresponding dimensions. For multidimensional arrays querying by labels only on a subset of dimensions,:
should be passed in-place for any labels preceeding custom ranges.** Example **
>>> import tiledb, numpy as np, tempfile >>> from collections import OrderedDict >>> dim1 = tiledb.Dim("d1", domain=(1, 4)) >>> dim2 = tiledb.Dim("d2", domain=(1, 3)) >>> dom = tiledb.Domain(dim1, dim2) >>> att = tiledb.Attr("a1", dtype=np.int64) >>> dim_labels = { ... 0: {"l1": dim1.create_label_schema("decreasing", np.int64)}, ... 1: { ... "l2": dim2.create_label_schema("increasing", np.int64), ... "l3": dim2.create_label_schema("increasing", np.float64), ... }, ... } >>> schema = tiledb.ArraySchema(domain=dom, attrs=(att,), dim_labels=dim_labels) >>> with tempfile.TemporaryDirectory() as tmp: ... tiledb.Array.create(tmp, schema) ... ... a1_data = np.reshape(np.arange(1, 13), (4, 3)) ... l1_data = np.arange(4, 0, -1) ... l2_data = np.arange(-1, 2) ... l3_data = np.linspace(0, 1.0, 3) ... ... with tiledb.open(tmp, "w") as A: ... A[:] = {"a1": a1_data, "l1": l1_data, "l2": l2_data, "l3": l3_data} ... ... with tiledb.open(tmp, "r") as A: ... np.testing.assert_equal( ... A.label_index(["l1"])[3:4], ... OrderedDict({"l1": [4, 3], "a1": [[1, 2, 3], [4, 5, 6]]}), ... ) ... np.testing.assert_equal( ... A.label_index(["l1", "l3"])[2, 0.5:1.0], ... OrderedDict( ... {"l3": [0.5, 1.0], "l1": [2], "a1": [[8, 9]]} ... ), ... ) ... np.testing.assert_equal( ... A.label_index(["l2"])[:, -1:0], ... OrderedDict( ... {"l2": [-1, 0], ... "a1": [[1, 2], [4, 5], [7, 8], [10, 11]]}, ... ), ... ) ... np.testing.assert_equal( ... A.label_index(["l3"])[:, 0.5:1.0], ... OrderedDict( ... {"l3": [0.5, 1.], ... "a1": [[2, 3], [5, 6], [8, 9], [11, 12]]}, ... ), ... )
- Parameters:
labels – List of labels to use when querying. Can only use at most one label per dimension.
selection (list) – Per dimension, a scalar,
slice
, or list of scalars. Each item is iterpreted as a point (scalar) or range (slice
) used to query the array on the corresponding dimension.
- Returns:
dict of {‘label/attribute’: result}.
- Raises:
- classmethod load_typed(uri, mode='r', key=None, timestamp=None, attr=None, ctx=None)¶
Return a {Dense,Sparse}Array instance from a pre-opened Array (internal)
- property meta: Metadata¶
- Returns:
The Array’s metadata as a key-value structure
- Return type:
Metadata
- property mode¶
The mode this array was opened with.
- property multi_index¶
Retrieve data cells with multi-range, domain-inclusive indexing. Returns the cross-product of the ranges.
- Parameters:
selection (list) – Per dimension, a scalar,
slice
, or list of scalars orslice
objects. Scalars andslice
components should match the type of the underlying Dimension.- Returns:
dict of {‘attribute’: result}. Coords are included by default for Sparse arrays only (use Array.query(coords=<>) to select).
- Raises:
IndexError – invalid or unsupported index selection
- Raises:
multi_index[]
accepts, for each dimension, a scalar,slice
, or list of scalars orslice
objects. Each item is interpreted as a point (scalar) or range (slice
) used to query the array on the corresponding dimension.Unlike NumPy array indexing,
multi_index
respects TileDB’s range semantics: slice ranges are inclusive of the start- and end-point, and negative ranges do not wrap around (because a TileDB dimensions may have a negative domain).See also: https://docs.tiledb.com/main/api-usage/reading-arrays/multi-range-subarrays
** Example **
>>> import tiledb, tempfile, numpy as np >>> >>> with tempfile.TemporaryDirectory() as tmp: ... A = tiledb.from_numpy(tmp, np.eye(4) * [1,2,3,4]) ... A.multi_index[1] ... A.multi_index[1,1] ... # return row 0 and 2 ... A.multi_index[[0,2]] ... # return rows 0 and 2 intersecting column 2 ... A.multi_index[[0,2], 2] ... # return rows 0:2 intersecting columns 0:2 ... A.multi_index[slice(0,2), slice(0,2)] OrderedDict(...''... array([[0., 2., 0., 0.]])...) OrderedDict(...''... array([[2.]])...) OrderedDict(...''... array([[1., 0., 0., 0.], [0., 0., 3., 0.]])...) OrderedDict(...''... array([[0.], [3.]])...) OrderedDict(...''... array([[1., 0., 0.], [0., 2., 0.], [0., 0., 3.]])...)
- property nattr¶
The number of attributes of this array.
- property ndim¶
The number of dimensions of this array.
- nonempty_domain()¶
Return the minimum bounding domain which encompasses nonempty values.
- property ptr¶
Return the underlying C++ TileDB array object pointer
- reopen()¶
Reopens this array.
This is useful when the array is updated after it was opened. To sync-up with the updates, the user must either close the array and open again, or just use
reopen()
without closing.reopen
will be generally faster than a close-then-open.
- property schema¶
The
ArraySchema
for this array.
- property shape¶
The shape of this array.
- property timestamp_range¶
Returns the timestamp range the array is opened at
- Return type:
- Returns:
tiledb timestamp range at which point the array was opened
- upgrade_version(config=None)¶
Upgrades an array to the latest format version.
- Parameters:
config – (default None) Configuration parameters for the upgrade (nullptr means default, which will use the config from ctx).
- Raises:
- view_attr()¶
The view attribute of this array.
- tiledb.consolidate(uri, config=None, ctx=None, fragment_uris=None, timestamp=None)¶
Consolidates TileDB array fragments for improved read performance
- Parameters:
uri (str) – URI to the TileDB Array
key (str) – (default None) Key to decrypt array if the array is encrypted
config (tiledb.Config) – The TileDB Config with consolidation parameters set
ctx (tiledb.Ctx) – (default None) The TileDB Context
fragment_uris – (default None) Consolidate the array using a list of fragment file names
timestamp – (default None) If not None, consolidate the array using the given tuple(int, int) UNIX seconds range (inclusive). This argument will be ignored if fragment_uris is passed.
- Return type:
- Returns:
path (URI) to the consolidated TileDB Array
- Raises:
TypeError – cannot convert path to unicode string
- Raises:
Rather than passing the timestamp into this function, it may be set with the config parameters “sm.vacuum.timestamp_start”`and `”sm.vacuum.timestamp_end” which takes in a time in UNIX seconds. If both are set then this function’s timestamp argument will be used.
Example:
>>> import tiledb, tempfile, numpy as np, os >>> path = tempfile.mkdtemp()
>>> with tiledb.from_numpy(path, np.zeros(4), timestamp=1) as A: ... pass >>> with tiledb.open(path, 'w', timestamp=2) as A: ... A[:] = np.ones(4, dtype=np.int64) >>> with tiledb.open(path, 'w', timestamp=3) as A: ... A[:] = np.ones(4, dtype=np.int64) >>> with tiledb.open(path, 'w', timestamp=4) as A: ... A[:] = np.ones(4, dtype=np.int64) >>> len(tiledb.array_fragments(path)) 4
>>> fragment_names = [ ... os.path.basename(f) for f in tiledb.array_fragments(path).uri ... ] >>> array_uri = tiledb.consolidate( ... path, fragment_uris=[fragment_names[1], fragment_names[3]] ... ) >>> len(tiledb.array_fragments(path)) 3
- tiledb.vacuum(uri, config=None, ctx=None, timestamp=None)¶
Vacuum underlying array fragments after consolidation.
- Parameters:
uri (str) – URI of array to be vacuumed
config – Override the context configuration for vacuuming. Defaults to None, inheriting the context parameters.
(ctx – tiledb.Ctx, optional): Context. Defaults to tiledb.default_ctx().
- Raises:
TypeError – cannot convert uri to unicode string
- Raises:
This operation of this function is controlled by the “sm.vacuum.mode” parameter, which accepts the values
fragments
,fragment_meta
,array_meta
, andcommits
. Rather than passing the timestamp into this function, it may be set by using “sm.vacuum.timestamp_start”`and `”sm.vacuum.timestamp_end” which takes in a time in UNIX seconds. If both are set then this function’s timestamp argument will be used.Example:
>>> import tiledb, numpy as np >>> import tempfile >>> path = tempfile.mkdtemp() >>> with tiledb.from_numpy(path, np.random.rand(4)) as A: ... pass # make sure to close >>> with tiledb.open(path, 'w') as A: ... for i in range(4): ... A[:] = np.ones(4, dtype=np.int64) * i >>> paths = tiledb.VFS().ls(path) >>> # should be 12 (2 base files + 2*5 fragment+ok files) >>> (); len(paths); () (...) >>> () ; tiledb.consolidate(path) ; () (...) >>> tiledb.vacuum(path) >>> paths = tiledb.VFS().ls(path) >>> # should now be 4 ( base files + 2 fragment+ok files) >>> (); len(paths); () (...)
Dense Array¶
- class tiledb.DenseArray(*args, **kw)¶
- __getitem__(selection)¶
Retrieve data cells for an item or region of the array.
- Parameters:
selection (tuple) – An int index, slice or tuple of integer/slice objects, specifying the selected subarray region for each dimension of the DenseArray.
- Return type:
numpy.ndarray
orcollections.OrderedDict
- Returns:
If the dense array has a single attribute then a Numpy array of corresponding shape/dtype is returned for that attribute. If the array has multiple attributes, a
collections.OrderedDict
is returned with dense Numpy subarrays for each attribute.- Raises:
IndexError – invalid or unsupported index selection
- Raises:
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... # Creates array 'array' on disk. ... A = tiledb.from_numpy(tmp + "/array", np.ones((100, 100))) ... # Many aspects of Numpy's fancy indexing are supported: ... A[1:10, ...].shape ... A[1:10, 20:99].shape ... A[1, 2].shape (9, 100) (9, 79) () >>> # Subselect on attributes when reading: >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... schema = tiledb.ArraySchema(domain=dom, ... attrs=(tiledb.Attr(name="a1", dtype=np.int64), ... tiledb.Attr(name="a2", dtype=np.int64))) ... tiledb.DenseArray.create(tmp + "/array", schema) ... with tiledb.DenseArray(tmp + "/array", mode='w') as A: ... A[0:10] = {"a1": np.zeros((10)), "a2": np.ones((10))} ... with tiledb.DenseArray(tmp + "/array", mode='r') as A: ... # Access specific attributes individually. ... A[0:5]["a1"] ... A[0:5]["a2"] array([0, 0, 0, 0, 0]) array([1, 1, 1, 1, 1])
- __setitem__(selection, val)¶
Set / update dense data cells
- Parameters:
selection (tuple) – An int index, slice or tuple of integer/slice objects, specifiying the selected subarray region for each dimension of the DenseArray.
val (dict or
numpy.ndarray
) – a dictionary of array attribute values, values must able to be converted to n-d numpy arrays. if the number of attributes is one, then a n-d numpy array is accepted.
- Raises:
IndexError – invalid or unsupported index selection
ValueError – value / coordinate length mismatch
- Raises:
Example:
>>> import tiledb, numpy as np, tempfile >>> # Write to single-attribute 2D array >>> with tempfile.TemporaryDirectory() as tmp: ... # Create an array initially with all zero values ... with tiledb.from_numpy(tmp + "/array", np.zeros((2, 2))) as A: ... pass ... with tiledb.DenseArray(tmp + "/array", mode='w') as A: ... # Write to the single (anonymous) attribute ... A[:] = np.array(([1,2], [3,4])) >>> >>> # Write to multi-attribute 2D array >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain( ... tiledb.Dim(domain=(0, 1), tile=2, dtype=np.uint64), ... tiledb.Dim(domain=(0, 1), tile=2, dtype=np.uint64)) ... schema = tiledb.ArraySchema(domain=dom, ... attrs=(tiledb.Attr(name="a1", dtype=np.int64), ... tiledb.Attr(name="a2", dtype=np.int64))) ... tiledb.DenseArray.create(tmp + "/array", schema) ... with tiledb.DenseArray(tmp + "/array", mode='w') as A: ... # Write to each attribute ... A[0:2, 0:2] = {"a1": np.array(([-3, -4], [-5, -6])), ... "a2": np.array(([1, 2], [3, 4]))}
- query(attrs=None, cond=None, dims=None, coords=False, order='C', use_arrow=None, return_arrow=False, return_incomplete=False)¶
Construct a proxy Query object for easy subarray queries of cells for an item or region of the array across one or more attributes.
Optionally subselect over attributes, return dense result coordinate values, and specify a layout a result layout / cell-order.
- Parameters:
attrs – the DenseArray attributes to subselect over. If attrs is None (default) all array attributes will be returned. Array attributes can be defined by name or by positional index.
cond – the str expression to filter attributes or dimensions on. The expression must be parsable by tiledb.QueryCondition(). See help(tiledb.QueryCondition) for more details.
dims – the DenseArray dimensions to subselect over. If dims is None (default) then no dimensions are returned, unless coords=True.
coords – if True, return array of coodinate value (default False).
order – ‘C’, ‘F’, ‘U’, or ‘G’ (row-major, col-major, unordered, TileDB global order)
mode – “r” to read (default), “d” to delete
use_arrow – if True, return dataframes via PyArrow if applicable.
return_arrow – if True, return results as a PyArrow Table if applicable.
return_incomplete –
if True, initialize and return an iterable Query object over the indexed range. Consuming this iterable returns a result set for each TileDB incomplete query. See usage example in ‘examples/incomplete_iteration.py’. To retrieve the estimated result sizes for the query ranges, use:
A.query(…, return_incomplete=True)[…].est_result_size()
If False (default False), queries will be internally run to completion by resizing buffers and resubmitting until query is complete.
- Returns:
A proxy Query object that can be used for indexing into the DenseArray over the defined attributes, in the given result layout (order).
- Raises:
ValueError – array is not opened for reads (mode = ‘r’)
- Raises:
Example:
>>> # Subselect on attributes when reading: >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... schema = tiledb.ArraySchema(domain=dom, ... attrs=(tiledb.Attr(name="a1", dtype=np.int64), ... tiledb.Attr(name="a2", dtype=np.int64))) ... tiledb.DenseArray.create(tmp + "/array", schema) ... with tiledb.DenseArray(tmp + "/array", mode='w') as A: ... A[0:10] = {"a1": np.zeros((10)), "a2": np.ones((10))} ... with tiledb.DenseArray(tmp + "/array", mode='r') as A: ... # Access specific attributes individually. ... np.testing.assert_equal(A.query(attrs=("a1",))[0:5], ... {"a1": np.zeros(5)})
Sparse Array¶
- class tiledb.SparseArray(*args, **kw)¶
- __getitem__(selection)¶
Retrieve nonempty cell data for an item or region of the array
- Parameters:
selection (tuple) – An int index, slice or tuple of integer/slice objects, specifying the selected subarray region for each dimension of the SparseArray.
- Return type:
- Returns:
An OrderedDict is returned with dimension and attribute names as keys. Nonempty attribute values are returned as Numpy 1-d arrays.
- Raises:
IndexError – invalid or unsupported index selection
- Raises:
Example:
>>> import tiledb, numpy as np, tempfile >>> from collections import OrderedDict >>> # Write to multi-attribute 2D array >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain( ... tiledb.Dim(name="y", domain=(0, 9), tile=2, dtype=np.uint64), ... tiledb.Dim(name="x", domain=(0, 9), tile=2, dtype=np.uint64)) ... schema = tiledb.ArraySchema(domain=dom, sparse=True, ... attrs=(tiledb.Attr(name="a1", dtype=np.int64), ... tiledb.Attr(name="a2", dtype=np.int64))) ... tiledb.SparseArray.create(tmp + "/array", schema) ... with tiledb.SparseArray(tmp + "/array", mode='w') as A: ... # Write in the twp cells (0,0) and (2,3) only. ... I, J = [0, 2], [0, 3] ... # Write to each attribute ... A[I, J] = {"a1": np.array([1, 2]), ... "a2": np.array([3, 4])} ... with tiledb.SparseArray(tmp + "/array", mode='r') as A: ... # Return an OrderedDict with values and coordinates ... np.testing.assert_equal(A[0:3, 0:10], OrderedDict({'a1': np.array([1, 2]), ... 'a2': np.array([3, 4]), 'y': np.array([0, 2], dtype=np.uint64), ... 'x': np.array([0, 3], dtype=np.uint64)})) ... # Return just the "x" coordinates values ... A[0:3, 0:10]["x"] array([0, 3], dtype=uint64)
With a floating-point array domain, index bounds are inclusive, e.g.:
>>> # Return nonempty cells within a floating point array domain (fp index bounds are inclusive): >>> # A[5.0:579.9]
- __setitem__(selection, val)¶
Set / update sparse data cells
- Parameters:
selection (tuple) – N coordinate value arrays (dim0, dim1, …) where N in the ndim of the SparseArray, The format follows numpy sparse (point) indexing semantics.
val (dict or
numpy.ndarray
) – a dictionary of nonempty array attribute values, values must able to be converted to 1-d numpy arrays. if the number of attributes is one, then a 1-d numpy array is accepted.
- Raises:
IndexError – invalid or unsupported index selection
ValueError – value / coordinate length mismatch
- Raises:
Example:
>>> import tiledb, numpy as np, tempfile >>> # Write to multi-attribute 2D array >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain( ... tiledb.Dim(domain=(0, 1), tile=2, dtype=np.uint64), ... tiledb.Dim(domain=(0, 1), tile=2, dtype=np.uint64)) ... schema = tiledb.ArraySchema(domain=dom, sparse=True, ... attrs=(tiledb.Attr(name="a1", dtype=np.int64), ... tiledb.Attr(name="a2", dtype=np.int64))) ... tiledb.SparseArray.create(tmp + "/array", schema) ... with tiledb.SparseArray(tmp + "/array", mode='w') as A: ... # Write in the corner cells (0,0) and (1,1) only. ... I, J = [0, 1], [0, 1] ... # Write to each attribute ... A[I, J] = {"a1": np.array([1, 2]), ... "a2": np.array([3, 4])}
- query(attrs=None, cond=None, dims=None, index_col=True, coords=None, order='U', use_arrow=None, return_arrow=None, return_incomplete=False)¶
Construct a proxy Query object for easy subarray queries of cells for an item or region of the array across one or more attributes.
Optionally subselect over attributes, return dense result coordinate values, and specify a layout a result layout / cell-order.
- Parameters:
attrs – the SparseArray attributes to subselect over. If attrs is None (default) all array attributes will be returned. Array attributes can be defined by name or by positional index.
cond – the str expression to filter attributes or dimensions on. The expression must be parsable by tiledb.QueryCondition(). See help(tiledb.QueryCondition) for more details.
dims – the SparseArray dimensions to subselect over. If dims is None (default) then all dimensions are returned, unless coords=False.
index_col – For dataframe queries, override the saved index information, and only set specified index(es) in the final dataframe, or None.
coords – (deprecated) if True, return array of coordinate value (default False).
order – ‘C’, ‘F’, or ‘G’ (row-major, col-major, tiledb global order)
mode – “r” to read
use_arrow – if True, return dataframes via PyArrow if applicable.
return_arrow – if True, return results as a PyArrow Table if applicable.
- Returns:
A proxy Query object that can be used for indexing into the SparseArray over the defined attributes, in the given result layout (order).
Example:
>>> import tiledb, numpy as np, tempfile >>> from collections import OrderedDict >>> # Write to multi-attribute 2D array >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain( ... tiledb.Dim(name="y", domain=(0, 9), tile=2, dtype=np.uint64), ... tiledb.Dim(name="x", domain=(0, 9), tile=2, dtype=np.uint64)) ... schema = tiledb.ArraySchema(domain=dom, sparse=True, ... attrs=(tiledb.Attr(name="a1", dtype=np.int64), ... tiledb.Attr(name="a2", dtype=np.int64))) ... tiledb.SparseArray.create(tmp + "/array", schema) ... with tiledb.SparseArray(tmp + "/array", mode='w') as A: ... # Write in the twp cells (0,0) and (2,3) only. ... I, J = [0, 2], [0, 3] ... # Write to each attribute ... A[I, J] = {"a1": np.array([1, 2]), ... "a2": np.array([3, 4])} ... with tiledb.SparseArray(tmp + "/array", mode='r') as A: ... np.testing.assert_equal(A.query(attrs=("a1",), coords=False, order='G')[0:3, 0:10], ... OrderedDict({'a1': np.array([1, 2])}))
Query¶
- class tiledb.Query(array: Array, ctx: Ctx | None = None, attrs: Sequence[str] | Sequence[int] | None = None, cond: str | None = None, dims: bool | Sequence[str] = False, has_coords: bool = False, index_col: bool | Sequence[int] | None = True, order: str | None = None, use_arrow: bool | None = None, return_arrow: bool = False, return_incomplete: bool = False)¶
Represents a TileDB query.
- agg(aggs)¶
Calculate an aggregate operation for a given attribute. Available operations are sum, min, max, mean, count, and null_count (for nullable attributes only). Aggregates may be combined with other query operations such as query conditions and slicing.
The input may be a single operation, a list of operations, or a dictionary with attribute mapping to a single operation or list of operations.
For undefined operations on max and min, which can occur when a nullable attribute contains only nulled data at the given coordinates or when there is no data read for the given query (e.g. query conditions that do not match any values or coordinates that contain no data)), invalid results are represented as np.nan for attributes of floating point types and None for integer types.
>>> import tiledb, tempfile, numpy as np >>> path = tempfile.mkdtemp()
>>> with tiledb.from_numpy(path, np.arange(1, 10)) as A: ... pass
>>> # Note that tiledb.from_numpy creates anonymous attributes, so the >>> # name of the attribute is represented as an empty string
>>> with tiledb.open(path, 'r') as A: ... A.query().agg("sum")[:] 45
>>> with tiledb.open(path, 'r') as A: ... A.query(cond="attr('') < 5").agg(["count", "mean"])[:] {'count': 9, 'mean': 2.5}
>>> with tiledb.open(path, 'r') as A: ... A.query().agg({"": ["max", "min"]})[2:7] {'max': 7, 'min': 3}
- Parameters:
agg – The input attributes and operations to apply aggregations on
- Returns:
single value for single operation on one attribute, a dictionary of attribute keys associated with a single value for a single operation across multiple attributes, or a dictionary of attribute keys that maps to a dictionary of operation labels with the associated value
- property attrs¶
List of attributes to include in Query.
- property cond¶
QueryCondition used to filter attributes or dimensions in Query.
- property df¶
Apply Array.multi_index with query parameters and return result as a Pandas dataframe.
- property dims¶
List of dimensions to include in Query.
- property domain_index¶
Apply Array.domain_index with query parameters.
- get_stats(print_out=True, json=False)¶
Retrieves the stats from a TileDB query.
- Parameters:
print_out – Print string to console (default True), or return as string
json – Return stats JSON object (default: False)
- property index_col¶
List of columns to set as index for dataframe queries, or None.
- label_index(labels)¶
Apply Array.label_index with query parameters.
- property multi_index¶
Apply Array.multi_index with query parameters.
- property order¶
Return underlying Array order.
- subarray() Subarray ¶
Subarray with the ranges this query is on.
- Return type:
Subarray
- submit()¶
An alias for calling the regular indexer [:]
Query Condition¶
- class tiledb.QueryCondition(expression: str, ctx: ~tiledb.ctx.Ctx = <factory>)¶
Class representing a TileDB query condition object for attribute and dimension (sparse arrays only) filtering pushdown.
A query condition is set with a string representing an expression as defined by the grammar below. A more straight forward example of usage is given beneath.
When querying a sparse array, only the values that satisfy the given condition are returned (coupled with their associated coordinates). An example may be found in examples/query_condition_sparse.py.
For dense arrays, the given shape of the query matches the shape of the output array. Values that DO NOT satisfy the given condition are filled with the TileDB default fill value. Different attribute and dimension types have different default fill values as outlined here (https://docs.tiledb.com/main/background/internal-mechanics/writing#default-fill-values). An example may be found in examples/query_condition_dense.py.
A query condition is made up of one or more Boolean expressions. Multiple Boolean expressions are chained together with Boolean operators. The
or_op
Boolean operators are given lower presedence thanand_op
.A Bitwise expression may either be a comparison expression or membership expression.
A Boolean expression may either be a comparison expression or membership expression.
A comparison expression contains a comparison operator. The operator works on a TileDB attribute or dimension name (hereby known as a “TileDB variable”) and value.
All comparison operators are supported.
Bitwise operators are given higher precedence than comparison operators. Boolean operators are given lower precedence than comparison operators.
If an attribute name has special characters in it, you can wrap
namehere
inattr("namehere")
.A membership expression contains the membership operator,
in
. The operator works on a TileDB variable and list of values.TileDB variable names are Python valid variables or a
attr()
ordim()
casted string.Values are any Python-valid number or string. datetime64 values should first be cast to UNIX seconds. Values may also be casted with
val()
.Example:
>>> with tiledb.open(uri, mode="r") as A: >>> # Select cells where the values for `foo` are less than 5 >>> # and `bar` equal to string "asdf". >>> # Note precedence is equivalent to: >>> # tiledb.QueryCondition("foo > 5 or ('asdf' == var('b a r') and baz <= val(1.0))") >>> A.query(cond=tiledb.QueryCondition("foo > 5 or 'asdf' == var('b a r') and baz <= val(1.0)")) >>> >>> # Select cells where the values for `foo` are equal to 1, 2, or 3. >>> # Note this is equivalent to: >>> # tiledb.QueryCondition("foo == 1 or foo == 2 or foo == 3") >>> A.query(cond=tiledb.QueryCondition("foo in [1, 2, 3]")) >>> >>> # Example showing that bitwise operators (| ^ &) are given higher precedence than comparison operators >>> # and comparison operators are given higher precedence than logical operators. >>> # Note this is equivalent to: >>> # tiledb.QueryCondition("((foo == 1) or (foo == 2)) and ('xyz' == var('b a r')) and ((foo & 1) == 0")) >>> A.query(cond=tiledb.QueryCondition("foo == 1 or foo == 2 and 'xyz' == var('b a r') and foo & 1 == 0"))
Group¶
- class tiledb.Group(uri: str, mode: str = 'r', config: Config = None, ctx: Ctx | None = None)¶
Support for organizing multiple arrays in arbitrary directory hierarchies.
Group members may be any number of nested groups and arrays. Members are stored as tiledb.Objects which indicate the member’s URI and type.
Groups may contain associated metadata similar to array metadata where keys are strings. Singleton values may be of type int, float, str, or bytes. Multiple values of the same type may be placed in containers of type list, tuple, or 1-D np.ndarray. The values within containers are limited to type int or float.
See more at: https://docs.tiledb.com/main/background/key-concepts-and-data-format#arrays-and-groups
- Parameters:
uri (str) – The URI to the Group
mode (str) – Read mode (‘r’), write mode (‘w’), or modify exclusive (‘m’)
ctx (tiledb.Ctx) – A TileDB context
Example:
>>> # Create a group >>> grp_path = "root_group" >>> tiledb.Group.create(grp_path) >>> grp = tiledb.Group(grp_path, "w") >>> >>> # Create an array and add as a member to the group >>> array_path = "array.tdb" >>> domain = tiledb.Domain(tiledb.Dim(domain=(1, 8), tile=2)) >>> a1 = tiledb.Attr("val", dtype="f8") >>> schema = tiledb.ArraySchema(domain=domain, attrs=(a1,)) >>> tiledb.Array.create(array_path, schema) >>> grp.add(array_path) >>> >>> # Create a group and add as a subgroup >>> subgrp_path = "sub_group" >>> tiledb.Group.create(subgrp_path) >>> grp.add(subgrp_path) >>> >>> # Add metadata to the subgroup >>> grp.meta["ints"] = [1, 2, 3] >>> grp.meta["str"] = "string_metadata" >>> grp.close() >>> >>> grp.open("r") >>> # Dump all the members in string format >>> mbrs_repr = grp >>> # Or create a list of Objects in the Group >>> mbrs_iter = list(grp) >>> # Get the first member's uri and type >>> member_uri, member_type = grp[0].uri, grp[0].type >>> grp.close() >>> >>> # Remove the subgroup >>> grp.open("w") >>> grp.remove(subgrp_path) >>> grp.close() >>> >>> # Delete the subgroup >>> grp.open("m") >>> grp.delete(subgrp_path) >>> grp.close()
- __getitem__(member)¶
Retrieve a member from the Group as an Object.
- __contains__(member)¶
- Returns:
Whether the Group contains a member with the given name
- Return type:
- close()¶
Close a Group.
- static consolidate_metadata(uri: str, config: Config = None, ctx: Ctx | None = None)¶
Consolidate the group metadata.
- static create(uri: str, ctx: Ctx | None = None)¶
Create a new Group.
- Parameters:
uri (str) – The URI to the to-be created Group
ctx (tiledb.Ctx) – A TileDB context
- delete(recursive: bool = False)¶
Delete a Group. The group needs to be opened in ‘m’ mode.
- Parameters:
uri – The URI of the group to delete
- property meta: Metadata¶
- Returns:
The Group’s metadata as a key-value structure
- Return type:
Metadata
- property mode: str¶
- Returns:
Read mode (‘r’), write mode (‘w’), or modify exclusive (‘m’)
- Return type:
- open(mode: str = 'r')¶
Open a Group in read mode (“r”) or write mode (“w”).
- Parameters:
mode (str) – Read mode (‘r’) or write mode (‘w’)
Object¶
Object Management¶
- tiledb.array_exists(uri, isdense=False, issparse=False, ctx=None)¶
Check if arrays exists and is open-able at the given URI
- tiledb.group_create(uri: str, ctx: Ctx | None = None)¶
Create a new Group.
- Parameters:
uri (str) – The URI to the to-be created Group
ctx (tiledb.Ctx) – A TileDB context
- tiledb.object_type(uri: str, ctx: Ctx = None) str | None ¶
Returns the TileDB object type at the specified URI as a string.
- Parameters:
uri – URI of the TileDB resource
ctx – The TileDB Context
- Returns:
object type string (“array” or “group”) or None if invalid TileDB object```
- tiledb.remove(uri: str, ctx: Ctx = None)¶
Removes (deletes) the TileDB object at the specified URI
- Parameters:
uri – URI of the TileDB resource
ctx – The TileDB Context
- Raises:
- tiledb.move(old_uri: str, new_uri: str, ctx: Ctx = None)¶
Moves a TileDB resource (group, array, key-value).
- Parameters:
old_uri – URI of the TileDB resource to move
new_uri – URI of the destination
ctx – The TileDB Context
- Raises:
- tiledb.ls(uri: str, func: Callable, ctx: Ctx = None)¶
Lists TileDB resources and applies a callback that have a prefix of
uri
(one level deep).- Parameters:
uri – URI of TileDB group object
func – callback to execute on every listed TileDB resource, resource URI and object type label are passed as arguments to the callback
ctx – A TileDB Context
- tiledb.walk(uri: str, func: Callable, order: str = 'preorder', ctx: Ctx = None)¶
Recursively visits TileDB resources and applies a callback to resources that have a prefix of
uri
- Parameters:
uri – URI of TileDB group object
func – callback to execute on every listed TileDB resource, resource URI and object type label are passed as arguments to the callback
ctx – The TileDB context
order – ‘preorder’ (default) or ‘postorder’ tree traversal
- Raises:
ValueError – unknown order
- Raises:
Fragment Info¶
- class tiledb.FragmentInfoList(array_uri, include_mbrs=False, ctx=None)¶
Class representing an ordered list of FragmentInfo objects.
- Parameters:
array_uri (str) – URI for the TileDB array (any supported TileDB URI)
include_mbrs (bool) – (default False) include minimum bounding rectangles in FragmentInfo result
ctx (tiledb.Ctx) – A TileDB context
- Variables:
uri – URIs of fragments
version – Fragment version of each fragment
nonempty_domain – Non-empty domain of each fragment
cell_num – Number of cells in each fragment
timestamp_range – Timestamp range of when each fragment was written
sparse – For each fragment, True if fragment is sparse, else False
has_consolidated_metadata – For each fragment, True if fragment has consolidated fragment metadata, else False
unconsolidated_metadata_num – Number of unconsolidated metadata fragments in each fragment
to_vacuum – URIs of already consolidated fragments to vacuum
mbrs – (TileDB Embedded 2.5.0+ only) The mimimum bounding rectangle of each fragment; only present when include_mbrs=True
array_schema_name – (TileDB Embedded 2.5.0+ only) The array schema’s name
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... # The array will be 4x4 with dimensions "rows" and "cols", with domain [1,4] and space tiles 2x2 ... dom = tiledb.Domain( ... tiledb.Dim(name="rows", domain=(1, 4), tile=2, dtype=np.int32), ... tiledb.Dim(name="cols", domain=(1, 4), tile=2, dtype=np.int32), ... ) ... # The array will be dense with a single attribute "a" so each (i,j) cell can store an integer. ... schema = tiledb.ArraySchema( ... domain=dom, sparse=False, attrs=[tiledb.Attr(name="a", dtype=np.int32)] ... ) ... # Set URI of the array ... uri = tmp + "/array" ... # Create the (empty) array on disk. ... tiledb.Array.create(uri, schema) ... ... # Write three fragments to the array ... with tiledb.DenseArray(uri, mode="w") as A: ... A[1:3, 1:5] = np.array(([[1, 2, 3, 4], [5, 6, 7, 8]])) ... with tiledb.DenseArray(uri, mode="w") as A: ... A[2:4, 2:4] = np.array(([101, 102], [103, 104])) ... with tiledb.DenseArray(uri, mode="w") as A: ... A[3:4, 4:5] = np.array(([202])) ... ... # tiledb.array_fragments() requires TileDB-Py version > 0.8.5 ... fragments_info = tiledb.array_fragments(uri) ... ... "====== FRAGMENTS INFO ======" ... f"number of fragments: {len(fragments_info)}" ... f"nonempty domains: {fragments_info.nonempty_domain}" ... f"sparse fragments: {fragments_info.sparse}" ... ... for fragment in fragments_info: ... f"===== FRAGMENT NUMBER {fragment.num} =====" ... f"is sparse: {fragment.sparse}" ... f"cell num: {fragment.cell_num}" ... f"has consolidated metadata: {fragment.has_consolidated_metadata}" ... f"nonempty domain: {fragment.nonempty_domain}" '====== FRAGMENTS INFO ======' 'number of fragments: 3' 'nonempty domains: (((1, 2), (1, 4)), ((2, 3), (2, 3)), ((3, 3), (4, 4)))' 'sparse fragments: (False, False, False)' '===== FRAGMENT NUMBER 0 =====' 'is sparse: False' 'cell num: 8' 'has consolidated metadata: False' 'nonempty domain: ((1, 2), (1, 4))' '===== FRAGMENT NUMBER 1 =====' 'is sparse: False' 'cell num: 16' 'has consolidated metadata: False' 'nonempty domain: ((2, 3), (2, 3))' '===== FRAGMENT NUMBER 2 =====' 'is sparse: False' 'cell num: 4' 'has consolidated metadata: False' 'nonempty domain: ((3, 3), (4, 4))'
- class tiledb.FragmentInfo(fragments: FragmentInfoList, num)¶
Class representing the metadata for a single fragment. See
tiledb.FragmentInfoList
for example of usage.- Variables:
uri – URIs of fragments
version – Fragment version of each fragment
nonempty_domain – Non-empty domain of each fragment
cell_num – Number of cells in each fragment
timestamp_range – Timestamp range of when each fragment was written
sparse – For each fragment, True if fragment is sparse, else False
has_consolidated_metadata – For each fragment, True if fragment has consolidated fragment metadata, else False
unconsolidated_metadata_num – Number of unconsolidated metadata fragments in each fragment
to_vacuum – URIs of already consolidated fragments to vacuum
mbrs – (TileDB Embedded 2.5.0+ only) The mimimum bounding rectangle of each fragment; only present when include_mbrs=True
array_schema_name – (TileDB Embedded 2.5.0+ only) The array schema’s name
Enumeration¶
- class tiledb.Enumeration(name: str, ordered: bool, values: Sequence[Any] | None = None, dtype: dtype | None = None, ctx: Ctx | None = None)¶
Represents a TileDB Enumeration.
- property dtype: dtype¶
Numpy dtype representation of the enumeration type.
- Return type:
numpy.dtype
- extend(values: Sequence[Any]) Enumeration ¶
Add additional values to the enumeration.
- Parameters:
values – The values to add to the enumeration
- Return type:
Exceptions¶
- exception tiledb.TileDBError¶
VFS¶
- class tiledb.VFS(config: Config | dict = None, ctx: Ctx | None = None)¶
TileDB VFS class
Encapsulates the TileDB VFS module instance with a specific configuration (config).
- Parameters:
ctx (tiledb.Ctx) – The TileDB Context
config (tiledb.Config or dict) – Override ctx VFS configurations with updated values in config.
- close(file: FileHandle)¶
Closes a VFS FileHandle object.
- Parameters:
file (FileIO) – An opened VFS FileIO
- Return type:
- Returns:
closed VFS FileHandle
- Raises:
- copy_dir(old_uri: str | bytes | PathLike, new_uri: str | bytes | PathLike)¶
Copies a TileDB directory from an old URI to a new URI.
- copy_file(old_uri: str | bytes | PathLike, new_uri: str | bytes | PathLike)¶
Copies a TileDB file from an old URI to a new URI.
- create_bucket(uri: str | bytes | PathLike)¶
Creates an object store bucket with the input URI.
- Parameters:
uri (str) – Input URI of the bucket
- create_dir(uri: str | bytes | PathLike)¶
Create a directory at the specified input URI.
- Parameters:
uri (str) – Input URI of the directory
- empty_bucket(uri: str | bytes | PathLike)¶
Empty an object store bucket.
- Parameters:
uri (str) – Input URI of the bucket
- ls(uri: str | bytes | PathLike, recursive: bool = False) List[str] ¶
Retrieves the children in directory uri. This function is non-recursive, i.e., it focuses in one level below uri.
- ls_recursive(uri: str | bytes | PathLike, callback: Callable[[str, int], bool] | None = None)¶
Recursively lists objects at the input URI, invoking the provided callback on each entry gathered. The callback is passed the data pointer provided on each invocation and is responsible for writing the collected results into this structure. If the callback returns True, the walk will continue. If False, the walk will stop. If an error is thrown, the walk will stop and the error will be propagated to the caller using std::throw_with_nested.
Currently only local, S3, Azure and GCS are supported.
- Parameters:
uri (str) – Input URI of the directory
callback – Callback function to invoke on each entry
- move_dir(old_uri: str | bytes | PathLike, new_uri: str | bytes | PathLike)¶
Renames a TileDB directory from an old URI to a new URI.
- move_file(old_uri: str | bytes | PathLike, new_uri: str | bytes | PathLike)¶
Renames a TileDB file from an old URI to a new URI.
- open(uri: str | bytes | PathLike, mode: str = 'rb')¶
Opens a VFS file resource for reading / writing / appends at URI.
If the file did not exist upon opening, a new file is created.
- Parameters:
uri (str) – URI of VFS file resource
str (mode) – ‘rb’ for opening the file to read, ‘wb’ to write, ‘ab’ to append
- Return type:
FileHandle
- Returns:
TileDB FileIO
- Raises:
TypeError – cannot convert uri to unicode string
ValueError – invalid mode
- Raises:
- read(file: FileHandle, offset: int, nbytes: int) bytes ¶
Read nbytes from an opened VFS FileHandle at a given offset.
- Parameters:
- Return type:
bytes()
- Returns:
read bytes
- Raises:
- remove_bucket(uri: str | bytes | PathLike)¶
Deletes an object store bucket with the input URI.
- Parameters:
uri (str) – Input URI of the bucket
- remove_dir(uri: str | bytes | PathLike)¶
Removes a directory (recursively) with the input URI.
- Parameters:
uri (str) – Input URI of the directory
- remove_file(uri: str | bytes | PathLike)¶
Removes a file with the input URI.
- Parameters:
uri (str) – Input URI of the file
- supports(scheme: str) bool ¶
Returns true if the given URI scheme (storage backend) is supported.
- Parameters:
scheme (str) – scheme component of a VFS resource URI (ex. ‘file’ / ‘hdfs’ / ‘s3’)
- Return type:
- Returns:
True if the linked libtiledb version supports the storage backend, False otherwise
- Raises:
ValueError – VFS storage backend is not supported
- touch(uri: str | bytes | PathLike)¶
Touches a file with the input URI, i.e., creates a new empty file.
- Parameters:
uri (str) – Input URI of the file
- class tiledb.FileIO(vfs: VFS, uri: str | bytes | PathLike, mode: str = 'rb')¶
TileDB FileIO class that encapsulates files opened by tiledb.VFS. The file operations are meant to mimic Python’s built-in file I/O methods.
- flush()¶
Force the data to be written to the file.
- property mode: str¶
- Return type:
- Returns:
Whether the file is in read mode (“rb”), write mode (“wb”), or append mode (“ab”)
- readable() bool ¶
- Return type:
- Returns:
True if the file is readable (ie. “rb” mode), otherwise False
- readinto(buff: bytes | bytearray | memoryview) int ¶
Read bytes into a pre-allocated, writable, bytes-like object, and return the number of bytes read. :param buff bytes | bytearray | memoryview: A pre-allocated, writable object that supports the byte buffer protocol :rtype: int :return: The number of bytes read
Filestore¶
- class tiledb.Filestore(uri: str, ctx: Ctx | None = None)¶
Functions to set and get data to and from a TileDB Filestore Array.
A Filestore Array may be created using ArraySchema.from_file combined with Array.create.
- Parameters:
uri (str) – The URI to the TileDB Fileshare Array
ctx (tiledb.Ctx) – A TileDB context
- static copy_from(filestore_array_uri: str, file_uri: str, mime_type: str = 'AUTODETECT', ctx: Ctx | None = None) None ¶
Copy data from a file to a Filestore Array.
- Parameters:
filestore_array_uri (str) – The URI to the TileDB Fileshare Array
file_uri (str) – URI of file to export
mime_type (str) – MIME types are “AUTODETECT” (default), “image/tiff”, “application/pdf”
ctx (tiledb.Ctx) – A TileDB context
- static copy_to(filestore_array_uri: str, file_uri: str, ctx: Ctx | None = None) None ¶
Copy data from a Filestore Array to a file.
- Parameters:
filestore_array_uri (str) – The URI to the TileDB Fileshare Array
file_uri (str) – The URI to the TileDB Fileshare Array
ctx (tiledb.Ctx) – A TileDB context
- write(buffer: ByteString, mime_type: str = 'AUTODETECT') None ¶
Import data from an object that supports the buffer protocol to a Filestore Array.
- Parameters:
ByteString (buffer) – Data of type bytes, bytearray, memoryview, etc.
mime_type (str) – MIME types are “AUTODETECT” (default), “image/tiff”, “application/pdf”
Version¶
Statistics¶
- tiledb.stats_enable()¶
Enable TileDB internal statistics.
- tiledb.stats_disable()¶
Disable TileDB internal statistics.
- tiledb.stats_reset()¶
Reset all TileDB internal statistics to 0.
- tiledb.stats_dump(version=True, print_out=True, include_python=True, json=False, verbose=True)¶
Return TileDB internal statistics as a string.
- Parameters:
include_python – Include TileDB-Py statistics
print_out – Print string to console (default True), or return as string
version – Include TileDB Embedded and TileDB-Py versions (default: True)
json – Return stats JSON object (default: False)
verbose – Print extended internal statistics (default: True)