TileDB Python API Reference¶
Warning
The Python interface to TileDB is still under development and the API is subject to change.
Modules¶
Typical usage of the Python interface to TileDB will use the top-level module tiledb
, e.g.
import tiledb
There is also a submodule libtiledb
which contains the necessary bindings to the underlying TileDB native library. Most of the time you will not need to interact with tiledb.libtiledb
unless you need native-library specific information, e.g. the version number:
import tiledb
tiledb.libtiledb.version() # Native TileDB library version number
Getting Started¶
Arrays may be opened with the tiledb.open
function:
-
tiledb.
open
(uri, mode='r', key=None, attr=None, config=None, timestamp=None, ctx=None)¶ Open a TileDB array at the given URI
Parameters: - uri – any TileDB supported URI
- timestamp – array timestamp to open, int or None. See the TileDB time traveling documentation for detailed functionality description.
- key – encryption key, str or None
- mode (str) – (default ‘r’) Open the array object in read ‘r’ or write ‘w’ mode
- attr – attribute name to select from a multi-attribute array, str or None
- config – TileDB config dictionary, dict or None
Returns: open TileDB {Sparse,Dense}Array object
Data import helpers¶
-
tiledb.
from_numpy
(uri, array, config=None, ctx=None, **kwargs)¶ Write a NumPy array into a TileDB DenseArray, returning a readonly DenseArray instance.
Parameters: - uri (str) – URI for the TileDB array (any supported TileDB URI)
- array (numpy.ndarray) – dense numpy array to persist
- config – TileDB config dictionary, dict or None
- ctx (tiledb.Ctx) – A TileDB Context
- kwargs – additional arguments to pass to the DenseArray constructor
Return type: Returns: An open DenseArray (read mode) with a single anonymous attribute
Raises: TypeError – cannot convert
uri
to unicode stringRaises: Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... # Creates array 'array' on disk. ... with tiledb.DenseArray.from_numpy(tmp + "/array", np.array([1.0, 2.0, 3.0])) as A: ... pass
-
tiledb.
from_csv
(uri: str, csv_file: Union[str, List[str]], **kwargs)¶ Create TileDB array at given URI from a CSV file or list of files
Parameters: - uri – URI for new TileDB array
- csv_file – input CSV file or list of CSV files. Note: multi-file ingestion requires a chunksize argument. Files will be read in batches of at least chunksize rows before writing to the TileDB array.
Keyword Arguments: - Any pandas.read_csv supported keyword argument
- ctx - A TileDB context
- sparse - (default True) Create sparse schema
- index_dims - Set the df index using a list of existing column names
- allows_duplicates - Generated schema should allow duplicates
- mode - (default
ingest
), Ingestion mode:ingest
,schema_only
,append
- attr_filters - FilterList to apply to Attributes: FilterList or Dict[str -> FilterList] for any attribute(s). Unspecified attributes will use default.
- dim_filters - FilterList to apply to Dimensions: FilterList or Dict[str -> FilterList] for any dimensions(s). Unspecified dimensions will use default.
- offsets_filters - FilterList to apply to all offsets
- full_domain - Dimensions should be created with full range of the dtype
- tile - Dimension tiling: accepts either an int that applies the tiling to all dimensions or a dict(“dim_name”: int) to specifically assign tiling to a given dimension
- row_start_idx - Start index to start new write (for row-indexed ingestions).
- fillna - Value to use to fill holes
- column_types - Dictionary of {
column_name
: dtype} to apply dtypes to columns - varlen_types - A set of {dtypes}; any column wihin the set is converted to a variable length attribute
- capacity - Schema capacity.
- date_spec - Dictionary of {
column_name
: format_spec} to apply to date/time columns which are not correctly inferred by pandas ‘parse_dates’. Format must be specified using the Python format codes: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior - cell_order - (default ‘row-major) Schema cell order: ‘row-major’, ‘col-major’, or ‘hilbert’
- tile_order - (default ‘row-major) Schema tile order: ‘row-major’ or ‘col-major’
- timestamp - Write TileDB array at specific timestamp.
Returns: None
Example:
>>> import tiledb >>> tiledb.from_csv("iris.tldb", "iris.csv") >>> tiledb.object_type("iris.tldb") 'array'
-
tiledb.
from_pandas
(uri: str, dataframe: pd.DataFrame, **kwargs)¶ Create TileDB array at given URI from a Pandas dataframe
Supports most Pandas series types, including nullable integers and bools.
Parameters: - uri – URI for new TileDB array
- dataframe – pandas DataFrame
- mode – Creation mode, one of ‘ingest’ (default), ‘schema_only’, ‘append’
Keyword Arguments: - Any pandas.read_csv supported keyword argument
- ctx - A TileDB context
- sparse - (default True) Create sparse schema
- chunksize - (default None) Maximum number of rows to read at a time. Note that this is also a pandas.read_csv argument
- which tiledb.read_csv checks for in order to correctly read a file batchwise.
- index_dims - Set the df index using a list of existing column names
- allows_duplicates - Generated schema should allow duplicates
- mode - (default
ingest
), Ingestion mode:ingest
,schema_only
,append
- attr_filters - FilterList to apply to Attributes: FilterList or Dict[str -> FilterList] for any attribute(s). Unspecified attributes will use default.
- dim_filters - FilterList to apply to Dimensions: FilterList or Dict[str -> FilterList] for any dimensions(s). Unspecified dimensions will use default.
- offsets_filters - FilterList to apply to all offsets
- full_domain - Dimensions should be created with full range of the dtype
- tile - Dimension tiling: accepts either an int that applies the tiling to all dimensions or a dict(“dim_name”: int) to specifically assign tiling to a given dimension
- row_start_idx - Start index to start new write (for row-indexed ingestions).
- fillna - Value to use to fill holes
- column_types - Dictionary of {
column_name
: dtype} to apply dtypes to columns - varlen_types - A set of {dtypes}; any column wihin the set is converted to a variable length attribute
- capacity - Schema capacity.
- date_spec - Dictionary of {
column_name
: format_spec} to apply to date/time columns which are not correctly inferred by pandas ‘parse_dates’. Format must be specified using the Python format codes: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior - cell_order - (default ‘row-major) Schema cell order: ‘row-major’, ‘col-major’, or ‘hilbert’
- tile_order - (default ‘row-major) Schema tile order: ‘row-major’ or ‘col-major’
- timestamp - Write TileDB array at specific timestamp.
Raises: Returns: None
Context¶
-
class
tiledb.
Ctx
(config=None)¶ Class representing a TileDB context.
A TileDB context wraps a TileDB storage manager.
Parameters: config (tiledb.Config or dict) – Initialize Ctx with given config parameters -
config
(self)¶ Returns the Config instance associated with the Ctx.
-
get_stats
(self, print_out=True, json=False)¶ Retrieves the stats from a TileDB context.
Parameters: - print_out – Print string to console (default True), or return as string
- json – Return stats JSON object (default: False)
-
set_tag
(self, key, value)¶ Sets a (string, string) “tag” on the Ctx (internal).
-
-
tiledb.
default_ctx
(config: Union[Config, dict] = None) → Ctx¶ Returns, and optionally initializes, the default tiledb.Ctx context variable.
This Ctx object is used by Python API functions when no ctx keyword argument is provided. Most API functions accept an optional ctx kwarg, but that is typically only necessary in advanced usage with multiple contexts per program.
For initialization, this function must be called before any other tiledb functions. The initialization call accepts a
tiledb.Config
object to override the defaults for process-global parameters.Parameters: config – tiledb.Config
object or dictionary with config parameters.Returns: Ctx
Config¶
-
class
tiledb.
Config
(params=None, path=None)¶ TileDB Config class
The Config object stores configuration parameters for both TileDB Embedded and TileDB-Py.
For TileDB Embedded parameters, see:
The following configuration options are supported by TileDB-Py:
py.init_buffer_bytes:
Initial allocation size in bytes for attribute and dimensions buffers. If result size exceed the pre-allocated buffer(s), then the query will return incomplete and TileDB-Py will allocate larger buffers and resubmit. Specifying a sufficiently large buffer size will often improve performance. Default 10 MB (1024**2 * 10).
py.use_arrow:
Use pyarrow from the Apache Arrow project to convert query results into Pandas dataframe format when requested. Default True.
py.deduplicate:
Attempt to deduplicate Python objects during buffer conversion to Python. Deduplication may reduce memory usage for datasets with many identical strings, at the cost of some performance reduction due to hash calculation/lookup for each object.
Unknown parameters will be ignored!
Parameters: -
clear
(self)¶ Unsets all Config parameters (returns them to their default values)
-
dict
(self, prefix=u'')¶ Returns a dict representation of a Config object
Parameters: prefix (str) – return only parameters with a given prefix Return type: dict Returns: Config parameter / values as a a Python dict
-
from_file
(self, path)¶ Update a Config object with from a persisted config file
Parameters: path – A local Config file path
-
get
(self, key, raise_keyerror=True)¶
-
items
(self, prefix=u'')¶ Returns an iterator object over Config parameters, values
Parameters: prefix (str) – return only parameters with a given prefix Return type: ConfigItems Returns: iterator over Config parameter, value tuples
-
keys
(self, prefix=u'')¶ Returns an iterator object over Config parameters (keys)
Parameters: prefix (str) – return only parameters with a given prefix Return type: ConfigKeys Returns: iterator over Config parameter string keys
-
static
load
(uri)¶ Constructs a Config class instance from config parameters loaded from a local Config file
Parameters: uri (str) – a local URI config file path Return type: tiledb.Config Returns: A TileDB Config instance with persisted parameter values Raises: TypeError – uri cannot be converted to a unicode string Raises: tiledb.TileDBError
-
save
(self, uri)¶ Persist Config parameter values to a config file
Parameters: uri (str) – a local URI config file path Raises: TypeError – uri cannot be converted to a unicode string Raises: tiledb.TileDBError
-
update
(self, odict)¶ Update a config object with parameter, values from a dict like object
Parameters: odict – dict-like object containing parameter, values to update Config.
Array Schema¶
-
class
tiledb.
ArraySchema
(domain=None, attrs=(), cell_order='row-major', tile_order='row-major', capacity=0, coords_filters=None, offsets_filters=None, validity_filters=None, allows_duplicates=False, sparse=False, Ctx ctx=None)¶ Schema class for TileDB dense / sparse array representations
Parameters: - domain – Domain of schema
- cell_order ('row-major' (default) or 'C', 'col-major' or 'F' or 'hilbert') – TileDB label for cell layout
- tile_order ('row-major' (default) or 'C', 'col-major' or 'F') – TileDB label for tile layout
- capacity (int) – tile cell capacity
- offsets_filters (tiledb.FilterList) – (default None) offsets filter list
- validity_filters (tiledb.FilterList) – (default None) validity filter list
- allows_duplicates (bool) – True if duplicates are allowed
- sparse (bool) – True if schema is sparse, else False (set by SparseArray and DenseArray derived classes)
- ctx (tiledb.Ctx) – A TileDB Context
Raises: -
allows_duplicates
¶ Returns True if the (sparse) array allows duplicates.
-
attr
(self, key)¶ Returns an Attr instance given an int index or string label
Parameters: key (int or str) – attribute index (positional or associative) Return type: tiledb.Attr Returns: The ArraySchema attribute at index or with the given name (label) Raises: TypeError – invalid key type
-
attr_or_dim_dtype
(self, unicode name)¶
-
capacity
¶ The array capacity
Return type: int Raises: tiledb.TileDBError
-
cell_order
¶ The cell order layout of the array.
-
check
(self)¶ Checks the correctness of the array schema
Return type: None Raises: tiledb.TileDBError
if invalid
-
coords_compressor
¶ The compressor label and level for the array’s coordinates.
Return type: tuple(str, int) Raises: tiledb.TileDBError
-
coords_filters
¶ The FilterList for the array’s coordinates
Return type: tiledb.FilterList Raises: tiledb.TileDBError
-
domain
¶ The Domain associated with the array.
Return type: tiledb.Domain Raises: tiledb.TileDBError
-
dump
(self)¶ Dumps a string representation of the array object to standard output (stdout)
-
static
from_file
(uri=None, Ctx ctx=None)¶ Create an ArraySchema for a Filestore Array from a given file. If a uri is not given, then create a default schema.
-
has_attr
(self, name)¶ Returns true if the given name is an Attribute of the ArraySchema
Parameters: name – attribute name Return type: boolean
-
static
load
(uri, Ctx ctx=None, key=None)¶
-
nattr
¶ The number of array attributes.
Return type: int Raises: tiledb.TileDBError
-
offsets_compressor
¶ The compressor label and level for the array’s variable-length attribute offsets.
Return type: tuple(str, int) Raises: tiledb.TileDBError
-
offsets_filters
¶ The FilterList for the array’s variable-length attribute offsets
Return type: tiledb.FilterList Raises: tiledb.TileDBError
-
shape
¶ The array’s shape
Return type: tuple(numpy scalar, numpy scalar) Raises: TypeError – floating point (inexact) domain
-
sparse
¶ True if the array is a sparse array representation
Return type: bool Raises: tiledb.TileDBError
-
tile_order
¶ The tile order layout of the array.
Return type: str Raises: tiledb.TileDBError
-
validity_filters
¶ The FilterList for the array’s validity
Return type: tiledb.FilterList Raises: tiledb.TileDBError
-
version
¶ The array’s scehma version.
Return type: int :raises
tiledb.TileDBError
-
tiledb.
empty_like
(uri, arr, config=None, key=None, tile=None, ctx=None)¶ Create and return an empty, writeable DenseArray with schema based on a NumPy-array like object.
Parameters: - uri – array URI
- arr – NumPy ndarray, or shape tuple
- config – (optional, deprecated) configuration to apply to new Ctx
- key – (optional) encryption key, if applicable
- tile – (optional) tiling of generated array
- ctx – (optional) TileDB Ctx
Returns:
Attribute¶
-
class
tiledb.
Attr
(name=u'', dtype=np.float64, fill=None, var=None, nullable=False, filters=None, Ctx ctx=None)¶ Class representing a TileDB array attribute.
Parameters: - ctx (tiledb.Ctx) – A TileDB Context
- name (str) – Attribute name, empty if anonymous
- dtype (bool) – Attribute value datatypes
- nullable – Attribute is nullable
- fill – Fill value for unset cells.
- var – Attribute is variable-length (automatic for byte/string types)
- filters (FilterList) – List of filters to apply
Raises: TypeError – invalid dtype
Raises: -
compressor
¶ String label of the attributes compressor and compressor level
Return type: tuple(str, int) Raises: tiledb.TileDBError
-
dtype
¶ Return numpy dtype object representing the Attr type
Return type: numpy.dtype
-
dump
(self)¶ Dumps a string representation of the Attr object to standard output (stdout)
-
fill
¶ Fill value for unset cells of this attribute
Return type: depends on dtype Raises: tiledb.TileDBERror
-
filters
¶ FilterList of the TileDB attribute
Return type: tiledb.FilterList Raises: tiledb.TileDBError
-
isascii
¶ True if the attribute is TileDB dtype TILEDB_STRING_ASCII
Return type: bool Raises: tiledb.TileDBError
-
isnullable
¶ True if the attribute is nullable
Return type: bool Raises: tiledb.TileDBError
-
isvar
¶ True if the attribute is variable length
Return type: bool Raises: tiledb.TileDBError
-
name
¶ Attribute string name, empty string if the attribute is anonymous
Return type: str Raises: tiledb.TileDBError
-
ncells
¶ The number of cells (scalar values) for a given attribute value
Return type: int Raises: tiledb.TileDBError
Filters¶
-
class
tiledb.
FilterList
(filters: Sequence[tiledb.filter.Filter] = None, chunksize: int = None, ctx: Ctx = None, is_capsule: bool = False)¶ An ordered list of Filter objects for filtering TileDB data.
FilterLists contain zero or more Filters, used for filtering attribute data, the array coordinate data, etc.
Parameters: - ctx (tiledb.Ctx) – A TileDB context
- filters – An iterable of Filter objects to add.
- chunksize (int) – (default None) chunk size used by the filter list in bytes
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... # Create several filters ... gzip_filter = tiledb.GzipFilter() ... bw_filter = tiledb.BitWidthReductionFilter() ... # Create a filter list that will first perform bit width reduction, then gzip compression. ... filters = tiledb.FilterList([bw_filter, gzip_filter]) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, filters=filters) ... # Create a second attribute filtered only by gzip compression. ... a2 = tiledb.Attr(name="a2", dtype=np.int64, ... filters=tiledb.FilterList([gzip_filter])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1, a2)) ... tiledb.DenseArray.create(tmp + "/array", schema)
-
__getitem__
(idx)¶ Gets a copy of the filter in the list at the given index
Parameters: idx (int or slice) – index into the Returns: A filter at given index / slice Raises: IndexError – invalid index Raises: tiledb.TileDBError
-
append
(filter: tiledb.filter.Filter)¶ Parameters: filter (Filter) – the filter to append into the FilterList Raises: ValueError – filter argument incorrect type
-
class
tiledb.
GzipFilter
(level: int = -1, ctx: Ctx = None)¶ Filter that compresses using gzip.
Parameters: - ctx (tiledb.Ctx) – TileDB Ctx
- level (int) – (default None) If not None set the compressor level
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.GzipFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
-
class
tiledb.
ZstdFilter
(level: int = -1, ctx: Ctx = None)¶ Filter that compresses using zstd.
Parameters: - ctx (tiledb.Ctx) – TileDB Ctx
- level (int) – (default None) If not None set the compressor level
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.ZstdFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
-
class
tiledb.
LZ4Filter
(level: int = -1, ctx: Ctx = None)¶ Filter that compresses using lz4.
Parameters: - ctx (tiledb.Ctx) – TileDB Ctx
- level (int) – (default None) If not None set the compressor level
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.LZ4Filter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
-
class
tiledb.
Bzip2Filter
(level: int = -1, ctx: Ctx = None)¶ Filter that compresses using bzip2.
Parameters: level (int) – (default None) If not None set the compressor level Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.Bzip2Filter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
-
class
tiledb.
RleFilter
(level: int = -1, ctx: Ctx = None)¶ Filter that compresses using run-length encoding (RLE).
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.RleFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
-
class
tiledb.
DoubleDeltaFilter
(level: int = -1, ctx: Ctx = None)¶ Filter that performs double-delta encoding.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.DoubleDeltaFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
-
class
tiledb.
BitShuffleFilter
(ctx: Ctx = None)¶ Filter that performs a bit shuffle transformation.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.BitShuffleFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
-
class
tiledb.
ByteShuffleFilter
(ctx: Ctx = None)¶ Filter that performs a byte shuffle transformation.
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.ByteShuffleFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
-
class
tiledb.
BitWidthReductionFilter
(window: int = -1, ctx: Ctx = None)¶ Filter that performs bit-width reduction.
param ctx: A TileDB Context type ctx: tiledb.Ctx param window: (default None) max window size for the filter type window: int Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.BitWidthReductionFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
-
class
tiledb.
PositiveDeltaFilter
(window: int = -1, ctx: Ctx = None)¶ Filter that performs positive-delta encoding.
Parameters: - ctx (tiledb.Ctx) – A TileDB Context
- window (int) – (default None) the max window for the filter
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... a1 = tiledb.Attr(name="a1", dtype=np.int64, ... filters=tiledb.FilterList([tiledb.PositiveDeltaFilter()])) ... schema = tiledb.ArraySchema(domain=dom, attrs=(a1,)) ... tiledb.DenseArray.create(tmp + "/array", schema)
Dimension¶
-
class
tiledb.
Dim
(name=u'__dim_0', domain=None, tile=None, filters=None, dtype=np.uint64, var=None, Ctx ctx=None)¶ Class representing a dimension of a TileDB Array.
Parameters: Dtype: the Dim numpy dtype object, type object, or string that can be corerced into a numpy dtype object
Raises: - ValueError – invalid domain or tile extent
- TypeError – invalid domain, tile extent, or dtype type
Raises: -
domain
¶ The dimension (inclusive) domain.
The dimension’s domain is defined by a (lower bound, upper bound) tuple.
Return type: tuple(numpy scalar, numpy scalar)
-
dtype
¶ Numpy dtype representation of the dimension type.
Return type: numpy.dtype
-
filters
¶ FilterList of the TileDB dimension
Return type: tiledb.FilterList Raises: tiledb.TileDBError
-
isvar
¶ True if the dimension is variable length
Return type: bool Raises: tiledb.TileDBError
-
name
¶ The dimension label string.
Anonymous dimensions return a default string representation based on the dimension index.
Return type: str
-
shape
¶ The shape of the dimension given the dimension’s domain.
Note: The shape is only valid for integer and datetime dimension domains.
Return type: tuple(numpy scalar, numpy scalar) Raises: TypeError – floating point (inexact) domain
-
size
¶ The size of the dimension domain (number of cells along dimension).
Return type: int Raises: TypeError – floating point (inexact) domain
-
tile
¶ The tile extent of the dimension.
Return type: numpy scalar or np.timedelta64
Domain¶
-
class
tiledb.
Domain
(Ctx ctx=None, *dims)¶ Class representing the domain of a TileDB Array.
Parameters: - dims – one or more tiledb.Dim objects up to the Domain’s ndim
- ctx (tiledb.Ctx) – A TileDB Context
Raises: TypeError – All dimensions must have the same dtype
Raises: -
dim
(self, dim_id)¶ Returns a Dim object from the domain given the dimension’s index or name.
Parameters: dim_d – dimension index (int) or name (str) Raises: tiledb.TileDBError
-
dtype
¶ The numpy dtype of the domain’s dimension type.
Return type: numpy.dtype
-
dump
(self)¶ Dumps a string representation of the domain object to standard output (STDOUT)
-
has_dim
(self, name)¶ Returns true if the Domain has a Dimension with the given name
Parameters: name – name of Dimension Return type: bool Returns:
-
homogeneous
¶ Returns True if the domain’s dimension types are homogeneous.
Array¶
-
class
tiledb.libtiledb.
Array
(uri, mode='r', key=None, timestamp=None, attr=None, Ctx ctx=None)¶ Base class for TileDB array objects.
Defines common properties/functionality for the different array types. When an Array instance is initialized, the array is opened with the specified mode.
Parameters: - uri (str) – URI of array to open
- mode (str) – (default ‘r’) Open the array object in read ‘r’ or write ‘w’ mode
- key (str) – (default None) If not None, encryption key to decrypt the array
- timestamp (tuple) – (default None) If int, open the array at a given TileDB timestamp. If tuple, open at the given start and end TileDB timestamps.
- attr (str) – (default None) open one attribute of the array; indexing a dense array will return a Numpy ndarray directly rather than a dictionary.
- ctx (Ctx) – TileDB context
-
attr
(self, key)¶ Returns an
Attr
instance given an int index or string labelParameters: key (int or str) – attribute index (positional or associative) Return type: Attr
Returns: The array attribute at index or with the given name (label) Raises: TypeError – invalid key type
-
close
(self)¶ Closes this array, flushing all buffered data.
-
consolidate
(self, Config config=None, key=None, timestamp=None)¶ Consolidates fragments of an array object for increased read performance.
Overview: https://docs.tiledb.com/main/concepts/internal-mechanics/consolidation
Parameters: - config (tiledb.Config) – The TileDB Config with consolidation parameters set
- key (str or bytes) – (default None) encryption key to decrypt an encrypted array
- timestamp (tuple (int, int)) – (default None) If not None, consolidate the array using the given tuple(int, int) UNIX seconds range (inclusive)
Raises: Rather than passing the timestamp into this function, it may be set with the config parameters “sm.vacuum.timestamp_start”`and `”sm.vacuum.timestamp_end” which takes in a time in UNIX seconds. If both are set then this function’s timestamp argument will be used.
-
coords_dtype
¶ Returns the numpy record array dtype of the array coordinates
Return type: numpy.dtype Returns: coord array record dtype
-
create
(type cls, uri, ArraySchema schema, key=None, overwrite=False, Ctx ctx=None)¶ Creates a TileDB Array at the given URI
Parameters: - uri (str) – URI at which to create the new empty array.
- schema (ArraySchema) – Schema for the array
- key (str) – (default None) Encryption key to use for array
- oerwrite (bool) – (default False) Overwrite the array if it already exists
- Ctx (ctx) – (default None) Optional TileDB Ctx used when creating the array,
by default uses the ArraySchema’s associated context
(not necessarily
tiledb.default_ctx
).
-
df
¶ Retrieve data cells as a Pandas dataframe, with multi-range, domain-inclusive indexing using
multi_index
.Parameters: selection (list) – Per dimension, a scalar, slice
, or list of scalars orslice
objects. Scalars andslice
components should match the type of the underlying Dimension.Returns: dict of {‘attribute’: result}. Coords are included by default for Sparse arrays only (use Array.query(coords=<>) to select). Raises: IndexError – invalid or unsupported index selection Raises: tiledb.TileDBError
df[]
accepts, for each dimension, a scalar,slice
, or list of scalars orslice
objects. Each item is interpreted as a point (scalar) or range (slice
) used to query the array on the corresponding dimension.** Example **
>>> import tiledb, tempfile, numpy as np, pandas as pd >>> >>> with tempfile.TemporaryDirectory() as tmp: ... data = {'col1_f': np.arange(0.0,1.0,step=0.1), 'col2_int': np.arange(10)} ... df = pd.DataFrame.from_dict(data) ... tiledb.from_pandas(tmp, df) ... A = tiledb.open(tmp) ... A.df[1] ... A.df[1:5] col1_f col2_int 1 0.1 1 col1_f col2_int 1 0.1 1 2 0.2 2 3 0.3 3 4 0.4 4 5 0.5 5
-
dim
(self, dim_id)¶ Returns a
Dim
instance given a dim index or nameParameters: key (int or str) – attribute index (positional or associative) Return type: Attr
Returns: The array attribute at index or with the given name (label) Raises: TypeError – invalid key type
-
domain
¶ The
Domain
of this array.
-
dtype
¶ The NumPy dtype of the specified attribute
-
dump
(self)¶
-
isopen
¶ True if this array is currently open.
-
iswritable
¶ This array is currently opened as writable.
-
static
load_typed
(uri, mode='r', key=None, timestamp=None, attr=None, Ctx ctx=None)¶ Return a {Dense,Sparse}Array instance from a pre-opened Array (internal)
-
meta
¶ Return array metadata instance
Return type: tiledb.Metadata
-
mode
¶ The mode this array was opened with.
-
multi_index
¶ Retrieve data cells with multi-range, domain-inclusive indexing. Returns the cross-product of the ranges.
Parameters: selection (list) – Per dimension, a scalar, slice
, or list of scalars orslice
objects. Scalars andslice
components should match the type of the underlying Dimension.Returns: dict of {‘attribute’: result}. Coords are included by default for Sparse arrays only (use Array.query(coords=<>) to select). Raises: IndexError – invalid or unsupported index selection Raises: tiledb.TileDBError
multi_index[]
accepts, for each dimension, a scalar,slice
, or list of scalars orslice
objects. Each item is interpreted as a point (scalar) or range (slice
) used to query the array on the corresponding dimension.Unlike NumPy array indexing,
multi_index
respects TileDB’s range semantics: slice ranges are inclusive of the start- and end-point, and negative ranges do not wrap around (because a TileDB dimensions may have a negative domain).See also: https://docs.tiledb.com/main/api-usage/reading-arrays/multi-range-subarrays
** Example **
>>> import tiledb, tempfile, numpy as np >>> >>> with tempfile.TemporaryDirectory() as tmp: ... A = tiledb.DenseArray.from_numpy(tmp, np.eye(4) * [1,2,3,4]) ... A.multi_index[1] ... A.multi_index[1,1] ... # return row 0 and 2 ... A.multi_index[[0,2]] ... # return rows 0 and 2 intersecting column 2 ... A.multi_index[[0,2], 2] ... # return rows 0:2 intersecting columns 0:2 ... A.multi_index[slice(0,2), slice(0,2)] OrderedDict([('', array([[0., 2., 0., 0.]]))]) OrderedDict([('', array([[2.]]))]) OrderedDict([('', array([[1., 0., 0., 0.], [0., 0., 3., 0.]]))]) OrderedDict([('', array([[0.], [3.]]))]) OrderedDict([('', array([[1., 0., 0.], [0., 2., 0.], [0., 0., 3.]]))])
-
nattr
¶ The number of attributes of this array.
-
ndim
¶ The number of dimensions of this array.
-
nonempty_domain
(self)¶ Return the minimum bounding domain which encompasses nonempty values.
Return type: tuple(tuple(numpy scalar, numpy scalar), ..) Returns: A list of (inclusive) domain extent tuples, that contain all nonempty cells
-
reopen
(self, timestamp=None)¶ Reopens this array.
This is useful when the array is updated after it was opened. To sync-up with the updates, the user must either close the array and open again, or just use
reopen()
without closing.reopen
will be generally faster than a close-then-open.
-
schema
¶ The
ArraySchema
for this array.
-
set_query
(self, serialized_query)¶
-
shape
¶ The shape of this array.
-
subarray
(self, selection, attrs=None, coords=False, order=None)¶
-
timestamp
¶ Deprecated in 0.9.2.
Use timestamp_range
Returns the timestamp the array is opened at
Return type: int Returns: tiledb timestamp at which point the array was opened
-
timestamp_range
¶ Returns the timestamp range the array is opened at
Return type: tuple Returns: tiledb timestamp range at which point the array was opened
-
uri
¶ Returns the URI of the array
-
view_attr
¶ The view attribute of this array.
-
tiledb.
consolidate
(uri, key=None, Config config=None, Ctx ctx=None, timestamp=None)¶ Consolidates TileDB array fragments for improved read performance
Parameters: - uri (str) – URI to the TileDB Array
- key (str) – (default None) Key to decrypt array if the array is encrypted
- config (tiledb.Config) – The TileDB Config with consolidation parameters set
- ctx (tiledb.Ctx) – (default None) The TileDB Context
- timestamp – (default None) If not None, consolidate the array using the given tuple(int, int) UNIX seconds range (inclusive)
Return type: Returns: path (URI) to the consolidated TileDB Array
Raises: TypeError – cannot convert path to unicode string
Raises: Rather than passing the timestamp into this function, it may be set with the config parameters “sm.vacuum.timestamp_start”`and `”sm.vacuum.timestamp_end” which takes in a time in UNIX seconds. If both are set then this function’s timestamp argument will be used.
-
tiledb.
vacuum
(uri, Config config=None, Ctx ctx=None, timestamp=None)¶ Vacuum underlying array fragments after consolidation.
Parameters: - uri (str) – URI of array to be vacuumed
- config – Override the context configuration for vacuuming. Defaults to None, inheriting the context parameters.
- (ctx – tiledb.Ctx, optional): Context. Defaults to tiledb.default_ctx().
Raises: TypeError – cannot convert uri to unicode string
Raises: This operation of this function is controlled by the “sm.vacuum.mode” parameter, which accepts the values
fragments
,fragment_meta
, andarray_meta
. Rather than passing the timestamp into this function, it may be set by using “sm.vacuum.timestamp_start”`and `”sm.vacuum.timestamp_end” which takes in a time in UNIX seconds. If both are set then this function’s timestamp argument will be used.Example:
>>> import tiledb, numpy as np >>> import tempfile >>> path = tempfile.mkdtemp() >>> with tiledb.from_numpy(path, np.random.rand(4)) as A: ... pass # make sure to close >>> with tiledb.open(path, 'w') as A: ... for i in range(4): ... A[:] = np.ones(4, dtype=np.int64) * i >>> paths = tiledb.VFS().ls(path) >>> # should be 12 (2 base files + 2*5 fragment+ok files) >>> (); len(paths); () # doctest:+ELLIPSIS (...) >>> () ; tiledb.consolidate(path) ; () # doctest:+ELLIPSIS (...) >>> tiledb.vacuum(path) >>> paths = tiledb.VFS().ls(path) >>> # should now be 4 ( base files + 2 fragment+ok files) >>> (); len(paths); () # doctest:+ELLIPSIS (...)
Dense Array¶
-
class
tiledb.
DenseArray
¶ Class representing a dense TileDB array.
Inherits properties and methods of
tiledb.Array
and implements __setitem__ and __getitem__ for dense array indexing and assignment.-
__getitem__
(selection)¶ Retrieve data cells for an item or region of the array.
Parameters: selection (tuple) – An int index, slice or tuple of integer/slice objects, specifying the selected subarray region for each dimension of the DenseArray. Return type: numpy.ndarray
orcollections.OrderedDict
Returns: If the dense array has a single attribute then a Numpy array of corresponding shape/dtype is returned for that attribute. If the array has multiple attributes, a collections.OrderedDict
is returned with dense Numpy subarrays for each attribute.Raises: IndexError – invalid or unsupported index selection Raises: tiledb.TileDBError
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... # Creates array 'array' on disk. ... A = tiledb.DenseArray.from_numpy(tmp + "/array", np.ones((100, 100))) ... # Many aspects of Numpy's fancy indexing are supported: ... A[1:10, ...].shape ... A[1:10, 20:99].shape ... A[1, 2].shape (9, 100) (9, 79) () >>> # Subselect on attributes when reading: >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... schema = tiledb.ArraySchema(domain=dom, ... attrs=(tiledb.Attr(name="a1", dtype=np.int64), ... tiledb.Attr(name="a2", dtype=np.int64))) ... tiledb.DenseArray.create(tmp + "/array", schema) ... with tiledb.DenseArray(tmp + "/array", mode='w') as A: ... A[0:10] = {"a1": np.zeros((10)), "a2": np.ones((10))} ... with tiledb.DenseArray(tmp + "/array", mode='r') as A: ... # Access specific attributes individually. ... A[0:5]["a1"] ... A[0:5]["a2"] array([0, 0, 0, 0, 0]) array([1, 1, 1, 1, 1])
-
__setitem__
(selection, value)¶ Set / update dense data cells
Parameters: - selection (tuple) – An int index, slice or tuple of integer/slice objects, specifiying the selected subarray region for each dimension of the DenseArray.
- value (dict or
numpy.ndarray
) – a dictionary of array attribute values, values must able to be converted to n-d numpy arrays. if the number of attributes is one, then a n-d numpy array is accepted.
Raises: - IndexError – invalid or unsupported index selection
- ValueError – value / coordinate length mismatch
Raises: Example:
>>> import tiledb, numpy as np, tempfile >>> # Write to single-attribute 2D array >>> with tempfile.TemporaryDirectory() as tmp: ... # Create an array initially with all zero values ... with tiledb.DenseArray.from_numpy(tmp + "/array", np.zeros((2, 2))) as A: ... pass ... with tiledb.DenseArray(tmp + "/array", mode='w') as A: ... # Write to the single (anonymous) attribute ... A[:] = np.array(([1,2], [3,4])) >>> >>> # Write to multi-attribute 2D array >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain( ... tiledb.Dim(domain=(0, 1), tile=2, dtype=np.uint64), ... tiledb.Dim(domain=(0, 1), tile=2, dtype=np.uint64)) ... schema = tiledb.ArraySchema(domain=dom, ... attrs=(tiledb.Attr(name="a1", dtype=np.int64), ... tiledb.Attr(name="a2", dtype=np.int64))) ... tiledb.DenseArray.create(tmp + "/array", schema) ... with tiledb.DenseArray(tmp + "/array", mode='w') as A: ... # Write to each attribute ... A[0:2, 0:2] = {"a1": np.array(([-3, -4], [-5, -6])), ... "a2": np.array(([1, 2], [3, 4]))}
-
query
(self, attrs=None, attr_cond=None, dims=None, coords=False, order='C', use_arrow=None, return_arrow=False, return_incomplete=False)¶ Construct a proxy Query object for easy subarray queries of cells for an item or region of the array across one or more attributes.
Optionally subselect over attributes, return dense result coordinate values, and specify a layout a result layout / cell-order.
Parameters: - attrs – the DenseArray attributes to subselect over. If attrs is None (default) all array attributes will be returned. Array attributes can be defined by name or by positional index.
- attr_cond – the QueryCondition to filter attributes on.
- dims – the DenseArray dimensions to subselect over. If dims is None (default) then no dimensions are returned, unless coords=True.
- coords – if True, return array of coodinate value (default False).
- order – ‘C’, ‘F’, ‘U’, or ‘G’ (row-major, col-major, unordered, TileDB global order)
- use_arrow – if True, return dataframes via PyArrow if applicable.
- return_arrow – if True, return results as a PyArrow Table if applicable.
- return_incomplete –
if True, initialize and return an iterable Query object over the indexed range. Consuming this iterable returns a result set for each TileDB incomplete query. See usage example in ‘examples/incomplete_iteration.py’. To retrieve the estimated result sizes for the query ranges, use:
A.query(…, return_incomplete=True)[…].est_result_size()If False (default False), queries will be internally run to completion by resizing buffers and resubmitting until query is complete.
Returns: A proxy Query object that can be used for indexing into the DenseArray over the defined attributes, in the given result layout (order).
Raises: ValueError – array is not opened for reads (mode = ‘r’)
Raises: Example:
>>> # Subselect on attributes when reading: >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain(tiledb.Dim(domain=(0, 9), tile=2, dtype=np.uint64)) ... schema = tiledb.ArraySchema(domain=dom, ... attrs=(tiledb.Attr(name="a1", dtype=np.int64), ... tiledb.Attr(name="a2", dtype=np.int64))) ... tiledb.DenseArray.create(tmp + "/array", schema) ... with tiledb.DenseArray(tmp + "/array", mode='w') as A: ... A[0:10] = {"a1": np.zeros((10)), "a2": np.ones((10))} ... with tiledb.DenseArray(tmp + "/array", mode='r') as A: ... # Access specific attributes individually. ... A.query(attrs=("a1",))[0:5] OrderedDict([('a1', array([0, 0, 0, 0, 0]))])
-
Sparse Array¶
-
class
tiledb.
SparseArray
¶ Class representing a sparse TileDB array.
Inherits properties and methods of
tiledb.Array
and implements __setitem__ and __getitem__ for sparse array indexing and assignment.-
__getitem__
(selection)¶ Retrieve nonempty cell data for an item or region of the array
Parameters: selection (tuple) – An int index, slice or tuple of integer/slice objects, specifying the selected subarray region for each dimension of the SparseArray. Return type: collections.OrderedDict
Returns: An OrderedDict is returned with dimension and attribute names as keys. Nonempty attribute values are returned as Numpy 1-d arrays. Raises: IndexError – invalid or unsupported index selection Raises: tiledb.TileDBError
Example:
>>> import tiledb, numpy as np, tempfile >>> # Write to multi-attribute 2D array >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain( ... tiledb.Dim(name="y", domain=(0, 9), tile=2, dtype=np.uint64), ... tiledb.Dim(name="x", domain=(0, 9), tile=2, dtype=np.uint64)) ... schema = tiledb.ArraySchema(domain=dom, sparse=True, ... attrs=(tiledb.Attr(name="a1", dtype=np.int64), ... tiledb.Attr(name="a2", dtype=np.int64))) ... tiledb.SparseArray.create(tmp + "/array", schema) ... with tiledb.SparseArray(tmp + "/array", mode='w') as A: ... # Write in the twp cells (0,0) and (2,3) only. ... I, J = [0, 2], [0, 3] ... # Write to each attribute ... A[I, J] = {"a1": np.array([1, 2]), ... "a2": np.array([3, 4])} ... with tiledb.SparseArray(tmp + "/array", mode='r') as A: ... # Return an OrderedDict with values and coordinates ... A[0:3, 0:10] ... # Return just the "x" coordinates values ... A[0:3, 0:10]["x"] OrderedDict([('a1', array([1, 2])), ('a2', array([3, 4])), ('y', array([0, 2], dtype=uint64)), ('x', array([0, 3], dtype=uint64))]) array([0, 3], dtype=uint64)
With a floating-point array domain, index bounds are inclusive, e.g.:
>>> # Return nonempty cells within a floating point array domain (fp index bounds are inclusive): >>> # A[5.0:579.9]
-
__setitem__
(selection, value)¶ Set / update sparse data cells
Parameters: - selection (tuple) – N coordinate value arrays (dim0, dim1, …) where N in the ndim of the SparseArray, The format follows numpy sparse (point) indexing semantics.
- value (dict or
numpy.ndarray
) – a dictionary of nonempty array attribute values, values must able to be converted to 1-d numpy arrays. if the number of attributes is one, then a 1-d numpy array is accepted.
Raises: - IndexError – invalid or unsupported index selection
- ValueError – value / coordinate length mismatch
Raises: Example:
>>> import tiledb, numpy as np, tempfile >>> # Write to multi-attribute 2D array >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain( ... tiledb.Dim(domain=(0, 1), tile=2, dtype=np.uint64), ... tiledb.Dim(domain=(0, 1), tile=2, dtype=np.uint64)) ... schema = tiledb.ArraySchema(domain=dom, sparse=True, ... attrs=(tiledb.Attr(name="a1", dtype=np.int64), ... tiledb.Attr(name="a2", dtype=np.int64))) ... tiledb.SparseArray.create(tmp + "/array", schema) ... with tiledb.SparseArray(tmp + "/array", mode='w') as A: ... # Write in the corner cells (0,0) and (1,1) only. ... I, J = [0, 1], [0, 1] ... # Write to each attribute ... A[I, J] = {"a1": np.array([1, 2]), ... "a2": np.array([3, 4])}
-
query
(self, attrs=None, attr_cond=None, dims=None, index_col=True, coords=None, order='U', use_arrow=None, return_arrow=None, return_incomplete=False)¶ Construct a proxy Query object for easy subarray queries of cells for an item or region of the array across one or more attributes.
Optionally subselect over attributes, return dense result coordinate values, and specify a layout a result layout / cell-order.
Parameters: - attrs – the SparseArray attributes to subselect over. If attrs is None (default) all array attributes will be returned. Array attributes can be defined by name or by positional index.
- attr_cond – the QueryCondition to filter attributes on.
- dims – the SparseArray dimensions to subselect over. If dims is None (default) then all dimensions are returned, unless coords=False.
- index_col – For dataframe queries, override the saved index information, and only set specified index(es) in the final dataframe, or None.
- coords – (deprecated) if True, return array of coordinate value (default False).
- order – ‘C’, ‘F’, or ‘G’ (row-major, col-major, tiledb global order)
- use_arrow – if True, return dataframes via PyArrow if applicable.
- return_arrow – if True, return results as a PyArrow Table if applicable.
Returns: A proxy Query object that can be used for indexing into the SparseArray over the defined attributes, in the given result layout (order).
Example:
>>> import tiledb, numpy as np, tempfile >>> # Write to multi-attribute 2D array >>> with tempfile.TemporaryDirectory() as tmp: ... dom = tiledb.Domain( ... tiledb.Dim(name="y", domain=(0, 9), tile=2, dtype=np.uint64), ... tiledb.Dim(name="x", domain=(0, 9), tile=2, dtype=np.uint64)) ... schema = tiledb.ArraySchema(domain=dom, sparse=True, ... attrs=(tiledb.Attr(name="a1", dtype=np.int64), ... tiledb.Attr(name="a2", dtype=np.int64))) ... tiledb.SparseArray.create(tmp + "/array", schema) ... with tiledb.SparseArray(tmp + "/array", mode='w') as A: ... # Write in the twp cells (0,0) and (2,3) only. ... I, J = [0, 2], [0, 3] ... # Write to each attribute ... A[I, J] = {"a1": np.array([1, 2]), ... "a2": np.array([3, 4])} ... with tiledb.SparseArray(tmp + "/array", mode='r') as A: ... A.query(attrs=("a1",), coords=False, order='G')[0:3, 0:10] OrderedDict([('a1', array([1, 2]))])
-
Query Condition¶
-
class
tiledb.
QueryCondition
(expression: str, ctx: tiledb.libtiledb.Ctx = <factory>)¶ Class representing a TileDB query condition object for attribute filtering pushdown. Set the query condition with a string representing an expression as defined by the grammar below. A more straight forward example of usage is given beneath.
BNF:
A query condition is made up of one or more Boolean expressions. Multiple Boolean expressions are chained together with Boolean operators. The
or_op
Boolean operators are given lower presedence thanand_op
.query_cond ::= bool_term | query_cond or_op bool_term
bool_term ::= bool_expr | bool_term and_op bool_expr
Logical
and
and bitwise&
Boolean operators are given equal precedence.and_op ::= and | &
Likewise,
or
and|
are given equal precedence.or_op ::= or | |
We intend to support
not
in future releases.A Boolean expression contains a comparison operator. The operator works on a TileDB attribute name and value.
bool_expr ::= attr compare_op val | val compare_op attr | val compare_op attr compare_op val
All comparison operators are supported.
compare_op ::= < | > | <= | >= | == | !=
TileDB attribute names are Python valid variables or a
attr()
casted string.attr ::= <variable> | attr(<str>)
Values are any Python-valid number or string. They may also be casted with
val()
.val ::= <num> | <str> | val(val)
Example:
>>> with tiledb.open(uri, mode="r") as A: >>> # Select cells where the attribute values for `foo` are less than 5 >>> # and `bar` equal to string "asdf". >>> # Note precedence is equivalent to: >>> # (foo > 5 or ('asdf' == attr('b a r') and baz <= val(1.0))) >>> qc = QueryCondition("foo > 5 or 'asdf' == attr('b a r') and baz <= val(1.0)") >>> A.query(attr_cond=qc)
Group¶
-
class
tiledb.
Group
(uri: str, mode: str = 'r', ctx: Ctx = None)¶ Support for organizing multiple arrays in arbitrary directory hierarchies.
Group members may be any number of nested groups and arrays. Members are stored as tiledb.Objects which indicate the member’s URI and type.
Groups may contain associated metadata similar to array metadata where keys are strings. Singleton values may be of type int, float, str, or bytes. Multiple values of the same type may be placed in containers of type list, tuple, or 1-D np.ndarray. The values within containers are limited to type int or float.
See more at: https://docs.tiledb.com/main/background/key-concepts-and-data-format#arrays-and-groups
Parameters: - uri (str) – The URI to the Group
- mode (str) – Read mode (‘r’) or write mode (‘w’)
- ctx (tiledb.Ctx) – A TileDB context
Example:
>>> # Create a group >>> grp_path = "root_group" >>> tiledb.Group.create(grp_path) >>> grp = tiledb.Group(grp_path, "w") >>> >>> # Create an array and add as a member to the group >>> array_path = "array.tdb" >>> domain = tiledb.Domain(tiledb.Dim(domain=(1, 8), tile=2)) >>> a1 = tiledb.Attr("val", dtype="f8") >>> schema = tiledb.ArraySchema(domain=domain, attrs=(a1,)) >>> tiledb.Array.create(array_path, schema) >>> grp.add(array_path) >>> >>> # Create a group and add as a subgroup >>> subgrp_path = "sub_group" >>> tiledb.Group.create(subgrp_path) >>> grp.add(subgrp_path) >>> >>> # Add metadata to the subgroup >>> grp.meta["ints"] = [1, 2, 3] >>> grp.meta["str"] = "string_metadata" >>> grp.close() >>> >>> grp.open("r") >>> # Dump all the members in string format >>> mbrs_repr = grp >>> # Or create a list of Objects in the Group >>> mbrs_iter = list(grp) >>> # Get the first member's uri and type >>> member_uri, member_type = grp[0].uri, grp[0].type >>> grp.close() >>> >>> # Remove the subgroup >>> grp.open("w") >>> grp.remove(subgrp_path) >>> grp.close()
-
__getitem__
(member)¶ Retrieve a member from the Group as an Object.
Parameters: member (Union[int, str]) – The index or name of the member Returns: The member as an Object Return type: Object
-
__contains__
(member)¶ Returns: Whether the Group contains a member with the given name Return type: bool
-
class
GroupMetadata
(group: tiledb.group.Group)¶ Holds metadata for the associated Group in a dictionary-like structure.
-
clear
() → None. Remove all items from D.¶
-
pop
(k[, d]) → v, remove specified key and return the corresponding value.¶ If key is not found, d is returned if given, otherwise KeyError is raised.
-
popitem
() → (k, v), remove and return some (key, value) pair¶ as a 2-tuple; but raise KeyError if D is empty.
-
setdefault
(k[, d]) → D.get(k,d), also set D[k]=d if k not in D¶
-
-
add
(uri: str, name: str = None, relative: bool = False)¶ Adds a member to the Group.
Parameters:
-
close
()¶ Close a Group.
-
static
create
(uri: str, ctx: Ctx = None)¶ Create a new Group.
Parameters: - uri (str) – The URI to the to-be created Group
- ctx (tiledb.Ctx) – A TileDB context
-
meta
¶ Returns: The Group’s metadata as a key-value structure Return type: GroupMetadata
-
open
(mode: str = 'r')¶ Open a Group in read mode (“r”) or write mode (“w”).
Parameters: mode (str) – Read mode (‘r’) or write mode (‘w’)
-
class
tiledb.Group.
GroupMetadata
(group: tiledb.group.Group) Holds metadata for the associated Group in a dictionary-like structure.
-
__setitem__
(key, value)¶ Parameters:
-
__getitem__
(key)¶ Parameters: key (str) – Key of the Group metadata entry Return type: Union[int, float, str, bytes, np.ndarray] Returns: The value associated with the key
-
__delitem__
(key)¶ Removes the entry from the Group metadata.
Parameters: key (str) – Key of the Group metadata entry
-
__contains__
(key)¶ Parameters: key (str) – Key of the Group metadata entry Return type: bool Returns: True if the key is in the Group metadata, otherwise False
-
clear
() → None. Remove all items from D.
-
pop
(k[, d]) → v, remove specified key and return the corresponding value. If key is not found, d is returned if given, otherwise KeyError is raised.
-
popitem
() → (k, v), remove and return some (key, value) pair as a 2-tuple; but raise KeyError if D is empty.
-
setdefault
(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
-
Object¶
Object Management¶
-
tiledb.
array_exists
(uri, isdense=False, issparse=False)¶ Check if arrays exists and is open-able at the given URI
Optionally restrict to isdense or issparse array types.
-
tiledb.
group_create
(uri: str, ctx: Ctx = None)¶ Create a new Group.
Parameters: - uri (str) – The URI to the to-be created Group
- ctx (tiledb.Ctx) – A TileDB context
-
tiledb.
object_type
(uri, Ctx ctx=None)¶ Returns the TileDB object type at the specified path (URI)
Parameters: - path (str) – path (URI) of the TileDB resource
- ctx (tiledb.Ctx) – The TileDB Context
Return type: Returns: object type string
Raises: TypeError – cannot convert path to unicode string
-
tiledb.
remove
(uri, Ctx ctx=None)¶ Removes (deletes) the TileDB object at the specified path (URI)
Parameters: - uri (str) – URI of the TileDB resource
- ctx (tiledb.Ctx) – The TileDB Context
Raises: TypeError – uri cannot be converted to a unicode string
Raises:
-
tiledb.
move
(old_uri, new_uri, Ctx ctx=None)¶ Moves a TileDB resource (group, array, key-value).
Parameters: - ctx (tiledb.Ctx) – The TileDB Context
- old_uri (str) – path (URI) of the TileDB resource to move
- new_uri (str) – path (URI) of the destination
Raises: TypeError – uri cannot be converted to a unicode string
Raises:
-
tiledb.
ls
(path, func, Ctx ctx=None)¶ Lists TileDB resources and applies a callback that have a prefix of
path
(one level deep).Parameters: - path (str) – URI of TileDB group object
- func (function) – callback to execute on every listed TileDB resource, URI resource path and object type label are passed as arguments to the callback
- ctx (tiledb.Ctx) – TileDB context
Raises: TypeError – cannot convert path to unicode string
Raises:
-
tiledb.
walk
(path, func, order='preorder', Ctx ctx=None)¶ Recursively visits TileDB resources and applies a callback to resources that have a prefix of
path
Parameters: - path (str) – URI of TileDB group object
- func (function) – callback to execute on every listed TileDB resource, URI resource path and object type label are passed as arguments to the callback
- ctx (tiledb.Ctx) – The TileDB context
- order (str) – ‘preorder’ (default) or ‘postorder’ tree traversal
Raises: - TypeError – cannot convert path to unicode string
- ValueError – unknown order
Raises:
Fragment Info¶
-
class
tiledb.
FragmentInfoList
(array_uri, include_mbrs=False, ctx=None)¶ Class representing an ordered list of FragmentInfo objects.
Parameters: - array_uri (str) – URI for the TileDB array (any supported TileDB URI)
- include_mbrs (bool) – (default False) include minimum bounding rectangles in FragmentInfo result
- ctx (tiledb.Ctx) – A TileDB context
Variables: - uri – URIs of fragments
- version – Fragment version of each fragment
- nonempty_domain – Non-empty domain of each fragment
- cell_num – Number of cells in each fragment
- timestamp_range – Timestamp range of when each fragment was written
- sparse – For each fragment, True if fragment is sparse, else False
- has_consolidated_metadata – For each fragment, True if fragment has consolidated fragment metadata, else False
- unconsolidated_metadata_num – Number of unconsolidated metadata fragments in each fragment
- to_vacuum – URIs of already consolidated fragments to vacuum
- mbrs – (TileDB Embedded 2.5.0+ only) The mimimum bounding rectangle of each fragment; only present when include_mbrs=True
- array_schema_name – (TileDB Embedded 2.5.0+ only) The array schema’s name
Example:
>>> import tiledb, numpy as np, tempfile >>> with tempfile.TemporaryDirectory() as tmp: ... # The array will be 4x4 with dimensions "rows" and "cols", with domain [1,4] and space tiles 2x2 ... dom = tiledb.Domain( ... tiledb.Dim(name="rows", domain=(1, 4), tile=2, dtype=np.int32), ... tiledb.Dim(name="cols", domain=(1, 4), tile=2, dtype=np.int32), ... ) ... # The array will be dense with a single attribute "a" so each (i,j) cell can store an integer. ... schema = tiledb.ArraySchema( ... domain=dom, sparse=False, attrs=[tiledb.Attr(name="a", dtype=np.int32)] ... ) ... # Set URI of the array ... uri = tmp + "/array" ... # Create the (empty) array on disk. ... tiledb.Array.create(uri, schema) ... ... # Write three fragments to the array ... with tiledb.DenseArray(uri, mode="w") as A: ... A[1:3, 1:5] = np.array(([1, 2, 3, 4, 5, 6, 7, 8])) ... with tiledb.DenseArray(uri, mode="w") as A: ... A[2:4, 2:4] = np.array(([101, 102, 103, 104])) ... with tiledb.DenseArray(uri, mode="w") as A: ... A[3:4, 4:5] = np.array(([202])) ... ... # tiledb.array_fragments() requires TileDB-Py version > 0.8.5 ... fragments_info = tiledb.array_fragments(uri) ... ... "====== FRAGMENTS INFO ======" ... f"number of fragments: {len(fragments_info)}" ... f"nonempty domains: {fragments_info.nonempty_domain}" ... f"sparse fragments: {fragments_info.sparse}" ... ... for fragment in fragments_info: ... f"===== FRAGMENT NUMBER {fragment.num} =====" ... f"is sparse: {fragment.sparse}" ... f"cell num: {fragment.cell_num}" ... f"has consolidated metadata: {fragment.has_consolidated_metadata}" ... f"nonempty domain: {fragment.nonempty_domain}" '====== FRAGMENTS INFO ======' 'number of fragments: 3' 'nonempty domains: (((1, 2), (1, 4)), ((2, 3), (2, 3)), ((3, 3), (4, 4)))' 'sparse fragments: (False, False, False)' '===== FRAGMENT NUMBER 0 =====' 'is sparse: False' 'cell num: 8' 'has consolidated metadata: False' 'nonempty domain: ((1, 2), (1, 4))' '===== FRAGMENT NUMBER 1 =====' 'is sparse: False' 'cell num: 16' 'has consolidated metadata: False' 'nonempty domain: ((2, 3), (2, 3))' '===== FRAGMENT NUMBER 2 =====' 'is sparse: False' 'cell num: 4' 'has consolidated metadata: False' 'nonempty domain: ((3, 3), (4, 4))'
-
class
tiledb.
FragmentInfo
(fragments: tiledb.fragment.FragmentInfoList, num)¶ Class representing the metadata for a single fragment. See
tiledb.FragmentInfoList
for example of usage.Variables: - uri – URIs of fragments
- version – Fragment version of each fragment
- nonempty_domain – Non-empty domain of each fragment
- cell_num – Number of cells in each fragment
- timestamp_range – Timestamp range of when each fragment was written
- sparse – For each fragment, True if fragment is sparse, else False
- has_consolidated_metadata – For each fragment, True if fragment has consolidated fragment metadata, else False
- unconsolidated_metadata_num – Number of unconsolidated metadata fragments in each fragment
- to_vacuum – URIs of already consolidated fragments to vacuum
- mbrs – (TileDB Embedded 2.5.0+ only) The mimimum bounding rectangle of each fragment; only present when include_mbrs=True
- array_schema_name – (TileDB Embedded 2.5.0+ only) The array schema’s name
VFS¶
-
class
tiledb.
VFS
(config: Union[Config, dict] = None, ctx: Ctx = None)¶ TileDB VFS class
Encapsulates the TileDB VFS module instance with a specific configuration (config).
Parameters: - ctx (tiledb.Ctx) – The TileDB Context
- config (tiledb.Config or dict) – Override ctx VFS configurations with updated values in config.
-
close
(file: tiledb.cc.FileHandle)¶ Closes a VFS FileHandle object.
Parameters: file (FileIO) – An opened VFS FileIO Return type: FileIO Returns: closed VFS FileHandle Raises: tiledb.TileDBError
-
config
() → Config¶ Return type: tiledb.Config Returns: config associated with the VFS object
-
copy_dir
(old_uri: str, new_uri: str)¶ Copies a TileDB directory from an old URI to a new URI.
Parameters:
-
copy_file
(old_uri: str, new_uri: str)¶ Copies a TileDB file from an old URI to a new URI.
Parameters:
-
create_bucket
(uri: str)¶ Creates an object store bucket with the input URI.
Parameters: uri (str) – Input URI of the bucket
-
create_dir
(uri: str)¶ Check if an object store bucket is empty.
Parameters: uri (str) – Input URI of the bucket
-
ctx
() → Ctx¶ Return type: tiledb.Ctx Returns: context associated with the VFS object
-
dir_size
(uri: str) → int¶ Parameters: uri (str) – Input URI of the directory Return type: int Returns: The size of a directory with the input URI
-
empty_bucket
(uri: str)¶ Empty an object store bucket.
Parameters: uri (str) – Input URI of the bucket
-
file_size
(uri: str) → int¶ Parameters: uri (str) – Input URI of the file Return type: int Returns: The size of a file with the input URI
-
is_bucket
(uri: str) → bool¶ Parameters: uri (str) – Input URI of the bucket Return type: bool Returns: True if an object store bucket with the input URI exists, False otherwise
-
is_dir
(uri: str) → bool¶ Parameters: uri (str) – Input URI of the directory Return type: bool Returns: True if a directory with the input URI exists, False otherwise
-
is_empty_bucket
(uri: str) → bool¶ Parameters: uri (str) – Input URI of the bucket Return type: bool Returns: True if an object store bucket is empty, False otherwise
-
is_file
(uri: str) → bool¶ Parameters: uri (str) – Input URI of the file Return type: bool Returns: True if a file with the input URI exists, False otherwise
-
ls
(uri: str) → List[str]¶ Retrieves the children in directory uri. This function is non-recursive, i.e., it focuses in one level below uri.
Parameters: uri (str) – Input URI of the directory Return type: List[str] Returns: The children in directory uri
-
move_dir
(old_uri: str, new_uri: str)¶ Renames a TileDB directory from an old URI to a new URI.
Parameters:
-
move_file
(old_uri: str, new_uri: str)¶ Renames a TileDB file from an old URI to a new URI.
Parameters:
-
open
(uri: str, mode: str = 'rb')¶ Opens a VFS file resource for reading / writing / appends at URI.
If the file did not exist upon opening, a new file is created.
Parameters: - uri (str) – URI of VFS file resource
- str (mode) – ‘rb’ for opening the file to read, ‘wb’ to write, ‘ab’ to append
Return type: FileHandle
Returns: TileDB FileIO
Raises: - TypeError – cannot convert uri to unicode string
- ValueError – invalid mode
Raises:
-
read
(file: tiledb.cc.FileHandle, offset: int, nbytes: int) → bytes¶ Read nbytes from an opened VFS FileHandle at a given offset.
Parameters: Return type: bytes()
Returns: read bytes
Raises:
-
remove_bucket
(uri: str)¶ Deletes an object store bucket with the input URI.
Parameters: uri (str) – Input URI of the bucket
-
remove_dir
(uri: str)¶ Removes a directory (recursively) with the input URI.
Parameters: uri (str) – Input URI of the directory
-
remove_file
(uri: str)¶ Removes a file with the input URI.
Parameters: uri (str) – Input URI of the file
-
supports
(scheme: str) → bool¶ Returns true if the given URI scheme (storage backend) is supported.
Parameters: scheme (str) – scheme component of a VFS resource URI (ex. ‘file’ / ‘hdfs’ / ‘s3’) Return type: bool Returns: True if the linked libtiledb version supports the storage backend, False otherwise Raises: ValueError – VFS storage backend is not supported
-
touch
(uri: str)¶ Touches a file with the input URI, i.e., creates a new empty file.
Parameters: uri (str) – Input URI of the file
-
class
tiledb.
FileIO
(vfs: tiledb.vfs.VFS, uri: str, mode: str = 'rb')¶ TileDB FileIO class that encapsulates files opened by tiledb.VFS. The file operations are meant to mimic Python’s built-in file I/O methods.
-
flush
()¶ Force the data to be written to the file.
-
mode
¶ Return type: str Returns: Whether the file is in read mode (“rb”), write mode (“wb”), or append mode (“ab”)
-
read
(size: int = -1) → bytes¶ Read the file from the current pointer position.
Parameters: size (int) – Number of bytes to read. By default, size is set to -1 which will read until the end of the file. :rtype: bytes :return: The bytes in the file
-
readable
() → bool¶ Return type: bool Returns: True if the file is readable (ie. “rb” mode), otherwise False
-
seek
(offset: int, whence: int = 0)¶ Parameters: beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. whence can be omitted and defaults to 0.
-
tell
() → int¶ Return type: int Returns: The current position in the file represented as number of bytes
-
Filestore¶
-
class
tiledb.
Filestore
(uri: str, ctx: Ctx = None)¶ Functions to set and get data to and from a TileDB Filestore Array.
Parameters: - uri (str) – The URI to the TileDB Fileshare Array
- ctx (tiledb.Ctx) – A TileDB context
-
static
copy_from
(filestore_array_uri: str, file_uri: str, mime_type: str = 'AUTODETECT', ctx: Ctx = None) → None¶ Copy data from a file to a Filestore Array.
Parameters: - filestore_array_uri (str) – The URI to the TileDB Fileshare Array
- file_uri (str) – URI of file to export
- mime_type (str) – MIME types are “AUTODETECT” (default), “image/tiff”, “application/pdf”
- ctx (tiledb.Ctx) – A TileDB context
-
static
copy_to
(filestore_array_uri: str, file_uri: str, ctx: Ctx = None) → None¶ Copy data from a Filestore Array to a file.
Parameters: - filestore_array_uri (str) – The URI to the TileDB Fileshare Array
- file_uri (str) – The URI to the TileDB Fileshare Array
- ctx (tiledb.Ctx) – A TileDB context
-
read
(offset: int = 0, size: int = -1) → bytes¶ Parameters: Return type: Returns: Data from the Filestore Array
-
uri_import
(uri: str, mime_type: str = 'AUTODETECT') → None¶ Import data from an object that supports the buffer protocol to a Filestore Array.
Parameters: - ByteString (buffer) – Data of type bytes, bytearray, memoryview, etc.
- mime_type (str) – MIME types are “AUTODETECT” (default), “image/tiff”, “application/pdf”
-
write
(buffer: ByteString, mime_type: str = 'AUTODETECT') → None¶ Import data from an object that supports the buffer protocol to a Filestore Array.
Parameters: - ByteString (buffer) – Data of type bytes, bytearray, memoryview, etc.
- mime_type (str) – MIME types are “AUTODETECT” (default), “image/tiff”, “application/pdf”
Version¶
Statistics¶
-
tiledb.
stats_enable
()¶ Enable TileDB internal statistics.
-
tiledb.
stats_disable
()¶ Disable TileDB internal statistics.
-
tiledb.
stats_reset
()¶ Reset all TileDB internal statistics to 0.
-
tiledb.
stats_dump
(version=True, print_out=True, include_python=True, json=False, verbose=True)¶ Return TileDB internal statistics as a string.
Parameters: - include_python – Include TileDB-Py statistics
- print_out – Print string to console (default True), or return as string
- version – Include TileDB Embedded and TileDB-Py versions (default: True)
- json – Return stats JSON object (default: False)
- verbose – Print extended internal statistics (default: True)