Configuration Parameters¶
This tutorial demonstrates how to set and get the TileDB config parameters, and summarizes all current config parameters explaining their function.
You can create a config object and pass it to either a TileDB context or VFS object as follows:
C++
// Create config object
Config config;
// Set/Get config to/from ctx
Context ctx(config);
Config config_ctx = ctx.config();
// Set/Get config to/from VFS
VFS vfs(ctx, config);
Config config_vfs = vfs.config();
Python
# Create config object
config = tiledb.Config()
# Set/get config to/from ctx
ctx = tiledb.Ctx(config)
config_ctx = ctx.config()
# Set/get config to/from VFS
vfs = tiledb.VFS(config, ctx=ctx)
config_vfs = vfs.config()
Running the vfs
code example we get the output shown below.
In the rest of the tutorial
we will discuss the various ways we used the config objects
in this program and explain the output.
C++
$ g++ -std=c++11 config.cc -o config_cpp -ltiledb
$ ./config_cpp
Tile cache size: 10000000
Default settings:
"sm.check_coord_dups" : "true"
"sm.check_coord_oob" : "true"
"sm.check_global_order" : "true"
"sm.consolidation.amplification" : "1"
"sm.consolidation.buffer_size" : "50000000"
"sm.consolidation.step_max_frags" : "4294967295"
"sm.consolidation.step_min_frags" : "4294967295"
"sm.consolidation.step_size_ratio" : "0"
"sm.consolidation.steps" : "4294967295"
"sm.dedup_coords" : "false"
"sm.enable_signal_handlers" : "true"
"sm.memory_budget" : "5368709120"
"sm.memory_budget_var" : "10737418240"
"sm.num_async_threads" : "1"
"sm.num_reader_threads" : "1"
"sm.num_tbb_threads" : "-1"
"sm.num_writer_threads" : "1"
"sm.tile_cache_size" : "10000000"
"vfs.file.max_parallel_ops" : "8"
"vfs.hdfs.kerb_ticket_cache_path" : ""
"vfs.hdfs.name_node_uri" : ""
"vfs.hdfs.username" : ""
"vfs.min_batch_gap" : "512000"
"vfs.min_batch_size" : "20971520"
"vfs.min_parallel_size" : "10485760"
"vfs.num_threads" : "8"
"vfs.s3.aws_access_key_id" : ""
"vfs.s3.aws_secret_access_key" : ""
"vfs.s3.connect_max_tries" : "5"
"vfs.s3.connect_scale_factor" : "25"
"vfs.s3.connect_timeout_ms" : "3000"
"vfs.s3.endpoint_override" : ""
"vfs.s3.max_parallel_ops" : "8"
"vfs.s3.multipart_part_size" : "5242880"
"vfs.s3.proxy_host" : ""
"vfs.s3.proxy_password" : ""
"vfs.s3.proxy_port" : "0"
"vfs.s3.proxy_scheme" : "https"
"vfs.s3.proxy_username" : ""
"vfs.s3.region" : "us-east-1"
"vfs.s3.request_timeout_ms" : "3000"
"vfs.s3.scheme" : "https"
"vfs.s3.use_virtual_addressing" : "true"
VFS S3 settings:
"aws_access_key_id" : ""
"aws_secret_access_key" : ""
"connect_max_tries" : "5"
"connect_scale_factor" : "25"
"connect_timeout_ms" : "3000"
"endpoint_override" : ""
"max_parallel_ops" : "8"
"multipart_part_size" : "5242880"
"proxy_host" : ""
"proxy_password" : ""
"proxy_port" : "0"
"proxy_scheme" : "https"
"proxy_username" : ""
"region" : "us-east-1"
"request_timeout_ms" : "3000"
"scheme" : "https"
"use_virtual_addressing" : "true"
Tile cache size after loading from file: 0
Python
$ python config.py
Tile cache size: 10000000
Default settings:
"sm.check_coord_dups" : "true"
"sm.check_coord_oob" : "true"
"sm.check_global_order" : "true"
"sm.consolidation.amplification" : "1"
"sm.consolidation.buffer_size" : "50000000"
"sm.consolidation.step_max_frags" : "4294967295"
"sm.consolidation.step_min_frags" : "4294967295"
"sm.consolidation.step_size_ratio" : "0"
"sm.consolidation.steps" : "4294967295"
"sm.dedup_coords" : "false"
"sm.enable_signal_handlers" : "true"
"sm.memory_budget" : "5368709120"
"sm.memory_budget_var" : "10737418240"
"sm.num_async_threads" : "1"
"sm.num_reader_threads" : "1"
"sm.num_tbb_threads" : "-1"
"sm.num_writer_threads" : "1"
"sm.tile_cache_size" : "10000000"
"vfs.file.max_parallel_ops" : "8"
"vfs.hdfs.kerb_ticket_cache_path" : ""
"vfs.hdfs.name_node_uri" : ""
"vfs.hdfs.username" : ""
"vfs.min_batch_gap" : "512000"
"vfs.min_batch_size" : "20971520"
"vfs.min_parallel_size" : "10485760"
"vfs.num_threads" : "8"
"vfs.s3.aws_access_key_id" : ""
"vfs.s3.aws_secret_access_key" : ""
"vfs.s3.connect_max_tries" : "5"
"vfs.s3.connect_scale_factor" : "25"
"vfs.s3.connect_timeout_ms" : "3000"
"vfs.s3.endpoint_override" : ""
"vfs.s3.max_parallel_ops" : "8"
"vfs.s3.multipart_part_size" : "5242880"
"vfs.s3.proxy_host" : ""
"vfs.s3.proxy_password" : ""
"vfs.s3.proxy_port" : "0"
"vfs.s3.proxy_scheme" : "https"
"vfs.s3.proxy_username" : ""
"vfs.s3.region" : "us-east-1"
"vfs.s3.request_timeout_ms" : "3000"
"vfs.s3.scheme" : "https"
"vfs.s3.use_virtual_addressing" : "true"
VFS S3 settings:
"aws_access_key_id" : ""
"aws_secret_access_key" : ""
"connect_max_tries" : "5"
"connect_scale_factor" : "25"
"connect_timeout_ms" : "3000"
"endpoint_override" : ""
"max_parallel_ops" : "8"
"multipart_part_size" : "5242880"
"proxy_host" : ""
"proxy_password" : ""
"proxy_port" : "0"
"proxy_scheme" : "https"
"proxy_username" : ""
"region" : "us-east-1"
"request_timeout_ms" : "3000"
"scheme" : "https"
"use_virtual_addressing" : "true"
Tile cache size after loading from file: 0
Setting/Getting config parameters¶
The TileDB config object is a simplified, in-memory key-value store/map, which accepts only string keys and values. The code below simply sets two parameters and gets the value of a third parameter. We explain the TileDB parameters at the end of this tutorial.
C++
Config config;
// Set value
config["vfs.s3.connect_timeout_ms"] = 5000;
// Append parameter segments with successive []
config["vfs."]["s3."]["endpoint_override"] = "localhost:8888";
// Get value
std::string tile_cache_size = config["sm.tile_cache_size"];
std::cout << "Tile cache size: " << tile_cache_size << "\n\n";
Python
config = tiledb.Config()
# Set value
config["vfs.s3.connect_timeout_ms"] = 5000
# Get value
tile_cache_size = config["sm.tile_cache_size"]
print("Tile cache size: %s" % str(tile_cache_size))
The above code snippet produces the following output in our program:
Tile cache size: 10000000
Iterating over config parameters¶
TileDB allows you to iterate over the configuration parameters as well. The code below prints the default parameters of a config object, as we iterate before setting any new parameter value.
C++
Config config;
std::cout << "Default settings:\n";
for (auto& p : config) {
std::cout << "\"" << p.first << "\" : \"" << p.second << "\"\n";
}
Python
config = tiledb.Config()
print("\nDefault settings:")
for p in config.items():
print("\"%s\" : \"%s\"" % (p[0], p[1]))
The corresponding output is (note that we ran this on a machine with 8 cores):
Default settings:
"sm.check_coord_dups" : "true"
"sm.check_coord_oob" : "true"
"sm.check_global_order" : "true"
"sm.consolidation.amplification" : "1"
"sm.consolidation.buffer_size" : "50000000"
"sm.consolidation.step_max_frags" : "4294967295"
"sm.consolidation.step_min_frags" : "4294967295"
"sm.consolidation.step_size_ratio" : "0"
"sm.consolidation.steps" : "4294967295"
"sm.dedup_coords" : "false"
"sm.enable_signal_handlers" : "true"
"sm.memory_budget" : "5368709120"
"sm.memory_budget_var" : "10737418240"
"sm.num_async_threads" : "1"
"sm.num_reader_threads" : "1"
"sm.num_tbb_threads" : "-1"
"sm.num_writer_threads" : "1"
"sm.tile_cache_size" : "10000000"
"vfs.file.max_parallel_ops" : "8"
"vfs.hdfs.kerb_ticket_cache_path" : ""
"vfs.hdfs.name_node_uri" : ""
"vfs.hdfs.username" : ""
"vfs.min_batch_gap" : "512000"
"vfs.min_batch_size" : "20971520"
"vfs.min_parallel_size" : "10485760"
"vfs.num_threads" : "8"
"vfs.s3.aws_access_key_id" : ""
"vfs.s3.aws_secret_access_key" : ""
"vfs.s3.connect_max_tries" : "5"
"vfs.s3.connect_scale_factor" : "25"
"vfs.s3.connect_timeout_ms" : "3000"
"vfs.s3.endpoint_override" : ""
"vfs.s3.max_parallel_ops" : "8"
"vfs.s3.multipart_part_size" : "5242880"
"vfs.s3.proxy_host" : ""
"vfs.s3.proxy_password" : ""
"vfs.s3.proxy_port" : "0"
"vfs.s3.proxy_scheme" : "https"
"vfs.s3.proxy_username" : ""
"vfs.s3.region" : "us-east-1"
"vfs.s3.request_timeout_ms" : "3000"
"vfs.s3.scheme" : "https"
"vfs.s3.use_virtual_addressing" : "true"
TileDB allows you also to iterate only over the config parameters with a certain prefix as follows:
C++
Config config;
// Print only the S3 settings
std::cout << "\nVFS S3 settings:\n";
for (auto i = config.begin("vfs.s3."); i != config.end(); ++i) {
auto& p = *i;
std::cout << "\"" << p.first << "\" : \"" << p.second << "\"\n";
}
Python
config = tiledb.Config()
# Print only the S3 settings.
print("\nVFS S3 settings:")
for p in config.items("vfs.s3."):
print("\"%s\" : \"%s\"" % (p[0], p[1]))
The above produces the following output. Observe that the prefix is stripped from the retrieved parameter names.
VFS S3 settings:
"aws_access_key_id" : ""
"aws_secret_access_key" : ""
"connect_max_tries" : "5"
"connect_scale_factor" : "25"
"connect_timeout_ms" : "3000"
"endpoint_override" : ""
"max_parallel_ops" : "8"
"multipart_part_size" : "5242880"
"proxy_host" : ""
"proxy_password" : ""
"proxy_port" : "0"
"proxy_scheme" : "https"
"proxy_username" : ""
"region" : "us-east-1"
"request_timeout_ms" : "3000"
"scheme" : "https"
"use_virtual_addressing" : "true"
Saving/Loading config to/from file¶
You can save the configuration parameters you used in your program into a (local) text file, and subsequently load them from the file into a new TileDB config if needed as follows:
C++
// Save to file
Config config;
config["sm.tile_cache_size"] = 0;
config.save_to_file("tiledb_config.txt");
// Load from file
Config config_load("tiledb_config.txt");
std::string tile_cache_size = config_load["sm.tile_cache_size"];
std::cout << "\nTile cache size after loading from file: " << tile_cache_size
<< "\n";
Python
# Save to file
config = tiledb.Config()
config["sm.tile_cache_size"] = 0
config.save("tiledb_config.txt")
# Load from file
config_load = tiledb.Config.load("tiledb_config.txt")
print("\nTile cache size after loading from file: %s" % str(config_load["sm.tile_cache_size"]))
The above code creates a config object, changes the tile cache size to 0
,
and saves the entire configuration into a file. Next, it creates a new
config loading the values from the created file. Running the program
produces the following output. Observe that the loaded tile cache size
value is 0
, which is the value we altered prior to saving the config
to the file.
Tile cache size after loading from file: 0
Inspecting the contents of the exported config file, we get the following:
$ cat tiledb_config.txt
sm.check_coord_dups true
sm.check_coord_oob true
sm.check_global_order true
sm.consolidation.amplification 1
sm.consolidation.buffer_size 50000000
sm.consolidation.step_max_frags 4294967295
sm.consolidation.step_min_frags 4294967295
sm.consolidation.step_size_ratio 0
sm.consolidation.steps 4294967295
sm.dedup_coords false
sm.enable_signal_handlers true
sm.memory_budget 5368709120
sm.memory_budget_var 10737418240
sm.num_async_threads 1
sm.num_reader_threads 1
sm.num_tbb_threads -1
sm.num_writer_threads 1
sm.tile_cache_size 0
vfs.file.max_parallel_ops 8
vfs.min_batch_gap 512000
vfs.min_batch_size 20971520
vfs.min_parallel_size 10485760
vfs.num_threads 8
vfs.s3.connect_max_tries 5
vfs.s3.connect_scale_factor 25
vfs.s3.connect_timeout_ms 3000
vfs.s3.max_parallel_ops 8
vfs.s3.multipart_part_size 5242880
vfs.s3.proxy_port 0
vfs.s3.proxy_scheme https
vfs.s3.region us-east-1
vfs.s3.request_timeout_ms 3000
vfs.s3.scheme https
vfs.s3.use_virtual_addressing true
Observe that config parameters that have an empty string as a value
are not exported (e.g., vfs.s3.proxy_host
).
Note also that vfs.s3.proxy_username
and
vfs.s3.proxy_password
are not exported for security purposes.
Summary of Parameters¶
Below we provide a table with all the TileDB configuration parameters, along with their description and default values.
Parameter |
Default Value |
Description |
|
|
This is applicable only if |
|
|
If |
|
|
If |
|
|
The factor by which the size of the dense fragment resulting from consolidating a set of fragments (containing at least one dense fragment) can be amplified. This is important when the union of the non-empty domains of the fragments to be consolidated have a lot of empty cells, which the consolidated fragment will have to fill with the special fill value (since the resulting fragment is dense). |
|
|
The size (in bytes) of the attribute buffers used during consolidation. |
|
|
The maximum number of fragments to consolidate in a single step. |
|
|
The minimum number of fragments to consolidate in a single step. |
|
|
The size ratio of two (“adjacent”) fragments must be larger than this value to be considered for consolidation in a single step. |
|
|
The number of consolidation steps to be performed when executing the consolidation algorithm. |
|
|
If |
|
|
Determines whether or not TileDB will install signal handlers. |
|
|
The memory budget for tiles of fixed-sized attributes (or offsets for var-sized attributes) to be fetched during reads. |
|
|
The memory budget for tiles of var-sized attributes to be fetched during reads. |
|
|
The number of threads allocated for async queries. |
|
|
The number of threads allocated for filesystem read operations. |
|
|
The number of threads allocated for filesystem write operations. |
|
|
The number of threads allocated for the TBB thread
pool (if TBB is enabled). Note: this is a
whole-program setting. Usually this should not be
modified from the default. See also the
documentation for TBB’s |
|
|
The tile cache size in bytes. |
|
# of cores |
The number of threads allocated for VFS operations (any backend), per VFS instance. |
|
|
The maximum number of parallel operations on
objects with |
|
|
If set to |
|
|
The minimum number of bytes between two VFS read batches. |
|
|
The minimum number of bytes in a VFS read operation. |
|
|
The minimum number of bytes in a parallel VFS
operation (except parallel S3 writes, which are
controlled by |
|
|
The maximum tries for a connection. Any |
|
|
The scale factor for exponential backoff when
connecting to S3. Any |
|
|
The connection timeout in ms. Any |
|
|
The S3 endpoint, if S3 is enabled. |
|
|
The maximum number of S3 backend parallel operations. |
|
|
The part size (in bytes) used in S3 multipart
writes. Any |
|
|
The S3 proxy host. |
|
|
The S3 proxy password. |
|
|
The S3 proxy port. |
|
|
The S3 proxy scheme. |
|
|
The S3 proxy username. |
|
|
The S3 region. |
|
|
The AWS access key id (AWS_ACCESS_KEY_ID) |
|
|
The AWS access secret (AWS_SECRET_ACCESS_KEY) |
|
|
The request timeout in ms. Any |
|
|
The S3 scheme. |
|
|
Determines whether to use virtual addressing or not. |
|
|
Path to the Kerberos ticket cache when connecting to an HDFS cluster. |
|
|
Optional namenode URI to use (TileDB will use
|
|
|
Username to use when connecting to the HDFS cluster. |