.\" $OpenBSD: btree.3,v 1.2 2010/06/02 14:45:59 deraadt Exp $ .\" .\" Copyright (c) 2009, 2010 Martin Hedenfalk .\" .\" Permission to use, copy, modify, and distribute this software for any .\" purpose with or without fee is hereby granted, provided that the above .\" copyright notice and this permission notice appear in all copies. .\" .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. .\" .Dd $Mdocdate: June 2 2010 $ .Dt BTREE 3 .Os .Sh NAME .Nm btree_open , .Nm btree_open_fd , .Nm btree_close , .Nm btree_txn_begin , .Nm btree_txn_get , .Nm btree_txn_put , .Nm btree_txn_del , .Nm btree_txn_commit , .Nm btree_txn_abort , .Nm btree_get , .Nm btree_put , .Nm btree_del , .Nm btree_txn_cursor_open , .Nm btree_cursor_open , .Nm btree_cursor_close , .Nm btree_cursor_get , .Nm btree_stat , .Nm btree_compact , .Nm btree_revert , .Nm btree_sync , .Nm btree_set_cache_size , .Nm btree_get_flags , .Nm btree_get_path , .Nm btree_cmp , .Nm btval_reset .Nd Append-only prefix B+Tree database library. .Sh SYNOPSIS .Fd #include .Ft "struct btree *" .Fn "btree_open_fd" "int fd" "uint32_t flags" .Ft "struct btree *" .Fn "btree_open" "const char *path" "uint32_t flags" "mode_t mode" .Ft "void" .Fn "btree_close" "struct btree *bt" .Ft "struct btree_txn *" .Fn "btree_txn_begin" "struct btree *bt" "int rdonly" .Ft "int" .Fn "btree_txn_get" "struct btree *bt" "struct btree_txn *" "struct btval *key" "struct btval *data" .Ft "int" .Fn "btree_txn_put" "struct btree *bt" "struct btree_txn *" "struct btval *key" "struct btval *data" "unsigned flags" .Ft "int" .Fn "btree_txn_del" "struct btree *bt" "struct btree_txn *" "struct btval *key" "struct btval *data" .Ft "int" .Fn "btree_txn_commit" "struct btree_txn *txn" .Ft "void" .Fn "btree_txn_abort" "struct btree_txn *txn" .Ft "int" .Fn "btree_get" "struct btree *bt" "struct btval *key" "struct btval *data" .Ft "int" .Fn "btree_put" "struct btree *bt" "struct btval *key" "struct btval *data" "unsigned flags" .Ft "int" .Fn "btree_del" "struct btree *bt" "struct btval *key" "struct btval *data" .Ft "struct cursor *" .Fn "btree_txn_cursor_open" "struct btree *bt" "struct btree_txn *txn" .Ft "struct cursor *" .Fn "btree_cursor_open" "struct btree *bt" .Ft "void" .Fn "btree_cursor_close" "struct cursor *cursor" .Ft "int" .Fn "btree_cursor_get" "struct cursor *cursor" "struct btval *key" "struct btval *data" "enum cursor_op op" .Ft "struct btree_stat *" .Fn "btree_stat" "struct btree *bt" .Ft "int" .Fn "btree_compact" "struct btree *bt" .Ft "int" .Fn "btree_revert" "struct btree *bt" .Ft "int" .Fn "btree_sync" "struct btree *bt" .Ft "void" .Fn "btree_set_cache_size" "struct btree *bt" "unsigned int cache_size" .Ft "unsigned int" .Fn "btree_get_flags" "struct btree *bt" .Ft "const char *" .Fn "btree_get_path" "struct btree *bt" .Ft "int" .Fn "btree_cmp" "struct btree *bt" "const struct btval *a" "const struct btval *b" .Ft "void" .Fn "btval_reset" "struct btval *btv" .Sh DESCRIPTION The database is implemented as a modified prefix B+tree. Instead of modifying the database file inplace, each update appends any modified pages at the end of the file. The last block of the file contains metadata and a pointer to the root page. .Pp Append-only writing gives the following properties: .Bl -enum .It No locks. Since all writes are appended to the end of the file, multiple readers can continue reading from the tree as it was when they started. This snapshot view might contain outdated versions of entries. .It Resistance to corruption. The file content is never modified. When opening a database file, the last good meta-data page is searched by scanning backwards. If there is trailing garbage in the file, it will be skipped. .It Hot backups. Backups can be made on a running server simply by copying the files. There is no risk for inconsistencies. .El .Pp The drawback is that it wastes space. A 4-level B+tree database will write at least 5 new pages on each update, including the meta-data page. With 4 KiB pagesize, the file would grow by 20 KiB on each update. .Pp To reclaim the wasted space, the database can be compacted. The compaction process opens the database and traverses the tree. Each active page is then written to a new file. When complete, the old file is replaced. Any updates committed after the file was opened will be lost. All processes using the file should re-open it. .Sh CURSORS A new cursor may be opened with a call to .Fn btree_txn_cursor_open or .Fn btree_cursor_open . The latter is implemented as a macro to the former with a NULL .Ar txn argument. Multiple cursors may be open simultaneously. The cursor must be closed with .Fn btree_cursor_close after use. .Pp The cursor can be placed at a specific key by setting .Ar op to BT_CURSOR and filling in the .Ar key argument. The cursor will be placed at the smallest key greater or equal to the specified key. .Pp The database may be traversed from the first key to the last by calling .Fn btree_cursor_get with .Ar op initially set to BT_FIRST and then set to BT_NEXT. If the cursor is not yet initialized, ie .Fn btree_cursor_get has not yet been called with .Ar op set to BT_FIRST or BT_CURSOR, then BT_NEXT behaves as BT_FIRST. .Sh TRANSACTIONS There are two types of transactions: write and read-only transactions. Only one write transaction is allowed at a time. A read-only transaction allows the grouping of several read operations to see a consistent state of the database. .Pp A transaction is started with .Fn btree_txn_begin . If the .Ar rdonly parameter is 0, a write transaction is started and an exclusive lock is taken on the file. No lock is taken for read-only transactions. .Pp The transaction is ended either with .Fn btree_txn_commit or .Fn btree_txn_abort . The .Ft btree_txn pointer must not be accessed afterwards. .Sh RETURN VALUES The .Fn btree_txn_get , .Fn btree_txn_put , .Fn btree_txn_del , .Fn btree_txn_commit , .Fn btree_get , .Fn btree_put , .Fn btree_del , .Fn btree_cursor_get , .Fn btree_compact and .Fn btree_revert functions all return BT_SUCCESS on success and BT_FAIL on failure. .Pp .Fn btree_txn_put and .Fn btree_put returns BT_EXISTS if the key already exists and BT_NOOVERWRITE was not passed in the .Ar flags argument. .Pp .Fn btree_txn_get , .Fn btree_txn_del , .Fn btree_get , .Fn btree_del and .Fn btree_cursor_get returns BT_NOTFOUND if the specified key was not found. .Pp All functions returning pointers to structs returns NULL on error. .Sh SEE ALSO .Pp .Sh AUTHORS The .Nm btree library was written by .An Martin Hedenfalk Aq martin@bzero.se .Sh BUGS This manpage is incomplete. The page size can't be changed. Byte order is assumed never to change.