1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
|
.\" $OpenBSD: btree.3,v 1.2 2010/06/02 14:45:59 deraadt Exp $
.\"
.\" Copyright (c) 2009, 2010 Martin Hedenfalk <martin@bzero.se>
.\"
.\" Permission to use, copy, modify, and distribute this software for any
.\" purpose with or without fee is hereby granted, provided that the above
.\" copyright notice and this permission notice appear in all copies.
.\"
.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
.\"
.Dd $Mdocdate: June 2 2010 $
.Dt BTREE 3
.Os
.Sh NAME
.Nm btree_open ,
.Nm btree_open_fd ,
.Nm btree_close ,
.Nm btree_txn_begin ,
.Nm btree_txn_get ,
.Nm btree_txn_put ,
.Nm btree_txn_del ,
.Nm btree_txn_commit ,
.Nm btree_txn_abort ,
.Nm btree_get ,
.Nm btree_put ,
.Nm btree_del ,
.Nm btree_txn_cursor_open ,
.Nm btree_cursor_open ,
.Nm btree_cursor_close ,
.Nm btree_cursor_get ,
.Nm btree_stat ,
.Nm btree_compact ,
.Nm btree_revert ,
.Nm btree_sync ,
.Nm btree_set_cache_size ,
.Nm btree_get_flags ,
.Nm btree_get_path ,
.Nm btree_cmp ,
.Nm btval_reset
.Nd Append-only prefix B+Tree database library.
.Sh SYNOPSIS
.Fd #include <btree.h>
.Ft "struct btree *"
.Fn "btree_open_fd" "int fd" "uint32_t flags"
.Ft "struct btree *"
.Fn "btree_open" "const char *path" "uint32_t flags" "mode_t mode"
.Ft "void"
.Fn "btree_close" "struct btree *bt"
.Ft "struct btree_txn *"
.Fn "btree_txn_begin" "struct btree *bt" "int rdonly"
.Ft "int"
.Fn "btree_txn_get" "struct btree *bt" "struct btree_txn *" "struct btval *key" "struct btval *data"
.Ft "int"
.Fn "btree_txn_put" "struct btree *bt" "struct btree_txn *" "struct btval *key" "struct btval *data" "unsigned flags"
.Ft "int"
.Fn "btree_txn_del" "struct btree *bt" "struct btree_txn *" "struct btval *key" "struct btval *data"
.Ft "int"
.Fn "btree_txn_commit" "struct btree_txn *txn"
.Ft "void"
.Fn "btree_txn_abort" "struct btree_txn *txn"
.Ft "int"
.Fn "btree_get" "struct btree *bt" "struct btval *key" "struct btval *data"
.Ft "int"
.Fn "btree_put" "struct btree *bt" "struct btval *key" "struct btval *data" "unsigned flags"
.Ft "int"
.Fn "btree_del" "struct btree *bt" "struct btval *key" "struct btval *data"
.Ft "struct cursor *"
.Fn "btree_txn_cursor_open" "struct btree *bt" "struct btree_txn *txn"
.Ft "struct cursor *"
.Fn "btree_cursor_open" "struct btree *bt"
.Ft "void"
.Fn "btree_cursor_close" "struct cursor *cursor"
.Ft "int"
.Fn "btree_cursor_get" "struct cursor *cursor" "struct btval *key" "struct btval *data" "enum cursor_op op"
.Ft "struct btree_stat *"
.Fn "btree_stat" "struct btree *bt"
.Ft "int"
.Fn "btree_compact" "struct btree *bt"
.Ft "int"
.Fn "btree_revert" "struct btree *bt"
.Ft "int"
.Fn "btree_sync" "struct btree *bt"
.Ft "void"
.Fn "btree_set_cache_size" "struct btree *bt" "unsigned int cache_size"
.Ft "unsigned int"
.Fn "btree_get_flags" "struct btree *bt"
.Ft "const char *"
.Fn "btree_get_path" "struct btree *bt"
.Ft "int"
.Fn "btree_cmp" "struct btree *bt" "const struct btval *a" "const struct btval *b"
.Ft "void"
.Fn "btval_reset" "struct btval *btv"
.Sh DESCRIPTION
The database is implemented as a modified prefix B+tree.
Instead of modifying the database file inplace,
each update appends any modified pages at the end of the file.
The last block of the file contains metadata and a pointer to the root page.
.Pp
Append-only writing gives the following properties:
.Bl -enum
.It
No locks.
Since all writes are appended to the end of the file, multiple
readers can continue reading from the tree as it was when they
started.
This snapshot view might contain outdated versions of entries.
.It
Resistance to corruption.
The file content is never modified.
When opening a database file, the last good meta-data page is searched
by scanning backwards.
If there is trailing garbage in the file, it will be skipped.
.It
Hot backups.
Backups can be made on a running server simply by copying the files.
There is no risk for inconsistencies.
.El
.Pp
The drawback is that it wastes space.
A 4-level B+tree database will write at least 5 new pages on each update,
including the meta-data page.
With 4 KiB pagesize, the file would grow by 20 KiB on each update.
.Pp
To reclaim the wasted space, the database can be compacted.
The compaction process opens the database and traverses the tree.
Each active page is then written to a new file.
When complete, the old file is replaced.
Any updates committed after the file was opened will be lost.
All processes using the file should re-open it.
.Sh CURSORS
A new cursor may be opened with a call to
.Fn btree_txn_cursor_open
or
.Fn btree_cursor_open .
The latter is implemented as a macro to the former with a NULL
.Ar txn
argument.
Multiple cursors may be open simultaneously.
The cursor must be closed with
.Fn btree_cursor_close
after use.
.Pp
The cursor can be placed at a specific key by setting
.Ar op
to BT_CURSOR and filling in the
.Ar key
argument.
The cursor will be placed at the smallest key greater or equal to
the specified key.
.Pp
The database may be traversed from the first key to the last by calling
.Fn btree_cursor_get
with
.Ar op
initially set to BT_FIRST and then set to BT_NEXT.
If the cursor is not yet initialized, ie
.Fn btree_cursor_get
has not yet been called with
.Ar op
set to BT_FIRST or BT_CURSOR, then BT_NEXT behaves as BT_FIRST.
.Sh TRANSACTIONS
There are two types of transactions: write and read-only transactions.
Only one write transaction is allowed at a time.
A read-only transaction allows the grouping of several read operations
to see a consistent state of the database.
.Pp
A transaction is started with
.Fn btree_txn_begin .
If the
.Ar rdonly
parameter is 0, a write transaction is started and an exclusive lock
is taken on the file.
No lock is taken for read-only transactions.
.Pp
The transaction is ended either with
.Fn btree_txn_commit
or
.Fn btree_txn_abort .
The
.Ft btree_txn
pointer must not be accessed afterwards.
.Sh RETURN VALUES
The
.Fn btree_txn_get ,
.Fn btree_txn_put ,
.Fn btree_txn_del ,
.Fn btree_txn_commit ,
.Fn btree_get ,
.Fn btree_put ,
.Fn btree_del ,
.Fn btree_cursor_get ,
.Fn btree_compact
and
.Fn btree_revert
functions all return BT_SUCCESS on success and BT_FAIL on failure.
.Pp
.Fn btree_txn_put
and
.Fn btree_put
returns BT_EXISTS if the key already exists and BT_NOOVERWRITE was not
passed in the
.Ar flags
argument.
.Pp
.Fn btree_txn_get ,
.Fn btree_txn_del ,
.Fn btree_get ,
.Fn btree_del
and
.Fn btree_cursor_get
returns BT_NOTFOUND if the specified key was not found.
.Pp
All functions returning pointers to structs returns NULL on error.
.Sh SEE ALSO
.Pp
.Sh AUTHORS
The
.Nm btree
library was written by
.An Martin Hedenfalk Aq martin@bzero.se
.Sh BUGS
This manpage is incomplete.
The page size can't be changed.
Byte order is assumed never to change.
|