diff --git a/srcs/libs/docs/libuv-1.41.0.chm b/srcs/libs/docs/libuv-1.41.0.chm deleted file mode 100644 index 31216ef..0000000 Binary files a/srcs/libs/docs/libuv-1.41.0.chm and /dev/null differ diff --git a/srcs/libs/docs/libuv-1.44.2.pdf b/srcs/libs/docs/libuv-1.44.2.pdf new file mode 100644 index 0000000..637307d Binary files /dev/null and b/srcs/libs/docs/libuv-1.44.2.pdf differ diff --git a/srcs/libs/docs/sdc.md b/srcs/libs/docs/sdc.md deleted file mode 100644 index 4bfc66c..0000000 --- a/srcs/libs/docs/sdc.md +++ /dev/null @@ -1,842 +0,0 @@ -Simple Dynamic Strings -=== - -**Notes about verison 2**: this is an updated version of SDS in an attempt to finally unify Redis, Disque, Hiredis, and -the stand alone SDS versions. This version is **NOT* binary compatible** with SDS verison 1, but the API is 99% -compatible so switching to the new lib should be trivial. - -Note that this version of SDS may be a slower with certain workloads, but uses less memory compared to V1 since header -size is dynamic and depends to the string to alloc. - -Moreover it includes a few more API functions, notably `sdscatfmt` which is a faster version of `sdscatprintf` that can -be used for the simpler cases in order to avoid the libc `printf` family functions performance penalty. - -How SDS stirngs work -=== - -SDS is a string library for C designed to augment the limited libc string handling functionalities by adding heap -allocated strings that are: - -* Simpler to use. -* Binary safe. -* Computationally more efficient. -* But yet... Compatible with normal C string functions. - -This is achieved using an alternative design in which instead of using a C structure to represent a string, we use a -binary prefix that is stored before the actual pointer to the string that is returned by SDS to the user. - - +--------+-------------------------------+-----------+ - | Header | Binary safe C alike string... | Null term | - +--------+-------------------------------+-----------+ - | - `-> Pointer returned to the user. - -Because of meta data stored before the actual returned pointer as a prefix, and because of every SDS string implicitly -adding a null term at the end of the string regardless of the actual content of the string, SDS strings work well -together with C strings and the user is free to use them interchangeably with real-only functions that access the string -in read-only. - -SDS was a C string I developed in the past for my everyday C programming needs, later it was moved into Redis where it -is used extensively and where it was modified in order to be suitable for high performance operations. Now it was -extracted from Redis and forked as a stand alone project. - -Because of its many years life inside Redis, SDS provides both higher level functions for easy strings manipulation in -C, but also a set of low level functions that make it possible to write high performance code without paying a penalty -for using an higher level string library. - -Advantages and disadvantages of SDS -=== - -Normally dynamic string libraries for C are implemented using a structure that defines the string. The structure has a -pointer field that is managed by the string function, so it looks like this: - -```c -struct yourAverageStringLibrary { - char *buf; - size_t len; - ... possibly more fields here ... -}; -``` - -SDS strings are already mentioned don't follow this schema, and are instead a single allocation with a prefix that -lives *before* the address actually returned for the string. - -There are advantages and disadvantages with this approach over the traditional approach: - -**Disadvantage #1**: many functions return the new string as value, since sometimes SDS requires to create a new string -with more space, so the most SDS API calls look like this: - -```c -s = sdscat(s,"Some more data"); -``` - -As you can see `s` is used as input for `sdscat` but is also set to the value returned by the SDS API call, since we are -not sure if the call modified the SDS string we passed or allocated a new one. Not remembering to assign back the return -value of `sdscat` or similar functions to the variable holding the SDS string will result in a bug. - -**Disadvantage #2**: if an SDS string is shared in different places in your program you have to modify all the -references when you modify the string. However most of the times when you need to share SDS strings it is much better to -encapsulate them into structures with a `reference count` otherwise it is too easy to incur into memory leaks. - -**Advantage #1**: you can pass SDS strings to functions designed for C functions without accessing a struct member or -calling a function, like this: - -```c -printf("%s\n", sds_string); -``` - -In most other libraries this will be something like: - -```c -printf("%s\n", string->buf); -``` - -Or: - -```c -printf("%s\n", getStringPointer(string)); -``` - -**Advantage #2**: accessing individual chars is straightforward. C is a low level language so this is an important -operation in many programs. With SDS strings accessing individual chars is very natural: - -```c -printf("%c %c\n", s[0], s[1]); -``` - -With other libraries your best chance is to assign `string->buf` (or call the function to get the string pointer) to -a `char` pointer and work with this. However since the other libraries may reallocate the buffer implicitly every time -you call a function that may modify the string you have to get a reference to the buffer again. - -**Advantage #3**: single allocation has better cache locality. Usually when you access a string created by a string -library using a structure, you have two different allocations for the structure representing the string, and the actual -buffer holding the string. Over the time the buffer is reallocated, and it is likely that it ends in a totally different -part of memory compared to the structure itself. Since modern programs performances are often dominated by cache misses, -SDS may perform better in many workloads. - -SDS basics -=== - -The type of SDS strings is just the char pointer `char *`. However SDS defines an `sds` type as alias of `char *` in its -header file: you should use the -`sds` type in order to make sure you remember that a given variable in your program holds an SDS string and not a C -string, however this is not mandatory. - -This is the simplest SDS program you can write that does something: - -```c -sds mystring = sdsnew("Hello World!"); -printf("%s\n", mystring); -sdsfree(mystring); - -output> Hello World! -``` - -The above small program already shows a few important things about SDS: - -* SDS strings are created, and heap allocated, via the `sdsnew()` function, or other similar functions that we'll see in - a moment. -* SDS strings can be passed to `printf()` like any other C string. -* SDS strings require to be freed with `sdsfree()`, since they are heap allocated. - -Creating SDS strings ---- - -```c -sds sdsnewlen(const void *init, size_t initlen); -sds sdsnew(const char *init); -sds sdsempty(void); -sds sdsdup(const sds s); -``` - -There are many ways to create SDS strings: - -* The `sdsnew` function creates an SDS string starting from a C null terminated string. We already saw how it works in - the above example. -* The `sdsnewlen` function is similar to `sdsnew` but instead of creating the string assuming that the input string is - null terminated, it gets an additional length parameter. This way you can create a string using binary data: - - ```c - char buf[3]; - sds mystring; - - buf[0] = 'A'; - buf[1] = 'B'; - buf[2] = 'C'; - mystring = sdsnewlen(buf,3); - printf("%s of len %d\n", mystring, (int) sdslen(mystring)); - - output> ABC of len 3 - ``` - - Note: `sdslen` return value is casted to `int` because it returns a `size_t` - type. You can use the right `printf` specifier instead of casting. - -* The `sdsempty()` function creates an empty zero-length string: - - ```c - sds mystring = sdsempty(); - printf("%d\n", (int) sdslen(mystring)); - - output> 0 - ``` - -* The `sdsdup()` function duplicates an already existing SDS string: - - ```c - sds s1, s2; - - s1 = sdsnew("Hello"); - s2 = sdsdup(s1); - printf("%s %s\n", s1, s2); - - output> Hello Hello - ``` - -Obtaining the string length ---- - -```c -size_t sdslen(const sds s); -``` - -In the examples above we already used the `sdslen` function in order to get the length of the string. This function -works like `strlen` of the libc except that: - -* It runs in constant time since the length is stored in the prefix of SDS strings, so calling `sdslen` is not expensive - even when called with very large strings. -* The function is binary safe like any other SDS string function, so the length is the true length of the string - regardless of the content, there is no problem if the string includes null term characters in the middle. - -As an example of the binary safeness of SDS strings, we can run the following code: - -```c -sds s = sdsnewlen("A\0\0B",4); -printf("%d\n", (int) sdslen(s)); - -output> 4 -``` - -Note that SDS strings are always null terminated at the end, so even in that case `s[4]` will be a null term, however -printing the string with `printf` -would result in just `"A"` to be printed since libc will treat the SDS string like a normal C string. - -Destroying strings ---- - -```c -void sdsfree(sds s); -``` - -The destroy an SDS string there is just to call `sdsfree` with the string pointer. However note that empty strings -created with `sdsempty` need to be destroyed as well otherwise they'll result into a memory leak. - -The function `sdsfree` does not perform any operation if instead of an SDS string pointer, `NULL` is passed, so you -don't need to check for `NULL` explicitly before calling it: - -```c -if (string) sdsfree(string); /* Not needed. */ -sdsfree(string); /* Same effect but simpler. */ -``` - -Concatenating strings ---- - -Concatenating strings to other strings is likely the operation you will end using the most with a dynamic C string -library. SDS provides different functions to concatenate strings to existing strings. - -```c -sds sdscatlen(sds s, const void *t, size_t len); -sds sdscat(sds s, const char *t); -``` - -The main string concatenation functions are `sdscatlen` and `sdscat` that are identical, the only difference being -that `sdscat` does not have an explicit length argument since it expects a null terminated string. - -```c -sds s = sdsempty(); -s = sdscat(s, "Hello "); -s = sdscat(s, "World!"); -printf("%s\n", s); - -output> Hello World! -``` - -Sometimes you want to cat an SDS string to another SDS string, so you don't need to specify the length, but at the same -time the string does not need to be null terminated but can contain any binary data. For this there is a special -function: - -```c -sds sdscatsds(sds s, const sds t); -``` - -Usage is straightforward: - -```c -sds s1 = sdsnew("aaa"); -sds s2 = sdsnew("bbb"); -s1 = sdscatsds(s1,s2); -sdsfree(s2); -printf("%s\n", s1); - -output> aaabbb -``` - -Sometimes you don't want to append any special data to the string, but you want to make sure that there are at least a -given number of bytes composing the whole string. - -```c -sds sdsgrowzero(sds s, size_t len); -``` - -The `sdsgrowzero` function will do nothing if the current string length is already `len` bytes, otherwise it will -enlarge the string to `len` just padding it with zero bytes. - -```c -sds s = sdsnew("Hello"); -s = sdsgrowzero(s,6); -s[5] = '!'; /* We are sure this is safe because of sdsgrowzero() */ -printf("%s\n', s); - -output> Hello! -``` - -Formatting strings ---- - -There is a special string concatenation function that accepts a `printf` alike format specifier and cats the formatted -string to the specified string. - -```c -sds sdscatprintf(sds s, const char *fmt, ...) { -``` - -Example: - -```c -sds s; -int a = 10, b = 20; -s = sdsnew("The sum is: "); -s = sdscatprintf(s,"%d+%d = %d",a,b,a+b); -``` - -Often you need to create SDS string directly from `printf` format specifiers. Because `sdscatprintf` is actually a -function that concatenates strings all you need is to concatenate your string to an empty string: - -```c -char *name = "Anna"; -int loc = 2500; -sds s; -s = sdscatprintf(sdsempty(), "%s wrote %d lines of LISP\n", name, loc); -``` - -You can use `sdscatprintf` in order to convert numbers into SDS strings: - -```c -int some_integer = 100; -sds num = sdscatprintf(sdsempty(),"%d\n", some_integer); -``` - -However this is slow and we have a special function to make it efficient. - -Fast number to string operations ---- - -Creating an SDS string from an integer may be a common operation in certain kind of programs, and while you may do this -with `sdscatprintf` the performance hit is big, so SDS provides a specialized function. - -```c -sds sdsfromlonglong(long long value); -``` - -Use it like this: - -```c -sds s = sdsfromlonglong(10000); -printf("%d\n", (int) sdslen(s)); - -output> 5 -``` - -Trimming strings and getting ranges ---- - -String trimming is a common operation where a set of characters are removed from the left and the right of the string. -Another useful operation regarding strings is the ability to just take a range out of a larger string. - -```c -void sdstrim(sds s, const char *cset); -void sdsrange(sds s, int start, int end); -``` - -SDS provides both the operations with the `sdstrim` and `sdsrange` functions. However note that both functions work -differently than most functions modifying SDS strings since the return value is null: basically those functions always -destructively modify the passed SDS string, never allocating a new one, because both trimming and ranges will never need -more room: the operations can only remove characters from the original strings. - -Because of this behavior, both functions are fast and don't involve reallocation. - -This is an example of string trimming where newlines and spaces are removed from an SDS strings: - -```c -sds s = sdsnew(" my string\n\n "); -sdstrim(s," \n"); -printf("-%s-\n",s); - -output> -my string- -``` - -Basically `sdstrim` takes the SDS string to trim as first argument, and a null terminated set of characters to remove -from left and right of the string. The characters are removed as long as they are not interrupted by a character that is -not in the list of characters to trim: this is why the space between -`"my"` and `"string"` was preserved in the above example. - -Taking ranges is similar, but instead to take a set of characters, it takes to indexes, representing the start and the -end as specified by zero-based indexes inside the string, to obtain the range that will be retained. - -```c -sds s = sdsnew("Hello World!"); -sdsrange(s,1,4); -printf("-%s-\n"); - -output> -ello- -``` - -Indexes can be negative to specify a position starting from the end of the string, so that `-1` means the last -character, `-2` the penultimate, and so forth: - -```c -sds s = sdsnew("Hello World!"); -sdsrange(s,6,-1); -printf("-%s-\n"); -sdsrange(s,0,-2); -printf("-%s-\n"); - -output> -World!- -output> -World- -``` - -`sdsrange` is very useful when implementing networking servers processing a protocol or sending messages. For example -the following code is used implementing the write handler of the Redis Cluster message bus between nodes: - -```c -void clusterWriteHandler(..., int fd, void *privdata, ...) { - clusterLink *link = (clusterLink*) privdata; - ssize_t nwritten = write(fd, link->sndbuf, sdslen(link->sndbuf)); - if (nwritten <= 0) { - /* Error handling... */ - } - sdsrange(link->sndbuf,nwritten,-1); - ... more code here ... -} -``` - -Every time the socket of the node we want to send the message to is writable we attempt to write as much bytes as -possible, and we use `sdsrange` in order to remove from the buffer what was already sent. - -The function to queue new messages to send to some node in the cluster will simply use `sdscatlen` in order to put more -data in the send buffer. - -Note that the Redis Cluster bus implements a binary protocol, but since SDS is binary safe this is not a problem, so the -goal of SDS is not just to provide an high level string API for the C programmer but also dynamically allocated buffers -that are easy to manage. - -String copying ---- - -The most dangerous and infamus function of the standard C library is probably -`strcpy`, so perhaps it is funny how in the context of better designed dynamic string libraries the concept of copying -strings is almost irrelevant. Usually what you do is to create strings with the content you want, or concatenating more -content as needed. - -However SDS features a string copy function that is useful in performance critical code sections, however I guess its -practical usefulness is limited as the function never managed to get called in the context of the 50k lines of code -composing the Redis code base. - -```c -sds sdscpylen(sds s, const char *t, size_t len); -sds sdscpy(sds s, const char *t); -``` - -The string copy function of SDS is called `sdscpylen` and works like that: - -```c -s = sdsnew("Hello World!"); -s = sdscpylen(s,"Hello Superman!",15); -``` - -As you can see the function receives as input the SDS string `s`, but also returns an SDS string. This is common to many -SDS functions that modify the string: this way the returned SDS string may be the original one modified or a newly -allocated one (for example if there was not enough room in the old SDS string). - -The `sdscpylen` will simply replace what was in the old SDS string with the new data you pass using the pointer and -length argument. There is a similar function called `sdscpy` that does not need a length but expects a null terminated -string instead. - -You may wonder why it makes sense to have a string copy function in the SDS library, since you can simply create a new -SDS string from scratch with the new value instead of copying the value in an existing SDS string. The reason is -efficiency: `sdsnewlen` will always allocate a new string while `sdscpylen` will try to reuse the existing string if -there is enough room to old the new content specified by the user, and will allocate a new one only if needed. - -Quoting strings ---- - -In order to provide consistent output to the program user, or for debugging purposes, it is often important to turn a -string that may contain binary data or special characters into a quoted string. Here for quoted string we mean the -common format for String literals in programming source code. However today this format is also part of the well known -serialization formats like JSON and CSV, so it definitely escaped the simple gaol of representing literals strings in -the source code of programs. - -An example of quoted string literal is the following: - -```c -"\x00Hello World\n" -``` - -The first byte is a zero byte while the last byte is a newline, so there are two non alphanumerical characters inside -the string. - -SDS uses a concatenation function for this goal, that concatenates to an existing string the quoted string -representation of the input string. - -```c -sds sdscatrepr(sds s, const char *p, size_t len); -``` - -The `scscatrepr` (where `repr` means *representation*) follows the usualy SDS string function rules accepting a char -pointer and a length, so you can use it with SDS strings, normal C strings by using strlen() as `len` argument, or -binary data. The following is an example usage: - -```c -sds s1 = sdsnew("abcd"); -sds s2 = sdsempty(); -s[1] = 1; -s[2] = 2; -s[3] = '\n'; -s2 = sdscatrepr(s2,s1,sdslen(s1)); -printf("%s\n", s2); - -output> "a\x01\x02\n" -``` - -This is the rules `sdscatrepr` uses for conversion: - -* `\` and `"` are quoted with a backslash. -* It quotes special characters `'\n'`, `'\r'`, `'\t'`, `'\a'` and `'\b'`. -* All the other non printable characters not passing the `isprint` test are quoted in `\x..` form, that is: backslash - followed by `x` followed by two digit hex number representing the character byte value. -* The function always adds initial and final double quotes characters. - -There is an SDS function that is able to perform the reverse conversion and is documented in the *Tokenization* -paragraph below. - -Tokenization ---- - -Tokenization is the process of splitting a larger string into smaller strings. In this specific case, the split is -performed specifying another string that acts as separator. For example in the following string there are two substrings -that are separated by the `|-|` separator: - -``` -foo|-|bar|-|zap -``` - -A more common separator that consists of a single character is the comma: - -``` -foo,bar,zap -``` - -In many progrems it is useful to process a line in order to obtain the sub strings it is composed of, so SDS provides a -function that returns an array of SDS strings given a string and a separator. - -```c -sds *sdssplitlen(const char *s, int len, const char *sep, int seplen, int *count); -void sdsfreesplitres(sds *tokens, int count); -``` - -As usually the function can work with both SDS strings or normal C strings. The first two arguments `s` and `len` -specify the string to tokenize, and the other two arguments `sep` and `seplen` the separator to use during the -tokenization. The final argument `count` is a pointer to an integer that will be set to the number of tokens (sub -strings) returned. - -The return value is a heap allocated array of SDS strings. - -```c -sds *tokens; -int count, j; - -sds line = sdsnew("Hello World!"); -tokens = sdssplitlen(line,sdslen(line)," ",1,&count); - -for (j = 0; j < count; j++) - printf("%s\n", tokens[j]); -sdsfreesplitres(tokens,count); - -output> Hello -output> World! -``` - -The returned array is heap allocated, and the single elements of the array are normal SDS strings. You can free -everything calling `sdsfreesplitres` -as in the example. Alternativey you are free to release the array yourself using the `free` function and use and/or free -the individual SDS strings as usually. - -A valid approach is to set the array elements you reused in some way to -`NULL`, and use `sdsfreesplitres` to free all the rest. - -Command line oriented tokenization ---- - -Splitting by a separator is a useful operation, but usually it is not enough to perform one of the most common tasks -involving some non trivial string manipulation, that is, implementing a **Command Line Interface** for a program. - -This is why SDS also provides an additional function that allows you to split arguments provided by the user via the -keyboard in an interactive manner, or via a file, network, or any other mean, into tokens. - -```c -sds *sdssplitargs(const char *line, int *argc); -``` - -The `sdssplitargs` function returns an array of SDS strings exactly like -`sdssplitlen`. The function to free the result is also identical, and is -`sdsfreesplitres`. The difference is in the way the tokenization is performed. - -For example if the input is the following line: - -``` -call "Sabrina" and "Mark Smith\n" -``` - -The function will return the following tokens: - -* "call" -* "Sabrina" -* "and" -* "Mark Smith\n" - -Basically different tokens need to be separated by one or more spaces, and every single token can also be a quoted -string in the same format that -`sdscatrepr` is able to emit. - -String joining ---- - -There are two functions doing the reverse of tokenization by joining strings into a single one. - -```c -sds sdsjoin(char **argv, int argc, char *sep, size_t seplen); -sds sdsjoinsds(sds *argv, int argc, const char *sep, size_t seplen); -``` - -The two functions take as input an array of strings of length `argc` and a separator and its length, and produce as -output an SDS string consisting of all the specified strings separated by the specified separator. - -The difference between `sdsjoin` and `sdsjoinsds` is that the former accept C null terminated strings as input while the -latter requires all the strings in the array to be SDS strings. However because of this only `sdsjoinsds` is able to -deal with binary data. - -```c -char *tokens[3] = {"foo","bar","zap"}; -sds s = sdsjoin(tokens,3,"|",1); -printf("%s\n", s); - -output> foo|bar|zap -``` - -Error handling ---- - -All the SDS functions that return an SDS pointer may also return `NULL` on out of memory, this is basically the only -check you need to perform. - -However many modern C programs handle out of memory simply aborting the program so you may want to do this as well by -wrapping `malloc` and other related memory allocation calls directly. - -SDS internals and advanced usage -=== - -At the very beginning of this documentation it was explained how SDS strings are allocated, however the prefix stored -before the pointer returned to the user was classified as an *header* without further details. For an advanced usage it -is better to dig more into the internals of SDS and show the structure implementing it: - -```c -struct sdshdr { - int len; - int free; - char buf[]; -}; -``` - -As you can see, the structure may resemble the one of a conventional string library, however the `buf` field of the -structure is different since it is not a pointer but an array without any length declared, so `buf` actually points at -the first byte just after the `free` integer. So in order to create an SDS string we just allocate a piece of memory -that is as large as the -`sdshdr` structure plus the length of our string, plus an additional byte for the mandatory null term that every SDS -string has. - -The `len` field of the structure is quite obvious, and is the current length of the SDS string, always computed every -time the string is modified via SDS function calls. The `free` field instead represents the amount of free memory in the -current allocation that can be used to store more characters. - -So the actual SDS layout is this one: - - +------------+------------------------+-----------+---------------\ - | Len | Free | H E L L O W O R L D \n | Null term | Free space \ - +------------+------------------------+-----------+---------------\ - | - `-> Pointer returned to the user. - -You may wonder why there is some free space at the end of the string, it looks like a waste. Actually after a new SDS -string is created, there is no free space at the end at all: the allocation will be as small as possible to just hold -the header, string, and null term. However other access patterns will create extra free space at the end, like in the -following program: - -```c -s = sdsempty(); -s = sdscat(s,"foo"); -s = sdscat(s,"bar"); -s = sdscat(s,"123"); -``` - -Since SDS tries to be efficient it can't afford to reallocate the string every time new data is appended, since this -would be very inefficient, so it uses the **preallocation of some free space** every time you enlarge the string. - -The preallocation algorithm used is the following: every time the string is reallocated in order to hold more bytes, the -actual allocation size performed is two times the minimum required. So for instance if the string currently is holding -30 bytes, and we concatenate 2 more bytes, instead of allocating 32 bytes in total SDS will allocate 64 bytes. - -However there is an hard limit to the allocation it can perform ahead, and is defined by `SDS_MAX_PREALLOC`. SDS will -never allocate more than 1MB of additional space (by default, you can change this default). - -Shrinking strings ---- - -```c -sds sdsRemoveFreeSpace(sds s); -size_t sdsAllocSize(sds s); -``` - -Sometimes there are class of programs that require to use very little memory. After strings concatenations, trimming, -ranges, the string may end having a non trivial amount of additional space at the end. - -It is possible to resize a string back to its minimal size in order to hold the current content by using the -function `sdsRemoveFreeSpace`. - -```c -s = sdsRemoveFreeSpace(s); -``` - -There is also a function that can be used in order to get the size of the total allocation for a given string, and is -called `sdsAllocSize`. - -```c -sds s = sdsnew("Ladies and gentlemen"); -s = sdscat(s,"... welcome to the C language."); -printf("%d\n", (int) sdsAllocSize(s)); -s = sdsRemoveFreeSpace(s); -printf("%d\n", (int) sdsAllocSize(s)); - -output> 109 -output> 59 -``` - -NOTE: SDS Low level API use cammelCase in order to warn you that you are playing with the fire. - -Manual modifications of SDS strings ---- - - void sdsupdatelen(sds s); - -Sometimes you may want to hack with an SDS string manually, without using SDS functions. In the following example we -implicitly change the length of the string, however we want the logical length to reflect the null terminated C string. - -The function `sdsupdatelen` does just that, updating the internal length information for the specified string to the -length obtained via `strlen`. - -```c -sds s = sdsnew("foobar"); -s[2] = '\0'; -printf("%d\n", sdslen(s)); -sdsupdatelen(s); -printf("%d\n", sdslen(s)); - -output> 6 -output> 2 -``` - -Sharing SDS strings ---- - -If you are writing a program in which it is advantageous to share the same SDS string across different data structures, -it is absolutely advised to encapsulate SDS strings into structures that remember the number of references of the -string, with functions to increment and decrement the number of references. - -This approach is a memory management technique called *reference counting* and in the context of SDS has two advantages: - -* It is less likely that you'll create memory leaks or bugs due to non freeing SDS strings or freeing already freed - strings. -* You'll not need to update every reference to an SDS string when you modify it (since the new SDS string may point to a - different memory location). - -While this is definitely a very common programming technique I'll outline the basic ideas here. You create a structure -like that: - -```c -struct mySharedStrings { - int refcount; - sds string; -} -``` - -When new strings are created, the structure is allocated and returned with -`refcount` set to 1. The you have two functions to change the reference count of the shared string: - -* `incrementStringRefCount` will simply increment `refcount` of 1 in the structure. It will be called every time you add - a reference to the string on some new data structure, variable, or whatever. -* `decrementStringRefCount` is used when you remove a reference. This function is however special since when - the `refcount` drops to zero, it automatically frees the SDS string, and the `mySharedString` structure as well. - -Interactions with heap checkers ---- - -Because SDS returns pointers into the middle of memory chunks allocated with -`malloc`, heap checkers may have issues, however: - -* The popular Valgrind program will detect SDS strings are *possibly lost* memory and never as *definitely lost*, so it - is easy to tell if there is a leak or not. I used Valgrind with Redis for years and every real leak was consistently - detected as "definitely lost". -* OSX instrumentation tools don't detect SDS strings as leaks but are able to correctly handle pointers pointing to the - middle of memory chunks. - -Zero copy append from syscalls ----- - -At this point you should have all the tools to dig more inside the SDS library by reading the source code, however there -is an interesting pattern you can mount using the low level API exported, that is used inside Redis in order to improve -performances of the networking code. - -Using `sdsIncrLen()` and `sdsMakeRoomFor()` it is possible to mount the following schema, to cat bytes coming from the -kernel to the end of an sds string without copying into an intermediate buffer: - -```c -oldlen = sdslen(s); -s = sdsMakeRoomFor(s, BUFFER_SIZE); -nread = read(fd, s+oldlen, BUFFER_SIZE); -... check for nread <= 0 and handle it ... -sdsIncrLen(s, nread); -``` - -`sdsIncrLen` is documented inside the source code of `sds.c`. - -Embedding SDS into your project -=== - -This is as simple as copying the `sds.c` and `sds.h` files inside your project. The source code is small and every C99 -compiler should deal with it without issues. - -Credits and license -=== - -SDS was created by Salvatore Sanfilippo and is released under the BDS two clause license. See the LICENSE file in this -source distribution for more information. diff --git a/srcs/libs/docs/sds.pdf b/srcs/libs/docs/sds.pdf new file mode 100644 index 0000000..8dba0c9 Binary files /dev/null and b/srcs/libs/docs/sds.pdf differ diff --git a/srcs/libs/docs/userguide.txt b/srcs/libs/docs/userguide.txt deleted file mode 100644 index 693cff1..0000000 --- a/srcs/libs/docs/userguide.txt +++ /dev/null @@ -1,1903 +0,0 @@ -uthash User Guide -================= -Troy D. Hanson, Arthur O'Dwyer -v2.3.0, February 2021 - -To download uthash, follow this link back to the -https://github.com/troydhanson/uthash[GitHub project page]. -Back to my http://troydhanson.github.io/[other projects]. - -A hash in C ------------ -This document is written for C programmers. Since you're reading this, chances -are that you know a hash is used for looking up items using a key. In scripting -languages, hashes or "dictionaries" are used all the time. In C, hashes don't -exist in the language itself. This software provides a hash table for C -structures. - -What can it do? -~~~~~~~~~~~~~~~~~ -This software supports these operations on items in a hash table: - -1. add/replace -2. find -3. delete -4. count -5. iterate -6. sort - -Is it fast? -~~~~~~~~~~~ -Add, find and delete are normally constant-time operations. This is influenced -by your key domain and the hash function. - -This hash aims to be minimalistic and efficient. It's around 1000 lines of C. -It inlines automatically because it's implemented as macros. It's fast as long -as the hash function is suited to your keys. You can use the default hash -function, or easily compare performance and choose from among several other -<>. - -Is it a library? -~~~~~~~~~~~~~~~~ -No, it's just a single header file: `uthash.h`. All you need to do is copy -the header file into your project, and: - - #include "uthash.h" - -Since uthash is a header file only, there is no library code to link against. - -C/C++ and platforms -~~~~~~~~~~~~~~~~~~~ -This software can be used in C and C++ programs. It has been tested on: - - * Linux - * Windows using Visual Studio 2008 and 2010 - * Solaris - * OpenBSD - * FreeBSD - * Android - -Test suite -^^^^^^^^^^ -To run the test suite, enter the `tests` directory. Then, - - * on Unix platforms, run `make` - * on Windows, run the "do_tests_win32.cmd" batch file. (You may edit the - batch file if your Visual Studio is installed in a non-standard location). - -BSD licensed -~~~~~~~~~~~~ -This software is made available under the -link:license.html[revised BSD license]. -It is free and open source. - -Download uthash -~~~~~~~~~~~~~~~ -Follow the links on https://github.com/troydhanson/uthash to clone uthash or get a zip file. - -Getting help -~~~~~~~~~~~~ -Please use the https://groups.google.com/d/forum/uthash[uthash Google Group] to -ask questions. You can email it at uthash@googlegroups.com. - -Contributing -~~~~~~~~~~~~ -You may submit pull requests through GitHub. However, the maintainers of uthash -value keeping it unchanged, rather than adding bells and whistles. - -Extras included -~~~~~~~~~~~~~~~ -Three "extras" come with uthash. These provide lists, dynamic arrays and -strings: - - * link:utlist.html[utlist.h] provides linked list macros for C structures. - * link:utarray.html[utarray.h] implements dynamic arrays using macros. - * link:utstring.html[utstring.h] implements a basic dynamic string. - -History -~~~~~~~ -I wrote uthash in 2004-2006 for my own purposes. Originally it was hosted on -SourceForge. Uthash was downloaded around 30,000 times between 2006-2013 then -transitioned to GitHub. It's been incorporated into commercial software, -academic research, and into other open-source software. It has also been added -to the native package repositories for a number of Unix-y distros. - -When uthash was written, there were fewer options for doing generic hash tables -in C than exist today. There are faster hash tables, more memory-efficient hash -tables, with very different API's today. But, like driving a minivan, uthash is -convenient, and gets the job done for many purposes. - -As of July 2016, uthash is maintained by Arthur O'Dwyer. - -Your structure --------------- - -In uthash, a hash table is comprised of structures. Each structure represents a -key-value association. One or more of the structure fields constitute the key. -The structure pointer itself is the value. - -.Defining a structure that can be hashed ----------------------------------------------------------------------- -#include "uthash.h" - -struct my_struct { - int id; /* key */ - char name[10]; - UT_hash_handle hh; /* makes this structure hashable */ -}; ----------------------------------------------------------------------- - -Note that, in uthash, your structure will never be moved or copied into another -location when you add it into a hash table. This means that you can keep other -data structures that safely point to your structure-- regardless of whether you -add or delete it from a hash table during your program's lifetime. - -The key -~~~~~~~ -There are no restrictions on the data type or name of the key field. The key -can also comprise multiple contiguous fields, having any names and data types. - -.Any data type... really? -***************************************************************************** -Yes, your key and structure can have any data type. Unlike function calls with -fixed prototypes, uthash consists of macros-- whose arguments are untyped-- and -thus able to work with any type of structure or key. -***************************************************************************** - -Unique keys -^^^^^^^^^^^ -As with any hash, every item must have a unique key. Your application must -enforce key uniqueness. Before you add an item to the hash table, you must -first know (if in doubt, check!) that the key is not already in use. You -can check whether a key already exists in the hash table using `HASH_FIND`. - -The hash handle -~~~~~~~~~~~~~~~ -The `UT_hash_handle` field must be present in your structure. It is used for -the internal bookkeeping that makes the hash work. It does not require -initialization. It can be named anything, but you can simplify matters by -naming it `hh`. This allows you to use the easier "convenience" macros to add, -find and delete items. - -A word about memory -~~~~~~~~~~~~~~~~~~~ - -Overhead -^^^^^^^^ -The hash handle consumes about 32 bytes per item on a 32-bit system, or 56 bytes -per item on a 64-bit system. The other overhead costs-- the buckets and the -table-- are negligible in comparison. You can use `HASH_OVERHEAD` to get the -overhead size, in bytes, for a hash table. See <>. - -How clean up occurs -^^^^^^^^^^^^^^^^^^^ -Some have asked how uthash cleans up its internal memory. The answer is simple: -'when you delete the final item' from a hash table, uthash releases all the -internal memory associated with that hash table, and sets its pointer to NULL. - - -Hash operations ---------------- - -This section introduces the uthash macros by example. For a more succinct -listing, see <>. - -.Convenience vs. general macros: -***************************************************************************** -The uthash macros fall into two categories. The 'convenience' macros can be used -with integer, pointer or string keys (and require that you chose the conventional -name `hh` for the `UT_hash_handle` field). The convenience macros take fewer -arguments than the general macros, making their usage a bit simpler for these -common types of keys. - -The 'general' macros can be used for any types of keys, or for multi-field keys, -or when the `UT_hash_handle` has been named something other than `hh`. These -macros take more arguments and offer greater flexibility in return. But if the -convenience macros suit your needs, use them-- your code will be more readable. -***************************************************************************** - -Declare the hash -~~~~~~~~~~~~~~~~ -Your hash must be declared as a `NULL`-initialized pointer to your structure. - - struct my_struct *users = NULL; /* important! initialize to NULL */ - -Add item -~~~~~~~~ -Allocate and initialize your structure as you see fit. The only aspect -of this that matters to uthash is that your key must be initialized to -a unique value. Then call `HASH_ADD`. (Here we use the convenience macro -`HASH_ADD_INT`, which offers simplified usage for keys of type `int`). - -.Add an item to a hash ----------------------------------------------------------------------- -void add_user(int user_id, char *name) { - struct my_struct *s; - - s = malloc(sizeof(struct my_struct)); - s->id = user_id; - strcpy(s->name, name); - HASH_ADD_INT(users, id, s); /* id: name of key field */ -} ----------------------------------------------------------------------- - -The first parameter to `HASH_ADD_INT` is the hash table, and the -second parameter is the 'name' of the key field. Here, this is `id`. The -last parameter is a pointer to the structure being added. - -[[validc]] -.Wait.. the field name is a parameter? -******************************************************************************* -If you find it strange that `id`, which is the 'name of a field' in the -structure, can be passed as a parameter... welcome to the world of macros. Don't -worry; the C preprocessor expands this to valid C code. -******************************************************************************* - -Key must not be modified while in-use -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Once a structure has been added to the hash, do not change the value of its key. -Instead, delete the item from the hash, change the key, and then re-add it. - -Checking uniqueness -^^^^^^^^^^^^^^^^^^^ -In the example above, we didn't check to see if `user_id` was already a key -of some existing item in the hash. *If there's any chance that duplicate keys -could be generated by your program, you must explicitly check the uniqueness* -before adding the key to the hash. If the key is already in the hash, you can -simply modify the existing structure in the hash rather than adding the item. -'It is an error to add two items with the same key to the hash table'. - -Let's rewrite the `add_user` function to check whether the id is in the hash. -Only if the id is not present in the hash, do we create the item and add it. -Otherwise we just modify the structure that already exists. - - void add_user(int user_id, char *name) { - struct my_struct *s; - - HASH_FIND_INT(users, &user_id, s); /* id already in the hash? */ - if (s == NULL) { - s = (struct my_struct *)malloc(sizeof *s); - s->id = user_id; - HASH_ADD_INT(users, id, s); /* id: name of key field */ - } - strcpy(s->name, name); - } - - -Why doesn't uthash check key uniqueness for you? It saves the cost of a hash -lookup for those programs which don't need it- for example, programs whose keys -are generated by an incrementing, non-repeating counter. - -However, if replacement is a common operation, it is possible to use the -`HASH_REPLACE` macro. This macro, before adding the item, will try to find an -item with the same key and delete it first. It also returns a pointer to the -replaced item, so the user has a chance to de-allocate its memory. - -Passing the hash pointer into functions -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -In the example above `users` is a global variable, but what if the caller wanted -to pass the hash pointer 'into' the `add_user` function? At first glance it would -appear that you could simply pass `users` as an argument, but that won't work -right. - - /* bad */ - void add_user(struct my_struct *users, int user_id, char *name) { - ... - HASH_ADD_INT(users, id, s); - } - -You really need to pass 'a pointer' to the hash pointer: - - /* good */ - void add_user(struct my_struct **users, int user_id, char *name) { ... - ... - HASH_ADD_INT(*users, id, s); - } - -Note that we dereferenced the pointer in the `HASH_ADD` also. - -The reason it's necessary to deal with a pointer to the hash pointer is simple: -the hash macros modify it (in other words, they modify the 'pointer itself' not -just what it points to). - - -Replace item -~~~~~~~~~~~~ -`HASH_REPLACE` macros are equivalent to HASH_ADD macros except they attempt -to find and delete the item first. If it finds and deletes an item, it will -also return that items pointer as an output parameter. - - -Find item -~~~~~~~~~ -To look up a structure in a hash, you need its key. Then call `HASH_FIND`. -(Here we use the convenience macro `HASH_FIND_INT` for keys of type `int`). - -.Find a structure using its key ----------------------------------------------------------------------- -struct my_struct *find_user(int user_id) { - struct my_struct *s; - - HASH_FIND_INT(users, &user_id, s); /* s: output pointer */ - return s; -} ----------------------------------------------------------------------- - -Here, the hash table is `users`, and `&user_id` points to the key (an integer -in this case). Last, `s` is the 'output' variable of `HASH_FIND_INT`. The -final result is that `s` points to the structure with the given key, or -is `NULL` if the key wasn't found in the hash. - -[NOTE] -The middle argument is a 'pointer' to the key. You can't pass a literal key -value to `HASH_FIND`. Instead assign the literal value to a variable, and pass -a pointer to the variable. - - -Delete item -~~~~~~~~~~~ -To delete a structure from a hash, you must have a pointer to it. (If you only -have the key, first do a `HASH_FIND` to get the structure pointer). - -.Delete an item from a hash ----------------------------------------------------------------------- -void delete_user(struct my_struct *user) { - HASH_DEL(users, user); /* user: pointer to deletee */ - free(user); /* optional; it's up to you! */ -} ----------------------------------------------------------------------- - -Here again, `users` is the hash table, and `user` is a pointer to the -structure we want to remove from the hash. - -uthash never frees your structure -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Deleting a structure just removes it from the hash table-- it doesn't `free` -it. The choice of when to free your structure is entirely up to you; uthash -will never free your structure. For example when using `HASH_REPLACE` macros, -a replaced output argument is returned back, in order to make it possible for -the user to de-allocate it. - -Delete can change the pointer -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The hash table pointer (which initially points to the first item added to the -hash) can change in response to `HASH_DEL` (i.e. if you delete the first item -in the hash table). - -Iterative deletion -^^^^^^^^^^^^^^^^^^ -The `HASH_ITER` macro is a deletion-safe iteration construct which expands -to a simple 'for' loop. - -.Delete all items from a hash ----------------------------------------------------------------------- -void delete_all() { - struct my_struct *current_user, *tmp; - - HASH_ITER(hh, users, current_user, tmp) { - HASH_DEL(users, current_user); /* delete; users advances to next */ - free(current_user); /* optional- if you want to free */ - } -} ----------------------------------------------------------------------- - -All-at-once deletion -^^^^^^^^^^^^^^^^^^^^ -If you only want to delete all the items, but not free them or do any -per-element clean up, you can do this more efficiently in a single operation: - - HASH_CLEAR(hh, users); - -Afterward, the list head (here, `users`) will be set to `NULL`. - -Count items -~~~~~~~~~~~ - -The number of items in the hash table can be obtained using `HASH_COUNT`: - -.Count of items in the hash table ----------------------------------------------------------------------- -unsigned int num_users; -num_users = HASH_COUNT(users); -printf("there are %u users\n", num_users); ----------------------------------------------------------------------- - -Incidentally, this works even if the list head (here, `users`) is `NULL`, in -which case the count is 0. - -Iterating and sorting -~~~~~~~~~~~~~~~~~~~~~ - -You can loop over the items in the hash by starting from the beginning and -following the `hh.next` pointer. - -.Iterating over all the items in a hash ----------------------------------------------------------------------- -void print_users() { - struct my_struct *s; - - for (s = users; s != NULL; s = s->hh.next) { - printf("user id %d: name %s\n", s->id, s->name); - } -} ----------------------------------------------------------------------- - -There is also an `hh.prev` pointer you could use to iterate backwards through -the hash, starting from any known item. - -[[deletesafe]] -Deletion-safe iteration -^^^^^^^^^^^^^^^^^^^^^^^ -In the example above, it would not be safe to delete and free `s` in the body -of the 'for' loop, (because `s` is dereferenced each time the loop iterates). -This is easy to rewrite correctly (by copying the `s->hh.next` pointer to a -temporary variable 'before' freeing `s`), but it comes up often enough that a -deletion-safe iteration macro, `HASH_ITER`, is included. It expands to a -`for`-loop header. Here is how it could be used to rewrite the last example: - - struct my_struct *s, *tmp; - - HASH_ITER(hh, users, s, tmp) { - printf("user id %d: name %s\n", s->id, s->name); - /* ... it is safe to delete and free s here */ - } - -.A hash is also a doubly-linked list. -******************************************************************************* -Iterating backward and forward through the items in the hash is possible -because of the `hh.prev` and `hh.next` fields. All the items in the hash can -be reached by repeatedly following these pointers, thus the hash is also a -doubly-linked list. -******************************************************************************* - -If you're using uthash in a C++ program, you need an extra cast on the `for` -iterator, e.g., `s = static_cast(s->hh.next)`. - -Sorting -^^^^^^^ -The items in the hash are visited in "insertion order" when you follow the -`hh.next` pointer. You can sort the items into a new order using `HASH_SORT`. - - HASH_SORT(users, name_sort); - -The second argument is a pointer to a comparison function. It must accept two -pointer arguments (the items to compare), and must return an `int` which is -less than zero, zero, or greater than zero, if the first item sorts before, -equal to, or after the second item, respectively. (This is the same convention -used by `strcmp` or `qsort` in the standard C library). - - int sort_function(void *a, void *b) { - /* compare a to b (cast a and b appropriately) - * return (int) -1 if (a < b) - * return (int) 0 if (a == b) - * return (int) 1 if (a > b) - */ - } - -Below, `name_sort` and `id_sort` are two examples of sort functions. - -.Sorting the items in the hash ----------------------------------------------------------------------- -int name_sort(struct my_struct *a, struct my_struct *b) { - return strcmp(a->name, b->name); -} - -int id_sort(struct my_struct *a, struct my_struct *b) { - return (a->id - b->id); -} - -void sort_by_name() { - HASH_SORT(users, name_sort); -} - -void sort_by_id() { - HASH_SORT(users, id_sort); -} ----------------------------------------------------------------------- - -When the items in the hash are sorted, the first item may change position. In -the example above, `users` may point to a different structure after calling -`HASH_SORT`. - -A complete example -~~~~~~~~~~~~~~~~~~ - -We'll repeat all the code and embellish it with a `main()` function to form a -working example. - -If this code was placed in a file called `example.c` in the same directory as -`uthash.h`, it could be compiled and run like this: - - cc -o example example.c - ./example - -Follow the prompts to try the program. - -.A complete program ----------------------------------------------------------------------- -#include /* gets */ -#include /* atoi, malloc */ -#include /* strcpy */ -#include "uthash.h" - -struct my_struct { - int id; /* key */ - char name[10]; - UT_hash_handle hh; /* makes this structure hashable */ -}; - -struct my_struct *users = NULL; - -void add_user(int user_id, char *name) { - struct my_struct *s; - - HASH_FIND_INT(users, &user_id, s); /* id already in the hash? */ - if (s == NULL) { - s = (struct my_struct *)malloc(sizeof *s); - s->id = user_id; - HASH_ADD_INT(users, id, s); /* id: name of key field */ - } - strcpy(s->name, name); -} - -struct my_struct *find_user(int user_id) { - struct my_struct *s; - - HASH_FIND_INT(users, &user_id, s); /* s: output pointer */ - return s; -} - -void delete_user(struct my_struct *user) { - HASH_DEL(users, user); /* user: pointer to deletee */ - free(user); -} - -void delete_all() { - struct my_struct *current_user, *tmp; - - HASH_ITER(hh, users, current_user, tmp) { - HASH_DEL(users, current_user); /* delete it (users advances to next) */ - free(current_user); /* free it */ - } -} - -void print_users() { - struct my_struct *s; - - for (s = users; s != NULL; s = (struct my_struct*)(s->hh.next)) { - printf("user id %d: name %s\n", s->id, s->name); - } -} - -int name_sort(struct my_struct *a, struct my_struct *b) { - return strcmp(a->name, b->name); -} - -int id_sort(struct my_struct *a, struct my_struct *b) { - return (a->id - b->id); -} - -void sort_by_name() { - HASH_SORT(users, name_sort); -} - -void sort_by_id() { - HASH_SORT(users, id_sort); -} - -int main(int argc, char *argv[]) { - char in[10]; - int id = 1, running = 1; - struct my_struct *s; - unsigned num_users; - - while (running) { - printf(" 1. add user\n"); - printf(" 2. add/rename user by id\n"); - printf(" 3. find user\n"); - printf(" 4. delete user\n"); - printf(" 5. delete all users\n"); - printf(" 6. sort items by name\n"); - printf(" 7. sort items by id\n"); - printf(" 8. print users\n"); - printf(" 9. count users\n"); - printf("10. quit\n"); - gets(in); - switch(atoi(in)) { - case 1: - printf("name?\n"); - add_user(id++, gets(in)); - break; - case 2: - printf("id?\n"); - gets(in); id = atoi(in); - printf("name?\n"); - add_user(id, gets(in)); - break; - case 3: - printf("id?\n"); - s = find_user(atoi(gets(in))); - printf("user: %s\n", s ? s->name : "unknown"); - break; - case 4: - printf("id?\n"); - s = find_user(atoi(gets(in))); - if (s) delete_user(s); - else printf("id unknown\n"); - break; - case 5: - delete_all(); - break; - case 6: - sort_by_name(); - break; - case 7: - sort_by_id(); - break; - case 8: - print_users(); - break; - case 9: - num_users = HASH_COUNT(users); - printf("there are %u users\n", num_users); - break; - case 10: - running = 0; - break; - } - } - - delete_all(); /* free any structures */ - return 0; -} ----------------------------------------------------------------------- - -This program is included in the distribution in `tests/example.c`. You can run -`make example` in that directory to compile it easily. - -Standard key types ------------------- -This section goes into specifics of how to work with different kinds of keys. -You can use nearly any type of key-- integers, strings, pointers, structures, etc. - -[NOTE] -.A note about float -================================================================================ -You can use floating point keys. This comes with the same caveats as with any -program that tests floating point equality. In other words, even the tiniest -difference in two floating point numbers makes them distinct keys. -================================================================================ - -Integer keys -~~~~~~~~~~~~ -The preceding examples demonstrated use of integer keys. To recap, use the -convenience macros `HASH_ADD_INT` and `HASH_FIND_INT` for structures with -integer keys. (The other operations such as `HASH_DELETE` and `HASH_SORT` are -the same for all types of keys). - -String keys -~~~~~~~~~~~ -If your structure has a string key, the operations to use depend on whether your -structure 'points to' the key (`char *`) or the string resides `within` the -structure (`char a[10]`). *This distinction is important*. As we'll see below, -you need to use `HASH_ADD_KEYPTR` when your structure 'points' to a key (that is, -the key itself is 'outside' of the structure); in contrast, use `HASH_ADD_STR` -for a string key that is contained *within* your structure. - -[NOTE] -.char[ ] vs. char* -================================================================================ -The string is 'within' the structure in the first example below-- `name` is a -`char[10]` field. In the second example, the key is 'outside' of the -structure-- `name` is a `char *`. So the first example uses `HASH_ADD_STR` but -the second example uses `HASH_ADD_KEYPTR`. For information on this macro, see -the <>. -================================================================================ - -String 'within' structure -^^^^^^^^^^^^^^^^^^^^^^^^^ - -.A string-keyed hash (string within structure) ----------------------------------------------------------------------- -#include /* strcpy */ -#include /* malloc */ -#include /* printf */ -#include "uthash.h" - -struct my_struct { - char name[10]; /* key (string is WITHIN the structure) */ - int id; - UT_hash_handle hh; /* makes this structure hashable */ -}; - - -int main(int argc, char *argv[]) { - const char *names[] = { "joe", "bob", "betty", NULL }; - struct my_struct *s, *tmp, *users = NULL; - - for (int i = 0; names[i]; ++i) { - s = (struct my_struct *)malloc(sizeof *s); - strcpy(s->name, names[i]); - s->id = i; - HASH_ADD_STR(users, name, s); - } - - HASH_FIND_STR(users, "betty", s); - if (s) printf("betty's id is %d\n", s->id); - - /* free the hash table contents */ - HASH_ITER(hh, users, s, tmp) { - HASH_DEL(users, s); - free(s); - } - return 0; -} ----------------------------------------------------------------------- - -This example is included in the distribution in `tests/test15.c`. It prints: - - betty's id is 2 - -String 'pointer' in structure -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Now, here is the same example but using a `char *` key instead of `char [ ]`: - -.A string-keyed hash (structure points to string) ----------------------------------------------------------------------- -#include /* strcpy */ -#include /* malloc */ -#include /* printf */ -#include "uthash.h" - -struct my_struct { - const char *name; /* key */ - int id; - UT_hash_handle hh; /* makes this structure hashable */ -}; - - -int main(int argc, char *argv[]) { - const char *names[] = { "joe", "bob", "betty", NULL }; - struct my_struct *s, *tmp, *users = NULL; - - for (int i = 0; names[i]; ++i) { - s = (struct my_struct *)malloc(sizeof *s); - s->name = names[i]; - s->id = i; - HASH_ADD_KEYPTR(hh, users, s->name, strlen(s->name), s); - } - - HASH_FIND_STR(users, "betty", s); - if (s) printf("betty's id is %d\n", s->id); - - /* free the hash table contents */ - HASH_ITER(hh, users, s, tmp) { - HASH_DEL(users, s); - free(s); - } - return 0; -} ----------------------------------------------------------------------- - -This example is included in `tests/test40.c`. - -Pointer keys -~~~~~~~~~~~~ -Your key can be a pointer. To be very clear, this means the 'pointer itself' -can be the key (in contrast, if the thing 'pointed to' is the key, this is a -different use case handled by `HASH_ADD_KEYPTR`). - -Here is a simple example where a structure has a pointer member, called `key`. - -.A pointer key ----------------------------------------------------------------------- -#include -#include -#include "uthash.h" - -typedef struct { - void *key; - int i; - UT_hash_handle hh; -} el_t; - -el_t *hash = NULL; -char *someaddr = NULL; - -int main() { - el_t *d; - el_t *e = (el_t *)malloc(sizeof *e); - if (!e) return -1; - e->key = (void*)someaddr; - e->i = 1; - HASH_ADD_PTR(hash, key, e); - HASH_FIND_PTR(hash, &someaddr, d); - if (d) printf("found\n"); - - /* release memory */ - HASH_DEL(hash, e); - free(e); - return 0; -} ----------------------------------------------------------------------- - -This example is included in `tests/test57.c`. Note that the end of the program -deletes the element out of the hash, (and since no more elements remain in the -hash), uthash releases its internal memory. - -Structure keys -~~~~~~~~~~~~~~ -Your key field can have any data type. To uthash, it is just a sequence of -bytes. Therefore, even a nested structure can be used as a key. We'll use the -general macros `HASH_ADD` and `HASH_FIND` to demonstrate. - -NOTE: Structures contain padding (wasted internal space used to fulfill -alignment requirements for the members of the structure). These padding bytes -'must be zeroed' before adding an item to the hash or looking up an item. -Therefore always zero the whole structure before setting the members of -interest. The example below does this-- see the two calls to `memset`. - -.A key which is a structure ----------------------------------------------------------------------- -#include -#include -#include "uthash.h" - -typedef struct { - char a; - int b; -} record_key_t; - -typedef struct { - record_key_t key; - /* ... other data ... */ - UT_hash_handle hh; -} record_t; - -int main(int argc, char *argv[]) { - record_t l, *p, *r, *tmp, *records = NULL; - - r = (record_t *)malloc(sizeof *r); - memset(r, 0, sizeof *r); - r->key.a = 'a'; - r->key.b = 1; - HASH_ADD(hh, records, key, sizeof(record_key_t), r); - - memset(&l, 0, sizeof(record_t)); - l.key.a = 'a'; - l.key.b = 1; - HASH_FIND(hh, records, &l.key, sizeof(record_key_t), p); - - if (p) printf("found %c %d\n", p->key.a, p->key.b); - - HASH_ITER(hh, records, p, tmp) { - HASH_DEL(records, p); - free(p); - } - return 0; -} - ----------------------------------------------------------------------- - -This usage is nearly the same as use of a compound key explained below. - -Note that the general macros require the name of the `UT_hash_handle` to be -passed as the first argument (here, this is `hh`). The general macros are -documented in <>. - -Advanced Topics ---------------- - -Compound keys -~~~~~~~~~~~~~ -Your key can even comprise multiple contiguous fields. - -.A multi-field key ----------------------------------------------------------------------- -#include /* malloc */ -#include /* offsetof */ -#include /* printf */ -#include /* memset */ -#include "uthash.h" - -#define UTF32 1 - -typedef struct { - UT_hash_handle hh; - int len; - char encoding; /* these two fields */ - int text[]; /* comprise the key */ -} msg_t; - -typedef struct { - char encoding; - int text[]; -} lookup_key_t; - -int main(int argc, char *argv[]) { - unsigned keylen; - msg_t *msg, *tmp, *msgs = NULL; - lookup_key_t *lookup_key; - - int beijing[] = {0x5317, 0x4eac}; /* UTF-32LE for 北京 */ - - /* allocate and initialize our structure */ - msg = (msg_t *)malloc(sizeof(msg_t) + sizeof(beijing)); - memset(msg, 0, sizeof(msg_t)+sizeof(beijing)); /* zero fill */ - msg->len = sizeof(beijing); - msg->encoding = UTF32; - memcpy(msg->text, beijing, sizeof(beijing)); - - /* calculate the key length including padding, using formula */ - keylen = offsetof(msg_t, text) /* offset of last key field */ - + sizeof(beijing) /* size of last key field */ - - offsetof(msg_t, encoding); /* offset of first key field */ - - /* add our structure to the hash table */ - HASH_ADD(hh, msgs, encoding, keylen, msg); - - /* look it up to prove that it worked :-) */ - msg = NULL; - - lookup_key = (lookup_key_t *)malloc(sizeof(*lookup_key) + sizeof(beijing)); - memset(lookup_key, 0, sizeof(*lookup_key) + sizeof(beijing)); - lookup_key->encoding = UTF32; - memcpy(lookup_key->text, beijing, sizeof(beijing)); - HASH_FIND(hh, msgs, &lookup_key->encoding, keylen, msg); - if (msg) printf("found \n"); - free(lookup_key); - - HASH_ITER(hh, msgs, msg, tmp) { - HASH_DEL(msgs, msg); - free(msg); - } - return 0; -} ----------------------------------------------------------------------- - -This example is included in the distribution in `tests/test22.c`. - -If you use multi-field keys, recognize that the compiler pads adjacent fields -(by inserting unused space between them) in order to fulfill the alignment -requirement of each field. For example a structure containing a `char` followed -by an `int` will normally have 3 "wasted" bytes of padding after the char, in -order to make the `int` field start on a multiple-of-4 address (4 is the length -of the int). - -[[multifield_note]] -.Calculating the length of a multi-field key: -******************************************************************************* -To determine the key length when using a multi-field key, you must include any -intervening structure padding the compiler adds for alignment purposes. - -An easy way to calculate the key length is to use the `offsetof` macro from -``. The formula is: - - key length = offsetof(last_key_field) - + sizeof(last_key_field) - - offsetof(first_key_field) - -In the example above, the `keylen` variable is set using this formula. -******************************************************************************* - -When dealing with a multi-field key, you must zero-fill your structure before -`HASH_ADD`'ing it to a hash table, or using its fields in a `HASH_FIND` key. - -In the previous example, `memset` is used to initialize the structure by -zero-filling it. This zeroes out any padding between the key fields. If we -didn't zero-fill the structure, this padding would contain random values. The -random values would lead to `HASH_FIND` failures; as two "identical" keys will -appear to mismatch if there are any differences within their padding. - -Alternatively, you can customize the global <> -and <> to ignore the padding in your key. -See <>. - -[[multilevel]] -Multi-level hash tables -~~~~~~~~~~~~~~~~~~~~~~~ -A multi-level hash table arises when each element of a hash table contains its -own secondary hash table. There can be any number of levels. In a scripting -language you might see: - - $items{bob}{age}=37 - -The C program below builds this example in uthash: the hash table is called -`items`. It contains one element (`bob`) whose own hash table contains one -element (`age`) with value 37. No special functions are necessary to build -a multi-level hash table. - -While this example represents both levels (`bob` and `age`) using the same -structure, it would also be fine to use two different structure definitions. -It would also be fine if there were three or more levels instead of two. - -.Multi-level hash table ----------------------------------------------------------------------- -#include -#include -#include -#include "uthash.h" - -/* hash of hashes */ -typedef struct item { - char name[10]; - struct item *sub; - int val; - UT_hash_handle hh; -} item_t; - -item_t *items = NULL; - -int main(int argc, char *argvp[]) { - item_t *item1, *item2, *tmp1, *tmp2; - - /* make initial element */ - item_t *i = malloc(sizeof(*i)); - strcpy(i->name, "bob"); - i->sub = NULL; - i->val = 0; - HASH_ADD_STR(items, name, i); - - /* add a sub hash table off this element */ - item_t *s = malloc(sizeof(*s)); - strcpy(s->name, "age"); - s->sub = NULL; - s->val = 37; - HASH_ADD_STR(i->sub, name, s); - - /* iterate over hash elements */ - HASH_ITER(hh, items, item1, tmp1) { - HASH_ITER(hh, item1->sub, item2, tmp2) { - printf("$items{%s}{%s} = %d\n", item1->name, item2->name, item2->val); - } - } - - /* clean up both hash tables */ - HASH_ITER(hh, items, item1, tmp1) { - HASH_ITER(hh, item1->sub, item2, tmp2) { - HASH_DEL(item1->sub, item2); - free(item2); - } - HASH_DEL(items, item1); - free(item1); - } - - return 0; -} ----------------------------------------------------------------------- -The example above is included in `tests/test59.c`. - -[[multihash]] -Items in several hash tables -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A structure can be added to more than one hash table. A few reasons you might do -this include: - -- each hash table may use a different key; -- each hash table may have its own sort order; -- or you might simply use multiple hash tables for grouping purposes. E.g., - you could have users in an `admin_users` and a `users` hash table. - -Your structure needs to have a `UT_hash_handle` field for each hash table to -which it might be added. You can name them anything. E.g., - - UT_hash_handle hh1, hh2; - -Items with multiple keys -~~~~~~~~~~~~~~~~~~~~~~~~ -You might create a hash table keyed on an ID field, and another hash table keyed -on username (if usernames are unique). You can add the same user structure to -both hash tables (without duplication of the structure), allowing lookup of a -user structure by their name or ID. The way to achieve this is to have a -separate `UT_hash_handle` for each hash to which the structure may be added. - -.A structure with two different keys ----------------------------------------------------------------------- -struct my_struct { - int id; /* first key */ - char username[10]; /* second key */ - UT_hash_handle hh1; /* handle for first hash table */ - UT_hash_handle hh2; /* handle for second hash table */ -}; ----------------------------------------------------------------------- - -In the example above, the structure can now be added to two separate hash -tables. In one hash, `id` is its key, while in the other hash, `username` is -its key. (There is no requirement that the two hashes have different key -fields. They could both use the same key, such as `id`). - -Notice the structure has two hash handles (`hh1` and `hh2`). In the code -below, notice that each hash handle is used exclusively with a particular hash -table. (`hh1` is always used with the `users_by_id` hash, while `hh2` is -always used with the `users_by_name` hash table). - -.Two keys on a structure ----------------------------------------------------------------------- - struct my_struct *users_by_id = NULL, *users_by_name = NULL, *s; - int i; - char *name; - - s = malloc(sizeof(struct my_struct)); - s->id = 1; - strcpy(s->username, "thanson"); - - /* add the structure to both hash tables */ - HASH_ADD(hh1, users_by_id, id, sizeof(int), s); - HASH_ADD(hh2, users_by_name, username, strlen(s->username), s); - - /* find user by ID in the "users_by_id" hash table */ - i = 1; - HASH_FIND(hh1, users_by_id, &i, sizeof(int), s); - if (s) printf("found id %d: %s\n", i, s->username); - - /* find user by username in the "users_by_name" hash table */ - name = "thanson"; - HASH_FIND(hh2, users_by_name, name, strlen(name), s); - if (s) printf("found user %s: %d\n", name, s->id); ----------------------------------------------------------------------- - - -Sorted insertion of new items -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -If you would like to maintain a sorted hash you have two options. The first -option is to use the HASH_SRT() macro, which will sort any unordered list in -'O(n log(n))'. This is the best strategy if you're just filling up a hash -table with items in random order with a single final HASH_SRT() operation -when all is done. Obviously, this won't do what you want if you need -the list to be in an ordered state at times between insertion of -items. You can use HASH_SRT() after every insertion operation, but that will -yield a computational complexity of 'O(n^2 log n)'. - -The second route you can take is via the in-order add and replace macros. -The `HASH_ADD_INORDER*` macros work just like their `HASH_ADD*` counterparts, but -with an additional comparison-function argument: - - int name_sort(struct my_struct *a, struct my_struct *b) { - return strcmp(a->name, b->name); - } - - HASH_ADD_KEYPTR_INORDER(hh, items, &item->name, strlen(item->name), item, name_sort); - -New items are sorted at insertion time in 'O(n)', thus resulting in a -total computational complexity of 'O(n^2)' for the creation of the hash -table with all items. -For in-order add to work, the list must be in an ordered state before -insertion of the new item. - -Several sort orders -~~~~~~~~~~~~~~~~~~~ -It comes as no surprise that two hash tables can have different sort orders, but -this fact can also be used advantageously to sort the 'same items' in several -ways. This is based on the ability to store a structure in several hash tables. - -Extending the previous example, suppose we have many users. We have added each -user structure to the `users_by_id` hash table and the `users_by_name` hash table. -(To reiterate, this is done without the need to have two copies of each structure.) -Now we can define two sort functions, then use `HASH_SRT`. - - int sort_by_id(struct my_struct *a, struct my_struct *b) { - if (a->id == b->id) return 0; - return (a->id < b->id) ? -1 : 1; - } - - int sort_by_name(struct my_struct *a, struct my_struct *b) { - return strcmp(a->username, b->username); - } - - HASH_SRT(hh1, users_by_id, sort_by_id); - HASH_SRT(hh2, users_by_name, sort_by_name); - -Now iterating over the items in `users_by_id` will traverse them in id-order -while, naturally, iterating over `users_by_name` will traverse them in -name-order. The items are fully forward-and-backward linked in each order. -So even for one set of users, we might store them in two hash tables to provide -easy iteration in two different sort orders. - -Bloom filter (faster misses) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Programs that generate a fair miss rate (`HASH_FIND` that result in `NULL`) may -benefit from the built-in Bloom filter support. This is disabled by default, -because programs that generate only hits would incur a slight penalty from it. -Also, programs that do deletes should not use the Bloom filter. While the -program would operate correctly, deletes diminish the benefit of the filter. -To enable the Bloom filter, simply compile with `-DHASH_BLOOM=n` like: - - -DHASH_BLOOM=27 - -where the number can be any value up to 32 which determines the amount of memory -used by the filter, as shown below. Using more memory makes the filter more -accurate and has the potential to speed up your program by making misses bail -out faster. - -.Bloom filter sizes for selected values of n -[width="50%",cols="10m,30",grid="none",options="header"] -|===================================================================== -| n | Bloom filter size (per hash table) -| 16 | 8 kilobytes -| 20 | 128 kilobytes -| 24 | 2 megabytes -| 28 | 32 megabytes -| 32 | 512 megabytes -|===================================================================== - -Bloom filters are only a performance feature; they do not change the results of -hash operations in any way. The only way to gauge whether or not a Bloom filter -is right for your program is to test it. Reasonable values for the size of the -Bloom filter are 16-32 bits. - -Select -~~~~~~ -An experimental 'select' operation is provided that inserts those items from a -source hash that satisfy a given condition into a destination hash. This -insertion is done with somewhat more efficiency than if this were using -`HASH_ADD`, namely because the hash function is not recalculated for keys of the -selected items. This operation does not remove any items from the source hash. -Rather the selected items obtain dual presence in both hashes. The destination -hash may already have items in it; the selected items are added to it. In order -for a structure to be usable with `HASH_SELECT`, it must have two or more hash -handles. (As described <>, a structure can exist in many -hash tables at the same time; it must have a separate hash handle for each one). - - user_t *users = NULL; /* hash table of users */ - user_t *admins = NULL; /* hash table of admins */ - - typedef struct { - int id; - UT_hash_handle hh; /* handle for users hash */ - UT_hash_handle ah; /* handle for admins hash */ - } user_t; - -Now suppose we have added some users, and want to select just the administrator -users who have id's less than 1024. - - #define is_admin(x) (((user_t*)x)->id < 1024) - HASH_SELECT(ah, admins, hh, users, is_admin); - -The first two parameters are the 'destination' hash handle and hash table, the -second two parameters are the 'source' hash handle and hash table, and the last -parameter is the 'select condition'. Here we used a macro `is_admin(x)` but we -could just as well have used a function. - - int is_admin(const void *userv) { - user_t *user = (const user_t*)userv; - return (user->id < 1024) ? 1 : 0; - } - -If the select condition always evaluates to true, this operation is -essentially a 'merge' of the source hash into the destination hash. - -`HASH_SELECT` adds items to the destination without removing them from -the source; the source hash table remains unchanged. The destination hash table -must not be the same as the source hash table. - -An example of using `HASH_SELECT` is included in `tests/test36.c`. - -[[hash_keycompare]] -Specifying an alternate key comparison function -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -When you call `HASH_FIND(hh, head, intfield, sizeof(int), out)`, uthash will -first call <>`(intfield, sizeof(int), hashvalue)` to -determine the bucket `b` in which to search, and then, for each element `elt` -of bucket `b`, uthash will evaluate -`elt->hh.hashv == hashvalue && elt.hh.keylen == sizeof(int) && HASH_KEYCMP(intfield, elt->hh.key, sizeof(int)) == 0`. -`HASH_KEYCMP` should return `0` to indicate that `elt` is a match and should be -returned, and any non-zero value to indicate that the search for a matching -element should continue. - -By default, uthash defines `HASH_KEYCMP` as an alias for `memcmp`. On platforms -that do not provide `memcmp`, you can substitute your own implementation. - ----------------------------------------------------------------------------- -#undef HASH_KEYCMP -#define HASH_KEYCMP(a,b,len) bcmp(a, b, len) ----------------------------------------------------------------------------- - -Another reason to substitute your own key comparison function is if your "key" is not -trivially comparable. In this case you will also need to substitute your own `HASH_FUNCTION`. - ----------------------------------------------------------------------------- -struct Key { - short s; - /* 2 bytes of padding */ - float f; -}; -/* do not compare the padding bytes; do not use memcmp on floats */ -unsigned key_hash(struct Key *s) { return s + (unsigned)f; } -bool key_equal(struct Key *a, struct Key *b) { return a.s == b.s && a.f == b.f; } - -#define HASH_FUNCTION(s,len,hashv) (hashv) = key_hash((struct Key *)s) -#define HASH_KEYCMP(a,b,len) (!key_equal((struct Key *)a, (struct Key *)b)) ----------------------------------------------------------------------------- - -Another reason to substitute your own key comparison function is to trade off -correctness for raw speed. During its linear search of a bucket, uthash always -compares the 32-bit `hashv` first, and calls `HASH_KEYCMP` only if the `hashv` -compares equal. This means that `HASH_KEYCMP` is called at least once per -successful find. Given a good hash function, we expect the `hashv` comparison to -produce a "false positive" equality only once in four billion times. Therefore, -we expect `HASH_KEYCMP` to produce `0` most of the time. If we expect many -successful finds, and our application doesn't mind the occasional false positive, -we might substitute a no-op comparison function: - ----------------------------------------------------------------------------- -#undef HASH_KEYCMP -#define HASH_KEYCMP(a,b,len) 0 /* occasionally wrong, but very fast */ ----------------------------------------------------------------------------- - -Note: The global equality-comparison function `HASH_KEYCMP` has no relationship -at all to the lessthan-comparison function passed as a parameter to `HASH_ADD_INORDER`. - -[[hash_functions]] -Built-in hash functions -~~~~~~~~~~~~~~~~~~~~~~~ -Internally, a hash function transforms a key into a bucket number. You don't -have to take any action to use the default hash function, currently Jenkins. - -Some programs may benefit from using another of the built-in hash functions. -There is a simple analysis utility included with uthash to help you determine -if another hash function will give you better performance. - -You can use a different hash function by compiling your program with -`-DHASH_FUNCTION=HASH_xyz` where `xyz` is one of the symbolic names listed -below. E.g., - - cc -DHASH_FUNCTION=HASH_BER -o program program.c - -.Built-in hash functions -[width="50%",cols="^5m,20",grid="none",options="header"] -|=============================================================================== -|Symbol | Name -|JEN | Jenkins (default) -|BER | Bernstein -|SAX | Shift-Add-Xor -|OAT | One-at-a-time -|FNV | Fowler/Noll/Vo -|SFH | Paul Hsieh -|=============================================================================== - -Which hash function is best? -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -You can easily determine the best hash function for your key domain. To do so, -you'll need to run your program once in a data-collection pass, and then run -the collected data through an included analysis utility. - -First you must build the analysis utility. From the top-level directory, - - cd tests/ - make - -We'll use `test14.c` to demonstrate the data-collection and analysis steps -(here using `sh` syntax to redirect file descriptor 3 to a file): - -.Using keystats --------------------------------------------------------------------------------- -% cc -DHASH_EMIT_KEYS=3 -I../src -o test14 test14.c -% ./test14 3>test14.keys -% ./keystats test14.keys -fcn ideal% #items #buckets dup% fl add_usec find_usec del-all usec ---- ------ ---------- ---------- ----- -- ---------- ---------- ------------ -SFH 91.6% 1219 256 0% ok 92 131 25 -FNV 90.3% 1219 512 0% ok 107 97 31 -SAX 88.7% 1219 512 0% ok 111 109 32 -OAT 87.2% 1219 256 0% ok 99 138 26 -JEN 86.7% 1219 256 0% ok 87 130 27 -BER 86.2% 1219 256 0% ok 121 129 27 --------------------------------------------------------------------------------- - -[NOTE] -The number 3 in `-DHASH_EMIT_KEYS=3` is a file descriptor. Any file descriptor -that your program doesn't use for its own purposes can be used instead of 3. -The data-collection mode enabled by `-DHASH_EMIT_KEYS=x` should not be used in -production code. - -Usually, you should just pick the first hash function that is listed. Here, this -is `SFH`. This is the function that provides the most even distribution for -your keys. If several have the same `ideal%`, then choose the fastest one -according to the `find_usec` column. - -keystats column reference -^^^^^^^^^^^^^^^^^^^^^^^^^ -fcn:: - symbolic name of hash function -ideal%:: - The percentage of items in the hash table which can be looked up within an - ideal number of steps. (Further explained below). -#items:: - the number of keys that were read in from the emitted key file -#buckets:: - the number of buckets in the hash after all the keys were added -dup%:: - the percent of duplicate keys encountered in the emitted key file. - Duplicates keys are filtered out to maintain key uniqueness. (Duplicates - are normal. For example, if the application adds an item to a hash, - deletes it, then re-adds it, the key is written twice to the emitted file.) -flags:: - this is either `ok`, or `nx` (noexpand) if the expansion inhibited flag is - set, described in <>. It is not recommended - to use a hash function that has the `noexpand` flag set. -add_usec:: - the clock time in microseconds required to add all the keys to a hash -find_usec:: - the clock time in microseconds required to look up every key in the hash -del-all usec:: - the clock time in microseconds required to delete every item in the hash - -[[ideal]] -ideal% -^^^^^^ - -.What is ideal%? -***************************************************************************** -The 'n' items in a hash are distributed into 'k' buckets. Ideally each bucket -would contain an equal share '(n/k)' of the items. In other words, the maximum -linear position of any item in a bucket chain would be 'n/k' if every bucket is -equally used. If some buckets are overused and others are underused, the -overused buckets will contain items whose linear position surpasses 'n/k'. -Such items are considered non-ideal. - -As you might guess, `ideal%` is the percentage of ideal items in the hash. These -items have favorable linear positions in their bucket chains. As `ideal%` -approaches 100%, the hash table approaches constant-time lookup performance. -***************************************************************************** - -[[hashscan]] -hashscan -~~~~~~~~ -NOTE: This utility is only available on Linux, and on FreeBSD (8.1 and up). - -A utility called `hashscan` is included in the `tests/` directory. It -is built automatically when you run `make` in that directory. This tool -examines a running process and reports on the uthash tables that it finds in -that program's memory. It can also save the keys from each table in a format -that can be fed into `keystats`. - -Here is an example of using `hashscan`. First ensure that it is built: - - cd tests/ - make - -Since `hashscan` needs a running program to inspect, we'll start up a simple -program that makes a hash table and then sleeps as our test subject: - - ./test_sleep & - pid: 9711 - -Now that we have a test program, let's run `hashscan` on it: - - ./hashscan 9711 - Address ideal items buckets mc fl bloom/sat fcn keys saved to - ------------------ ----- -------- -------- -- -- --------- --- ------------- - 0x862e038 81% 10000 4096 11 ok 16 14% JEN - -If we wanted to copy out all its keys for external analysis using `keystats`, -add the `-k` flag: - - ./hashscan -k 9711 - Address ideal items buckets mc fl bloom/sat fcn keys saved to - ------------------ ----- -------- -------- -- -- --------- --- ------------- - 0x862e038 81% 10000 4096 11 ok 16 14% JEN /tmp/9711-0.key - -Now we could run `./keystats /tmp/9711-0.key` to analyze which hash function -has the best characteristics on this set of keys. - -hashscan column reference -^^^^^^^^^^^^^^^^^^^^^^^^^ -Address:: - virtual address of the hash table -ideal:: - The percentage of items in the table which can be looked up within an ideal - number of steps. See <> in the `keystats` section. -items:: - number of items in the hash table -buckets:: - number of buckets in the hash table -mc:: - the maximum chain length found in the hash table (uthash usually tries to - keep fewer than 10 items in each bucket, or in some cases a multiple of 10) -fl:: - flags (either `ok`, or `NX` if the expansion-inhibited flag is set) -bloom/sat:: - if the hash table uses a Bloom filter, this is the size (as a power of two) - of the filter (e.g. 16 means the filter is 2^16 bits in size). The second - number is the "saturation" of the bits expressed as a percentage. The lower - the percentage, the more potential benefit to identify cache misses quickly. -fcn:: - symbolic name of hash function -keys saved to:: - file to which keys were saved, if any - -.How hashscan works -***************************************************************************** -When hashscan runs, it attaches itself to the target process, which suspends -the target process momentarily. During this brief suspension, it scans the -target's virtual memory for the signature of a uthash hash table. It then -checks if a valid hash table structure accompanies the signature and reports -what it finds. When it detaches, the target process resumes running normally. -The hashscan is performed "read-only"-- the target process is not modified. -Since hashscan is analyzing a momentary snapshot of a running process, it may -return different results from one run to another. -***************************************************************************** - -[[expansion]] -Expansion internals -~~~~~~~~~~~~~~~~~~~ -Internally this hash manages the number of buckets, with the goal of having -enough buckets so that each one contains only a small number of items. - -.Why does the number of buckets matter? -******************************************************************************** -When looking up an item by its key, this hash scans linearly through the items -in the appropriate bucket. In order for the linear scan to run in constant -time, the number of items in each bucket must be bounded. This is accomplished -by increasing the number of buckets as needed. -******************************************************************************** - -Normal expansion -^^^^^^^^^^^^^^^^ -This hash attempts to keep fewer than 10 items in each bucket. When an item is -added that would cause a bucket to exceed this number, the number of buckets in -the hash is doubled and the items are redistributed into the new buckets. In an -ideal world, each bucket will then contain half as many items as it did before. - -Bucket expansion occurs automatically and invisibly as needed. There is -no need for the application to know when it occurs. - -Per-bucket expansion threshold -++++++++++++++++++++++++++++++ -Normally all buckets share the same threshold (10 items) at which point bucket -expansion is triggered. During the process of bucket expansion, uthash can -adjust this expansion-trigger threshold on a per-bucket basis if it sees that -certain buckets are over-utilized. - -When this threshold is adjusted, it goes from 10 to a multiple of 10 (for that -particular bucket). The multiple is based on how many times greater the actual -chain length is than the ideal length. It is a practical measure to reduce -excess bucket expansion in the case where a hash function over-utilizes a few -buckets but has good overall distribution. However, if the overall distribution -gets too bad, uthash changes tactics. - -Inhibited expansion -^^^^^^^^^^^^^^^^^^^ -You usually don't need to know or worry about this, particularly if you used -the `keystats` utility during development to select a good hash for your keys. - -A hash function may yield an uneven distribution of items across the buckets. -In moderation this is not a problem. Normal bucket expansion takes place as -the chain lengths grow. But when significant imbalance occurs (because the hash -function is not well suited to the key domain), bucket expansion may be -ineffective at reducing the chain lengths. - -Imagine a very bad hash function which always puts every item in bucket 0. No -matter how many times the number of buckets is doubled, the chain length of -bucket 0 stays the same. In a situation like this, the best behavior is to -stop expanding, and accept 'O(n)' lookup performance. This is what uthash -does. It degrades gracefully if the hash function is ill-suited to the keys. - -If two consecutive bucket expansions yield `ideal%` values below 50%, uthash -inhibits expansion for that hash table. Once set, the 'bucket expansion -inhibited' flag remains in effect as long as the hash has items in it. -Inhibited expansion may cause `HASH_FIND` to exhibit worse than constant-time -performance. - -Diagnostic hooks -^^^^^^^^^^^^^^^^ - -There are two "notification" hooks which get executed if uthash is -expanding buckets, or setting the 'bucket expansion inhibited' flag. -There is no need for the application to set these hooks or take action in -response to these events. They are mainly for diagnostic purposes. -Normally both of these hooks are undefined and thus compile away to nothing. - -The `uthash_expand_fyi` hook can be defined to execute code whenever -uthash performs a bucket expansion. - ----------------------------------------------------------------------------- -#undef uthash_expand_fyi -#define uthash_expand_fyi(tbl) printf("expanded to %u buckets\n", tbl->num_buckets) ----------------------------------------------------------------------------- - -The `uthash_noexpand_fyi` hook can be defined to execute code whenever -uthash sets the 'bucket expansion inhibited' flag. - ----------------------------------------------------------------------------- -#undef uthash_noexpand_fyi -#define uthash_noexpand_fyi(tbl) printf("warning: bucket expansion inhibited\n") ----------------------------------------------------------------------------- - -Hooks -~~~~~ -You don't need to use these hooks -- they are only here if you want to modify -the behavior of uthash. Hooks can be used to replace standard library functions -that might be unavailable on some platforms, to change how uthash allocates -memory, or to run code in response to certain internal events. - -The `uthash.h` header will define these hooks to default values, unless they -are already defined. It is safe either to `#undef` and redefine them -after including `uthash.h`, or to define them before inclusion; for -example, by passing `-Duthash_malloc=my_malloc` on the command line. - -Specifying alternate memory management functions -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -By default, uthash uses `malloc` and `free` to manage memory. -If your application uses its own custom allocator, uthash can use them too. - ----------------------------------------------------------------------------- -#include "uthash.h" - -/* undefine the defaults */ -#undef uthash_malloc -#undef uthash_free - -/* re-define, specifying alternate functions */ -#define uthash_malloc(sz) my_malloc(sz) -#define uthash_free(ptr, sz) my_free(ptr) - -... ----------------------------------------------------------------------------- - -Notice that `uthash_free` receives two parameters. The `sz` parameter is for -convenience on embedded platforms that manage their own memory. - -Specifying alternate standard library functions -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Uthash also uses `strlen` (in the `HASH_FIND_STR` convenience macro, for -example) and `memset` (used only for zeroing memory). On platforms that do not -provide these functions, you can substitute your own implementations. - ----------------------------------------------------------------------------- -#undef uthash_bzero -#define uthash_bzero(a, len) my_bzero(a, len) - -#undef uthash_strlen -#define uthash_strlen(s) my_strlen(s) ----------------------------------------------------------------------------- - -Out of memory -^^^^^^^^^^^^^ -If memory allocation fails (i.e., the `uthash_malloc` function returns `NULL`), -the default behavior is to terminate the process by calling `exit(-1)`. This -can be modified by re-defining the `uthash_fatal` macro. - ----------------------------------------------------------------------------- -#undef uthash_fatal -#define uthash_fatal(msg) my_fatal_function(msg) ----------------------------------------------------------------------------- - -The fatal function should terminate the process or `longjmp` back to a safe -place. Note that an allocation failure may leave allocated memory that cannot -be recovered. After `uthash_fatal`, the hash table object should be considered -unusable; it might not be safe even to run `HASH_CLEAR` on the hash table -when it is in this state. - -To enable "returning a failure" if memory cannot be allocated, define the -macro `HASH_NONFATAL_OOM` before including the `uthash.h` header file. In this -case, `uthash_fatal` is not used; instead, each allocation failure results in -a single call to `uthash_nonfatal_oom(elt)` where `elt` is the address of the -element whose insertion triggered the failure. The default behavior of -`uthash_nonfatal_oom` is a no-op. - ----------------------------------------------------------------------------- -#undef uthash_nonfatal_oom -#define uthash_nonfatal_oom(elt) perhaps_recover((element_t *) elt) ----------------------------------------------------------------------------- - -Before the call to `uthash_nonfatal_oom`, the hash table is rolled back -to the state it was in prior to the problematic insertion; no memory is -leaked. It is safe to `throw` or `longjmp` out of the `uthash_nonfatal_oom` -handler. - -The `elt` argument will be of the correct pointer-to-element type, unless -`uthash_nonfatal_oom` is invoked from `HASH_SELECT`, in which case it will -be of `void*` type and must be cast before using. In any case, `elt->hh.tbl` -will be `NULL`. - -Allocation failure is possible only when adding elements to the hash table -(including the `ADD`, `REPLACE`, and `SELECT` operations). -`uthash_free` is not allowed to fail. - -Debug mode -~~~~~~~~~~ -If a program that uses this hash is compiled with `-DHASH_DEBUG=1`, a special -internal consistency-checking mode is activated. In this mode, the integrity -of the whole hash is checked following every add or delete operation. This is -for debugging the uthash software only, not for use in production code. - -In the `tests/` directory, running `make debug` will run all the tests in -this mode. - -In this mode, any internal errors in the hash data structure will cause a -message to be printed to `stderr` and the program to exit. - -The `UT_hash_handle` data structure includes `next`, `prev`, `hh_next` and -`hh_prev` fields. The former two fields determine the "application" ordering -(that is, insertion order-- the order the items were added). The latter two -fields determine the "bucket chain" order. These link the `UT_hash_handles` -together in a doubly-linked list that is a bucket chain. - -Checks performed in `-DHASH_DEBUG=1` mode: - -- the hash is walked in its entirety twice: once in 'bucket' order and a - second time in 'application' order -- the total number of items encountered in both walks is checked against the - stored number -- during the walk in 'bucket' order, each item's `hh_prev` pointer is compared - for equality with the last visited item -- during the walk in 'application' order, each item's `prev` pointer is compared - for equality with the last visited item - -.Macro debugging: -******************************************************************************** -Sometimes it's difficult to interpret a compiler warning on a line which -contains a macro call. In the case of uthash, one macro can expand to dozens of -lines. In this case, it is helpful to expand the macros and then recompile. -By doing so, the warning message will refer to the exact line within the macro. - -Here is an example of how to expand the macros and then recompile. This uses the -`test1.c` program in the `tests/` subdirectory. - - gcc -E -I../src test1.c > /tmp/a.c - egrep -v '^#' /tmp/a.c > /tmp/b.c - indent /tmp/b.c - gcc -o /tmp/b /tmp/b.c - -The last line compiles the original program (test1.c) with all macros expanded. -If there was a warning, the referenced line number can be checked in `/tmp/b.c`. -******************************************************************************** - -Thread safety -~~~~~~~~~~~~~ -You can use uthash in a threaded program. But you must do the locking. Use a -read-write lock to protect against concurrent writes. It is ok to have -concurrent readers (since uthash 1.5). - -For example using pthreads you can create an rwlock like this: - - pthread_rwlock_t lock; - if (pthread_rwlock_init(&lock, NULL) != 0) fatal("can't create rwlock"); - -Then, readers must acquire the read lock before doing any `HASH_FIND` calls or -before iterating over the hash elements: - - if (pthread_rwlock_rdlock(&lock) != 0) fatal("can't get rdlock"); - HASH_FIND_INT(elts, &i, e); - pthread_rwlock_unlock(&lock); - -Writers must acquire the exclusive write lock before doing any update. Add, -delete, and sort are all updates that must be locked. - - if (pthread_rwlock_wrlock(&lock) != 0) fatal("can't get wrlock"); - HASH_DEL(elts, e); - pthread_rwlock_unlock(&lock); - -If you prefer, you can use a mutex instead of a read-write lock, but this will -reduce reader concurrency to a single thread at a time. - -An example program using uthash with a read-write lock is included in -`tests/threads/test1.c`. - -[[Macro_reference]] -Macro reference ---------------- - -Convenience macros -~~~~~~~~~~~~~~~~~~ -The convenience macros do the same thing as the generalized macros, but -require fewer arguments. - -In order to use the convenience macros, - -1. the structure's `UT_hash_handle` field must be named `hh`, and -2. for add or find, the key field must be of type `int` or `char[]` or pointer - -.Convenience macros -[width="90%",cols="10m,30m",grid="none",options="header"] -|=============================================================================== -|macro | arguments -|HASH_ADD_INT | (head, keyfield_name, item_ptr) -|HASH_REPLACE_INT | (head, keyfield_name, item_ptr, replaced_item_ptr) -|HASH_FIND_INT | (head, key_ptr, item_ptr) -|HASH_ADD_STR | (head, keyfield_name, item_ptr) -|HASH_REPLACE_STR | (head, keyfield_name, item_ptr, replaced_item_ptr) -|HASH_FIND_STR | (head, key_ptr, item_ptr) -|HASH_ADD_PTR | (head, keyfield_name, item_ptr) -|HASH_REPLACE_PTR | (head, keyfield_name, item_ptr, replaced_item_ptr) -|HASH_FIND_PTR | (head, key_ptr, item_ptr) -|HASH_DEL | (head, item_ptr) -|HASH_SORT | (head, cmp) -|HASH_COUNT | (head) -|=============================================================================== - -General macros -~~~~~~~~~~~~~~ - -These macros add, find, delete and sort the items in a hash. You need to -use the general macros if your `UT_hash_handle` is named something other -than `hh`, or if your key's data type isn't `int` or `char[]`. - -.General macros -[width="90%",cols="10m,30m",grid="none",options="header"] -|=============================================================================== -|macro | arguments -|HASH_ADD | (hh_name, head, keyfield_name, key_len, item_ptr) -|HASH_ADD_BYHASHVALUE | (hh_name, head, keyfield_name, key_len, hashv, item_ptr) -|HASH_ADD_KEYPTR | (hh_name, head, key_ptr, key_len, item_ptr) -|HASH_ADD_KEYPTR_BYHASHVALUE | (hh_name, head, key_ptr, key_len, hashv, item_ptr) -|HASH_ADD_INORDER | (hh_name, head, keyfield_name, key_len, item_ptr, cmp) -|HASH_ADD_BYHASHVALUE_INORDER | (hh_name, head, keyfield_name, key_len, hashv, item_ptr, cmp) -|HASH_ADD_KEYPTR_INORDER | (hh_name, head, key_ptr, key_len, item_ptr, cmp) -|HASH_ADD_KEYPTR_BYHASHVALUE_INORDER | (hh_name, head, key_ptr, key_len, hashv, item_ptr, cmp) -|HASH_REPLACE | (hh_name, head, keyfield_name, key_len, item_ptr, replaced_item_ptr) -|HASH_REPLACE_BYHASHVALUE | (hh_name, head, keyfield_name, key_len, hashv, item_ptr, replaced_item_ptr) -|HASH_REPLACE_INORDER | (hh_name, head, keyfield_name, key_len, item_ptr, replaced_item_ptr, cmp) -|HASH_REPLACE_BYHASHVALUE_INORDER | (hh_name, head, keyfield_name, key_len, hashv, item_ptr, replaced_item_ptr, cmp) -|HASH_FIND | (hh_name, head, key_ptr, key_len, item_ptr) -|HASH_FIND_BYHASHVALUE | (hh_name, head, key_ptr, key_len, hashv, item_ptr) -|HASH_DELETE | (hh_name, head, item_ptr) -|HASH_VALUE | (key_ptr, key_len, hashv) -|HASH_SRT | (hh_name, head, cmp) -|HASH_CNT | (hh_name, head) -|HASH_CLEAR | (hh_name, head) -|HASH_SELECT | (dst_hh_name, dst_head, src_hh_name, src_head, condition) -|HASH_ITER | (hh_name, head, item_ptr, tmp_item_ptr) -|HASH_OVERHEAD | (hh_name, head) -|=============================================================================== - -[NOTE] -`HASH_ADD_KEYPTR` is used when the structure contains a pointer to the -key, rather than the key itself. - -The `HASH_VALUE` and `..._BYHASHVALUE` macros are a performance mechanism mainly for the -special case of having different structures, in different hash tables, having -identical keys. It allows the hash value to be obtained once and then passed -in to the `..._BYHASHVALUE` macros, saving the expense of re-computing the hash value. - - -Argument descriptions -^^^^^^^^^^^^^^^^^^^^^ -hh_name:: - name of the `UT_hash_handle` field in the structure. Conventionally called - `hh`. -head:: - the structure pointer variable which acts as the "head" of the hash. So - named because it initially points to the first item that is added to the hash. -keyfield_name:: - the name of the key field in the structure. (In the case of a multi-field - key, this is the first field of the key). If you're new to macros, it - might seem strange to pass the name of a field as a parameter. See - <>. -key_len:: - the length of the key field in bytes. E.g. for an integer key, this is - `sizeof(int)`, while for a string key it's `strlen(key)`. (For a - multi-field key, see <>.) -key_ptr:: - for `HASH_FIND`, this is a pointer to the key to look up in the hash - (since it's a pointer, you can't directly pass a literal value here). For - `HASH_ADD_KEYPTR`, this is the address of the key of the item being added. -hashv:: - the hash value of the provided key. This is an input parameter for the - `..._BYHASHVALUE` macros, and an output parameter for `HASH_VALUE`. - Reusing a cached hash value can be a performance optimization if - you're going to do repeated lookups for the same key. -item_ptr:: - pointer to the structure being added, deleted, replaced, or looked up, or the current - pointer during iteration. This is an input parameter for the `HASH_ADD`, - `HASH_DELETE`, and `HASH_REPLACE` macros, and an output parameter for `HASH_FIND` - and `HASH_ITER`. (When using `HASH_ITER` to iterate, `tmp_item_ptr` - is another variable of the same type as `item_ptr`, used internally). -replaced_item_ptr:: - used in `HASH_REPLACE` macros. This is an output parameter that is set to point - to the replaced item (if no item is replaced it is set to NULL). -cmp:: - pointer to comparison function which accepts two arguments (pointers to - items to compare) and returns an int specifying whether the first item - should sort before, equal to, or after the second item (like `strcmp`). -condition:: - a function or macro which accepts a single argument (a void pointer to a - structure, which needs to be cast to the appropriate structure type). The - function or macro should evaluate to a non-zero value if the - structure should be "selected" for addition to the destination hash. - -// vim: set tw=80 wm=2 syntax=asciidoc: diff --git a/srcs/libs/docs/uthash.pdf b/srcs/libs/docs/uthash.pdf new file mode 100644 index 0000000..f4e8d94 Binary files /dev/null and b/srcs/libs/docs/uthash.pdf differ diff --git a/srcs/libs/docs/utlist.pdf b/srcs/libs/docs/utlist.pdf new file mode 100644 index 0000000..b225892 Binary files /dev/null and b/srcs/libs/docs/utlist.pdf differ diff --git a/srcs/libs/docs/utringbuffer.pdf b/srcs/libs/docs/utringbuffer.pdf new file mode 100644 index 0000000..ffa4923 Binary files /dev/null and b/srcs/libs/docs/utringbuffer.pdf differ diff --git a/srcs/libs/docs/utstack.pdf b/srcs/libs/docs/utstack.pdf new file mode 100644 index 0000000..e1c736d Binary files /dev/null and b/srcs/libs/docs/utstack.pdf differ diff --git a/srcs/libs/docs/utstring.pdf b/srcs/libs/docs/utstring.pdf new file mode 100644 index 0000000..23344ad Binary files /dev/null and b/srcs/libs/docs/utstring.pdf differ diff --git a/srcs/libs/docs/zlog-EN.pdf b/srcs/libs/docs/zlog-EN.pdf new file mode 100644 index 0000000..ec72674 Binary files /dev/null and b/srcs/libs/docs/zlog-EN.pdf differ diff --git a/srcs/libs/docs/zlog.lyx b/srcs/libs/docs/zlog.lyx index 76ea028..4b69257 100644 --- a/srcs/libs/docs/zlog.lyx +++ b/srcs/libs/docs/zlog.lyx @@ -7532,4 +7532,4 @@ $ ./test_record \end_layout \end_body -\end_document \ No newline at end of file +\end_document