GIT REPO

git clone http://www.coincoin169.org/git/ccsp.git ccsp

NAME

ccsp - a library for string pooling and interning

SYNOPSIS

#include <ccsp.h>

DESCRIPTION

Ccsp stands for CoinCoin String Pooling. It is a small library for string pooling and interning. The C header file exports three types: ccsp_strbuf, ccsp_pool and ccsp_string. Basically you first build a string using a ccsp_strbuf variable and its related function, then you intern the string in a ccsp_pool variable to obtain a pointer of type ccsp_string. The ccsp_strbuf variables are only provided to simplify the construction of C-string. Of course you can also intern standard C-string into any pool to obtain a pointer of type ccsp_string. The particularity of the library resides in the fact that is you intern several times the same C-string (either by direct pointer char* or via the use of ccsp_strbuf variables) you will get several pointers of type ccsp_string that all points to the same structure containing only one copy of the C-string. For each interned C-string in any pool you can associate some data by the means of a pointer of type void* so that the pool acts somehow like a mapping from C-strings to void*'s. Note that two different variables of type ccsp_string that point to the same C-string within the same pool will have the same value (of type void*).

MEMORY MANAGEMENT

For its internal memory management ccsp use by default malloc, realloc and free. But you can of course put in your own memory management functions. This could be useful if the library is to be used by another program that disposes of a garbage collector or that has some alignment requirements. The functions given to ccsp must follow the following semantic: the replacements for malloc and realloc must follow exactly the same semantic as malloc and realloc with the exception that they must always return a non-NULL pointer. Errors must be handled within the function. The replacement for free must follow exactly the same semantic as the free function.

FUNCTION REFERENCE

All variables must be initialized before being used. And when a variable is not needed anymore you may clear it with the appropriate function.

void ccpp_set_memory_functions(void *(*malloc_func)(size_t),void *(*realloc_func)(void*,size_t),void (*free_func)(void*));

Set the memory functions to be used by ccsp. The replacements for malloc and realloc must follow exactly the same semantic as malloc and realloc with the exception that they must always return a non-NULL pointer. Errors must be handled within the function. The replacement for free must follow exactly the same semantic as the free function. If NULL pointers are passed as any argument then the default ccsp corresponding memory functions will be used.

void ccsp_strbuf_empty(ccsp_strbuf buf);

Set the buffer buf to be the empty string "".

void ccsp_strbuf_init2(ccsp_strbuf buf,size_t mem);

Initialize the buffer buf to be at least mem bytes of memory. This function must be called before doing anything else with buf.

void ccsp_strbuf_init(ccsp_strbuf buf);

Initialize the buffer buf. This function must be called before doing anything else with buf or any element in the list.

void ccsp_strbuf_inits(ccsp_strbuf_ptr buf,...);

Initialize the NULL-terminated list of buffers beginning with buf. This function must be called before doing anything else with buf.

void ccsp_strbuf_clear(ccsp_strbuf buf);

Clear the memory used by buf. You may use this function when you no longer need buf.

void ccsp_strbuf_clears(ccsp_strbuf_ptr buf,...);

Clear the NULL-terminated list of string buffers beginning by buf. You may use this function when you no longer need buf and the other string buffers in the list.

size_t ccsp_strbuf_len(ccsp_strbuf buf);

Return the length of the string contained in buf. Be aware that this function is nothing more than a simple call to strlen so that it does take into account thte charset of the string.

void ccsp_strbuf_append(ccsp_strbuf buf,char c);

Append the character c to the string contained in buf.

char *ccsp_strbuf_get_cstring(ccsp_strbuf buf);

Return a pointer (char*) to the C-string contained in buf. Do not free this pointer or it will result in undefined behaviors. If you want to free the memory occupied by the string contained in buf use ccsp_strbuf_clear instead.

void ccsp_string_init(ccsp_string s);

Initialize s. This function must be called before doing anything else with s.

void ccsp_string_inits(ccsp_string_ptr s,...);

Initialize the NULL-terminated list of string pointers beginning with s. This function must be called before doing anything else with s or any element in the list.

void ccsp_string_clear(ccsp_string s);

Clear the memory used by s. You may use this function when you no longer need s.

void ccsp_string_clears(ccsp_string_ptr s,...);

Clear the NULL-terminated list of string poniters beginning by s. You may use this function when you no longer need buf and the other string buffers in the list.

void ccsp_string_copy(ccsp_string r,ccsp_string s);

Copy the string poniter s into r.

void ccsp_pool_init2(ccsp_pool p,size_t n,size_t (*hash_func)(const char*,size_t))

Initialize the pool of strings p. Internally it is implemented with a hash table which can be given with arguemetn n. The hash fucntion used for the pool is given by hash_func. The hash function must have the following signature: its first arguement is a poniter to string whose hash will be computed while its second argument gives an upper bound on the hash values. If hash_func is set to NULL then the default hash function djb2 will be used. This function must be called before doing anything else with p.

void ccsp_pool_init(ccsp_pool p);

Initialize pool p. This function must be called before doing anything else with p.

void ccsp_pool_inits(ccsp_pool_ptr p,...);

Initialize the NULL-terminated list of pools beginning with p. This function must be called before doing anything else with p or any element in the list.

void ccsp_pool_clear(ccsp_pool p);

Clear the memory used by s. You may use this function when you no longer need s. Note that every string pointers (ccsp_string) that ponits to a string in the pool p must not be used after pool p has been cleared otherwise undefined behavior will occur. If you want to produce cleaner code I recommand to clear all string pointers that ponits to p before clearing p.

void ccsp_pool_clears(ccsp_pool_ptr p,...);

Clear the NULL-terminated list of pools beginning by p. You may use this function when you no longer need p and any other element in the list. The same remark as for ccsp_pool_clear apply about ccsp_string variables pointing to any element in the list.

void ccsp_intern_strbuf(ccsp_string r,ccsp_pool p,ccsp_strbuf buf);

Intern the string contained in buf in pool p and fills the string variable r to point to the string within the pool p.

void ccsp_intern_cstring(ccsp_string r,ccsp_pool p,const char *s);

Intern the C-string pointed to by s in pool p and fills the string variable r to point to the string within the pool p. A deep copy of s is done by this function so that you may dispose of s as you wish afterwards.

void ccsp_intern_cstring_alias(ccsp_string r,ccsp_pool p,char *s);

Intern the C-string pointed to by s in pool p and fills the string variable r to point to the string within the pool p. Note that no deep copy of s is done by this function, therefore it is strongly recommanded to let the C-string pointed by s untouched after a cal to this function otherwise undefined behiavors will occur. Unless you know what you are doing please use ccsp_inter_cstring instead.

void ccsp_string_set_value(ccsp_string s,void *value);

If s does point to a string in a pool then you give this string the value value. If s does not point to any string then this function does nothing.

void *ccsp_string_get_value(ccsp_string s);

Return the value of the string pointed to by s. If s does not point to any string in any pool then NULL is returned by this function.

char *ccsp_string_get_cstring(ccsp_string s);

Return a pointer (char*) to the C-string pointed by s. If s does not point to any string, typically just after a call to ccsp_string_init this function will return NULL. Do not free this pointer or it will result in undefined behaviors. If you want to free the memory occupied by the string pointed to by s use ccsp_string_clear instead.

AN EXAMPLE PROGRAM

The following program shows the use of the library. It interns two times in two different ways the same C-string and shows that the obtained pointers of type ccsp_string both points to the same C-string in memory. Then a value is given by the means of one string pointer (ccsp_string) and checked by the means of the other string pointer. As the two string pointers ponit to the same C-string they share the value.

#include <stdio.h>
#include <ccsp.h>

int main(void) {
  ccsp_pool p;
  ccsp_strbuf buf;
  ccsp_string s1,s2;
  
  ccsp_string_init(s1);
  ccsp_string_init(s2);
  ccsp_strbuf_init(buf);
  ccsp_pool_init(p);

  ccsp_intern_cstring(s1,p,"abc");
  ccsp_strbuf_append(buf,'a');
  ccsp_strbuf_append(buf,'b');
  ccsp_strbuf_append(buf,'c');
  ccsp_inter_strbuf(s2,p,buf);

  fprintf(stdout,"s1 == s2 ? %dn",
          ccsp_string_get_cstring(s1) == ccsp_string_get_cstring(s2));
  fflush(stdout);

  ccsp_string_set_value(s1,(void*)-1);
  fprintf(stdout,"s1.value == s2.value ? %dn",
          ccsp_string_get_value(s1) == ccsp_string_get_value(s2));
  fflush(stdout);

  ccsp_string_clear(s1);
  ccsp_string_clear(s2);
  ccsp_strbuf_clear(buf);
  ccsp_pool_clear(p);
  return 0;
}

A WORD ON THE LIBRARY EFFICIENCY

The main goal of the library is to be memory efficient that is, use the least memory possible although there sure are some areas for improvements. I did not design the library for speed. Of course I care about speed but, for this library, at the moment, not at low-level. I have implemented a hash table to have roughly O(1) time for all the operations although the constant behind the big-O may be big. You are welcome to improve the library of course and I will be happy to have your feedback or patches.

LICENSE

The documentation of ccsp and ccsp are placed under the GPL.

Copyright (C) 2014 Guillaume Quintin

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

AUTHOR

Written by Guillaume Quintin (quintin@lix.polytechnique.fr).

SEE ALSO

malloc(3), realloc(3), free(3), strlen(3), djb2(http://www.cse.yorku.ca/~oz/hash.html), GPL(7).