Based on "Programming Pearls" page 161
The favorite data structure in the implementation of STL's sets and maps is a balanced search tree. These containers are based on these balanced trees to perform better in find and retrieval procedures; as a result, implementation of a dictionary requires the usage of such a mechanism that is not necessary optimized in "insertion" scenarios, as an alternative we can use a custom hash table + hash algorithm that performs much faster in inserting the words into a dictionary. Here is a basic comparison of these two methods STL's map container
#include #include
Running on a text file of the Bible The result is: Clock Before Read Map:1 Clock After Read Map:8428 Clock Before Write Map:8430 Clock After Write Map:8830 Press any key to continue . . . The following version uses a cusom hash table + hash algorithm, notice the read time
#include #include #include #include using namespace std; struct node { char *word; int count; node* next; }; #define NHASH 22189 #define MULT 31 node* bin[NHASH]; unsigned int hash(const char *p) { unsigned int h = 0; for(; *p; p++) h = MULT * h + *p; return h % NHASH; } void incword(string word) { int hashCode = hash(word.data()); node* hashNode; for( hashNode = bin[hashCode]; hashNode != NULL; hashNode = hashNode->next) { if(strcmp(word.data(), hashNode->word) == 0) { hashNode->count++; return; } } hashNode = new node(); hashNode->count = 1; hashNode->word = new char[word.size()]; strcpy(hashNode->word, word.data()); hashNode->next = bin[hashCode]; bin[hashCode] = hashNode; } int main(int argc, char* argv[]) { const static char* InputFilePath = argv[1]; const static char* OutputFilePath = argv[2]; if(argc != 3) { cout << "The first argument is the input file path," " and the second one is the output file path" << counter =" 0;">> word) { incword(word); } cout << "Clock After Read Hash:" << counter =" 0;" hashnode =" bin[counter];" hashnode =" hashNode-">next) { fileOutputHash <<>word << "\t\t\t\t" <<>count <<>
Output: Clock Before Read Map:1 Clock After Read Map:8428 Clock Before Write Map:8430 Clock After Write Map:8830 Press any key to continue . . .
No comments:
Post a Comment