Friday, July 16, 2010

Hash Table vs. STL's map

Based on "Programming Pearls" page 161
The favorite data structure in the implementation of STL's sets and maps is a balanced search tree. These containers are based on these balanced trees to perform better in find and retrieval procedures; as a result, implementation of a dictionary requires the usage of such a mechanism that is not necessary optimized in "insertion" scenarios, as an alternative we can use a custom hash table + hash algorithm that performs much faster in inserting the words into a dictionary. Here is a basic comparison of these two methods STL's map container
 #include  #include  #include  #include  #include   using namespace std;  int main(int argc, char* argv[]) {   const static char* InputFilePath  = argv[1];   const static char* OutputFilePath = argv[2];    if(argc != 3)   {     cout << "The first argument is the input file path,"       " and the second one is the output file path"       << int=""> mapWords;    //Read Data   ifstream fileInput(InputFilePath);   string word;    cout << "Clock Before Read Map:" <<>> word)     mapWords[word]++;    cout << "Clock After Read Map:" << int="">::iterator mapIterator;    cout << "Clock Before Write Map:" << mapiterator =" mapWords.begin();">first                    << "\t\t\t\t"                    <<>second                   <<>
Running on a text file of the Bible The result is: Clock Before Read Map:1 Clock After Read Map:8428 Clock Before Write Map:8430 Clock After Write Map:8830 Press any key to continue . . . The following version uses a cusom hash table + hash algorithm, notice the read time
 #include  #include  #include  #include   using namespace std;  struct node {   char *word;   int count;   node* next; };  #define NHASH 22189 #define MULT 31 node* bin[NHASH];  unsigned int hash(const char *p) {   unsigned int h = 0;   for(; *p; p++)     h = MULT * h + *p;   return h % NHASH; }  void incword(string word) {   int hashCode = hash(word.data());   node* hashNode;   for(     hashNode = bin[hashCode];     hashNode != NULL;     hashNode = hashNode->next)   {     if(strcmp(word.data(), hashNode->word) == 0)     {       hashNode->count++;       return;     }   }    hashNode = new node();   hashNode->count = 1;   hashNode->word = new char[word.size()];   strcpy(hashNode->word, word.data());   hashNode->next = bin[hashCode];   bin[hashCode] = hashNode; }  int main(int argc, char* argv[]) {   const static char* InputFilePath  = argv[1];   const static char* OutputFilePath = argv[2];    if(argc != 3)   {     cout << "The first argument is the input file path,"       " and the second one is the output file path"       << counter =" 0;">> word)   {     incword(word);   }    cout << "Clock After Read Hash:" << counter =" 0;" hashnode =" bin[counter];" hashnode =" hashNode-">next)     {       fileOutputHash          <<>word          << "\t\t\t\t"          <<>count          <<>
Output: Clock Before Read Map:1 Clock After Read Map:8428 Clock Before Write Map:8430 Clock After Write Map:8830 Press any key to continue . . .

No comments:

Post a Comment