Add min_freq and freq_type to textstat_collocations() #68

koheiw · 2024-01-04T19:02:02Z

It is sometimes difficult to set min_count because we don't know how many collocations in the corpus. If the number is too low, we have to wait long time to get the result of computation.

How about adding min_freq and freq_type = c("count", "prop", "rank", "quantile") in a similar way to dfm_trim()? It is only to set min_count besed on the distribution in counts_seq.

quanteda.textstats/src/collocations.cpp

Lines 287 to 290 in 68a8489

    
           for (auto it = counts_seq.begin(); it != counts_seq.end(); ++it) { 
        
               // conver to a vector for faster itteration 
        
               seqs_all.push_back(std::make_pair(it->first, it->second.first)); 
        
               if (it->second.first < count_min) continue;

The text was updated successfully, but these errors were encountered:

koheiw added the enhancement New feature or request label Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add min_freq and freq_type to textstat_collocations() #68

Add min_freq and freq_type to textstat_collocations() #68

koheiw commented Jan 4, 2024

Add min_freq and freq_type to textstat_collocations() #68

Add min_freq and freq_type to textstat_collocations() #68

Comments

koheiw commented Jan 4, 2024