Saturday, February 4, 2012

Hash de-duplication in Perl


If I were to ask you to take a list of values and remove duplicates, how would you do it? The most obvious way would be to compare every value to every other value and removing anything equivalent. However this is probably the most inefficient way as well.

Thankfully, a hash data structure has some properties that helps us with this kind of task. Due to the fact that hash keys cannot be duplicated we can just load every value as a hash key with a value. I usually choose 1.


If you run the code above you should get similar output to:


As you can see, all duplicates have been removed.

Every time you insert an item from the array as the hash key it's going to assign a "1" to that value. So once a duplicate comes along it will just write another "1" into the same hash key. Happy de-duplicating.

3 comments:

  1. an even shorter way to get the hash filled could be:

    my %number_hash = map { $_ => 1 } @list;

    and yet shorter:

    use List::MoreUtils qw(uniq);
    my @unique_numbers = uniq @list;

    ReplyDelete
  2. Instead of filling the hash with a loop or map you could use:

    my %number_hash ;
    @number_hash{@list} = undef ;
    my @unique_numbers = keys %number_hash ;

    ReplyDelete
  3. This is a common problem and it's great to see a nice, succinct solution.

    ReplyDelete