LogIn
I don't have account.

HashSet in C#

DevSniper

146 Views

In modern C# development, choosing the right data structure can drastically improve the performance and readability of your code. When you need to store unique items, perform fast lookups or handle set operations like union and intersection the HashSet<T> is the collection of choice in .NET.

HashSet<T> is part of the System.Collections.Generic namespace and provides a highly optimized, hash-based implementation for storing unordered, distinct elements.

In this article we covers everything about HashSet in C# from concepts and internal workings to practical examples, advanced operations and performance analysis. By the end, you’ll have expert-level insight into when and how to use HashSet effectively in your applications.

What is a HashSet in C#?

A HashSet<T> is a generic collection in C# that stores unique elements and provides high-performance operations such as lookup, addition and removal, all typically in O(1) average time complexity. Unlike a List<T>, it does not allow duplicate values and does not maintain any specific order of elements.

Key characteristics of HashSet<T>

  • Unique Elements Only : Automatically prevents duplicates.
  • High Performance : Most operations (Add, Remove, Contains) have O(1) average time complexity.
  • Unordered Collection : Elements are not stored in insertion order.
  • Set Operations Support : You can perform union, intersection, difference and symmetric difference with other sets.
  • Type Safety : Generic type ensures compile-time safety.
  • Thread Safety : Not thread-safe by default.
  • Internal Structure : Uses hashing and buckets for quick element access.

Namespace :- System.Collections.Generic

Assembly :- System.Collections.dll

Signature :-
public class HashSet<T> :
   System.Collections.Generic.ICollection<T>,
   System.Collections.Generic.IEnumerable<T>,
   System.Collections.Generic.IReadOnlyCollection<T>,
   System.Collections.Generic.IReadOnlySet<T>,
   System.Collections.Generic.ISet<T>,
   System.Runtime.Serialization.IDeserializationCallback,
   System.Runtime.Serialization.ISerializable
  • HashSet provides constant time complexity O(1) for basic operations like add, remove and contains.
  • HashSet does not store duplicate elements. if you add duplicate values in the set, only one instance of that value will be available other will be discarded.
  • HashSet support set operations like union , intersection and difference. These operations are useful in scenarios such as to finding common element within sets or combining multiple sets
  • HashSet can store user-defined objects also
  • HashSet does not maintain order of element it uses hashing mechanism to store data for faster lookup
  • HashSet is not inherently thread-safe from multiple threads.

How HashSet Stores Data Internally in C#

To understand why and How HashSet<T> is fast and efficient, it’s important to look under the hood and see how it stores data internally.

HashSet in C# is built on top of a hash-based architecture, similar to a Dictionary<TKey, TValue> but it only stores keys (no values).

1. Internal Components

Internally, a HashSet maintains two core data structures

  • Buckets Array (int[] buckets)

    Each element in this array represents a bucket index that points to a chain of entries with the same hash code.

  • Entries Array (Entry[] entries)

    Each entry stores the actual data item along with its hash and a pointer to the next entry in case of collisions.

    Copy
    struct Entry
    {
        public int hashCode;   // Hash code of the element
        public int next;       // Index of the next entry in the same bucket
        public T value;        // The stored element
    }

2. Hashing Process

When you call Add(item)

  • Compute the hash code : The HashSet calculates the hash code of the item using its GetHashCode() method.
  • Transform the hash code: The raw hash code is typically masked (e.g. hashCode & 0x7FFFFFFF) to ensure a non-negative index.
  • After that computes the bucket index, by
    Copy
    int bucketIndex = hashCode % buckets.Length;
  • The item is placed in the corresponding bucket.
  • If two items have the same hash index (a collision), HashSet uses linked chaining. it links the new entry to the previous one via the next field.

3. Collision Handling (Chaining)

If multiple elements hash to the same bucket

  • They are stored as a linked list in the entries array.
  • The next field in each Entry points to the next entry with the same bucket index.
Example
Copy
buckets: [-1, -1, 0, -1, -1]
entries:
[0] -> hash=12, value="apple", next=-1
[1] -> hash=17, value="banana", next=-1

If a new item "grape" hashes to the same bucket as "apple", it gets linked

Copy
buckets[2] = 2 (grape)
entries[2] = { hash=12, value="grape", next=0 }  // points to apple

4. Lookup (Contains)

When you call Contains(item)

  • Compute the hash code -> find the bucket.
  • Traverse the linked entries in that bucket to find a match using Equals().
  • Time complexity :- Average case : O(1), Worst case (many collisions) : O(n)

5. Resizing & Load Factor

To maintain performance, HashSet<T> will resize its internal storage when the number of items grows beyond a certain threshold (load factor * buckets count). Although .NET doesn’t publicly document the exact threshold, the general pattern is

  • When Count exceeds _buckets.Length * LoadFactor (often ~0.75), allocate a new buckets array (typically a larger prime or next appropriate size) plus a new slots array.
  • Rehash all existing elements (i.e. recompute bucket index using new size) and copy them into the new structure, re-linking chains.
  • This ensures the average bucket chain length remains small (hence average O(1) operations).
  • Resize cost is O(n) (for rehashing), but amortised over many inserts remains efficient.

This ensures operations remain efficient even as data grows.

6. Equality Comparer

HashSet can also use a custom equality comparer for flexible comparison and hashing logic

Copy
HashSet<string> set = new HashSet<string>(StringComparer.OrdinalIgnoreCase);

This controls how items are compared and how their hash codes are computed.

Step-by-Step Visualization Example of HashSet Internal Storage

Step Operation Bucket Calculation Buckets (buckets[]) Entries (index -> [hash, value, next])
1 Start (Empty HashSet) [-1, -1, -1, -1, -1] (empty)
2 Insert "apple" (hash=12) 12 % 5 = 2 -> bucket 2 [-1, -1, 0, -1, -1] 0 -> [12, apple, -1]
3 Insert "banana" (hash=7) 7 % 5 = 2 -> collision with “apple”, becomes new head [-1, -1, 1, -1, -1] 1 -> [7, banana, 0]
0 -> [12, apple, -1]
4 Insert "grape" (hash=3) 3 % 5 = 3 [-1, -1, 1, 2, -1] 2 -> [3, grape, -1]
1 -> [7, banana, 0]
0 -> [12, apple, -1]
5 Insert "lemon" (hash=18) 18 % 5 = 3 -> collision with “grape”, becomes new head [-1, -1, 1, 3, -1] 3 -> [18, lemon, 2]
2 -> [3, grape, -1]
1 -> [7, banana, 0]
0 -> [12, apple, -1]
6 Insert "peach" (hash=4) 4 % 5 = 4 -> no collision [-1, -1, 1, 3, 4] 4 -> [4, peach, -1]
3 -> [18, lemon, 2]
2 -> [3, grape, -1]
1 -> [7, banana, 0]
0 -> [12, apple, -1]
7 Remove "peach" 4 % 5 = 4 -> clear bucket 4 [-1, -1, 1, 3, -1] 4 -> (free)
3 -> [18, lemon, 2]
2 -> [3, grape, -1]
1 -> [7, banana, 0]
0 -> [12, apple, -1]
8 Remove "banana" 7 % 5 = 2 -> bucket 2 now points to “apple” [-1, -1, 0, 3, -1] 1 -> (free)
4 -> (free)
3 -> [18, lemon, 2]
2 -> [3, grape, -1]
0 -> [12, apple, -1]

When to Use HashSet in C#

  • You need unique elements : If duplicates are not allowed or you want to enforce uniqueness automatically, HashSet is a great fit.
  • You want fast lookups / existence checks : Checking whether an item exists in a HashSet is very efficient (average O(1)).
  • You need to perform set operations (union, intersection, difference) : HashSet supports operations like UnionWith, IntersectWith, ExceptWith.
  • Order of items does not matter : If you don’t care about preserving insertion order or sorting, HashSet’s unordered nature is fine.

HashSet Constructors in C#

1. HashSet<T>()

Initializes a new empty HashSet<T> using the default equality comparer for the element type.

Copy
var set = new HashSet<string>();

Usage: Use when you want a simple, empty set with default settings.

2. HashSet<T>(IEnumerable<T> collection)

Initializes a new HashSet<T> that contains elements copied from the specified collection, using the default equality comparer. Duplicates in the collection are ignored.

Copy
var list = new List<int> { 1, 2, 2, 3 };
var set = new HashSet<int>(list);

Usage: Use when you want to create a set from an existing collection, ensuring uniqueness.

3. HashSet<T>(IEnumerable<T> collection, IEqualityComparer<T> comparer)

Initializes a new HashSet<T> that contains elements copied from the specified collection, using the specified equality comparer. Duplicates in the collection are ignored.

Copy
var list = new List<string> { "apple", "Apple", "banana" };
var set = new HashSet<string>(list, StringComparer.OrdinalIgnoreCase);

Usage: Use when you want to create a set from an existing collection with a custom equality comparer.

4. HashSet<T>(IEqualityComparer<T> comparer)

Initializes a new empty HashSet<T> using the specified equality comparer.

Copy
var set = new HashSet<string>(StringComparer.OrdinalIgnoreCase);

Usage: Use when you want an empty set with a custom equality comparer.

5. HashSet<T>(int capacity)

Initializes a new empty HashSet<T> with a specified initial capacity, using the default equality comparer.

Copy
var set = new HashSet<int>(100);

Usage: Use when you know the approximate number of elements in advance to optimize performance.

6. HashSet<T>(int capacity, IEqualityComparer<T> comparer)

Initializes a new empty HashSet<T> with a specified initial capacity and a specified equality comparer.

Copy
var set = new HashSet<string>(50, StringComparer.OrdinalIgnoreCase);

Usage: Use when you know the approximate number of elements and want a custom equality comparer.

HashSet Properties in C#

1. Count

Gets the number of elements contained in the set. This property provides the current size of the HashSet<T>.

Copy
var set = new HashSet<int> { 1, 2, 3 };
Console.WriteLine(set.Count); // Output: 3

Usage : Use when you need to determine how many unique elements are present in the set.

2. Comparer

Gets the IEqualityComparer<T> object used to determine equality for the values in the set. This property indicates how equality comparisons are performed within the HashSet<T>.

Copy
var set = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
Console.WriteLine(set.Comparer.Equals("apple", "APPLE")); // Output: True

Usage : Use when you need to understand or customize the equality logic used by the set.

Creating and Adding Elements to a HashSet in C#

A HashSet<T> stores unique elements, so any attempt to add duplicates is automatically ignored. You can populate a HashSet either using a collection initializer or the Add() method.

1. Using Collection Initializer

You can initialize a HashSet with elements directly at the time of creation using curly braces {}.

Copy
using System;
using System.Collections.Generic;

class Program
{
    static void Main()
    {
        var languages = new HashSet<string> { "C#", "Java", "C++", "Python", "React" };
        Console.WriteLine("Languages in HashSet:");
        foreach (var lang in languages)
        {
            Console.WriteLine(lang);
        }
    }
}
C#
Java
C++
Python
React

2. Using Add() Method

You can also create an empty HashSet and add elements one by one using the Add() method.

Copy
using System;
using System.Collections.Generic;

class Program
{
    static void Main()
    {
        var frameworks = new HashSet<string>();
        frameworks.Add("ASP.NET");
        frameworks.Add("Spring Boot");
        frameworks.Add("Django");
        frameworks.Add("Flask");
        frameworks.Add("ASP.NET"); // duplicate ignored
        Console.WriteLine("Frameworks in HashSet:");
        foreach (var f in frameworks)
        {
            Console.WriteLine(f);
        }
    }
}
ASP.NET
Spring Boot
Django
Flask