LogIn
I don't have account.

How to Efficiently Use LINQ Intersect in C# with Large Collections

Vishnu Reddy

53 Views

Why This is Important

When you work with large datasets in C#, performance and memory can quickly become a concern. The LINQ Intersect method might look straightforward, but the way you use it can make a big difference. Depending on which list you call it on, your code can either run smoothly or end up using more time and memory than necessary. By understanding how Intersect works, you can write faster, more efficient C# code that handles big collections with ease.

Understanding this helps you:

  • Write efficient code for large collections.
  • Avoid hidden performance pitfalls.
  • Make better design decisions in real-world applications.

How Intersect Works

The LINQ Intersect method works like this:

  1. Builds a HashSet<T> from the sequence passed as the parameter.
  2. Iterates over the calling sequence, checking if each element exists in the HashSet.
  3. Returns the unique intersection (removes duplicates by default).

Complexity Analysis

  • Building a HashSet of size kO(k) time + memory for k elements.
  • Iterating over sequence of size lO(l) membership checks.

Total → O(n + m) where n and m are sizes of the two lists.

Optimal Choice

Both orders are asymptotically the same, but constants matter:

  • The HashSet should be built from the smaller collection (less memory + faster build).
  • The larger collection should be the one being iterated over.

Example

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

class Program
{
    static void Main()
    {
        // Large list (n = 1,000,000)
        var list1 = Enumerable.Range(1, 1_000_000).ToList();
        // Smaller list (m = 1,000)
        var list2 = Enumerable.Range(500_000, 1_000).ToList();

        var sw = new Stopwatch();

        // Case 1: Larger list calls Intersect()
        sw.Start();
        var result1 = list1.Intersect(list2).ToList();
        sw.Stop();
        Console.WriteLine($"list1.Intersect(list2) => {sw.ElapsedMilliseconds} ms, Count = {result1.Count}");

        // Case 2: Smaller list calls Intersect()
        sw.Restart();
        var result2 = list2.Intersect(list1).ToList();
        sw.Stop();
        Console.WriteLine($"list2.Intersect(list1) => {sw.ElapsedMilliseconds} ms, Count = {result2.Count}");
    }
}

Output

list1.Intersect(list2) => 29 ms, Count = 1000
list2.Intersect(list1) => 42 ms, Count = 1000

Rule of Thumb

Always call .Intersect() on the larger collection, passing the smaller one as the parameter.

This ensures:

  • Smaller HashSet build → less memory usage.

  • Faster execution, especially for large datasets.

Responses (0)

Write a response

CommentHide Comments

No Comments yet.