How to Efficiently Use LINQ Intersect in C# with Large Collections
Why This is Important
When you work with large datasets in C#, performance and memory can quickly become a concern. The LINQ Intersect method might look straightforward, but the way you use it can make a big difference. Depending on which list you call it on, your code can either run smoothly or end up using more time and memory than necessary.
By understanding how Intersect works, you can write faster, more efficient C# code that handles big collections with ease.
Understanding this helps you:
- Write efficient code for large collections.
- Avoid hidden performance pitfalls.
- Make better design decisions in real-world applications.
How Intersect Works
The LINQ Intersect method works like this:
- Builds a
HashSet<T>from the sequence passed as the parameter. - Iterates over the calling sequence, checking if each element exists in the
HashSet. - Returns the unique intersection (removes duplicates by default).
Complexity Analysis
- Building a
HashSetof sizek→ O(k) time + memory forkelements. - Iterating over sequence of size
l→ O(l) membership checks.
Total → O(n + m) where n and m are sizes of the two lists.
Optimal Choice
Both orders are asymptotically the same, but constants matter:
- The
HashSetshould be built from the smaller collection (less memory + faster build). - The larger collection should be the one being iterated over.
Example
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
class Program
{
static void Main()
{
// Large list (n = 1,000,000)
var list1 = Enumerable.Range(1, 1_000_000).ToList();
// Smaller list (m = 1,000)
var list2 = Enumerable.Range(500_000, 1_000).ToList();
var sw = new Stopwatch();
// Case 1: Larger list calls Intersect()
sw.Start();
var result1 = list1.Intersect(list2).ToList();
sw.Stop();
Console.WriteLine($"list1.Intersect(list2) => {sw.ElapsedMilliseconds} ms, Count = {result1.Count}");
// Case 2: Smaller list calls Intersect()
sw.Restart();
var result2 = list2.Intersect(list1).ToList();
sw.Stop();
Console.WriteLine($"list2.Intersect(list1) => {sw.ElapsedMilliseconds} ms, Count = {result2.Count}");
}
}
Output
list1.Intersect(list2) => 29 ms, Count = 1000
list2.Intersect(list1) => 42 ms, Count = 1000
Rule of Thumb
Always call .Intersect() on the larger collection, passing the smaller one as the parameter.
This ensures:
-
Smaller HashSet build → less memory usage.
-
Faster execution, especially for large datasets.
