NeuroAgent

Why Identical Delegates Behave Differently in C#

Discover the reasons for the 10x performance difference between identical delegates in C#. Learn about JIT optimization, caching, and inlining.

Question

Why do identical delegates behave differently, with a 10x speed difference?

What kind of nonsense is this happening. In the example, 2 identical delegates are created (unless I missed something after checking 1000 combinations of causes), the one that runs first will be 10 times faster. Why does this happen, I unexpectedly caught this.

For simplicity, here’s an example where I measure the function’s performance, I created a For extension to execute a function N times for array elements, without it the effect is not observed, if you insert a loop inside m(()=>{for…});.

I did a bit of debugging, and it seems I noticed that the second one in the call order doesn’t get inlined at all, and there it’s a standard method instead of at least 2 instructions “lea lea eax, [rcx + rdx]; ret” there are 10-20 like in debug mode.

csharp
var arr = Enumerable.Range(0, 1_000_000).Select(i => (X: (uint)i, Y: 1u)).ToArray(); // doesn't depend on n, (tuple - artifact of cause search)
// identical delegates, the second in call order will be 10-100 times slower!!!!
var del = () => arr.For((a) => test1(a.Item1, a.Item2));
var del2 = () => arr.For((a) => test1(a.Item1, a.Item2));
m(del);    // depends on call order  
m(del2);

// some complex function up to 50 assembly lines of various instructions, effect observed somewhere
static uint test1(uint i, uint j)
{
    return i + j; 
}

public static class ArrayExtensions
{
    public static void For<T>(this T[] arr, Action<T> func)
    {
        int len = arr.Length; 
        foreach(var i in arr)
        {
            func(i);
        }
    }
}

public static void m(Action action, Action load = null)
{
    Stopwatch sw = Stopwatch.StartNew();
    load?.Invoke();
    action();  
    load?.Invoke();
    
    sw.Start();
    action();
    sw.Stop();
    int n = (int)(5_000 / (sw.Elapsed.TotalMilliseconds + 0.001f));
    if (n == 0)
        n = 1;
    List<double> times = new List<double>();
    for (int i = 0; i < n; i++)
    {
        load?.Invoke();
        sw.Restart();
        action();
        sw.Stop();
        times.Add(sw.Elapsed.TotalMilliseconds);
    }

    double sum = times.Sum();
    double average = sum / n;
    double err = Math.Sqrt(times.Aggregate(0.0, (acc, e) => acc += (e - average) * (e - average)) / (n - 1));
    Console.WriteLine($"Test func: {action.Method.Name}");
    Console.WriteLine($"Count invoke: {n}");
    Console.WriteLine("Standard deviation: {0,20:f4} ms", err);
    Console.WriteLine("Max:                {0,20:f4} ms", times.Max());
    Console.WriteLine("Min:                {0,20:f4} ms", times.Min());
    var c1 = Console.ForegroundColor;
    Console.ForegroundColor = ConsoleColor.Yellow;
    Console.WriteLine("Mean time         : {0,20:f4} ms", average);
    Console.ForegroundColor = c1;
    Console.WriteLine();
}

Test results:

Test func: b__1
Count invoke: 5803
Standard deviation:               0.0146 ms
Max:                              0.3772 ms
Min:                              0.0362 ms
Mean time         :               0.0509 ms

Test func: b__1
Count invoke: 55555
Standard deviation:               0.0091 ms
Max:                              0.4715 ms
Min:                              0.0355 ms
Mean time         :               0.0420 ms

Test func: b__2
Count invoke: 3902
Standard deviation:               0.0589 ms
Max:                              0.9081 ms
Min:                              0.3880 ms
Mean               :               0.4358 ms

Test func: b__2
Count invoke: 6124
Standard deviation:               0.0519 ms
Max:                              0.7931 ms
Min:                              0.3880 ms
Mean              :               0.4263 ms

What is the cause of this behavior and how can it be fixed?

NeuroAgent

Performance differences between identical delegates in C# can reach up to 10x due to .NET JIT compiler behavior and delegate caching. This behavior is related to runtime optimization mechanics rather than your code logic.

Table of Contents

Main Performance Reasons

Your observation is explained by several key factors in how the .NET JIT compiler works:

  1. Delegate caching: The JIT compiler caches created delegates, but this mechanism depends on the call order. As research indicates, “delegate caching created by lambda expressions depends on the compiler.”

  2. First call vs subsequent calls: The first delegate call can be significantly slower due to the costs of JIT compilation and optimization of that specific execution path.

  3. Lack of inlining: Delegates are almost never inlined, even if the JIT optimizer could determine the target method. Research shows that “callbacks through a delegate are never inlined.”

Role of JIT Compiler and Caching

The .NET JIT compiler applies several optimization strategies that directly affect delegate performance:

Delegate caching: According to research, “C# compiler does optimize this by caching the delegate and re-using it for both calls.” However, this mechanism doesn’t always work predictably.

In your case, the JIT compiler likely caches the first delegate when it’s executed, allowing subsequent calls to use already compiled code. The second delegate in the call order doesn’t receive this optimization, as the JIT compiler may treat them as different execution contexts.

Factors affecting caching:

  • Order of creation and invocation
  • Execution context (method where the delegate is created)
  • Presence of first-call optimizations

Method Inlining and Delegates

One of the main reasons for performance differences is the lack of inlining for delegates:

csharp
// Direct method call (can be inlined)
uint result = test1(i, j);

// Call through delegate (never inlined)
var action = (uint x, uint y) => test1(x, y);
uint result = action(i, j);

As research shows, “callbacks through a delegate are never inlined even if the jitter optimizer could deduce what the delegate’s target method might be.” This means that instead of 2-3 assembly instructions, you might have 10-20 instructions when calling through a delegate.

Performance comparison:

  • Direct method call: ~324 ms (in test)
  • Call through delegate: ~1904 ms (in test)
  • Virtual call: ~2714 ms (in test)

Execution Order and Optimization

Your observation about call order confirms that the JIT compiler applies optimizations based on execution history:

csharp
var del = () => arr.For((a) => test1(a.Item1, a.Item2));
var del2 = () => arr.For((a) => test1(a.Item1, a.Item2));

m(del);    // First call - cached, optimized
m(del2);   // Second call - may use cache, but not always

Why order matters:

  1. JIT compiler tracks call patterns
  2. After the first call, the method may be marked for optimization
  3. Subsequent calls in the same context may use optimized code
  4. Different delegates may be treated as different contexts, even if they’re identical

Solutions and Best Practices

To achieve stable delegate performance, the following is recommended:

1. Avoid recreating identical delegates

csharp
// Bad: creating new delegates in each iteration
for (int i = 0; i < count; i++)
{
    var action = () => Process(i);
    action();
}

// Good: create one delegate and reuse it
var action = () => { /* logic */ };
for (int i = 0; i < count; i++)
{
    action();
}

2. Use direct method calls where possible

csharp
// Instead of delegates for simple operations
uint result = i + j;  // Fast
// Instead of:
uint result = addDelegate(i, j);  // Slow

3. JIT “warming up”

csharp
// Execute the delegate once before the main loop
// to "warm up" the JIT compiler
del();

4. Use delegate caching

csharp
private static readonly Func<uint, uint, uint> CachedAdd = 
    (i, j) => i + j;

Detailed Analysis of Your Example

Your code demonstrates a classic case where the JIT compiler applies different optimization strategies:

Problem in the For method

csharp
public static void For<T>(this T[] arr, Action<T> func)
{
    int len = arr.Length; 
    foreach(var i in arr)
    {
        func(i);  // <-- Delegate call happens here
    }
}

Each call to func(i) is a delegate call that doesn’t get inlined. With 1,000,000 iterations, this creates enormous overhead.

Analysis of results

Your results show:

  • First delegate: ~0.05 ms (fast)
  • Second delegate: ~0.43 ms (8.6x slower)

This shows that the JIT compiler applied optimizations to the first delegate after the first call, but the second delegate is treated as a separate context.

Conclusion

The 10x performance difference between identical delegates occurs due to:

  1. JIT caching: the first delegate call is cached and optimized
  2. Lack of inlining: delegates are never inlined, creating additional instructions
  3. Execution context: JIT treats different delegates as different contexts
  4. Call order: early calls receive optimization benefits

Recommendations for improvement:

  • Create delegates once and reuse them
  • Avoid creating delegates inside hot loops
  • Use direct method calls for simple operations
  • Implement delegate caching for frequently used operations
  • Apply JIT “warming up” before main computations

This behavior is a normal characteristic of .NET JIT compiler operation, not an error in your code. For maximum performance, always consider the trade-off between delegate flexibility and direct method calls.

Sources

  1. Can the C# compiler or JIT optimize away a method call in a lambda expression? – Stack Overflow
  2. C# delegate compiler optimisation – Stack Overflow
  3. Are deterministically unchangable Actions, and Funcs Inlined by the JIT? – Stack Overflow
  4. What optimization hints can I give to the compiler/JIT? – Stack Overflow
  5. Compiled C# Lambda Expressions Performance – Stack Overflow