Clarify the situation with `ref` variables referring to `null`

Related: dotnet/roslyn#81574

Before `Span<T>` and more generally, `ref` fields in `ref struct`, were formally introduced in C#, a purely safe C# program (a program completely written in C# without `unsafe`) could never produce `ref` variables that refer to `null`. Therefore, the following optimization **was** valid:

```csharp
public static void Foo1(ref int arg) { int v = arg; }
public static void Foo2(ref int arg) { }
// Foo1 and Foo2 have the same meaning in, say, 2005.
// I've lost track of which feature was introduced when.
// 2005 (when C# 2.0 was released) is just a conservative estimation.
```

Since the introduction of `ref` fields in `ref struct`, the following is safe C#:

```csharp
public ref struct RefTo<T> { public ref T R; }

Foo1(ref default(RefTo<int>).R);
```

An underlying principle of C# language design is that it should minimize the number of undefined behaviors --- in particular, **for single-threaded safe C#, there shouldn't be any undefined behavior.**

Today, if there's code in `Foo1` that "use" `v`, then it will indeed trigger a load from `arg`, causing `NullReferenceException` if it's `null`. However, if the behavior is defined, the meaning of a statement should not be affected by what happens later, nor at the mercy of compiler optimizations (unless the method is compiled with exception relaxations, which doesn't apply here).

---

**Suggestion/Consideration 1.** Define what happens when obtaining the value of a variable reference to `null`, **or clearly state it as undefined.** If undefined, then the aforementioned optimization is still valid.

If defined, I argue it should result in `NullReferenceException`, for the following reasons.

- Today, IL-reading (meaning that there is `ldind.i4` in IL) a variable reference to `null` results in `NullReferenceException`, e.g., if you call another method passing its value as argument.
- It's the consistent with what happens when you access a member to a `null` object reference (i.e., `int unused = ((string)null).Length`).

If defined as `NullReferenceException`, then the optimization from `Foo1` to `Foo2` is no longer valid. Instead, the meaning of `Foo1` is

```csharp
// Foo1 and Foo3 should have the same meaning if we define
// reading from `null` variable reference throws `NullReferenceException`.
public static void Foo3(ref int arg)
{
  if (Unsafe.IsNullRef(ref arg))
    throw new NullReferenceException();
}
// Foo2 and Foo3 has the same meaning in 2005,
// since it wasn't possible for `ref int` to refer to `null`.
```

**Consideration 2.** What happens if you read a field to a value type referenced by `null`?

Note that in particular, this means

```csharp
[StructLayout(LayoutKind.Explicit)]
public struct SLarge
{
  [FieldOffset(70000)]
  public int V;
}
public static void Foo4(ref SLarge s)
{
  int v = s.V;
}
```

If it's defined to have `NullReferenceException`, then when compiled to native code, it must first trigger an access to `s`, then read `s.V`. Otherwise, since `s.V` is beyond 64KB, the code would throw `AccessViolationException` instead of the desired `NullReferenceException`.

**Consideration 3.** What happens if you take the address of a field to `null`?

```csharp
ref int i = ref default(RefTo<SLarge>).R.V;
```

**Consideration 4.** What happens if you invoke a method on `null`?

```csharp
ref int i = ref default(RefTo<int>).R;
i.Equals(null);
// Calling `i.ToString()` throws `NullReferenceException`
// because the first thing inside `ToString` is to read `m_value`.
```

---

My personal take on this.

When I created dotnet/roslyn#81574, I thought the right thing to do is to define accessing `null` variable reference as `NullReferenceException`, because that's consistent with what C# is.

Thinking about the method case, I'm not really sure. There's non-trivial penalty in generics code if we are to check for `null` before every method call.

```csharp
[MethodImpl(MethodImplOptions.NoInlining)]
public static void Foo4<T>(ref T arg)
  where T : IEquatable<int>
{
  arg.Equals(null);
}

default(RefTo<int>).R.Equals(null);
Foo4(ref default(RefTo<int>).R);
```

Today, both method calls succeed by passing `null` as `this` inside its body, because the method is implemented as follows:

```csharp
public override bool Equals(object o)
{
  return (o is int) && this.m_value == ((int)o).m_value;
}
```

When the other object is not `int`, the method doesn't read `this.m_value`, therefore completing successfully.

If we're to check for `null` before a method call, then the call site would be scattered with a lot of `cmp [rcx], ecx`.

I am more supportive for stating it as undefined behavior.

**Additional Consideration.** Maybe it should be defined as implementation-defined, constraining the possible consequences of it --- for example, one way to define it (not the current behavior!):

1. non-`volatile` reading from `ref` can be reordered or removed, as long as the meaning does not change in a single-threaded program in which `ref` refers to a valid, non-`null` location;
2. taking-address-of-field from `ref` can be reordered or removed, as long as ...;
3. non-`volatile` read-from-field from `ref` can be reordered or removed, as long as ...;
4. if a read or a take-address-of-field or a read-from-field is actually done, then it is guaranteed that a `NullReferenceException` is thrown if the variable reference refers to `null`;
5. invoking methods doesn't check for `null` variable reference.

Rules 1, 2, 3, 5 are consistent with what Roslyn does today. Rule 4 ensures that it's not `AccessViolationException`, nor arbitrary crazy behavior.

(Memo to self: The semantics of `_ = EXPR` involving `null` variable reference is a separate issue, dependent on the resolution of this issue. If this issue decides to undefine the meaning of reading from `null` variable reference, then the aforementioned discard semantics is a moot point. Otherwise, need to clarify whether `ref int r = ...; _ = r;` reads from `r`. I think yes, but a Roslyn dev doesn't think so.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarify the situation with `ref` variables referring to `null` #1498

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify the situation with ref variables referring to null #1498

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Clarify the situation with `ref` variables referring to `null` #1498