-
Notifications
You must be signed in to change notification settings - Fork 92
Description
Related: dotnet/roslyn#81574
Before Span<T> and more generally, ref fields in ref struct, were formally introduced in C#, a purely safe C# program (a program completely written in C# without unsafe) could never produce ref variables that refer to null. Therefore, the following optimization was valid:
public static void Foo1(ref int arg) { int v = arg; }
public static void Foo2(ref int arg) { }
// Foo1 and Foo2 have the same meaning in, say, 2005.
// I've lost track of which feature was introduced when.
// 2005 (when C# 2.0 was released) is just a conservative estimation.Since the introduction of ref fields in ref struct, the following is safe C#:
public ref struct RefTo<T> { public ref T R; }
Foo1(ref default(RefTo<int>).R);An underlying principle of C# language design is that it should minimize the number of undefined behaviors --- in particular, for single-threaded safe C#, there shouldn't be any undefined behavior.
Today, if there's code in Foo1 that "use" v, then it will indeed trigger a load from arg, causing NullReferenceException if it's null. However, if the behavior is defined, the meaning of a statement should not be affected by what happens later, nor at the mercy of compiler optimizations (unless the method is compiled with exception relaxations, which doesn't apply here).
Suggestion/Consideration 1. Define what happens when obtaining the value of a variable reference to null, or clearly state it as undefined. If undefined, then the aforementioned optimization is still valid.
If defined, I argue it should result in NullReferenceException, for the following reasons.
- Today, IL-reading (meaning that there is
ldind.i4in IL) a variable reference tonullresults inNullReferenceException, e.g., if you call another method passing its value as argument. - It's the consistent with what happens when you access a member to a
nullobject reference (i.e.,int unused = ((string)null).Length).
If defined as NullReferenceException, then the optimization from Foo1 to Foo2 is no longer valid. Instead, the meaning of Foo1 is
// Foo1 and Foo3 should have the same meaning if we define
// reading from `null` variable reference throws `NullReferenceException`.
public static void Foo3(ref int arg)
{
if (Unsafe.IsNullRef(ref arg))
throw new NullReferenceException();
}
// Foo2 and Foo3 has the same meaning in 2005,
// since it wasn't possible for `ref int` to refer to `null`.Consideration 2. What happens if you read a field to a value type referenced by null?
Note that in particular, this means
[StructLayout(LayoutKind.Explicit)]
public struct SLarge
{
[FieldOffset(70000)]
public int V;
}
public static void Foo4(ref SLarge s)
{
int v = s.V;
}If it's defined to have NullReferenceException, then when compiled to native code, it must first trigger an access to s, then read s.V. Otherwise, since s.V is beyond 64KB, the code would throw AccessViolationException instead of the desired NullReferenceException.
Consideration 3. What happens if you take the address of a field to null?
ref int i = ref default(RefTo<SLarge>).R.V;Consideration 4. What happens if you invoke a method on null?
ref int i = ref default(RefTo<int>).R;
i.Equals(null);
// Calling `i.ToString()` throws `NullReferenceException`
// because the first thing inside `ToString` is to read `m_value`.My personal take on this.
When I created dotnet/roslyn#81574, I thought the right thing to do is to define accessing null variable reference as NullReferenceException, because that's consistent with what C# is.
Thinking about the method case, I'm not really sure. There's non-trivial penalty in generics code if we are to check for null before every method call.
[MethodImpl(MethodImplOptions.NoInlining)]
public static void Foo4<T>(ref T arg)
where T : IEquatable<int>
{
arg.Equals(null);
}
default(RefTo<int>).R.Equals(null);
Foo4(ref default(RefTo<int>).R);Today, both method calls succeed by passing null as this inside its body, because the method is implemented as follows:
public override bool Equals(object o)
{
return (o is int) && this.m_value == ((int)o).m_value;
}When the other object is not int, the method doesn't read this.m_value, therefore completing successfully.
If we're to check for null before a method call, then the call site would be scattered with a lot of cmp [rcx], ecx.
I am more supportive for stating it as undefined behavior.
Additional Consideration. Maybe it should be defined as implementation-defined, constraining the possible consequences of it --- for example, one way to define it (not the current behavior!):
- non-
volatilereading fromrefcan be reordered or removed, as long as the meaning does not change in a single-threaded program in whichrefrefers to a valid, non-nulllocation; - taking-address-of-field from
refcan be reordered or removed, as long as ...; - non-
volatileread-from-field fromrefcan be reordered or removed, as long as ...; - if a read or a take-address-of-field or a read-from-field is actually done, then it is guaranteed that a
NullReferenceExceptionis thrown if the variable reference refers tonull; - invoking methods doesn't check for
nullvariable reference.
Rules 1, 2, 3, 5 are consistent with what Roslyn does today. Rule 4 ensures that it's not AccessViolationException, nor arbitrary crazy behavior.
(Memo to self: The semantics of _ = EXPR involving null variable reference is a separate issue, dependent on the resolution of this issue. If this issue decides to undefine the meaning of reading from null variable reference, then the aforementioned discard semantics is a moot point. Otherwise, need to clarify whether ref int r = ...; _ = r; reads from r. I think yes, but a Roslyn dev doesn't think so.)