Should We Initialize an Out Parameter Before a Method Returns?
A question as old as time...
Join the DZone community and get the full member experience.Join For Free
Surely every C# developer has used out-parameters. It seems that everything is extremely simple and clear with them. But is it really so? For a kickoff, let's start with a self-test task.
Let me remind you that out parameters must be initialized by the called method before exiting it.
Now look at the following code snippet and see if it compiles.
MyStruct - a value type:
If you confidently answered "yes" or "no" - I invite you to keep reading, since everything is not so clear...
Let's start with a quick flash back. How did we even dive into the study of out parameters?
It all started with the development of another diagnostic rule for PVS-Studio. The idea of the diagnostic is as follows - one of the method parameters is of the CancellationToken type. This parameter is not used in the method body. As a result, the program may not respond (or react untimely) to some cancellation actions, such as canceling an operation by the user's request. When viewing warnings of the diagnostic, we found code that looks something like this:
Obviously, this was a false positive, so I asked a colleague to add another unit test "with out parameters". He added tests, including a test of this type:
First of all, I was interested in tests with parameter initializations, but I took a closer look at this... And then it hit me! How does this code actually compile? Does it compile at all? The code was compiling. Then I realized I got an article coming up. :)
For the sake of experiment, we decided to change the CancellationToken to some other value type. For example, TimeSpan:
It does not compile. Well, that's to be expected. But why did the example with CancellationToken compile?
The Out Parameter Modifier
Let's recall again what is a parameter's out modifier. Here are the main theses taken from docs.microsoft.com (out parameter modifier):
- The out keyword causes arguments to be passed by reference;
- Variables passed as out arguments do not have to be initialized before being passed in a method call. However, the called method is required to assign a value before the method returns.
Please pay attention to the highlighted sentence.
Here is the question. What is the difference between the following three methods, and why does the last one compile, while the first and second do not?
So far, the pattern is not obvious. Maybe there are some exceptions that are described in the docks? For the CancellationToken type, for example. Although that would be a bit strange - what's so special about it? In the above documentation, I did not find any information about this. Here's what the documentation suggests: For more information, see the C# Language Specification. The language specification is the definitive source for C# syntax and usage.
Well, let's see the specification. We are interested in the "Output parameters" section. Nothing new - it is all the same: Every output parameter of a method must be definitively assigned before the method returns.
Well, since the official documentation and specification of the language did not give us answers, we will have to dig into the compiler. :)
You can download the Roslyn source code from the project page on GitHub. For experiments, I took the master branch. We will work with the Compilers.sln solution. As a starting project for experiments, we use csc.csproj. You can even run it on a file with our tests to make sure that the problem is reproducible.
For the experiments we will use the following code:
To check that the error really takes place, we will build and run the compiler on the file with this code. And indeed - the error is right there: error CS0177: The out parameter 'obj' must be assigned to before control leaves the current method
By the way, this message can be a good starting point for diving into the code. The error code itself (CS0177) is probably generated dynamically, whereas the format string for the message is most likely somewhere in the resources. And this is true - we find the ERR_ParamUnassigned resource:
By the same name, we find the error code - ERR_ParamUnassigned = 177, as well as several places of use in the code. We are interested in the place where the error is added (the DefiniteAssignmentPass.ReportUnassignedOutParameter method):
Well, that seems like the place we're interested in! We set a breakpoint and make sure that this fragment is what we need. According to the results, Diagnostics will record exactly the message that we saw:
Well, that's great. And now let's change MyStruct to CancellationToken, aaand... We still enter this code execution branch, and the error is recorded in Diagnostics. This means it's still there! That's a twist!
Therefore, it is not enough to track the place where the compilation error is added - we have to explore it further.
After some digging in the code, we go to the DefiniteAssignmentPass.Analyze method that initiated the analysis run. The method checks, among other things, that the out parameters get initialized. In it, we find that the corresponding analysis runs 2 times:
There is an interesting condition below:
The case is gradually becoming clearer. We are trying to compile our code with MyStruct. After strict and compat analysis we still get the same number of diagnostics that will be issued.
If we change MyStruct to CancellationToken in our example, strictDiagnostics will contain 1 error (as we have already seen), and compatDiagnostics will have nothing.
As a result, the above condition is not met and the method execution is not interrupted. Where does the compilation error go? It turns out to be a simple warning:
What happens in our case when using CancellationToken? The loop traverses strictDiagnostics. Let me quickly remind you that it contains an error - an uninitialized out parameter. Then branch of the if statement is not executed. It is because diagnostic.Severity is of DiagnosticSeverity.Error value, and the compatDiagnosticSet collection is empty. Then compilation error code is mapped with a new code - a warning's one. After, the warning is formed and written to the resulting collection. This is how the compilation error turned into a warning. :)
By the way, it has a fairly low level. So when you run the compiler, this warning may not be visible if you do not set the flag for issuing warnings of the appropriate level.
Let's run the compiler and specify an additional flag: csc.exe %pathToFile% -w:5
And we see the expected warning:
Now we have figured out where the compilation error disappears - it is replaced with a low-priority warning. However, we still do not have an answer to the question, what is the distinctiveness of CancellationToken and its difference from MyStruct? When analyzing the method with a MyStruct out parameter, compat analysis finds an error. Whereas when the parameter type is CancellationToken, the error can't be detected. Why is it so?
Here I suggest grabbing a cup of tea or coffee, because we are about to get down to a painstaking investigation.
I hope you took the advice and got ready. So let's move on. :)
Remember the ReportUnassignedParameter method in which the compilation error was written? Let's look at the calling method above:
The difference when executing these methods from strict and compat analysis is that in the first case, the slot variable has the value 1, and in the second - -1. Therefore, in the second case, the then branch of the if statement is not executed. Now we need to find out why slot has the value -1 in the second case.
Look at the method LocalDataFlowPass.VariableSlot:
In our case, _variableSlot does not contain a slot for the out parameter. Therefore, _variableSlot.TryGetValue(....) returns false. The code execution follows the alternative branch of the ?:, operator, and the method returns -1. Now we need to understand why _variableSlot does not contain an out parameter.
After digging around, we find the LocalDataFlowPass.GetOrCreateSlot method. It looks like this:
The method shows that there is a number of conditions when the method returns -1, and the slot will not be added to _variableSlot. If there is no slot for a variable yet, and all checks are successful, then an entry is made in _variableSlot: _variableSlot.Add(identifier, slot). We debug the code and see that when performing strict analysis, all checks pass successfully. Whereas when performing compat analysis, we finish executing the method in the following if statement:
The value of the forceSlotEvenIfEmpty variable is false in both cases. The difference is in the value of the IsEmptyStructType method: for strict analysis it is false, for compat analysis – true.
At this point I already have new questions and the desire to do some experiments. So it turns out that if the type of the out parameter is an "empty structure" (later we will get what this means), the compiler considers such code valid and does not generate an error, right? In our example, we remove the field from MyStruct and compile it.
And this code compiles successfully! Interesting... I can't recall any mention of such features in the documentation and specification. :)
Here comes another question: how does the code work when the type of the out parameter is CancellationToken? After all, this is clearly not an "empty structure". If you check out the code at referencesource.microsoft.com (link to CancellationToken), it becomes clear that this type contains methods, properties, and fields... Still not clear, let's keep digging.
Let's go back to the LocalDataFlowPass.IsEmptyStructType method:
Let's go deep (EmptyStructTypeCache.IsEmptyStructType):
And even deeper:
The code is executed by calling the EmptyStructTypeCache.CheckStruct method:
Here, the execution goes into then branch of the if statement, as the typesWithMembersOfThisType collection is empty. Check out the EmptyStructTypeCache.IsEmptyStructType method, where it is passed as an argument.
We're getting some clarity here - now we understand what is an "empty structure". Judging by the methods' names, this is a structure that does not contain instance fields. But let me remind you that there are instance fields in CancellationToken. So, we go the extra mile and check out the EmptyStructTypeCache.CheckStructInstanceFields method.
The method iterates over instance members. We get 'actualField' for each of them. We managed to get this value (field - not null) and next we check if the type of this field is an "empty structure". This means if we find at least one "non-empty structure", we also consider the original type to be a "non-empty structure". If all the instance fields are "empty structures", then the original type is also considered an "empty structure".
We'll have to go a little deeper. Don't worry, our dive will be over soon, and we'll put the dots on the 'i'. :)
Look at the method EmptyStructTypeCache.GetActualField:
Accordingly, for the CancellationToken type, we are interested in the SymbolKind.Field case-branch. We can only get into it when analyzing the m_source member of this type. It is because the CancellationToken type contains only one instance field – m_source).
Let's look at calculations in this case (branch in our case).
field.IsVirtualTupleField - false. We move on to the conditional operator and parse the conditional expression field.IsFixedSizeBuffer || ShouldIgnoreStructField(field, field.Type). field.IsFixedSizeBuffer is not our case. As expected the value is false. As for the value returned by calling the ShouldIgnoreStructField(field, field.Type) method, it differs for strict and compat analysis. A quick reminder – we analyze the same field of the same type.
Here is the body of the EmptyStructTypeCache.ShouldIgnoreStructField method:
Let's see what is different for strict and compat analysis. Well, you may have already guessed on your own. :)
Strict analysis:_dev12CompilerCompatibility – false, hence the result of the entire expression is false. Compat analysis: the values of all subexpressions are true; the result of the entire expression is true.
And now we follow the chain of conclusions, rising to the top from the very end. :)
In compat analysis, we think that we should ignore a single instance field of the CancellationSource type, which is m_source. Thus, we decided that CancellationToken is an "empty structure", hence no slot is created for it, and no "empty structures" are written to the cache. Since there is no slot, we do not process the out parameter and do not record a compilation error when performing compat analysis. As a result, strict and compat analysis give different results, which is why the compilation error is downgraded to a low-priority warning.
That is, this is not some special processing of the CancellationToken type. There is a number of types for which the lack of out parameter's initialization will not lead to compilation errors.
Let's try to see in practice which types will be successfully compiled. As usual, we take our typical method:
And try to substitute different types instead of MyType. We've already figured out that this code compiles successfully for CancellationToken and for an empty structure. What else?
If we use MyStruct2 instead of MyType, the code also compiles successfully.
When using this type, the code will compile successfully if MyExternalStruct is declared in an external assembly. If MyExternalStruct is declared in the same assembly with the CheckYourself method, it does not compile.
When using this type from an external assembly, the code no longer compiles, as we changed the access modifier of the _field field from private to public:
With this kind of change, the code will not compile either, since we changed the field type from String to int:
As you may have guessed, there is a certain scope for experimentation.
Generally speaking, out parameters must be initialized before the called method returns control to the caller. However, as practice shows, the compiler can make its own adjustments to this requirement. In some cases, a low-level warning will be issued instead of a compilation error. Why exactly this happens, we discussed in detail in the previous section.
But what about the types for which you can skip initializing out parameters? For example, parameter initialization is not required if the type is a structure with no fields. Or if all fields are structures with no fields. Here is the case with CancellationToken. This type is in the external library. Its only m_source field is of a reference type. The field itself is not available from external code. By these reasons the compilation is successful. Well, you can come up with other similar types - you'll be able to not initialize out parameters and successfully compile your code.
Going back to the question from the beginning of the article:
Does this code compile? As you have already understood, neither 'Yes' nor 'No' is the correct answer. Depending on what MyStruct is, what fields are there, where the type is declared, etc. – this code can either compile or not compile.
What we went through today is diving into the compiler's source code to answer a seemingly simple question. I think we will repeat this experience soon, as the topic for the next similar article is already there. Stay in touch.
By the way, subscribe to my Twitter account, where I also post articles and other interesting findings. This way you won't miss anything exciting.
Published at DZone with permission of Sergey Vasiliev. See the original article here.
Opinions expressed by DZone contributors are their own.