Variable Interpolation in Double-Quoted Strings [000913]

This document attempts to detail the interpolation behaviour of perl in double-quoted strings. It was originally written and tested using Perl 5.005_03, but I've just re-tested with 5.6.0.

There are a couple of error message differences (although I've left the old 5.005_03 errors because the differences are negligible) and one fairly alarming bug that has been introduced in 5.6.0 (clearly detailed in Ex 21.

It was born from a discussion on comp.lang.perl.misc. It was written by Jason King, a reasonably unexciting contributor to c.l.p.misc. If you have any questions after reading this document then don't mail me, I've only included my email address so people can report errors in this document or in the unlikely event that someone wants to say thanks (even in that case - keep it concise *8^).

Perhaps Randal L. Schwartz said it best when he said simply Perl doesn't interpolate expressions, just variables, we'll see as we go along how accurate this is. It's certainly a good place to start, we'll call it Theorem One (or T1 for short).

The pot is stirred.

Let's look at the original example that started all this, posted by Alan Curry in c.l.p.misc

#!/usr/bin/perl -wl
use strict;

my @a=(1000, 10, 20, 30, 40, 50, 60);
my $i=2;
my $f=sub { return 4 };
my $r=\@a;

print "$i";
print "$a[0]";
print "@a";
print "$i+1";
print "$a[$i+1]";
print "$a[$i+1]+1";
print "$f->()+1";
print "$r->[1]";
print "$a[$f->()+1]+1";
print "@a[2..$f->()]";
__END__

I suggest that you copy this code and run it, have a look at the output before proceeding, see if you get the output that you expected. And if not then have a think about why not before going further.

We're going to go through each line and explain the interpolation that's happening. The first few will be boring and predictable - they're done for completeness and because I'm not making any assumptions about the Perl experience of the reader (not at this point anyway *8^).

Ex 1

print "$i";     # Output: 2

Simple interpolation, a variable is seen and its value is substituted. This conforms to T1.

Ex 2

print "$a[0]";      # Output: 1000

Slightly more complication than the above, but $a[0] is still a variable, so interpolation is done and the value substituted. For those that think of the [] as an index operator, Perl doesn't really see it this way when it comes to strings. Perl treats $a[0] as a single variable and so just interpolates it. This still conforms to T1 therefore - because no expression evaluation was done.

Ex 3

print "@a";     # Output: 1000 10 20 30 40 50 60

This is a simple one also, @a is clearly a variable, it's an array. So it's interpolated and the results substituted. T1 is still happy.

Ex 4

print "$i+1";     # Output: 2+1

From T1 we see that the variable interpolation is done, but the expression is not evaluated. So the $i is substituted, but the resulting string '2+1' is not evaluated any further.

Ex 5

print "$a[$i+1]";   # Output: 30

Here we see T1 break down. It said that expression evaluation is not done, but in this example clearly the 2+1 expression is evaluated. It might be clear to some that this is only done in the process of interpolating the variable, but to my mind T1 needs a little amendment, so let's create T2.

T2: The variables in a string are interpolated, but no operators are evaluated UNLESS the interpolation of a variable demands a value from that operator.

So does T2 cover the example at hand ? Perl sees the $a[something] variable, but to index the @a array it needs a value for 'something', so it evaluates $i+1. And extremely importantly: this evaluation is not done in string context, it is done as if that code was a normal line of code. Then with that value it gets the array element and substitutes that. T2 seems to work.

Ex 6

print "$a[$i+1]+1";   # Output: 30+1

Again, the $a[$i+1] part undergoes the same interpolation and evaluation as above, but the resulting string '30+1' is not processed any further, because by T2 we only evaluate operators IF a variable interpolation demands it. So T2 works here too.

Ex 7

print "$f->()+1";    # Output: CODE(0x9f9b8c)->()+1

Ok, so here's a strange one. We often think of references as being the value of the thing they reference, rather than values in their own right - with that sort of thinking you'll get bitten by this example. $f is a variable all unto itself, the dereference operator does not need to be evaluated to get a value for $f. Therefore it is not evaluated, the scalar value of $f is substituted into the string and the '->()+1' string is appended for output. T2 stands (although possible not the way we'd usually prefer).

Ex 8

print "$r->[1]";   # Output: 10

But what about this one, if what we said in Ex 7 was correct, then this should behave the same way. $r has a value in a string context just like $f did in Ex 7, and it doesn't demand a value from the dereference operator. So why is the dereference done ? This must contradict T2. Well, it certainly does. This would be the exception to that rule. You'll find that $hashref->{name} does the same thing. So let's write that exception into our theorem to make T3.

T3: The variables in a string are interpolated, but no operators are evaluated UNLESS the interpolation of a variable demands a value from that operator OR the operator is a dereference operator and is followed by either a '[' or a '{'.

Does that sound like an ugly exception ? Well I'm afraid it's true. We'll see that later in "How tightly do '[' and '{' bind ?", but for now let's test our T3 with the current example.

The dereference operator is followed by a '[' so the operator is evaluated, and the result substituted. So T3 is happy.

And before we move on, let's just double check Ex 7 (which also used the dereference operator) with T3. The dereference operator in that example was not followed by '[' or '{' so no evaluation was done. T3 is still happy.

Ex 9

print "$a[$f->()+1]+1";    # Output: 50+1

Doesn't this look messy ? T3 doesn't think so. $a[something] is a variable that needs a value to complete interpolation. So $f->()+1 is evaluated. This gives us a value of 5, which we use to index the @a array to get a value that's then substituted into the string to give us '50+1', no further variable interpolation exists, so no further evaluation is required. T3 is happy.

Ex 10

print "@a[2..$f->()]";   # Output: 20 30 40

Again, a walk in the park for T3. @a[something] is an array slice and needs to have 2..$f->() evaluated before interpolation can be complete. The resulting value (2,3,4) is used to take the array slice. T3 is yawningly happy.

The trouble with Alan.

So it would appear that we've got a good theorem to use for predicting Perl's interpolation. Let's have a look at some more examples, also posted by Alan Curry (he's such a troublemaker *8^).

Again, I strongly urge you to copy-paste these examples into your favourite editor and run them before proceeding. It adds a lot of value to this discussion if you have already run the examples, and therefore have an expectation about the output of each one.

my %h=(foo=>'bar', bar=>'baz');
my $s='foo';

print "$h{'foo'}";
print "$h{$s}";
print "$h{$h{$s}}";
print "$h{\"$h{$s}\"}";

Ex 11

print "$h{'foo'}";    # Output: bar

Hash elements (just like array elements) are treated as variables in their own right. And the same rules apply as do those for array elements. So by T3 (in fact by T2 and T1) the above element is looked up in the %h hash and substituted into the string. No surprises.

Ex 12

print "$h{$s}";     # Output: bar

Here Perl sees $h{something} and knows that to find the hash element it must evaluate $s, it does that and gets $h{foo} which it looks up in %h and substitutes into the string. T3 is happy.

Ex 13

print "$h{$h{$s}}";   # Output: baz

Again, Perl sees $h{something} and therefore has to evaluate $h{$s}. It does this and then has $h{bar} which it substitutes into the string. T3 is not at all challenged by this.

Ex 14

print "$h{\"$h{$s}\"}";   # Output: baz

A little more complex, here Perl sees $h{something} and needs to evaluate "$h{$s}" (notice that this code actually contains the quotes) before it can substitute the variable. Of course, in Perl a string is a valid piece of code and it will evaluate it. Therefore Perl evaluates the string "$h{$s}", which clearly requires more interpolation, it interpolates $h{$s} and gets bar which it then uses to evaluate $h{foo} to get baz which it then substitutes into the original string.

We'll mention it again here, that after the original variable $h{something} is seen in the original string, Perl then sets about evaluating 'something' in a non-string context. Perl needs a value for this 'something' before it can complete interpolation. In Ex 14 this 'something' happened to be another double-quoted string (a double-quoted string happens to be a valid statement in Perl), so it was evaluated and a second level of interpolation occurred. But this second interpolation had nothing to do with the first - it only occured because Perl was looking for a value for 'something' and so evaluated 'something', which happened to be a string.

Just in case you don't understand what I mean by the second level of quotation and interpolation, let's look at another example which uses single quotes instead of double quotes.

Ex 15

print "$h{'$h{$s}'}";   # Output: Use of uninitialized value at - line 28.

If you run this code then you'll get a warning of Use of uninitialized value at - line %d., this is because Perl did the following.

First it sees $h{something} and so evaluates the contents of 'something', which are '$h{$s}'. Now you'll remember me saying that this evaluation is done in a non-string context. Well, here it will be very obvious. The expression '$h{$s}' (note the single quotes, they're part of the expression) is not interpolated, so it literally has the value of the string $h{$s} which Perl then attempts to lookup in the %h hash. As the message says, it can't find it - so we get a warning about using an uninitialised value.

This is in stark contrast to the previous example where there was a double-quoted string that was interpolated further.

How tightly do '[' and '{' bind ?

You'll remember that when we wrote down T3 that we made the rather ugly exception concerning a dereference operator followed by either a '[' or a '{'. Here we're going to revisit that, and have a look at a few examples that will show you that this is in fact how Perl behaves.

The first examples that we're going to look at will not be using the dereference operator. We want to first see how tightly the '[' and '{' bind to a variable name. This is an important point to understand for anyone who thinks of '[' or '{' as an operator - they really are not treated as such - they're absent from the perlop manual for a reason.

Some of the examples below intentionally cause syntax errors, so we haven't listed them as a working program, just run them one at a time. With each example we're showing both the '[' version and the '{' version. All the examples assume the following simple declaration:

#!/usr/bin/perl -wl
# leave out the 'use strict' this time

my $a = 4;         # a simple scalar

Ex 16

print "$a+5";     # Output: 4+5

Nothing amazing about the results of this, as we've seen with T2 and T3 the + operator is not evaluated unless it's needed for a variable interpolation - here it clearly isn't so it's just part of the string.

Ex 17

print "$a[0]";      # Output: Use of uninitialized value at - line 2.
print "$a{name}";   # Output: Use of uninitialized value at - line 2.

What's going on here ? You'll get a warning (from the '-w') of Use of uninitialized value at - line %d. indicating that Perl's not using your $a but instead it's trying to evaluate the list member $a[0].

What does this mean ? It means that Perl binds very tightly to the '[' and '{' tokens. It doesn't do any checking of the symbol table, doesn't care that it skipped past the valid variable $a, it just sees the '[' or '{' and tries to evaluate the list member.

Ex 18

print "$a[";      # Output: Missing right bracket at - line 2, within string
print "$a{";      # Output: Missing right bracket at - line 2, within string

You'd expect Perl at least to know what you're talking about here ! You haven't even included a full list element syntax. But Perl is blind to this, all it sees is the '[' or '{', and it's then looking for the list member. Believe it or not, you have to go to the following lengths to get the output we're after here.

Ex 19

print "$a->[";     # Output: Missing right bracket at - line 2, within string
print "$a->{";     # Output: Missing right bracket at - line 2, within string

Same deal with dereferenced list members. Basically if Perl sees that damn '[' or '{' anywhere in the string then it jumps on it and begins evaluation, even when it has a perfectly good variable to work with.

Ex 20

print "$a" . "[";   # Output: 4[
print "$a" . "{";   # Output: 4{

Here Perl performs the interpolation before it evaluates the . (string concatenation) operator, so by the time Perl sees the '[' or the '{' it can't do anything with it, and finally we get our output.

Ex 21

print "${a}[";      # 5.005_03 Output: 4[
                    # 5.6.0    Output: Name "main::a" used only once: possible typo at - line 3.
                                       Use of uninitialized value in concatenation (.) at - line 3.
                                       [
print "${a}{";      # 5.005_03 Output: 4{
                    # 5.6.0    Output: Name "main::a" used only once: possible typo at - line 3.
                                       Use of uninitialized value in concatenation (.) at - line 3.
                                       {

In 5.005_03 you can see that we had another way to tell perl that $a is it's own variable and should not be confused with the '[' or '{'. But in 5.6.0 the parenthesis seem to force perl to evaluate $a as a package variable and hence there's a warning and an error.


That's all really, hope you understand a bit more about variable interpolation in strings.

Copyright © 1999 Jason King. All rights reserved.