5

I have a regular expression in my script which I wrote years ago.

I know what this regex does (looking for percentages higher than 80%), but I don't remember its meaning/principles. I see the ternary operator is used, and also last closed parenthesis match, but what I don't know for example is the meaning of ??:

qr/^(\d+)%$(??{$^N>= 80 ? '':'^'})/

Can anybody explain this regex for me?

  • 4
  • 3
    and that's a rather tricky feature, btw. Why not if ( (/^([0-9]+)\s*%/)[0] > 80 ) ... – zdim Apr 17 at 19:10
  • @zdim nice one. Can't remember why I used embedded code. I will try your solution also. – taiko Apr 17 at 19:57
7

$(??{code}) executes the code, and then substitutes the result into the regular expression. Within this, $^N is replaced with whatever matched the most recent capture group, in this case (\d+). The regexp with this substitution is then matched again.

So if the string begins with a number followed by %, ^(\d+)% matches that. It then executes $^N >= 80 ? '' : '^', replacing $^N with the number. If the number is at least 80, the regexp becomes ^(\d+)%, and the whole match succeeds. But if the number is less than 80, it becomes ^(\d+)%^. Since the second ^ can't match in the middle of the string, the regular expression no longer matches.

So this regular expression matches strings that begin with a percentage at least 80.

  • fantastic, thank you very much. If you will ever write a book about programming, sign me in. I will pre-order :) – taiko Apr 17 at 19:48
10

Before I answer, I'd like to point out that use of the embedded perl code form (??...) can be fraught with errors if you do not fully understand its implications. I've written perl regexen for more than twenty years and my natural tendency is to always code a "use case" like this as a filter to the result of the regex rather than embed perl code directly into the regex. You have been warned.

Ok, let's pick this regex apart:

^           # start of text

(           # begin capture group
  \d+         # one or more digits 0-9
)           # end of capture group

%           # literal percent sign character

$           # end of text

(??{        # start embedded perl code

  $^N >= 80   # if last closed match group($^N) is greater than or equal to 80
    ? ''        # then return empty pattern ('') 
    : '^'       # else return start of text (^) pattern

})          # end embedded perl code

where $^N references the value of the most recent closed match pair and the (??{ ... }) zero-width clause will execute the perl code it wraps, converting the value it returns into a new regex that will be added to the original pattern.

So, in this example, we match one or more digits followed immediately by a percent sign character. Then, if the captured value is greater than or equal to 80, evaluate an empty pattern against the text (effectively allowing the overall pattern to match, returning the captured value), or if not, evaluate the ^ (start of text) pattern which cannot match at the end of the string, effectively returning nothing.

(Note that by added the /x modifier to your Perl regex, you can embed comments directly into the pattern which will also ignore embedded whitespace. I've found this to be a great way to document complicated regexen.)

  • 3
    Oh and it should be noted that the capture group will match values greater than 100%, which is probably not what is intended. Change the capture group to 100|\d{1,2} to match values from 0 to 100. – Rob Raisch Apr 17 at 20:18
  • very good point. However in this case value can't be greater than 100%, as we are talking about DISK USAGE – taiko Apr 17 at 20:38
  • 1
    Sure, but the regex you want to understand will allow values greater than 100. – Rob Raisch Apr 17 at 20:40
  • true true, good to remember bigger picture :) – taiko Apr 19 at 11:33

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.