11

Is there a way to get the line number (and maybe filename) where a __DATA__ token was coded? Or some other way to know the actual line number in the original source file where a line of data read from the DATA filehandle came from?

Note that $. counts from 1 when reading from the DATA filehandle. So if the line number of the __DATA__ token were added to $. it would be what I'm looking for.

For example:

#!/usr/bin/perl
while (<DATA>) {
  my $n = $. + WHAT??;
  die "Invalid data at line $n\n" if /bad/;
}

__DATA__
something good
something bad

I want this to say "Invalid data at line 9", not "line 2" (which is what you get if $. is used by itself).

  • 1
    Just to clarify: I don't necessarily have control of the file containing the _DATA_ so any tricks which use special code in that file are not feasible. The information must be obtainable from wherever the filehandle is being read, which may be in another module (e.g. reading from filehandle otherpackagename::DATA). – jimav Apr 22 at 2:54
2

Perl keeps track of the file and line at which each symbol is created. A symbol is normally created when the parser/compiler first encounters it. But if __DATA__ is encountered before DATA is otherwise created, this will create the symbol. We can take advantage of this to set the line number associated with the file handle in DATA.

For the case where the Package::DATA handle is not used in Package.pm itself, the line number of the __DATA__ token could be obtained via B::GV->LINE on the DATA handle:

$ cat Foo.pm
package Foo;

1;
__DATA__
good
bad
$ perl -I. -MFoo -MB -e '
   my $ln = B::svref_2object(\*Foo::DATA)->LINE;
   warn "__DATA__ at line $ln\n";
   Foo::DATA->input_line_number($ln);
   while(<Foo::DATA>){ die "no good" unless /good/ }
'
__DATA__ at line 4
no good at -e line 1, <DATA> line 6.

In the case where the DATA handle is referenced in the file itself, a possible kludge would be to use an @INC hook:

$ cat DH.pm
package DH;

unshift @INC, sub {
        my ($sub, $fname) = @_;
        for(@INC){
                if(open my $fh, '<', my $fpath = "$_/$fname"){
                        $INC{$fname} = $fpath;
                        return \'', $fh, sub {
                                our (%ln, %pos);
                                if($_){ $pos{$fname} += length; ++$ln{$fname} }
                        }
                }
        }
};
$ cat Bar.pm
package Bar;

print while <DATA>;

1;
__DATA__
good
bad
$ perl -I. -MDH -MBar -e '
    my $fn = "Bar.pm";
    warn "__DATA__ at line $DH::ln{$fn} pos $DH::pos{$fn}\n";
    seek Bar::DATA, $DH::pos{$fn}, 0;
    Bar::DATA->input_line_number($DH::ln{$fn});
    while (<Bar::DATA>){ die "no good" unless /good/ }
'
good
bad
__DATA__ at line 6 pos 47
no good at -e line 6, <DATA> line 8.

Just for the sake of completion, in the case where you do have control over the file, all could be easily done with:

print "$.: $_" while <DATA>;
BEGIN { our $ln = __LINE__ + 1; DATA->input_line_number($ln) }
__DATA__
...

You can also use the first B::GV solution, provided that you reference the DATA handle via an eval:

use B;
my ($ln, $data) = eval q{B::svref_2object(\*DATA)->LINE, \*DATA}; die [email protected] if [email protected];
$data->input_line_number($ln);
print "$.: $_" while <$data>;
__DATA__
...

None of these solutions assumes that the source file are seekable (except if you want to read the DATA more than once, as I did in the second example), or try to reparse your files, etc.

  • perl will set the gp_line field of a GV (the line num which appears in error messages and such) to the first line which brings it into existence -- and if you don't use the eval in the last snippet, the line will be set to 2 instead of 5. – mosvy Apr 23 at 13:08
  • For most people, the "all could be easily done with" solution is probably sufficient, and the best solution presented. Pity it's hidden. (It doesn't appear to be sufficient for the OP, however.) – ikegami Apr 23 at 13:56
7

In systems that support /proc/<pid> virtual filesystems (e.g., Linux), you can do:

# find the file where <DATA> handle is read from
my $DATA_FILE = readlink("/proc/$$/fd/" . fileno(*DATA));

# find the line where DATA begins
open my $THIS, "<", $DATA_FILE;
my @THIS = <$THIS>;
my ($DATA_LINE) = grep { $THIS[$_] =~ /^__DATA__\b/ } 0 .. $#THIS;
  • 1) Don't need to reopen; just seek to 0. – ikegami Apr 22 at 12:34
  • 2) Note that this approach can give you a false positive (e.g. in the extremely rare even that \n__DATA__\n exists in a mutli-line string literal). – ikegami Apr 22 at 12:35
3

File don't actually have lines; they're just sequences of bytes. The OS doesn't even offer the capability of getting a line from a file, so it has no concept of line numbers.

Perl, on the other hand, does keep track of a line number for each handle. It is accessed via $..

However, the Perl handle DATA is created from a file descriptor that's already been moved to the start of the data —it's the file descriptor that Perl itself uses to load and parse the file— so there's no record of how many lines have already been read. So the line 1 of DATA is the first line after __DATA__.

To correct the line count, one must seek back to the start of the file, and read it line by line until the file handle is back at the same position it started.

#!/usr/bin/perl
use strict;
use warnings qw( all );

use Fcntl qw( SEEK_SET );

# Determines the line number at the current file position without using «$.».
# Corrects the value of «$.» and returns the line number.
# Sets «$.» to «1» and returns «undef» if unable to determine the line number.
# The handle is left pointing to the same position as when this was called, or this dies.
sub fix_line_number {
   my ($fh) = @_;
   ( my $initial_pos = tell($fh) ) >= 0
      or return undef;
   seek($fh, 0, SEEK_SET)
      or return undef;

   $. = 1;
   while (<$fh>) {
      ( my $pos = tell($fh) ) >= 0
         or last;

      if ($pos >= $initial_pos) {
         if ($pos > $initial_pos) {
            seek($fh, $initial_pos, SEEK_SET) 
               or die("Can't reset handle: $!\n");
         }

         return $.;
      }
   }

   seek($fh, $initial_pos, SEEK_SET)
      or die("Can't reset handle: $!\n");

   $. = 1;
   return undef;
}

my $prefix = fix_line_number(\*DATA) ? "" : "+";

while (<DATA>) {
   printf "%s:%s: %s", __FILE__, "$prefix$.", $_;
}

__DATA__
foo
bar
baz

Output:

$ ./a.pl
./a.pl:48: foo
./a.pl:49: bar
./a.pl:50: baz

$ perl <( cat a.pl )
/dev/fd/63:+1: foo
/dev/fd/63:+2: bar
/dev/fd/63:+3: baz
0

Comparing the end of the file to itself in reverse might do what you want:

#!/usr/bin/perl
open my $f, "<", $0;
my @lines;
my @dataLines;
push @lines ,$_ while <$f>;
close $f;
push @dataLines, $_ while <DATA>;

my @revLines= reverse @lines;
my @revDataLines=reverse @dataLines;
my [email protected];
my $offset=0;

$offset++ while ($revLines[$offset] eq $revDataLines[$offset]);
$count-=$offset;

print "__DATA__ section is at line $count\n";

__DATA__
Hello there
"Some other __DATA__
lkjasdlkjasdfklj
ljkasdf

Running give a output of :

__DATA__ section is at line 19

The above script reads itself (using $0 for file name) into the @lines array and reads the DATA file into the @dataLines array.

The arrays are reversed and then compared element by element until they are different. The number of lines are tracked in $offset and this is subtracted from the $count variable which is the number of lines in the file.

The result is the line number the DATA section starts at. Hope that helps.

0

Thank you @mosvy for the clever and general idea.

Below is a consolidated solution which works anywhere. It uses a symbolic reference instead of eval to avoid mentioning "DATA" at compile time, but otherwise uses the same ideas as mosvy.

The important point is that code in a package containing __DATA__ must not refer to the DATA symbol by name so that that symbol won't be created until the compiler sees the __DATA__ token. The way to avoid mentioning DATA is to use a filehandle ref created at run-time.

# Get the DATA filehandle for a package (default: the caller's), 
# fixed so that "$." provides the actual line number in the 
# original source file where the last-read line of data came
# from, rather than counting from 1.
#
# In scalar context, returns the fixed filehandle.
# In list context, returns ($fh, $filename)
#
# For this to work, a package containing __DATA__ must not 
# explicitly refer to the DATA symbol by name, so that the 
# DATA symbol (glob) will not yet be created when the compiler 
# encounters the __DATA__ token.
#
# Therefore, use the filehandle ref returned by this 
# function instead of DATA!
#
sub get_DATA_fh(;$) {
  my $pkg = $_[0] // caller;

  # Using a symbolic reference to avoid mentioning "DATA" at
  # compile time, in case we are reading our own module's __DATA__
  my $fh = do{ no strict 'refs'; *{"${pkg}::DATA"} };

  use B;
  $fh->input_line_number( B::svref_2object(\$fh)->LINE );

  wantarray ? ($fh, B::svref_2object(\$fh)->FILE) : $fh
}

Usage examples:

my $fh = get_DATA_fh;  # read my own __DATA__
while (<$fh>) { print "$. : $_"; }

or

my ($fh,$fname) = get_DATA_fh("Otherpackage");
while (<$fh>) {  
    print " $fname line $. : $_";
}

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.