想想寻正的文章挺多的,删起来费事。将下面的脚本保存为deletejunks.pl, 假设新语丝的Web根目录是/var/www/localhost/htdocs/,那么输入
./deletejunks.pl /var/www/localhost/htdocs/ 西风独自凉 
列出所有西风的文章的前面几行接着是文章的完整路径,这时候可以检查检查,看看是不是有错误。
要删除的时候,输入:
./deletejunks.pl /var/www/localhost/htdocs/ 西风独自凉 2>deletelist && xargs -a deletelist rm
注意locale的设置,我的机器上设置的是zh_CN.gb2312
#!/usr/bin/perl
use encoding 'gb2312' , STDIN => 'gb2312', STDOUT => 'gb2312';
sub delete_by_author($) {
  my($txtfile) = @_;
  my($is_author) = 0;
  my($nonblank_line_number) = 1;
  open ARTICLE, "<:encoding(gb2312)", $txtfile;
  do {
      $line = <ARTICLE>;
      if ($line !~ /^\s+$/) {
        print $line;
        $nonblank_line_number ++;
        if ($line =~ /$ARGV[1]$/) {
          print {STDERR} $txtfile,"\n";
        }
      }
  ## search the top 4 non-blank lines in the article
  } while $nonblank_line_number < 5;
  close ARTICLE;
}
sub recurse($) {
  my($path) = @_;
  ## append a trailing / if it's not there
  $path .= '/' if($path !~ /\/$/);
  for my $eachFile ( glob($path.'*') ) {
    ## if the file is a directory
    if( -d $eachFile) {
      ## pass the directory to the routine ( recursion )
      recurse($eachFile);
    } else {
      #delete_by_author($eachFile) if($eachFile ~ /\.txt$/);
      if($eachFile =~ /\.txt$/) { delete_by_author($eachFile); }
    }
  }
}
if (! $ARGV[1]) {die "Usage: deletejunks.pl BaseDirectory Author \n";}
## initial call ... $ARGV[0] is the first command line argument
recurse($ARGV[0]);