Ad Zapping With Squid [SourceForge Logo]

Cameron Simpson <cs@zip.com.au>
SourceForge Project Page and mailing lists, Freshmeat Project Record, Changelog
Quick installs: Debian, FreeBSD, Gentoo, NetBSD.
Translations: Belorussian by Patricia Clausnitzer at FatCow, Hindi by Outshine Solutions.

This is a redirector for squid that intercepts advertising (banners, popup windows, flash animations, etc), page counters and some web bugs (as found). This has both aesthetic and bandwidth benefits. It's also easy to install. Note: you can use Apache instead of Squid if you like.

License

As of 13oct2002, this code is available under the terms of the BSD License.

It remains strongly my desire that this code is not used for censorship. By this, I mean that while it could be adapted to block all sorts of content, I wish that it not be so used without parallel provision of unblocked browsing. Feel free to protect yourself from stuff with this code; do not force your blinkered view onto others without their consent. There are instructions in this page for easy provision of zapped and unzapped browsing to users; please use them.

Previous licensing:
Until Wednesday 26may1999, this code was free for use by all. However, the Australian Government brought in some truly stupid and invasive legislation, so this code is now free except that it MAY NOT be used to enforce or support that legislation or other legislation of similar intent. I'm happy for people to use it to filter their own browsing, but not for people to force their morals onto others.

Background

For some time at my workplace we've been running an ad-zapping service on our web proxy. This page documents how it works, how to use it yourself, how to join the mailing list for updates of the pattern file, and the weirdnesses of our local setup (which you need not duplicate yourself).

Ad zapping is not a new idea. Basicly you interpose between the reader and the web some kind of filter which replaces those annoying ad banners with something unobtrusive. (There are a few motivations for this; see this digression for mine.)

I first came across it at my ISP (Zip World - www.zip.com.au) a few years ago. Their technique was to use a complicated proxy.pac file. They supplied two: one which zapped ads and one which didn't. The zapping one was, I discovered, a piece of JavaScript which told your browser to go to one proxy for URLs matching known ad patterns and to the main proxy for everything else. The former proxy simply returned a placeholder GIF for everything asked of it. Initially I copied this for use at our site.

This method is a bit cumbersome. Firstly, you have to run a special web server to serve the placeholder GIF. Secondly, JavaScript interpreters are slow and (in Netscape at least) a tad buggy - eventually the browser gets flakey and may fall over. Thirdly, not all browsers support JavaScript and those that do needn't support proxy.pac files. Finally, the file was a pain to maintain and the size was making me fear for the sanity of the JavaScript interpreters.

Enter squid, arguably the best web proxy around. One great feature is the redirector. This is a program which reads request information on its input and writes (possibly) redirected information on its output. If activiated, squid will consult it for every request, permitting easy interception of ads. All you have to do to activate it is place the line:

redirect_program /home/marshall/bin/squid_redirect
in your squid.conf file. Obviously, that pathname should be replaced by wherever you install the redirection program.

Attempt number 1 was a shell script. Short and effective, it was a simple while loop with a case statement. However, it seemed to have some scaling problems. Now it is a perl script called squid_redirect. In particular, because the expressions are compiled when the script starts the redirector runs quite efficiently.

Installation

The install is meant to be fairly easy: install the script, add one line to your squid.conf file, restart squid.

Microsoft Windows users should read the notes for Windows users below.
Smoothwall Firewall users may want to see Martin Pot's Smoothwall Ad Zap Installation Instructions.
There's also a less wordy quick'n'dirty installation kit here by Gaute Lund, with this readme file.

  1. Install squid.
    Frankly, this is worth doing even for a single user home system (squid's very easy to install, btw). Also, the ad-zapper is very useful when you're connected to the outside world with a modem link.

    Note: you can also use Apache instead.

    Note:

    minor security remark: of course your proxy (squid or apache) should not be available to the internet at large. Generally your proxy will be ok automatically, simply by being inside your firewall. However, if you install a proxy on some public machine you should make sure it has some sort of access control. If you're installing on a personal machine such as a laptop that is sometimes on a public net, probably your proxy should listen only on the local interface (127.0.0.1).
  2. Fetch the software.
    The easy thing to do is simply to fetch just the script for the default uncustomised install.
    Later, if you want to customise its behaviour, fetch this tarball: adzap-20110915.tar.gz which contains the redirector, a set of the replacement images and a wrapper script for customising the environment for the zapper.
    FreeBSD people: you can use the FreeBSD FRESHports port of the zapper.
    NetBSD people: you can use the NetBSD package of the zapper.
    Debian people: you can use the Debian package of the zapper.
    Gentoo people: you can fetch the zapper with emerge:
    emerge adzapper
  3. Install the redirector in some suitable spot (such as /usr/local/bin/squid_redirect).

    Note 1: The script must be executable. Run the command:

    chmod a+rx the-script
    when it's in place.

    Note 2: the first line of the script says:

    #!/usr/bin/perl
    You may want to change this to:
    #!/usr/local/bin/perl
    or suchlike if your perl isn't in /usr/bin. (Or put a symlink in /usr/bin - this may save you hassle with other perl scripts, many of which also expect a /usr/bin/perl.)

    Note 3: If you used a Windows box to fetch the script (eg via Internet Explorer) and then transfered it to the machine running your squid proxy then it's possible for the script to end up on your proxy in DOS text mode, which means it ends every line with a CR and a NL character (instead of just NL).
    If you suspect this, see this troubleshooting section.

  4. Insert the line:
    redirect_program /path/to/squid_redirect
    into the squid.conf file.
  5. Send a SIGHUP to your squid:
    kill -1 pid-of-squid
    You should also do this after you've updated the script; squid starts new instances of the redirector.
    Brent J. Nordquist <bjn@visi.com> notes that you can also say:
    squid -k reconfigure
    to do the same thing.
  6. Want to use a different placeholder image?
    Want to zap more than just ads?
  7. Help me keep the patterns up to date!
    Just keep half an eye on the zapping.
    If you find a page with an annoying amount of unzapped ads, let me know by email. Yes, that is my personal email address. No, do not worry that you message may be thought annoying. Also do not worry if I don't respond; sometimes I can be very slow on that; prod me again after a few weeks if you hear nothing.
    I will want to know the page itself as well as the ad image so I can sanity check and perhaps optimise or generalise the pattern. (No, I don't care where you browse; fear no censure!) I am more interested in zapping large or animated ads; static, small, cache-friendly ads are lower on the priority queue (and perhaps we should consider leaving them alone, to encourage their use). In particular, certain small, purely text, fast loading ads are not zapped by default; patterns for them are collected and maintained and the zapper can be told to zap them quite easily.
    If you find pages with content being zapped which should not be, also let me know by email.
    Just as with the above. Ad zapping is inherently a moving target. Some patterns will match things which are not ads. If people are going to use this facility, I must keep the patterns well tuned.

Notes for Windows users

It is possible to run squid and the zapper on a stand alone Windows box if your home LAN doesn't have a spare machine to run UNIX. An exchange with Carolyn Longfoot shows that the procedure is pretty well exactly the same as for UNIX except that the redirect_program line should look like this:
redirect_program C:/perl/bin/perl.exe c:/squid/etc/adzapper.pl
adjusting C:/perl/bin/perl.exe and c:/squid/etc/adzapper.pl to match your own install locations. This tip was obtained from the BannerFilter page. You will need SquidNT and ActivePerl or other versions of Squid and Perl for Windows, for example you might run both under Cygwin. It is also mentioned in this thread from the squid-users mailing list.

Using Apache as your proxy instead of Squid

Johannes Berg supplied a small patch to support using Apache2 as a proxy instead of squid. His addition to the Debian README for this says:
Alternatively, you can also use adzapper with Apache2. This has the advantage of being IPv6 compatible. To do this, make Apache2 load mod_proxy and mod_redirect and configure it as follows:

        ProxyRequests On
        RewriteEngine On
        RewriteLock /var/lock/apache2/rewrite-adzapper
        RewriteMap adzap prg:/usr/bin/adzapper.wrapper
        <Proxy *>
                Order deny,allow
                Deny from all
                Allow from localhost
                RewriteRule ^proxy:(.*)$ proxy:${adzap:$1|$1} [L]
        </Proxy>
Also, edit the new "ZAP_CHANGE_VALUE" configuration variable and set it to NULL:
ZAP_CHANGE_VALUE="NULL"

Customisation

If you start customising things I suggest you install the wrapzap script next to the redirector and use it to effect the customisations. It contains all the environment variables such as $ZAP_POSTMATCH that affect the zapper's behaviour, ready for adjustment.

Simply tell wrapzap the full install path of squid_redirect and tell the squid.conf file the full path of the wrapzap script instead of the zapper. Then modify wrapzap to suit. Remember that all scripts should have public read/execute permissions:

chmod a+rx scripts...

Using Different and Extra Pattern Files

You can use your own pattern files, too. Extra pattern files can be specified with the $ZAP_PREMATCH and $ZAP_POSTMATCH environment variables to the full pathnames of two pattern files. Normally you would only need to set $ZAP_POSTMATCH.

The patterns in $ZAP_PREMATCH are consulted before the main pattern list and the patterns in $ZAP_POSTMATCH afterwards. Generally you use the latter to add extra patterns and only use the former to correct overzapping by some erroneous patterns in the main pattern file. If you find such, tell me! That way your $ZAP_PREMATCH file can usually be empty and stay that way.

Finally, you can have squid_redirect ignore its inbuilt pattern list completely and use your own by defining the environment variable $ZAP_MATCH.

Pattern File Format

The syntax of the pattern file is as follows:

Using Different Placeholder Images

The default placeholder GIF is: http://adzapper.sourceforge.net/zaps/ad.gif This will actually work fine (once cached it's irrelevant that it's not on your site). However, if you wish a customised placeholder you can do a few things to control what is used. Most involve the setting of environment variables to indicate your desires.

The $ZAP_MODE variable can be set to the word "CLEAR" to cause the zapper to use "clear" versions of the replacement images and text. This will mean the ads just "vanish" from your pages. The only real downside to this is that is the zapper, through some mischance, replaces some useful markup on the page then it's not very apparent.

The $ZAP_BASE variable can be set to point to a web directory containing your own versions of the replacement images. Place files named ad.gif, adbg.gif, ad.swf, closepopup.html, counter.gif, no-op.html no-op.js, and webbug.gif there. If you're using the "CLEAR" mode then you need files named x-clear.ext for every file x.ext listed above.

The default for $ZAP_BASE is http://adzapper.sourceforge.net/zaps. If you set the $ZAP_MODE variable to "CLEAR" then you will naturally want files named ad-clear.gif, closepopup-clear.html, no-op-clear.html, etcetera.

You can replace classes of ad with specific replacements. The following classes are known: AD for inlined images, ADHTML for separate HTML pages inserted as an ad (usually via FRAME, IFRAME or ILAYER tags), ADJS for javascript programs used to generate ads, ADBG for background images containing ads, ADSWF for ads implemented as Shockwave animations, ADMP3 for ads implemented as MP3 audio, ADPOPUP for those mega-annoying ads which pop up on their own as new web pages, COUNTER for inlined visitor count images and WEBBUG for web bugs. Each of these words matches the keyword on the start of the lines in the configuration file. To control each you would set the variable $STUBURL_class to the URL of the specific replacement for that class.

For example, setting

STUBURL_AD=http://adzapper.sourceforge.net/zaps/ad-clear.gif
which would cause the inlined images to be the "clear" version while leaving the other classes as normal. That ad-clear.gif is a transparent single pixel GIF donated by David Finster <dfinster@airmail.net>. Another image you might like is http://adzapper.sourceforge.net/zaps/ad-grey.gif, from Andrew Dalgleish <andrewd@axonet.com.au>, which is a low contrast replacement image which lets you see what's zapped without it standing out so much.

Zapping Things Other Than Ads

The default behaviour of the zapper is to zap ads only (the AD*, COUNTER and WEBBUG classes). However, I desire that it can be used to zap other animated annoyances like flashing "NEW!" icons and glowing line images used in place of the venerable <HR> horizontal rule markup. Accordingly, the pattern list contains patterns for more than just ads. By default, these extra patterns are ignored. To cause the zapper to start using a particular class of pattern, set the environment variable STUBURL_class to a suitable URL in the wrapzap script.

Zapping small text ads

There are currently two ad classes, ADHTMLTEXT and ADJSTEXT, that are supported but not active by default. They are for small, pure text, fast loading inline ads; these are the grey area where advertising (often the main revenue source for free sites) is present but as unobtrusive as is possible. Therefore, as shipped, the zapper does not zap them. However, by editing the wrapzap script to set STUBURL_ADHTMLTEXT and STUBURL_ADJSTEXT to the URLs used for STUBURL_ADHTML and STUBURL_ADJS these classes will be enabled.

Rewriting URLs

You can also use the rewrite facility to get the printer-friendly version of some pages. As with the extra pattern classes, the PRINT class is also off by default. To activate it, just set STUBURL_PRINT to "1" in wrapzap. You're free to add your own rewrite classes (or, of course, extend PRINT). These classes too need their STUBURL_* variables set and exported in wrapzap to turn them on.

Chaining Redirector Programs

[ People running on very small systems, such as a low end system running something like LEAF, should also see Andrew Liebeskind's nifty Adzap2Squirm script which translates the zapper patterns for use with Squirm. ]

Chris Lightfoot <chris@ex-parrot.com> wrote asking if I could make the zapper friendly to setups where people chain multiple redirection programs together (for example, to run both the ad zapper and another tool like SquidGuard). Then Adam Hope <a.hope@csl.gov.uk> wrote to say that they were chaining to another redirector which wanted the full 4 word input a redirector may expect.

The specification for the redirectors says unredirected URLs should be indicated with a blank line, which is no good for piping the output of one into the next. Accordingly, to chain redirectors a wrapper program is needed to pass URLs to each redirector in turn.

To chain redirectors:

  1. Fetch the scripts wrapzap and zapchain. Install them where you installed the squid_redirect script. Remember that all scripts should have public read/execute permissions:
    chmod a+rx scripts...
  2. As stated in the section on customising the zapper, the main purpose of wrapzap is to tune the behaviour of the zapper. However, it is also the hook for chaining things.
    1. Adjust the setting for zapper near the top of the script.
    2. Change the last lines of wrapzap from:
      exec "$zapper"
      # exec /path/to/zapchain "$zapper" /path/to/another/eg/squirm
      to:
      # exec "$zapper"
      exec /path/to/zapchain "$zapper" /path/to/another/eg/squirm
      and adjust the pathnames to suit. You may name as many different redirectors as you like, not just two.
  3. Change the squid config line:
    redirect_program /path/to/squid_redirect
    to be:
    redirect_program /path/to/wrapzap
This causes squid to run wrapzap, wrapzap to run zapchain and zapchain to run the various redirectors correctly.

Updates

Updates are normally as simple as fetching a new version of the script. Simply copy it over the existing script and restart your squid server. The command "squid -k reconfigure" will do this, as will sending a SIGHUP to the squid.
Note: if you keep your own set of extra patterns, see the customisation section - in particular the section on extra pattern files - for how to use the wrapzap script to keep these additions separate, so as not to be overwritten by the script update.
You have several choices about keeping up to date with the patterns (and the matching squid_redirect).

Using the zapper in proxy.pac files

[ Also see: Can I get my ISP to do this for me?, below. ]

If you have to support more than a few users, you may want to use a proxy.pac file. This is a file containing a JavaScript function used by a browser to decide which proxy to use (if any) on a per-URL basis. This is often known as "automatic proxy configuration", as all you tell the browser configuration is the URL of the proxy.pac file. Once you've set this up for each of your users, you can then control things by editing the central file. Both Netscape and Internet Explorer support proxy.pac files.

Can I get my ISP to do this for me?

If you petition them, maybe. The setup at their end is pretty easy. However, they may refuse. For example, ZipWorld no longer support the zapping service themselves; instead I now supply this service for Zip people who wish it. (Essentially, their legal people have raised the spectre of zapping somehow being construed as a kind of copyright violation. Personally I think that's daft; it's no different to browsing in text mode with lynx or with image loading off in a graphical browser).

Thus it may become a case of "do it yourself". However, at least in my case, ZipWorld were happy enough to up my disc limit a bit, let me run the zapper all the time (even when not logged in), and automate a monthly post to a local newsgroup to tell people about the zapper. Very cool!

Something to bear in mind if you implement this for an ISP (or anywhere where the zapper isn't behind a firewall): to avoid having their site hammered ZipWorld asked me to limit access to the zapper at Zip to a the list of IP address ranges that they own. To this end the ranges are in a file and the squid config for the zapper there says:

acl zipworldIP src "/home/cs/rc/squid/ip-ranges@zip"
acl zipworldDNS srcdomain zipworld.com.au zipworld.net.au zip.com.au zip.net.au zipworld.net pacific.net.au
I also customised the ERR_ACCESS_DENIED page that squid returns for unauthorised access.

Sites with the Zapper already installed

ZipWorld
People at Zip can simply use the prepackaged .pac file URL:
http://adzapper.sourceforge.net/rc/proxy-zip.pac
Users of other ISPs can contact me for details on how to I set this up.

Here are a few example .pac files which I've set up for various sites. Each would require some customisation for your own site.

Why run an ad zapper?

There are a few reasons one might do this:

Other Similar Software

Mine is hardly the only alternative you have in this line. Google maintains a useful index. Other tools include:
Squid: Related Software
A listing of interesting software related to squid, including a few other redirectors.
Junkbuster
Josh Marshall <MarshallJ@switch.aust.com> briefly compares them:
Similarities:
Both filter out those annoying advertisement pages that waste time and bandwidth, meaning money (we're paying for that!) Both use a list of sites and regular expressions to eliminate these advertisements. Both redirect the image to a default, smaller image.

That's where the similarities end.

Differences:
Ad Zapper integrates much more nicely into squid. It is started from within squid (as many processes as you like) and is basically a URL redirector based on regular expressions that are contained inside the script.
Junkbuster runs as a separate daemon, and you have to use it as a hierachial cache, with junkbuster as either the parent or child. I found having it as the parent (they document how to set it up as a child in the docs) to be the superior configuration. All fetches from an external web page must be redirected through junkbuster - which is quite slow compared to squid. Also the double handling makes for a slower transaction.

Ad Zapper zaps ads - that's it. Junkbuster also can filter out cookies and web pages (like those annoying small ones that advertise the free web pages the site is from) I have found junkbuster to be a little too constrictive. It can also to web anonymity and return wafers instead of cookies for you with "leave me alone" privacy messages in them for the web administrators.

My recommendation is this: If you want tight security then go for junkbuster. You're sacrificing some speed and some pages which simply wont load anymore since the pattern matching tries too hard. If you want performance without ads, go for Ad Zapper (you can even specify your own image which you can't do with junkbuster)

I've noticed that the recent squid release (2.2STABLE4 as I type this) has anonymising facilities, so you can perhaps use those in conjunction with squid_redirect to get what you want.
Craig Sanders' <cas@taz.net.au> squid-redir tool.
Quite similar in intent and implementation to my own.
Cut The Crap
AtGuard
WebWasher
adzapper
(No, not my ad zapper; this one is by Adam Feuer, and coded in Python.)
SleezeBall
Another squid based redirector.
Squirm.
A general squid redirector which can be used for whatever purpose. It doesn't seems to come with prepackaged patterns for common purposes, and uses pure regexps as opposed to the more shell-like regexps I use (which are transliterated into real regexps).
pyredir by Don Baarda <abo@minkirri.apana.org.au>.
This is a Python based redirector with flexibility in mind, coded becasue Squirm (above) lacked some features. Also interesting is that he has added the ability to read my pattern files, so if you desire to keep the zapping while using pyredir you can do so trivially. (Note that if you go this was then bugfixes for missing or overzapped ads should still come to me - pyredir should pick up the changes as I make them I think).
Proxomitron
This is for Win32 systems (Win95, 98, ME, etc). It does more than ad zapping.
BannerFilter
Another ad filter redirector for squid. Like AdZapper, this can run under UNIX and Windows (in fact, the instructions for getting AdZapper running under Windows came from bannerFilter's home page:-).
http://www.redhatbox.org/squid/squid-bannerfilter.html [page dead?]
Squid-Bannerfilter mini-HOWTO
David Hill's instructions for setting up a transparent squid proxy with an ad zapper (happens to be mine ,but any other redirector can readily be used). It was motivated by Telstra BigPond Cable's recent bandwidth caps.
Yet Another Filter Proxy
A proxy to filter out advertising banners and malicious script code from web sites by Andreas Gohr.
BannerFilter
Yet another redirector.
Privoxy
Privoxy is a web proxy with advanced filtering capabilities, based on Internet Junkbuster ™.
Proxomitron
A filtering/editing web proxy.
BFilter
jart's HTML-parsing heuristic ad filter

Offering your users a choice of zapped and unzapped browsing

The purpose of this is to permit un-zapped access to the web for those few who want it (marketing types, as it happens:-).

Using two ports on a single squid

Aidas Kasparas <kaspar@lifosa.com> pointed me at squid's redirector_access facility. To use this you make squid listen on two ports like this:
http_port 8080 8081
Then you say that only accesses to one of the ports use the redirector:
acl nobannerport myport 8080
redirector_access allow nobannerport
That way people using port 8080 will get the zapping service and people using port 8081 will get the raw, uglified web.

My double-layer squid setup

At work we run a double layer squid setup. One day I will replace it with the two port method above, but I'll describe it here anyway.

We have a double squid cache (once on the same machine, now on separate machines). The usual proxy for users is:

proxy:8080
which has no cache and the URL redirector in its config:
redirect_program /opt/UCSDsquid/bin/squid_redirect
This lives off the main, non-redirecting cache at:
proxy-raw:8080
which has a big cache. The proxy.pac file users use points them at:
PROXY proxy-noads:8080; PROXY proxy-raw:8080
and the proxy-raw.pac (which shows ads) says:
PROXY proxy-raw:8080
The CNAMEs proxy-noads and proxy-raw point at the zapping and nonzapping squids, respectively. The CNAME proxy points at the same machine proxy-noads does. That way the naive and memorable setup gets a zapped view of the web. If your site policy is different you can just point proxy at the nonzapping machine and publicise the zapper as an optional service.

Troubleshooting

This might be as unhelpful as Microsoft's online help, but hopefully not.

Basic checks:

  1. Make sure your squid proxy is working normally without the ad-zapper line in the config file.
  2. Make sure the squid_redirect script has public read and execute permission. Remember that all scripts should have public read/execute oermissions:
    chmod a+rx scripts...
  3. Make sure the squid_redirect script is not in DOS text mode (if you fetched it from a Windows machine); see Note 3 under step 2 of the install steps for a fix for this.
  4. Examine your cache.log file for error messages from squid or the ad zapper.
Still stumped?
Here is a basic, untested, quick and dirty howto for setting this up from scratch if you haven't got squid running and have never used squid. Please attempt a normal squid install using their instructions (which come with the source) first! You should only need this is things fail obscurely and you're at a loss. It's just a sequence of things to do. Here goes:
Planning:
Find out your ISP's proxy server and port. It's traditional that the server is called proxy.your.isp.domain and that it listens on port 8080. If that's documented by your ISP's web pages, well and good. If you have to guess, try connecting to it:
telnet proxy.your.isp.domain 8080
If you don't get a connection, try port 3128 instead of 8080.
If you get a complaint that the hostname is unknown, you'll have to consult your ISP.
If you get a connection, check that it's actually a web proxy. Type:
GET http://www.zip.com.au/~cs/ HTTP/1.0
and press return twice You should get an HTTP response (code 200 hopefully), some header lines, then some HTML. If you don't then that's not your ISP's proxy service, and you must contact them to find out the correct details.
Basic Sanity Checks:
Ensure your browser works with no proxies at all set up.
Ensure your browser works with its proxy setup to talk to your ISP's proxy service.
Squid:
Fetch the latest squid (2.2STABLE4 as I type this), build and install.
Edit the squid.conf file by walking through it from beginning to end in an editor, adjusting it to suit your host. In particular: Run "squid -z" to initialise your cache.
Run the squid startup script to set squid running.
Working?:
On your squid host, run the command:
netstat -an | grep -i listen
to check that squid (presumably) is listening on port 8080 on your machine.
As with your ISP's proxy, you should now test your proxy. Run the command:
telnet localhost 8080
to check, and issue the same GET command you used above to fetch a web page.
Test new squid:
Set your browser config to use the local machine (well, your squid host, which needn't be the same machine as where yoiur browser runs), port 8080 as its proxy.
Ad zapping:
Add the ad-zapper line to the squid.conf, restart the squid server and test again.
Not working? Maybe the script came fvia a DOS or Windows box and is in DOS text mode?
This usually shows up as failure (by squid) to run the script, so first check your script is usable by running it by hand:
the-script </dev/null
That should do nothing, with no complaints. If this is greeted with messages like:
the-script: exec failed: No such file or directory
then you may have spurious CR characters in there. You can verify this with the command:
sed 1q the-script | od -c
which will print:
0000000 # ! / u s r / b i n / p e r l \n
0000020
for a good script and:
0000000 # ! / u s r / b i n / p e r l \r
0000020 \n
for a bad script (note that extra \r, which is a carriage return (CR)). These can be deleted with the tr command, viz:
tr -d '\015' <the-script >the-script.fixed
mv the-script.fixed the-script
which makes a new copy without the CRs and then replaces the orignal with the new one. The dos2unix(1) command can also be used for this task, if available.