Checking if a remote file exists in python

Normally, to check if a remote web file exists I would use urllib’s getcode() but that is a 2.6 and newer feature.  In Python 2.5 its a little more interesting.  Thankfully, wget’s spider command can help us out.

from subprocess import Popen, PIPE
def url_exists(url):
    command = ["wget", "-S", "--spider", url]
    p = Popen(command, stdout=PIPE, stderr=PIPE)
    stdout, stderr = p.communicate()
    exists = stderr.find('ERROR 404')
    if int(exists) > -1:
        return False
    else:
        return True
  • woxidu

    I noticed that this got the wrong answer for some okcupid URLs that I threw at it. I don't think OKWS prints out "ERROR 404" for its 404 pages, but it does put "HTTP/1.0 404" in its response headesr. I think it might work better if you disregard whatever's coming out of stdout/stderr and just go by wget's return code. I'm pretty sure wget will return with 0 for all pages it fetches successfully and 1 otherwise.

    While you're disregarding the output, you don't need Popen or PIPE or communicate or stdin/stderr or whatever. We can just use the os.system call for this (though I admit it's kind of nice having the command components in a python list) and save a bunch of trouble. One key is using wget's '-q" option which makes it stop outputting anything (even on error). Here's a simper version I put together:


    from os import system
    def url_exists(url):
        command = "wget -qS –spider %s" % url
        res = system(command)
        return not res # "Success" is backwards for processes

    Cheers!
    Eli

    PS, does blogger support code/pre blocks in comments?

Performance Optimization WordPress Plugins by W3 EDGE