<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/">
<channel>
<title>Comments on: Checking if a remote file exists in python</title>
<atom:link href="http://timbroder.com/2009/09/checking-if-remote-file-exists-in.html/feed" rel="self" type="application/rss+xml" />
<link>http://timbroder.com/2009/09/checking-if-remote-file-exists-in.html</link>
<description>code. comics. crossfit.</description>
<lastBuildDate>Thu, 11 Aug 2011 19:51:00 +0000</lastBuildDate>
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<generator>http://wordpress.org/?v=3.3</generator>
<item>
<title>By: woxidu</title>
<link>http://timbroder.com/2009/09/checking-if-remote-file-exists-in.html/comment-page-1#comment-20</link>
<dc:creator>woxidu</dc:creator>
<pubDate>Thu, 01 Oct 2009 00:55:54 +0000</pubDate>
<guid isPermaLink="false">http://beta.timbroder.com/2009/09/30/checking-if-a-remote-file-exists-in-python/#comment-20</guid>
<description>I noticed that this got the wrong answer for some okcupid URLs that I threw at it. I don&#039;t think OKWS prints out &quot;ERROR 404&quot; for its 404 pages, but it does put &quot;HTTP/1.0 404&quot; in its response headesr. I think it might work better if you disregard whatever&#039;s coming out of stdout/stderr and just go by wget&#039;s return code. I&#039;m pretty sure wget will return with 0 for all pages it fetches successfully and 1 otherwise.&lt;br /&gt;&lt;br /&gt;While you&#039;re disregarding the output, you don&#039;t need Popen or PIPE or communicate or stdin/stderr or whatever. We can just use the os.system call for this (though I admit it&#039;s kind of nice having the command components in a python list) and save a bunch of trouble. One key is using wget&#039;s &#039;-q&quot; option which makes it stop outputting anything (even on error). Here&#039;s a simper version I put together:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;from os import system&lt;br /&gt;def url_exists(url):&lt;br /&gt;    command = &quot;wget -qS --spider %s&quot; % url&lt;br /&gt;    res = system(command)&lt;br /&gt;    return not res # &quot;Success&quot; is backwards for processes&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Cheers!&lt;br /&gt;Eli&lt;br /&gt;&lt;br /&gt;&lt;i&gt;PS, does blogger support code/pre blocks in comments?&lt;/i&gt;</description>
<content:encoded>
<![CDATA[<p>I noticed that this got the wrong answer for some okcupid URLs that I threw at it. I don&#39;t think OKWS prints out &quot;ERROR 404&quot; for its 404 pages, but it does put &quot;HTTP/1.0 404&quot; in its response headesr. I think it might work better if you disregard whatever&#39;s coming out of stdout/stderr and just go by wget&#39;s return code. I&#39;m pretty sure wget will return with 0 for all pages it fetches successfully and 1 otherwise.</p>
<p>While you&#39;re disregarding the output, you don&#39;t need Popen or PIPE or communicate or stdin/stderr or whatever. We can just use the os.system call for this (though I admit it&#39;s kind of nice having the command components in a python list) and save a bunch of trouble. One key is using wget&#39;s &#39;-q&quot; option which makes it stop outputting anything (even on error). Here&#39;s a simper version I put together:</p>
<p><b><br />from os import system<br />def url_exists(url):<br />    command = &quot;wget -qS &#8211;spider %s&quot; % url<br />    res = system(command)<br />    return not res # &quot;Success&quot; is backwards for processes<br /></b></p>
<p>Cheers!<br />Eli</p>
<p><i>PS, does blogger support code/pre blocks in comments?</i></p>
]]>
</content:encoded>
</item>
</channel>
</rss>
<!-- Served from: timbroder.com @ 2012-02-10 05:14:11 by W3 Total Cache -->
