<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>No Content, No Fuss: Category net</title>
    <link>http://anil.recoil.org/blog/articles/category/net</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Anil Madhavapeddy</description>
    <item>
      <title>Deens, welcome to the Internet!</title>
      <description>&lt;p&gt;Inspired by finishing my PhD corrections (!) today, I decided to hook up the DNS server from our &lt;a href="http://melange.recoil.org/"&gt;Melange&lt;/a&gt; project up to the Internet.  The authoritative server is called &lt;a href="http://melange.recoil.org/trac/browser/apps/deens/"&gt;deens&lt;/a&gt; (since the co-author is one &lt;a href="http://www.tjd.phlegethon.org/"&gt;Tim Deegan&lt;/a&gt;, geddit?), and is written in pure &lt;a href="http://caml.inria.fr/"&gt;OCaml&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This is all rather experimental, to put it mildly, but I stuck in the zone file below, hooked it up as a delegate to our main name-servers, checked it against the &lt;a href="http://www.dnsreport.com/tools/dnsreport.ch?domain=deens.recoil.org"&gt;DNS Report&lt;/a&gt;, and it all seems to be working!&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ORIGIN deens.recoil.org. ;
$TTL    240
deens.recoil.org. 604800 IN SOA  (
    deens.recoil.org. anil.recoil.org.
    2006122401 3600 1800 3024000 1800
)
        IN  NS     ns1.deens.recoil.org.
        IN  NS     deensns.recoil.org.
ns1     IN  A      194.70.3.132
dynamic IN  CNAME  dynamic.recoil.org.
static  IN  CNAME  static.recoil.org.
anil    IN  CNAME  dynamic
stats   IN  CNAME  dynamic
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I also modified &lt;a href="http://stats.recoil.org/"&gt;stats.recoil.org&lt;/a&gt; to be an alias to &lt;em&gt;stats.deens.recoil.org&lt;/em&gt;, so all the requests for that domain will go via the deens setup.  You actually need a user/pass to access the site, but that doesn't matter; if it gets that far, the DNS bit has worked.&lt;/p&gt;

&lt;p&gt;There's still an awful lot of tedious work to get the server into a production-ready state, such as proper logging, more error handling and recovery, etc., but I really hope to find the time in 2007 to polish this up somewhat.  Performance is excellent already; faster than &lt;a href="http://www.isc.org/bind/"&gt;BIND&lt;/a&gt; by quite a lot, and it can optionally use more memory to cache responses to shoot up to crazy levels.&lt;/p&gt;

&lt;p&gt;Incidentally, the &lt;a href="http://melange.recoil.org/trac/browser/apps/mldig/"&gt;dig replacement&lt;/a&gt; utility also seems to be working fairly well, and &lt;a href="http://dave.recoil.org/"&gt;David Scott&lt;/a&gt; has been messing around with a &lt;a href="http://www.apple.com/macosx/features/bonjour/"&gt;Bonjour&lt;/a&gt; implementation that will get finished sometime in 2007 as well (honest!).&lt;/p&gt;</description>
      <pubDate>Sat, 30 Dec 2006 01:11:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:fa0b4534-23b5-4166-b078-d29a1a43f3ca</guid>
      <author>anil@recoil.org (Anil Madhavapeddy)</author>
      <link>http://anil.recoil.org/blog/articles/2006/12/30/deens-welcome-to-the-internet</link>
      <category>research</category>
      <category>hacking</category>
      <category>net</category>
    </item>
    <item>
      <title>Google Webmaster tools</title>
      <description>&lt;p&gt;The conversion of the Recoil web services to external FastCGI pinned our &lt;a href="http://trac.edgewall.org/"&gt;Trac&lt;/a&gt; installation at &lt;a href="http://melange.recoil.org/"&gt;Melange&lt;/a&gt; as the source of the CPU hogging.  It turned out the Google crawler was indexing the entire source tree via Trac, causing it to go ballistic.&lt;/p&gt;

&lt;p&gt;I then stumbled on the latest cool Googlism: the &lt;a href="https://www.google.com/webmasters/tools/"&gt;Google Webmaster Tool&lt;/a&gt;, which lets you register your sites and displays options, diagnostics and statistics about how the Google crawler views your website.
I turned down the frequency at which Google hits the Trac installation (as well as installing a suitable &lt;a href="http://melange.recoil.org/robots.txt"&gt;robots.txt&lt;/a&gt; file).  This solved the immediate problem, but some of the search statistics were fun to check out as well.&lt;/p&gt;

&lt;p&gt;It turns out the &lt;a href="http://anil.recoil.org/gallery/"&gt;gallery&lt;/a&gt; is pretty highly ranked for &lt;a href="http://images.google.com/"&gt;image searches&lt;/a&gt;.  My trips to Japan seems to have made it big, with popular searches including "&lt;a href="http://images.google.com/images?q=shibuya&amp;amp;hl=en&amp;amp;ie=UTF-8&amp;amp;oe=UTF-8&amp;amp;sa=N&amp;amp;tab=wi"&gt;Shibuya&lt;/a&gt;", "&lt;a href="http://images.google.com/images?q=tokyo%20at%20night&amp;amp;hl=en&amp;amp;ie=UTF-8&amp;amp;oe=UTF-8&amp;amp;sa=N&amp;amp;tab=wi"&gt;tokyo at night&lt;/a&gt;", and "&lt;a href="http://images.google.com/images?q=japanese%20roof&amp;amp;hl=en&amp;amp;ie=UTF-8&amp;amp;oe=UTF-8&amp;amp;sa=N&amp;amp;tab=wi"&gt;japanese roof&lt;/a&gt;".  My random pictures of &lt;a href="http://images.google.com/images?q=buffalo%20india&amp;amp;hl=en&amp;amp;ie=UTF-8&amp;amp;oe=UTF-8&amp;amp;sa=N&amp;amp;tab=wi"&gt;indian buffaloes&lt;/a&gt;, &lt;a href="http://images.google.com/images?svnum=10&amp;amp;hl=en&amp;amp;lr=&amp;amp;q=smoggy+skyline&amp;amp;btnG=Search"&gt;smoggy skylines&lt;/a&gt; and &lt;a href="http://images.google.com/images?q=fried%20ice%20cream&amp;amp;hl=en&amp;amp;ie=UTF-8&amp;amp;oe=UTF-8&amp;amp;sa=N&amp;amp;tab=wi"&gt;fried ice-cream&lt;/a&gt; seem especially popular as well.  It's a wierd old Internet eh?&lt;/p&gt;

&lt;p&gt;The gallery has fallen a bit by the wayside in recent months.  I'll update it when I get back to Cambridge!&lt;/p&gt;</description>
      <pubDate>Thu, 28 Dec 2006 15:04:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:ae998dee-78ec-4f05-b751-1ff800d3f880</guid>
      <author>anil@recoil.org (Anil Madhavapeddy)</author>
      <link>http://anil.recoil.org/blog/articles/2006/12/28/google-webmaster-tools</link>
      <category>travel</category>
      <category>recoil</category>
      <category>net</category>
    </item>
    <item>
      <title>Mercurial FastCGI module</title>
      <description>&lt;p&gt;Our &lt;a href="http://www.lighttpd.net"&gt;lighttpd&lt;/a&gt; setup has been very unstable in recent months, probably brought on by the load of the large &lt;a href="http://www.selenic.com/mercurial"&gt;Mercurial&lt;/a&gt; repositories &lt;a href="http://hg.recoil.org/"&gt;hosted&lt;/a&gt; on Recoil since the Google &lt;a href="http://code.google.com/soc/"&gt;Summer of Code&lt;/a&gt; mentoring.&lt;/p&gt;

&lt;p&gt;The source of the instability was really hard to track down, but it seems to be the automatic spawning of &lt;a href="http://www.fastcgi.org/"&gt;FastCGI&lt;/a&gt; processes by the web-server, and lighttpd failing to handle a &lt;a href="http://en.wikipedia.org/wiki/SIGCHLD"&gt;SIGCHLD&lt;/a&gt; somewhere when a child process crashes.  To sort this out, I just converted all the Ruby on Rails setups (this blog and &lt;a href="http://nick.recoil.org/"&gt;Nick's&lt;/a&gt;) to use an external spawn.&lt;/p&gt;

&lt;p&gt;This only leaves our Mercurial vhost &lt;a href="http://hg.recoil.org/"&gt;hg.recoil.org&lt;/a&gt; to switch to using FastCGI, and I couldn't find a module for this anywhere and so lashed up some Python glue to do the job.&lt;/p&gt;

&lt;p&gt;You can download the small distribution for Mercurial 0.9 (&lt;a href="http://anil.recoil.org/projects/hg-fcgi-0.9.tar.gz"&gt;hg-fcgi-0.9.tar.gz&lt;/a&gt;).  It has a FastCGI library written by someone else, the Python files to glue the Mercurial and FastCGI libraries together, and a simple rc script to launch the external web process.&lt;/p&gt;&lt;p&gt;Instructions are for lighttpd... install the Python files somewhere, modify them to point to the Mercurial directory, run the rc script to start the daemon, and then add something similar to the following to your lighttpd config file:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;fastcgi.server = (
  ".fcgi" =&amp;gt; ( "localhost" =&amp;gt;
    ( "socket" =&amp;gt; "/var/cache/fcgi/sites/hg.recoil.org/dirsock" )),
  ".hg" =&amp;gt; ( "localhost" =&amp;gt;
    ( "socket" =&amp;gt; "/var/cache/fcgi/sites/hg.recoil.org/sock" )),
)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Also add "index.fcgi" to &lt;em&gt;index-file.names&lt;/em&gt; in the config file, and touch it in the vhost directory to create an empty file (this is to avoid getting a 404 error and instead pass it through to the FastCGI process).  Similarly, touch a .hg file for every repository you want to serve.  You could do this differently by passing through a URL prefix and modifying the Python appropriately, but I prefer finer control over what we're serving.&lt;/p&gt;

&lt;p&gt;Hope this is useful; I won't bother submitting it back to the Mercurial list as it looks like the &lt;a href="http://www.selenic.com/hg/"&gt;official hg repo&lt;/a&gt; has a different code layout; I'll check it out later on when I have a bit more time and integrate properly.&lt;/p&gt;

&lt;p&gt;I have no idea whether or not this will actually improve our stability, but it's at least easier to move onto a different web-server now that everything is FastCGI.  All I need now is an OpenBSD/php5-fastcgi port, which doesn't seem to exist (yet).&lt;/p&gt;</description>
      <pubDate>Wed, 27 Dec 2006 20:59:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:c9de9f88-5d89-42ea-9430-6df6c7054109</guid>
      <author>anil@recoil.org (Anil Madhavapeddy)</author>
      <link>http://anil.recoil.org/blog/articles/2006/12/27/mercurial-fastcgi-module</link>
      <category>hacking</category>
      <category>net</category>
    </item>
    <item>
      <title>Looking my Spam statistics</title>
      <description>&lt;p&gt;The switch to &lt;a href="http://smtpd.develooper.com/"&gt;qpsmtpd&lt;/a&gt; does seem to have reduced my spam intake somewhat, so out of curiousity I looked at the statistics from 2 years of &lt;a href="http://www.procmail.org/"&gt;procmail&lt;/a&gt; logs to see what's been happening in terms of filtering effectiveness.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://anil.recoil.org/blog/files/mailstats-dec2006.png" rel="lightbox" title="Ham/Spam stats for 2004-2006"&gt;&lt;img style="float:right" src="/blog/files/mailstats-dec2006-thumb.png" alt="mlgalleryedit" /&gt; &lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A quick &lt;a href="http://www.openbsd.org/cgi-bin/cvsweb.cgi/ports/mail/p5-Log-Procmail"&gt;import and bug-fix&lt;/a&gt; of &lt;a href="http://search.cpan.org/dist/Log-Procmail/"&gt;Log::Procmail&lt;/a&gt; into OpenBSD, and some lashed up Perl and gnuplot later, the graph on the right showed up.  The red and green are ham and spam respectively, as classified by &lt;a href="http://www.spamassassin.org/"&gt;SpamAssassin&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The large amount of ham in 2004 was not actually real mail, but mostly postmaster bounces from forged spam; I am currently forced to destroy all domain bounces without even reading them due to the sheer volume.  This is something that &lt;a href="http://www.openspf.org/"&gt;Sender Permitted From&lt;/a&gt; promises to help solve once we determine if any our users send &lt;code&gt;@recoil.org&lt;/code&gt; mail from sources other than our mail server.&lt;/p&gt;

&lt;p&gt;Since the turn of this year the amount of spam has jumped, but more concerningly, SpamAssassin has been missing increasing amounts, and it's been flowing through straight to my Inbox (despite &lt;a href="http://wiki.apache.org/spamassassin/RuleUpdates"&gt;sa-update&lt;/a&gt; running daily).  I'm going to do these graphs again in a few months and see just how much the switch to the new paranoid SMTP has helped.&lt;/p&gt;</description>
      <pubDate>Wed, 27 Dec 2006 00:01:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:e79fc438-9464-4f07-9476-5b96fad27392</guid>
      <author>anil@recoil.org (Anil Madhavapeddy)</author>
      <link>http://anil.recoil.org/blog/articles/2006/12/27/looking-my-spam-statistics</link>
      <category>recoil</category>
      <category>net</category>
    </item>
    <item>
      <title>I balance, I weave, I dodge, I frolic, and my bills are all paid</title>
      <description>&lt;p&gt;
While assisting various graduating PhD students (lucky for some) with their resumes, I ran across the sublime urban legend that is Hugh Graham's "&lt;a href="http://www-users.cs.york.ac.uk/~susan/joke/essay.htm"&gt;College Essay&lt;/a&gt;".  I'm now inspired for future job applications!  Here is a taster of the first paragraph:
&lt;/p&gt;

&lt;table width="90%" align="center"&gt;
&lt;tr&gt;
&lt;td style="text-align:justify; font-size:90%"&gt;
I am a dynamic figure, often seen scaling walls and crushing ice. I have  been known to remodel train stations on my lunch breaks, making them more  efficient in the area of heat retention. I translate ethnic slurs for  Cuban refugees, I write award-winning operas, I manage time efficiently.  Occasionally, I tread water for three days in a row.
&lt;br /&gt;
&lt;i&gt;&lt;a href="http://www-users.cs.york.ac.uk/~susan/joke/essay.htm"&gt;Read on...&lt;/a&gt;&lt;/i&gt;
&lt;/td&gt;
&lt;/tr&gt;

&lt;/table&gt;</description>
      <pubDate>Fri, 08 Oct 2004 13:49:45 +0100</pubDate>
      <guid isPermaLink="false">urn:uuid:a9f7a7bd-5b67-4dba-b6d4-66a5b9b21052</guid>
      <author>avsm</author>
      <link>http://anil.recoil.org/blog/articles/2004/10/08/i-balance-i-weave-i-dodge-i-frolic-and-my-bills-are-all-paid</link>
      <category>humour</category>
      <category>net</category>
    </item>
    <item>
      <title>New clustering search engine</title>
      <description>&lt;p&gt;
Just stumbled across a beta of &lt;a href="http://www.clusty.com/"&gt;Clusty&lt;/a&gt;, which is a pretty good search engine in its own right (not as minimal as &lt;a href="http://google.com"&gt;google&lt;/a&gt;, but still usable).
The novel thing about Clusty is that it automatically clusters searches into groups to help narrow down the search.  So searching for &lt;a href="http://clusty.com/search?query=anil+madhavapeddy"&gt;my name&lt;/a&gt; brings up a bar on the left with categories such as "OpenBSD", "High Energy Magic", etc.  Not bad!
&lt;/p&gt;

&lt;p&gt;
It's a pity that I'm kind of locked into google now, just by virtue of the &lt;a href="http://www.apple.com/safari/"&gt;Safari&lt;/a&gt; toolbar not having an easy option to remap the search engine to use.  It does appear people have started &lt;a href="http://captnswing.net/howto/safari/"&gt;hacking Safari&lt;/a&gt; though, so perhaps a Clusty bar isn't too far off!
&lt;/p&gt;</description>
      <pubDate>Thu, 07 Oct 2004 18:14:05 +0100</pubDate>
      <guid isPermaLink="false">urn:uuid:dfe61247-dc7b-4e6c-9c6a-82057355b5b7</guid>
      <author>avsm</author>
      <link>http://anil.recoil.org/blog/articles/2004/10/07/new-clustering-search-engine</link>
      <category>net</category>
    </item>
    <item>
      <title>Playing with spammers</title>
      <description>&lt;p&gt;The amount of spam sent to Recoil accounts has dramatically sprung
up over the last few years, sending the machine loads skyrocketing
accordingly.  Luckily, we're running
&lt;a href="http://www.openbsd.org/"&gt;OpenBSD&lt;/a&gt;, which added a fun
tool called
&lt;a href="http://www.openbsd.org/cgi-bin/man.cgi?query=spamd"&gt;&lt;i&gt;spamd(8)&lt;/i&gt;&lt;/a&gt;
a couple of releases ago.
&lt;/p&gt;

&lt;p&gt;
It's activated by tracking IP addresses of known
spammers from blacklists like &lt;a href="http://www.spamhaus.org/"&gt;Spamhaus&lt;/a&gt;,
and redirecting them to the spam daemon via &lt;a href="http://www.openbsd.org/cgi-bin/man.cgi?query=pf"&gt;pf&lt;/a&gt; rules.  Once the mail reaches &lt;tt&gt;spamd&lt;/tt&gt;, it "tarpits"
it by dropping its TCP send and receive buffers to a very small value, 
encouraging the spammers and virii to (slowly) send their malware on.  If they
ever do reach the end of their data, it then rejects it with a temporary
failure - costing the spammers more resourcs if they decide to retransmit it.
&lt;/p&gt;

&lt;p&gt;
The load has dropped quite a bit since I activated this filtering; it seems
to help against some of the latest worms quite a lot, which just
connect to port 25, spew off a buffer-overflow attempt, and repeat this
once every few seconds.  Since &lt;tt&gt;spamd&lt;/tt&gt;, things take a bit longer though!
&lt;/p&gt;

&lt;pre&gt;
quick spamd: 221.2.232.138: connected (9/9), lists: spamhaus
quick spamd: 221.2.232.138: disconnected after 431 seconds. lists: spamhaus
&lt;/pre&gt;

&lt;p&gt;
Very satisfying.  I did play with the &lt;a href="http://www.greylisting.org/"&gt;greylisting&lt;/a&gt; mode of &lt;tt&gt;spamd&lt;/tt&gt; as well, but it wasn't quite as successful as some valid mail sites such as &lt;a href="http://www.edas.info/"&gt;EDAS&lt;/a&gt; (bless its underwhelming soul) take five days to send conference paper rejections into a greylisted system.  Public whitelists do &lt;a href="http://greylisting.org/whitelisting.shtml"&gt;exist&lt;/a&gt;, but I think I'll wait a while and see if things mature a little more first.
&lt;/p&gt;</description>
      <pubDate>Thu, 29 Jul 2004 09:53:48 +0100</pubDate>
      <guid isPermaLink="false">urn:uuid:17e93bbf-37c3-4628-97c6-efba183457db</guid>
      <author>avsm</author>
      <link>http://anil.recoil.org/blog/articles/2004/07/29/playing-with-spammers</link>
      <category>hacking</category>
      <category>net</category>
    </item>
    <item>
      <title>Friendster</title>
      <description>&lt;p&gt;In what could be a silly move, I stumbled across the &lt;a href="http://www.friendster.com"&gt;Friendster&lt;/a&gt; web-site, and joined up.  It works on the famous &lt;a href="http://www.cs.virginia.edu/oracle/"&gt;six degrees of separation&lt;/a&gt; principle, which means that it takes very few hops to know someone else in the world.
&lt;br /&gt;
Imagine my amusement when it turns out that &lt;a href="http://diary.recoil.org/nick/"&gt;Nick&lt;/a&gt; is already a member ... I should have guessed :-)  If you join up, do invite me - I'm really curious to see how this network works out.  They seem to be "good guys" and promise not to use the information they collect for nefarious purposes.&lt;/p&gt;</description>
      <pubDate>Fri, 08 Aug 2003 00:22:00 +0100</pubDate>
      <guid isPermaLink="false">urn:uuid:1aedfbe6-2f8b-4e2c-89b0-b79c04b07157</guid>
      <author>avsm</author>
      <link>http://anil.recoil.org/blog/articles/2003/08/08/friendster</link>
      <category>net</category>
    </item>
  </channel>
</rss>
