Python: function memoization

First a little question. What do you expect the following (Python 2) code to return?

def contrived(n, existing=[]):
    existing.append(n)
    return existing

print contrived(2)
print contrived(3)

Maybe try it and see? If the answer of “[2]”, followed by “[2, 3]” wasn’t what you had in mind then you need to understand a bit more about how Python handles default variables. Once you understand that initialisation of default variables is done once, at the time that the function is defined, things should make more sense.

The next question is when might this behaviour be useful? Well, how about function memoization? The idea is essentially keeping track of the results of expensive functions so that when the function is called again with the same arguments you don’t have to calculate it again. My favourite expensive function is the recursive implementation of the Fibonacci function.

def fib1(n):
    if n < 2:
        return 1

    return fib1(n - 1) + fib1(n - 2)

This is a really stupid way of calculating the n’th Fibonacci number, but serves nicely here. If I time how long it takes to calculate fib1(33) on my admittedly ancient desktop PC here, I find it takes 2.26 seconds to return the answer 5702887. If I make a second call in the same file, you won’t be surprised to find it takes the same time again. So, how about using a default variable to hold state between calls? Something like the following maybe.

def fib2(n, cache={}):
    if n in cache:
        return cache[n]

    value = fib2(n - 1) + fib2(n - 2)
    cache[n] = value
    return value

So now if I were to run fib2(33) two times in a row, you’d maybe expect it to take a little while to evaluate the first time and then come back almost instantly the second time? Well in this case it’s actually better than that because fib2 is recursive and the recursive calls themselves benefit from memoization. On my PC I get timings of 0.00 and 0.00 rounding to a hundredth of a second.

Well that’s nice I suppose, but what if the expensive function has already been written and you don’t want to have to make any changes to it? This is one of those cases where Python decorators are useful. Essentially you write a decorator function once that encapsulates the idea of memoization and then you can use it to transform any other function into a memoized function.

def memoize(fn, cache={}):
    def memoized(n):
        if n in cache:
            return cache[n]

        value = fn(n)
        cache[n] = value
        return value

    return memoized

At that point you could simply decorate the definition of fib1 with the memoize decorator, but just to prove the point that this can be done without any changes to fib1 at all, I’ll create another function that simply wraps fib1 and decorate that instead.

@memoize
def fib3(n):
    return fib1(n)

Voila! The naive, recursive definition of the Fibonacci function is transformed into a memoized function. More usefully, the memoize decorator can be used to do the same to any other function, including ones which might not be so easily optimised as this.

Posted in programming | Leave a comment

Providing an IPv6 address for an IPv4 legacy web server

We have a corporate web server which doesn’t have IPv6 connectivity. It’s behind firewalls which haven’t yet been updated for IPv6 and the server itself is managed by others in our company who are still getting up to speed with IPv6. Despite the fact that we have IPv6 rolled out over our core network, have multiple IPv6 transit connections and have DNS working for AAAA and PTR records, it still looks bad that our own web server isn’t yet enabled. What to do?

Well trawling around the Interwebz I’ve seen a few people with similar problems. In particular this posting¬†looks pretty close to what I had in mind. There were however a few wrinkles (aren’t there always?) which made things more interesting. First the web site in question has an address (let’s say it’s www.mycorp.com) which is a CNAME for another address (let’s say www.actual.net). Secondly the web server running www.actual.net is configured so that it will only respond to requests specifying either of those two addresses in the HTTP request header.

My solution was to configure Apache on a spare dual stacked server and to set up a new virtual host with the skeleton of the configuration as follows:

<VirtualHost [2001:xxxx:yyyy::1]:80>
ProxyRequests Off
ProxyPreserveHost On
<Proxy *>
Order deny,allow
Allow from all
</Proxy>
ProxyPass / http://A.B.C.D/
</VirtualHost>

I created a new AAAA record for www.actual.net; this is the 2001:xxxx:yyyy::1 address in the configuration above. Once DNS has updated this means that when you do a query for www.actual.net it will have both the original A record and the new AAAA record. Remember that many browsers are set up to use an AAAA record preferentially over an A record when both are available.

There are three things to note. The “ProxyRequests Off” directive is very important because it’s what stops your server becoming an open proxy whilst enabling it to act as a reverse proxy. The “ProxyPreserveHost On” directive is there so that the legacy server sees requests for www.mycorp.com or www.actual.net rather than http://A.B.C.D/. Lastly A.B.C.D is the IPv4 address of www.actual.net and is there in order to force the proxy server to connect to the legacy server over IPv4 rather than the IPv6 address it would see if it looked up www.actual.net. You could also set up another A record such as ipv4.actual.net with this address and use that instead.

For completeness, I’m running Apache version 2.2.9 on Debian Lenny and remember to run “a2enmod proxy” to enable proxying (or else Apache won’t recognise the proxy related directives).

Posted in systems | Leave a comment