Wednesday, May 27, 2009

A Simple Python Module

Writing a Python Module in C is simple and quite effective. The following is a module to solve the Fibonacci series up to n. The C version of this module is four times faster than the Python equivalent.

[ All code taken from http://superjared.com/entry/anatomy-python-c-module/ ]
include <Python.h>

// Returns a list of the Fibonacci series up to n in args
static PyObject *fib(PyObject *self, PyObject *args)
{
int a = 0, b = 1, c, n;

// Parse args for a single integer
if (!PyArg_ParseTuple(args, "i", &n))
return NULL;

// Create a list to store the numbers using PyList
PyObject *list = PyList_New(0);

// Calculate the Fibonnaci series
while (b < n)
{
// Append b as an int to list
PyList_Append
(list, PyInt_FromLong(b)); // Why FromLong?
c
= a + b;
a
= b;
b
= c;
}

// returns a PyObject*
return list;
}

// Some array of PyMethodDefs
PyMethodDef methods[] = {
{"fib", fib, METH_VARARGS, "Returns a fibonnaci sequence as a list"},
{NULL, NULL, 0, NULL}
};

// Initialize the module
PyMODINIT_FUNC
initfib
()
{
(void) Py_InitModule("fib", methods);
}
This code was pretty easy and made sense. There are a few arcane functions in here that I can't make sense of. The C function type is PyObject*. This is because:
All object types are extensions of PyObject. This is a type which contains the information Python needs to treat a pointer to an object as an object.
(http://docs.python.org/c-api/structures.html)
I'm going to need to look through the C API and get my stuff down straight before tomorrow when I begin writing the c_datetime64.c and c_timedelta64.c files.

After creating the C file, you have to create a Python setup.py file which uses distutils to "compile" (I think that's the right word) the C module and make it usuable with Python. The file for the Fibonacci example looks like this:
from distutils.core import setup, Extension

# Tells distutils that our module is located in fibonacci.c
setup(name = "Fib",
version = "1.0",
ext_modules = [Extension("fib", ["fibonacci.c"])])

This uses distutils to run some setup function. I won't be researching too much into distutils until later. For now, I understand that it can create Extensions from C files. Here's that Fibonacci code in action:
>>> import fib
>>> fib.fib(2131)
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597]
>>> fib.fib(10000)
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

Tuesday, May 26, 2009

Day 1

I've been having some awful graphics issues lately. Today, I decided to take a 10 minute break and update the drivers. That decision cost me about 6 hours while I pounded my fists in frustration trying to make these silly ATI proprietary drivers work. Finally, after a (should have been) obvious revelation, I discovered I was using Ubuntu 8.04 instead of 8.10. Good catch there, Marty. Anyway, problem solved - crisis averted. On to the research.

This is my first week working on a Summer of Code project, so I'll be logging a little more frequently here. I've decided to spend most of the day that I wasn't fiddling with graphics settings on researching gregorian versus POSIX date implementations. A gregorian temporal datatype would store the number of days since January 1st, 1 AD, which is how the TimeSeries module works. The POSIX implementation (also known as Unix Time) would store the number of seconds since the epoch: January 1st, 1970. As of this time, my proposal uses the POSIX (Unix) implementation. My lingering concern is the range of dates. Which implementation would give me a larger, faster, more accurate, and more useful range of dates? More on this tomorrow, I guess.

Unix Time does not count leap seconds.
A leap second is a positive or negative one-second adjustment to the Coordinated Universal Time (UTC) time scale that keeps it close to mean solar time.
-Wikipedia
So the UTC time scale was implemented to keep a very accurate time approximation. Days in the UTC time scale have 24 hours, which have 60 minutes, which usually have 60 seconds, but sometimes have 59 or 61 (Wikipedia). Most days in the UTC time scale have 86,400 seconds. This will give pretty accurate approximations for times.

Leap seconds are necessary because we don't measure time by the rotation of the earth. We measure the frequency of the caesium-133 atom changing states (or something). This method has produced an incredibly accurate representation of time (down to 10-9 seconds per day) and it's a lot easier than trying to find some Archimedean Point to observe the rotation of the earth. It's a lot easier to put a tiny thing in a test tube than to put a planet in a test tube.

There have been lots of different designs for how to store and compute time intervals. Luckily for my proposal, this type of thing has been done lots of times before me. Rather than reinvent the wheel, I can rest on a sound foundation. Lots of databases depend on time storage. I'll have to be checking those out tonight or tomorrow. But my main focus here isn't storage; it's computation. I need to provide quick and easy computation with something as complex as a date or a time interval.

Also, due for this week is the base creation of both datetime64 and timedelta64. Just the basic creation. Next week I'll worry with compatibility between the other Python Datetime. Hopefully, I can borrow some of the timeseries code.