Saturday, August 15, 2009

Frequency Conversion UFunc

The interesting thing about the astype() frequency conversion is the second argument. astype operates on a narray of datetime64 (or timedelta64) types and accepts one argument of a string representing a dtype (either datetime64[X] or timedelta64[X]). I don't know what Generic UFunc to use to handle these inputs. Most ufuncs operate on inputs like narray of floats + narray of floats to result an narray of floats. This ufunc takes an array of datetime64 and a string.

Looks like more internal NumPy work.

Wednesday, August 12, 2009

ufuncs

This morning I had a healthy dose of segfaults with my coffee while I fought with the NumPy ufunc API before realizing my install of NumPy was borged (or something...). After reinstalling a stable version of NumPy (1.30), I began learning about writing ufuncs.

Python lists are not efficient to operate on, since each element in the list is a PyObject. Fortunately, NumPy lists (narrays) are very efficient, since each element in the narray is just a contiguous amount of data represented by a dtype. A ufunc is an object which operates on the data in the narray.

The NumPy C API for writing ufuncs is pretty simple and fairly straightforward. The ufunc object is created using a method called PyUFunc_FromFuncAndData() which takes an array of actual generic ufunc functions (more on that later), an array of "data" ufunc C functions (more on that later), an array of signatures to tell NumPy how many arguments go into the ufunc and how many come out.

The NumPy C API comes with generic ufunc functions for iterating over data in the narray. These are things like PyUFunc_ff_f (which I assume means float + float with a result of float... which would explain my current problem...). The array of "data" ufunc C functions aforementioned refer to actual functions written in C by myself to operate on the data passed by the ufunc. The ufunc "data" function I've written will be to convert frequencies based on the frequency input. The functions must have the same parameters, but there's plenty of room to store my frequencies to be converted.

This is looking simpler than I thought. Once I get through the learning curve, that is.

Tuesday, August 4, 2009

Frequency Conversions

I've been working on frequency conversions for a while now. The general idea in a frequency conversion is to keep the same date, but change precision as necessary. A conversion from year to hour will gain precision (keeping defaulted months, days, and hours). A conversion from seconds to months will lose precision. The important thing to remember is the dates must be as similar as possible after the conversion.

The problem with frequency conversions (other than the monotony of the whole endeavor) is the "awkward" dates like business days and weeks. Weeks (in the NumPy DateTime implementation) only occur on Sundays and Business Days will skip any weekends. So what happens when you have a Monday for a conversion to Weeks? Do we go forward to the next Sunday? The nearest Sunday? The previous Sunday? This is an awkward operation, but the user should be able to anticipate the results.

TimeSeries conversions work very simply (and I'm basing some of my conversion routines on theirs): the long value is converted into days, then the days to X frequency conversion routine is run. This leaves the program with a simple calculation to determine the results of the remaining (if any) precisions (hours, minutes, seconds, etc), which isn't very difficult. Luckily, most of the operations are trivial (seconds to milliseconds, seconds to microseconds, seconds to nanoseconds) and haven't been much trouble. The ones to worry about are weeks and business days. Those can be a little bit nasty.