The interesting thing about the astype() frequency conversion is the second argument. astype operates on a narray of datetime64 (or timedelta64) types and accepts one argument of a string representing a dtype (either datetime64[X] or timedelta64[X]). I don't know what Generic UFunc to use to handle these inputs. Most ufuncs operate on inputs like narray of floats + narray of floats to result an narray of floats. This ufunc takes an array of datetime64 and a string.

Looks like more internal NumPy work.

## Saturday, August 15, 2009

## Wednesday, August 12, 2009

### ufuncs

This morning I had a healthy dose of segfaults with my coffee while I fought with the NumPy ufunc API before realizing my install of NumPy was borged (or something...). After reinstalling a stable version of NumPy (1.30), I began learning about writing ufuncs.

Python lists are not efficient to operate on, since each element in the list is a PyObject. Fortunately, NumPy lists (narrays) are very efficient, since each element in the narray is just a contiguous amount of data represented by a dtype. A ufunc is an object which operates on the data in the narray.

The NumPy C API for writing ufuncs is pretty simple and fairly straightforward. The ufunc object is created using a method called PyUFunc_FromFuncAndData() which takes an array of actual generic ufunc functions (more on that later), an array of "data" ufunc C functions (more on that later), an array of signatures to tell NumPy how many arguments go into the ufunc and how many come out.

The NumPy C API comes with generic ufunc functions for iterating over data in the narray. These are things like PyUFunc_ff_f (which I assume means float + float with a result of float... which would explain my current problem...). The array of "data" ufunc C functions aforementioned refer to actual functions written in C by myself to operate on the data passed by the ufunc. The ufunc "data" function I've written will be to convert frequencies based on the frequency input. The functions must have the same parameters, but there's plenty of room to store my frequencies to be converted.

This is looking simpler than I thought. Once I get through the learning curve, that is.

Python lists are not efficient to operate on, since each element in the list is a PyObject. Fortunately, NumPy lists (narrays) are very efficient, since each element in the narray is just a contiguous amount of data represented by a dtype. A ufunc is an object which operates on the data in the narray.

The NumPy C API for writing ufuncs is pretty simple and fairly straightforward. The ufunc object is created using a method called PyUFunc_FromFuncAndData() which takes an array of actual generic ufunc functions (more on that later), an array of "data" ufunc C functions (more on that later), an array of signatures to tell NumPy how many arguments go into the ufunc and how many come out.

The NumPy C API comes with generic ufunc functions for iterating over data in the narray. These are things like PyUFunc_ff_f (which I assume means float + float with a result of float... which would explain my current problem...). The array of "data" ufunc C functions aforementioned refer to actual functions written in C by myself to operate on the data passed by the ufunc. The ufunc "data" function I've written will be to convert frequencies based on the frequency input. The functions must have the same parameters, but there's plenty of room to store my frequencies to be converted.

This is looking simpler than I thought. Once I get through the learning curve, that is.

## Tuesday, August 4, 2009

### Frequency Conversions

I've been working on frequency conversions for a while now. The general idea in a frequency conversion is to keep the same date, but change precision as necessary. A conversion from year to hour will gain precision (keeping defaulted months, days, and hours). A conversion from seconds to months will lose precision. The important thing to remember is the dates must be as similar as possible after the conversion.

The problem with frequency conversions (other than the monotony of the whole endeavor) is the "awkward" dates like business days and weeks. Weeks (in the NumPy DateTime implementation) only occur on Sundays and Business Days will skip any weekends. So what happens when you have a Monday for a conversion to Weeks? Do we go forward to the next Sunday? The nearest Sunday? The previous Sunday? This is an awkward operation, but the user should be able to anticipate the results.

TimeSeries conversions work very simply (and I'm basing some of my conversion routines on theirs): the long value is converted into days, then the days to X frequency conversion routine is run. This leaves the program with a simple calculation to determine the results of the remaining (if any) precisions (hours, minutes, seconds, etc), which isn't very difficult. Luckily, most of the operations are trivial (seconds to milliseconds, seconds to microseconds, seconds to nanoseconds) and haven't been much trouble. The ones to worry about are weeks and business days. Those can be a little bit nasty.

The problem with frequency conversions (other than the monotony of the whole endeavor) is the "awkward" dates like business days and weeks. Weeks (in the NumPy DateTime implementation) only occur on Sundays and Business Days will skip any weekends. So what happens when you have a Monday for a conversion to Weeks? Do we go forward to the next Sunday? The nearest Sunday? The previous Sunday? This is an awkward operation, but the user should be able to anticipate the results.

TimeSeries conversions work very simply (and I'm basing some of my conversion routines on theirs): the long value is converted into days, then the days to X frequency conversion routine is run. This leaves the program with a simple calculation to determine the results of the remaining (if any) precisions (hours, minutes, seconds, etc), which isn't very difficult. Luckily, most of the operations are trivial (seconds to milliseconds, seconds to microseconds, seconds to nanoseconds) and haven't been much trouble. The ones to worry about are weeks and business days. Those can be a little bit nasty.

## Saturday, July 25, 2009

### Datestring

Well, this is annoying (but complete, at least). I've been able to convert longs to Python DateTime Objects based on a frequency, but what about outputting them? It would be nice to have a function that could print out the date in a readable format, no? Unfortunately, there's a few problems with this seemingly trivial process.

First off, there's a lot of different ways to print the date. Do we print the weekday? If so, where do we print it? Do we print months first, or years first, or days first? I realize most of this has become "standardized" (whatever that means these days...) but with so many possibilities, it's difficult to create a "perfect" format that could satisfy anyone.

The second (real) problem is that PyStrings are horrible with formatting. There's a function in the Python C - API called PyString_FromFormat() which works very similarly to printf(). It does not, unfortunately, work similarly enough. PyString_FromFormat() completely ignores any minimum length formats and precisions placed on the input data. If I want to print a YYYY-MM-DD string format with exactly four numbers for years, 2 for months, and 2 for days, I can't do that without formatting the strings myself (in C). Is this catastrophic? No. But it's certainly annoying...

And to finish off, a quick aside: standardizing years at 4 digits is a bad idea. What about years post 9999? Maybe someone in some scientific research lab is going to grow really old (cryogenic technology is advancing fairly steadily, after all...). But years below 1000 look... awkward in a YYYY-MM-DD format (which is the current one the tests are written for...).

Oh, and Python DateTime has a printing method of it's own:

See? They standardize at YYYY and error out above the year 9999... but of course none of this is available on the C end (as far as I can tell), so this is of little use to me... Besides, NumPy should be able to handle dates above 9999. It's the principle of the thing.

Also, sorry to Matt Knox about not reading your comment until now. That calculation would have saved me a lot of time (and blood pressure). I didn't have the blog set up to email me when people commented. I do now. Thank you, again.

First off, there's a lot of different ways to print the date. Do we print the weekday? If so, where do we print it? Do we print months first, or years first, or days first? I realize most of this has become "standardized" (whatever that means these days...) but with so many possibilities, it's difficult to create a "perfect" format that could satisfy anyone.

The second (real) problem is that PyStrings are horrible with formatting. There's a function in the Python C - API called PyString_FromFormat() which works very similarly to printf(). It does not, unfortunately, work similarly enough. PyString_FromFormat() completely ignores any minimum length formats and precisions placed on the input data. If I want to print a YYYY-MM-DD string format with exactly four numbers for years, 2 for months, and 2 for days, I can't do that without formatting the strings myself (in C). Is this catastrophic? No. But it's certainly annoying...

And to finish off, a quick aside: standardizing years at 4 digits is a bad idea. What about years post 9999? Maybe someone in some scientific research lab is going to grow really old (cryogenic technology is advancing fairly steadily, after all...). But years below 1000 look... awkward in a YYYY-MM-DD format (which is the current one the tests are written for...).

Oh, and Python DateTime has a printing method of it's own:

>>> print datetime.datetime.now()

2009-07-25 03:55:31.884781

>>> print datetime.datetime(999,1,1)

0999-01-01 00:00:00

>>> print datetime.datetime(10001,1,1)

Traceback (most recent call last):

File "", line 1, in

ValueError: year is out of range

See? They standardize at YYYY and error out above the year 9999... but of course none of this is available on the C end (as far as I can tell), so this is of little use to me... Besides, NumPy should be able to handle dates above 9999. It's the principle of the thing.

Also, sorry to Matt Knox about not reading your comment until now. That calculation would have saved me a lot of time (and blood pressure). I didn't have the blog set up to email me when people commented. I do now. Thank you, again.

## Tuesday, July 21, 2009

### Long to Datetime

I'm absolutely stuck on this one calculation. Everything else is working perfectly. I can't figure out how to calculate the calendar given a long "number of business days" since 1970. The funny thing is, I figured it out fine going the other way (that is, datetime to long). Something is fishy in my calculations (I've tested the test numbers again and again).

Also, since you suggested using smaller structs (since femtosecond has no need of knowing month, etc), I started using ymdstruct (year, month, day structs) and hmsstruct (hour, minute, second structs). Send a long "number of days" to long_to_ymdstruct() or "number of seconds" to long_to_hmsstruct() and the function will return the appropriate struct. So to calculate a calendar date, it's just a matter of converting given frequency to days or seconds (depending on precision of frequencies... for the Business Day case, we need to convert to days by using the ymdstruct).

I may be overly complicating things. I'm not totally confident in the efficiency of structs in C (I know you should try to have structs contain a base two number of items so the compiler can do a cheap shift instead of an expensive multiplication to find members). Unfortunately, both ymdstruct and hmsstruct seem like unnecessary baggage for the long_to_datestruct function, since this function returns neither a ymdstruct nor an hmsstruct (it returns a datestruct, which is a seperate struct entirely...). There really has to be a better way of doing this... I just haven't thought of it yet...

But regardless, I still can't figure out these business day calculations. The most annoying part of the entire endeavor is that January 1, 1970 is on a Thursday... So I know I have to add some offset or subtract some other offset... but what are they? The closest calculation I could try to find (remember, I'm only trying to turn business days into absolute days to fill my ymdstruct):

absdays = 7 * (dlong / 5) + dlong % 5

where dlong is the long long value representing a date and absdays are the absolute number of days (since Jan 1, 1970, specifically).

Also, since you suggested using smaller structs (since femtosecond has no need of knowing month, etc), I started using ymdstruct (year, month, day structs) and hmsstruct (hour, minute, second structs). Send a long "number of days" to long_to_ymdstruct() or "number of seconds" to long_to_hmsstruct() and the function will return the appropriate struct. So to calculate a calendar date, it's just a matter of converting given frequency to days or seconds (depending on precision of frequencies... for the Business Day case, we need to convert to days by using the ymdstruct).

I may be overly complicating things. I'm not totally confident in the efficiency of structs in C (I know you should try to have structs contain a base two number of items so the compiler can do a cheap shift instead of an expensive multiplication to find members). Unfortunately, both ymdstruct and hmsstruct seem like unnecessary baggage for the long_to_datestruct function, since this function returns neither a ymdstruct nor an hmsstruct (it returns a datestruct, which is a seperate struct entirely...). There really has to be a better way of doing this... I just haven't thought of it yet...

But regardless, I still can't figure out these business day calculations. The most annoying part of the entire endeavor is that January 1, 1970 is on a Thursday... So I know I have to add some offset or subtract some other offset... but what are they? The closest calculation I could try to find (remember, I'm only trying to turn business days into absolute days to fill my ymdstruct):

absdays = 7 * (dlong / 5) + dlong % 5

where dlong is the long long value representing a date and absdays are the absolute number of days (since Jan 1, 1970, specifically).

## Sunday, July 12, 2009

### Git

Sorry for the blogless time lapse. I needed a little push to pump out some code. Check it out for yourself:

Clone URL: git://github.com/martyfuhry/npy_datetime.gitThe important code is in the parsing directory (you'll need to navigate there and run the setup.py build and install). The instructions in the readme are a little antiquated, but the tests directory (in the parsing folder) should clear things up.

## Tuesday, June 30, 2009

### From Here On Out, It's Math

The code is coming along nicely.

A few things will need to be changed, though...

The frequency will need to be parsed, too, though, since Travis needs to support "custom" frequencies (read more about them here). Perhaps this will call for a second Python callback function, as parsing Strings with regular expressions is relatively easy in Python and difficult at best in C.

The mxDateTime parser returns the Python datetime object, which only supports time units up to the nanosecond (if I recall correctly...). The NumPy DateTime module must support units as high as femtoseconds. Hopefully this will be doable with just a couple of lines of code added to the parser.

>>> p.parse_date("01/01/1980", "Y")Real simple. First, we import the parsedates module, then we set the callback function to the mxDateTime Parser. The parse_date function takes a String of a date, which it passes to the mxDateTime Parsing function and a String for a frequency, which it converts into an int (to be stored internally). The Parsing function for the date returns a Python DateTime object (which, for my use, is basically a tuple filled with (year, month, day, hour, minute, second, etc.). I can extract this and pass all of these into a master function to calculate the date. The frequency is taken from the second String (and proper error messages are awarded in the event of a bizzare frequency) and stored internally (as an int for now).

(10L, 9L)

A few things will need to be changed, though...

The frequency will need to be parsed, too, though, since Travis needs to support "custom" frequencies (read more about them here). Perhaps this will call for a second Python callback function, as parsing Strings with regular expressions is relatively easy in Python and difficult at best in C.

The mxDateTime parser returns the Python datetime object, which only supports time units up to the nanosecond (if I recall correctly...). The NumPy DateTime module must support units as high as femtoseconds. Hopefully this will be doable with just a couple of lines of code added to the parser.

Subscribe to:
Posts (Atom)