The interesting thing about the astype() frequency conversion is the second argument. astype operates on a narray of datetime64 (or timedelta64) types and accepts one argument of a string representing a dtype (either datetime64[X] or timedelta64[X]). I don't know what Generic UFunc to use to handle these inputs. Most ufuncs operate on inputs like narray of floats + narray of floats to result an narray of floats. This ufunc takes an array of datetime64 and a string.
Looks like more internal NumPy work.
Saturday, August 15, 2009
Wednesday, August 12, 2009
ufuncs
This morning I had a healthy dose of segfaults with my coffee while I fought with the NumPy ufunc API before realizing my install of NumPy was borged (or something...). After reinstalling a stable version of NumPy (1.30), I began learning about writing ufuncs.
Python lists are not efficient to operate on, since each element in the list is a PyObject. Fortunately, NumPy lists (narrays) are very efficient, since each element in the narray is just a contiguous amount of data represented by a dtype. A ufunc is an object which operates on the data in the narray.
The NumPy C API for writing ufuncs is pretty simple and fairly straightforward. The ufunc object is created using a method called PyUFunc_FromFuncAndData() which takes an array of actual generic ufunc functions (more on that later), an array of "data" ufunc C functions (more on that later), an array of signatures to tell NumPy how many arguments go into the ufunc and how many come out.
The NumPy C API comes with generic ufunc functions for iterating over data in the narray. These are things like PyUFunc_ff_f (which I assume means float + float with a result of float... which would explain my current problem...). The array of "data" ufunc C functions aforementioned refer to actual functions written in C by myself to operate on the data passed by the ufunc. The ufunc "data" function I've written will be to convert frequencies based on the frequency input. The functions must have the same parameters, but there's plenty of room to store my frequencies to be converted.
This is looking simpler than I thought. Once I get through the learning curve, that is.
Python lists are not efficient to operate on, since each element in the list is a PyObject. Fortunately, NumPy lists (narrays) are very efficient, since each element in the narray is just a contiguous amount of data represented by a dtype. A ufunc is an object which operates on the data in the narray.
The NumPy C API for writing ufuncs is pretty simple and fairly straightforward. The ufunc object is created using a method called PyUFunc_FromFuncAndData() which takes an array of actual generic ufunc functions (more on that later), an array of "data" ufunc C functions (more on that later), an array of signatures to tell NumPy how many arguments go into the ufunc and how many come out.
The NumPy C API comes with generic ufunc functions for iterating over data in the narray. These are things like PyUFunc_ff_f (which I assume means float + float with a result of float... which would explain my current problem...). The array of "data" ufunc C functions aforementioned refer to actual functions written in C by myself to operate on the data passed by the ufunc. The ufunc "data" function I've written will be to convert frequencies based on the frequency input. The functions must have the same parameters, but there's plenty of room to store my frequencies to be converted.
This is looking simpler than I thought. Once I get through the learning curve, that is.
Tuesday, August 4, 2009
Frequency Conversions
I've been working on frequency conversions for a while now. The general idea in a frequency conversion is to keep the same date, but change precision as necessary. A conversion from year to hour will gain precision (keeping defaulted months, days, and hours). A conversion from seconds to months will lose precision. The important thing to remember is the dates must be as similar as possible after the conversion.
The problem with frequency conversions (other than the monotony of the whole endeavor) is the "awkward" dates like business days and weeks. Weeks (in the NumPy DateTime implementation) only occur on Sundays and Business Days will skip any weekends. So what happens when you have a Monday for a conversion to Weeks? Do we go forward to the next Sunday? The nearest Sunday? The previous Sunday? This is an awkward operation, but the user should be able to anticipate the results.
TimeSeries conversions work very simply (and I'm basing some of my conversion routines on theirs): the long value is converted into days, then the days to X frequency conversion routine is run. This leaves the program with a simple calculation to determine the results of the remaining (if any) precisions (hours, minutes, seconds, etc), which isn't very difficult. Luckily, most of the operations are trivial (seconds to milliseconds, seconds to microseconds, seconds to nanoseconds) and haven't been much trouble. The ones to worry about are weeks and business days. Those can be a little bit nasty.
The problem with frequency conversions (other than the monotony of the whole endeavor) is the "awkward" dates like business days and weeks. Weeks (in the NumPy DateTime implementation) only occur on Sundays and Business Days will skip any weekends. So what happens when you have a Monday for a conversion to Weeks? Do we go forward to the next Sunday? The nearest Sunday? The previous Sunday? This is an awkward operation, but the user should be able to anticipate the results.
TimeSeries conversions work very simply (and I'm basing some of my conversion routines on theirs): the long value is converted into days, then the days to X frequency conversion routine is run. This leaves the program with a simple calculation to determine the results of the remaining (if any) precisions (hours, minutes, seconds, etc), which isn't very difficult. Luckily, most of the operations are trivial (seconds to milliseconds, seconds to microseconds, seconds to nanoseconds) and haven't been much trouble. The ones to worry about are weeks and business days. Those can be a little bit nasty.
Saturday, July 25, 2009
Datestring
Well, this is annoying (but complete, at least). I've been able to convert longs to Python DateTime Objects based on a frequency, but what about outputting them? It would be nice to have a function that could print out the date in a readable format, no? Unfortunately, there's a few problems with this seemingly trivial process.
First off, there's a lot of different ways to print the date. Do we print the weekday? If so, where do we print it? Do we print months first, or years first, or days first? I realize most of this has become "standardized" (whatever that means these days...) but with so many possibilities, it's difficult to create a "perfect" format that could satisfy anyone.
The second (real) problem is that PyStrings are horrible with formatting. There's a function in the Python C - API called PyString_FromFormat() which works very similarly to printf(). It does not, unfortunately, work similarly enough. PyString_FromFormat() completely ignores any minimum length formats and precisions placed on the input data. If I want to print a YYYY-MM-DD string format with exactly four numbers for years, 2 for months, and 2 for days, I can't do that without formatting the strings myself (in C). Is this catastrophic? No. But it's certainly annoying...
And to finish off, a quick aside: standardizing years at 4 digits is a bad idea. What about years post 9999? Maybe someone in some scientific research lab is going to grow really old (cryogenic technology is advancing fairly steadily, after all...). But years below 1000 look... awkward in a YYYY-MM-DD format (which is the current one the tests are written for...).
Oh, and Python DateTime has a printing method of it's own:
See? They standardize at YYYY and error out above the year 9999... but of course none of this is available on the C end (as far as I can tell), so this is of little use to me... Besides, NumPy should be able to handle dates above 9999. It's the principle of the thing.
Also, sorry to Matt Knox about not reading your comment until now. That calculation would have saved me a lot of time (and blood pressure). I didn't have the blog set up to email me when people commented. I do now. Thank you, again.
First off, there's a lot of different ways to print the date. Do we print the weekday? If so, where do we print it? Do we print months first, or years first, or days first? I realize most of this has become "standardized" (whatever that means these days...) but with so many possibilities, it's difficult to create a "perfect" format that could satisfy anyone.
The second (real) problem is that PyStrings are horrible with formatting. There's a function in the Python C - API called PyString_FromFormat() which works very similarly to printf(). It does not, unfortunately, work similarly enough. PyString_FromFormat() completely ignores any minimum length formats and precisions placed on the input data. If I want to print a YYYY-MM-DD string format with exactly four numbers for years, 2 for months, and 2 for days, I can't do that without formatting the strings myself (in C). Is this catastrophic? No. But it's certainly annoying...
And to finish off, a quick aside: standardizing years at 4 digits is a bad idea. What about years post 9999? Maybe someone in some scientific research lab is going to grow really old (cryogenic technology is advancing fairly steadily, after all...). But years below 1000 look... awkward in a YYYY-MM-DD format (which is the current one the tests are written for...).
Oh, and Python DateTime has a printing method of it's own:
>>> print datetime.datetime.now()
2009-07-25 03:55:31.884781
>>> print datetime.datetime(999,1,1)
0999-01-01 00:00:00
>>> print datetime.datetime(10001,1,1)
Traceback (most recent call last):
File "", line 1, in
ValueError: year is out of range
See? They standardize at YYYY and error out above the year 9999... but of course none of this is available on the C end (as far as I can tell), so this is of little use to me... Besides, NumPy should be able to handle dates above 9999. It's the principle of the thing.
Also, sorry to Matt Knox about not reading your comment until now. That calculation would have saved me a lot of time (and blood pressure). I didn't have the blog set up to email me when people commented. I do now. Thank you, again.
Tuesday, July 21, 2009
Long to Datetime
I'm absolutely stuck on this one calculation. Everything else is working perfectly. I can't figure out how to calculate the calendar given a long "number of business days" since 1970. The funny thing is, I figured it out fine going the other way (that is, datetime to long). Something is fishy in my calculations (I've tested the test numbers again and again).
Also, since you suggested using smaller structs (since femtosecond has no need of knowing month, etc), I started using ymdstruct (year, month, day structs) and hmsstruct (hour, minute, second structs). Send a long "number of days" to long_to_ymdstruct() or "number of seconds" to long_to_hmsstruct() and the function will return the appropriate struct. So to calculate a calendar date, it's just a matter of converting given frequency to days or seconds (depending on precision of frequencies... for the Business Day case, we need to convert to days by using the ymdstruct).
I may be overly complicating things. I'm not totally confident in the efficiency of structs in C (I know you should try to have structs contain a base two number of items so the compiler can do a cheap shift instead of an expensive multiplication to find members). Unfortunately, both ymdstruct and hmsstruct seem like unnecessary baggage for the long_to_datestruct function, since this function returns neither a ymdstruct nor an hmsstruct (it returns a datestruct, which is a seperate struct entirely...). There really has to be a better way of doing this... I just haven't thought of it yet...
But regardless, I still can't figure out these business day calculations. The most annoying part of the entire endeavor is that January 1, 1970 is on a Thursday... So I know I have to add some offset or subtract some other offset... but what are they? The closest calculation I could try to find (remember, I'm only trying to turn business days into absolute days to fill my ymdstruct):
absdays = 7 * (dlong / 5) + dlong % 5
where dlong is the long long value representing a date and absdays are the absolute number of days (since Jan 1, 1970, specifically).
Also, since you suggested using smaller structs (since femtosecond has no need of knowing month, etc), I started using ymdstruct (year, month, day structs) and hmsstruct (hour, minute, second structs). Send a long "number of days" to long_to_ymdstruct() or "number of seconds" to long_to_hmsstruct() and the function will return the appropriate struct. So to calculate a calendar date, it's just a matter of converting given frequency to days or seconds (depending on precision of frequencies... for the Business Day case, we need to convert to days by using the ymdstruct).
I may be overly complicating things. I'm not totally confident in the efficiency of structs in C (I know you should try to have structs contain a base two number of items so the compiler can do a cheap shift instead of an expensive multiplication to find members). Unfortunately, both ymdstruct and hmsstruct seem like unnecessary baggage for the long_to_datestruct function, since this function returns neither a ymdstruct nor an hmsstruct (it returns a datestruct, which is a seperate struct entirely...). There really has to be a better way of doing this... I just haven't thought of it yet...
But regardless, I still can't figure out these business day calculations. The most annoying part of the entire endeavor is that January 1, 1970 is on a Thursday... So I know I have to add some offset or subtract some other offset... but what are they? The closest calculation I could try to find (remember, I'm only trying to turn business days into absolute days to fill my ymdstruct):
absdays = 7 * (dlong / 5) + dlong % 5
where dlong is the long long value representing a date and absdays are the absolute number of days (since Jan 1, 1970, specifically).
Sunday, July 12, 2009
Git
Sorry for the blogless time lapse. I needed a little push to pump out some code. Check it out for yourself:
Clone URL: git://github.com/martyfuhry/npy_datetime.gitThe important code is in the parsing directory (you'll need to navigate there and run the setup.py build and install). The instructions in the readme are a little antiquated, but the tests directory (in the parsing folder) should clear things up.
Tuesday, June 30, 2009
From Here On Out, It's Math
The code is coming along nicely.
A few things will need to be changed, though...
The frequency will need to be parsed, too, though, since Travis needs to support "custom" frequencies (read more about them here). Perhaps this will call for a second Python callback function, as parsing Strings with regular expressions is relatively easy in Python and difficult at best in C.
The mxDateTime parser returns the Python datetime object, which only supports time units up to the nanosecond (if I recall correctly...). The NumPy DateTime module must support units as high as femtoseconds. Hopefully this will be doable with just a couple of lines of code added to the parser.
>>> p.parse_date("01/01/1980", "Y")Real simple. First, we import the parsedates module, then we set the callback function to the mxDateTime Parser. The parse_date function takes a String of a date, which it passes to the mxDateTime Parsing function and a String for a frequency, which it converts into an int (to be stored internally). The Parsing function for the date returns a Python DateTime object (which, for my use, is basically a tuple filled with (year, month, day, hour, minute, second, etc.). I can extract this and pass all of these into a master function to calculate the date. The frequency is taken from the second String (and proper error messages are awarded in the event of a bizzare frequency) and stored internally (as an int for now).
(10L, 9L)
A few things will need to be changed, though...
The frequency will need to be parsed, too, though, since Travis needs to support "custom" frequencies (read more about them here). Perhaps this will call for a second Python callback function, as parsing Strings with regular expressions is relatively easy in Python and difficult at best in C.
The mxDateTime parser returns the Python datetime object, which only supports time units up to the nanosecond (if I recall correctly...). The NumPy DateTime module must support units as high as femtoseconds. Hopefully this will be doable with just a couple of lines of code added to the parser.
Thursday, June 25, 2009
Parsed!
The callback worked! The code previously posted only had to be slightly modified.
parsedates.set_callback(timeseries_parse.DateTimeFromString)I have the mxDateTime Parser modified from the Scikits Timeseries imported here as timeseries_parse. In that program is a magical date parsing function called DateTimeFromString (and a similar DateFromString) which takes a string and returns a Python datetime object filled with the correct date amounts.
parsedates.parse_date("01/01/2001")So here we see a datetime object with (Year, Month, Day, Hour, Minute, Second). Turning that into a long number is very easy and it all depends on the frequency metadata. If our frequency is in years, then our number is (Year - 1970) . We can convert this data into a long value very easily.
datetime.datetime(2001, 1, 1, 0, 0)
Tuesday, June 23, 2009
Callbacks
Sometimes you need to run a Python code segment from C in your module. There's a lot of good reasons to do this. C doesn't have much support for writing regular expressions, while Python is pretty robust. You can take a PyObject with a C string and send it to a Python parsing function. When the Python code is done manipulating the PyObject, you have it back safe and sound in C.
Writing callback functions is pretty easy.
Writing callback functions is pretty easy.
static PyObject *callback = NULL;This function takes a dummy object and some arguments. It stores a callable function into a global PyObject named callback. You can later use this global PyObject with the callback function stored in it like this:
static PyObject *
set_callback(PyObject *dummy, PyObject *args)
{
PyObject *result = NULL;
PyObject *temp;
if (PyArg_ParseTuple(args, "O:set_callback", &temp))
{
if (!PyCallable_Check(temp))
{
PyErr_SetString(PyExc_TypeError, "parameter must be callable");
return NULL;
}
// Reference to new callback
Py_XINCREF(temp);
// Dispose of previous callback
Py_XDECREF(callback);
// Remember new callback
callback = temp;
// Boilerplate to return "None"
Py_INCREF(Py_None);
result = Py_None;
}
return result;
}
result = PyEval_CallObject(callback, arglist);Result is a PyObject (pointer) with the result of the callback function stored in it. Here's a pretty simple example:
def add_ftn(a,b):We set the callback PyObject to store the callable Python function add_ftn. We can test the callback by running the code above with the PyEval_CallObject(callback, arglist). This C method will take arguments (for add_ftn, we need two) and send them to the callback function stored in the global PyObject variable callback.
return a + b
parsedates.set_callback(add_ftn)
Thursday, June 18, 2009
Enthought and Other Developments
In the words of my Mentor, Pierre,
This is really a godsend, since my knowledge of low level NumPy is quite sparse. This is the nature of open source, it would seem: collaboration between the ignorant and learning (me) and the experienced brilliance which created the foundations and core of these massive projects.
I've been commissioned to write two sets of code. The first set is to get and set datetime members of the narrays. The second will be incorporating the mxDateTime Parsing module for strings to datetime conversions.
More on those later.
Enthought is a private company based in Austin, TX, founded by Eric Jones, a long-time Pythonista. Enthought's main source of revenue is the programming of specific scientific application.Enthought recently had a client request a datetime type exactly like the type I've been working on. Enthought is a prominent contributer to NumPy. Travis Oliphant will be working on the new datetime dtype, himself. This is the guy who literally wrote the book on NumPy. And I get the privelage of assisting him.
This is really a godsend, since my knowledge of low level NumPy is quite sparse. This is the nature of open source, it would seem: collaboration between the ignorant and learning (me) and the experienced brilliance which created the foundations and core of these massive projects.
I've been commissioned to write two sets of code. The first set is to get and set datetime members of the narrays. The second will be incorporating the mxDateTime Parsing module for strings to datetime conversions.
More on those later.
Thursday, June 11, 2009
A Sparse Documentation
A bit of a frustrating last couple of days. I've been all over the place with my research, which means I kept getting lost and confused. First I tried to figure out how to incorporate a scalar datatype into a NumPy dtype. Then I got lost trying to learn some more advanced Python C-API on object handling. I couldn't figure out quite how to make my datetime object play nice with frequencies. So, off I went venturing into the land of the NumPy C-API.
The documenation is very well written, but not exactly geared towards my project. I read, "The best way to truly understand the C-API is to read the source code." So I ran right to the source. The last few days have been a marathon of running over code and trying to understand how everything fits together.
Communication is key. Yeah, we say that for relationships and other insignificant things, but I'm talking about source code. The NumPy source code has a significant amount of C code, which can be overwhelming at times. I've decided the only way to understand what's going is to document it for myself. I've been literally running through each significant file and using pen and paper to pin down exactly what's going on. I'm interested in anything related to the Array Scalar Types, specifically the LongLong type. Both the datetime and timedelta types will very similar to the LongLong type. There are key differences, which I guess I should talk about. But I'll save that for tomorrow. More on the NumPy source code:
There are some fancy repetition techniques employed in the definitions of these scalar types. Commented out above each "generic" method is a comma separated list of each scalar type which. The "generic" methods have a variable (macro?) to replace the name and repeat for each scalar type in the comma separated values. I know how it works, but I don't know why.
There's been a very important update on my project. More on that later this week.
The documenation is very well written, but not exactly geared towards my project. I read, "The best way to truly understand the C-API is to read the source code." So I ran right to the source. The last few days have been a marathon of running over code and trying to understand how everything fits together.
Communication is key. Yeah, we say that for relationships and other insignificant things, but I'm talking about source code. The NumPy source code has a significant amount of C code, which can be overwhelming at times. I've decided the only way to understand what's going is to document it for myself. I've been literally running through each significant file and using pen and paper to pin down exactly what's going on. I'm interested in anything related to the Array Scalar Types, specifically the LongLong type. Both the datetime and timedelta types will very similar to the LongLong type. There are key differences, which I guess I should talk about. But I'll save that for tomorrow. More on the NumPy source code:
There are some fancy repetition techniques employed in the definitions of these scalar types. Commented out above each "generic" method is a comma separated list of each scalar type which. The "generic" methods have a variable (macro?) to replace the name and repeat for each scalar type in the comma separated values. I know how it works, but I don't know why.
There's been a very important update on my project. More on that later this week.
Monday, June 8, 2009
Scalar Objects VS dtype
Just a clarification:
When I store an object from a narray with dtype (for example) float64, this means the stored variable is of scalar object of float64's .type attribute.
>>> somearray = numpy.array([1,2,3,4,5], dtype='float64')
>>> somefloat = somearray[0]
>>> type(somefloat)
<type 'numpy.float64'>
So numpy.float64 is a scalar type. The dtype is only there to tell the numpy array how to handle the data in the array.
When I store an object from a narray with dtype (for example) float64, this means the stored variable is of scalar object of float64's .type attribute.
>>> somearray = numpy.array([1,2,3,4,5], dtype='float64')
>>> somefloat = somearray[0]
>>> type(somefloat)
<type 'numpy.float64'>
So numpy.float64 is a scalar type. The dtype is only there to tell the numpy array how to handle the data in the array.
dtypes
A NumPy dtype is not a normal Python object. Dtypes tell the narray how to interpret the array. A dtype is a way to specify exactly what every member in the narray is.
>>> numpy.array([1,2,3,4,5], dtype='int32')The NumPy array is able to take a list of data [1,2,3,4,5] and a dtype to refer to that data (dtype='int32'). When I create this narray, the dtype='int32' makes the list of data be interpretted as a list of 32 bit integers. See what happens when I change the dtype to a float:
array([1, 2, 3, 4, 5], dtype=int32)
>>> numpy.array([1,2,3,4,5], dtype='float64')The data inside of the narray is now interpreted as an array of 64 bit floats. My goal is to make a new one of these dtypes.
array([ 1., 2., 3., 4., 5.])
>>> numpy.array(["12-3-2009"], dtype='datetime64[D]')I've been working on creating some kind of separate module with datetime64 as a scalar object type. This is not the project goal. I need to create a numpy array dtype datetime64 and timedelta64 for use in the narrays. I've been sifting through NumPy's core code all weekend and can't seem to find the file(s) where dtypes are referred to. My current plan is to take these already created dtypes' chunk of code and copy paste so I can start with something barebones and work my way up
array([12-3-2009])
Wednesday, June 3, 2009
Parsing
Yuck. I've hit a wall and it hurts.
This line of code should take the args sent to the function, use whatever kwds were supplied to identify those arguments, and parse them as two PyObjects. The kwlist is used to identify which kwds refer to which recipients of the PyObject variables.
Let's give it this input.
We've created a datetime64 object and (allegedly) given it values 1 for both time and freq. This should parse so that we create two PyObjects with values '1' and the different kwds ("time" and "freq"). I should be able to extract those values by simply checking the appropriate PyObjects for their respective keywords. But, alas, if life were easy, it would be boring.
This resolves to false. Why? This is the exact same implementation as the TimeSeries Date type. I mean, I almost copied this. I don't understand why the arguments are being completely discraded. I can very easily parse to something else, like longs or ints, but that wouldn't that just be a workaround? Maybe not...
I'll be trying to parse:
Where do the keywords go, now? I assumed they were placed into some kind of magical PyObject slot. But since I'm parsing to an int and a long long int ("iL"), I wonder what happens to the keywords?
Yes, I've been neglecting writing my Unit Tests, I know. But this is just so darned frustrating. Whether or not the Unit Tests even exist, I need to be able to create datetime64 types with appropriate values. I need to be able to comfortably be able to make datetime64 types with different values before doing anything fancy.
This is important, I promise. Now, off to parse.
if (! PyArg_ParseTupleAndKeywords(args, kwds, "OO", kwlist, &obj_time, &obj_freq))
This line of code should take the args sent to the function, use whatever kwds were supplied to identify those arguments, and parse them as two PyObjects. The kwlist is used to identify which kwds refer to which recipients of the PyObject variables.
static char* kwlist[] = {"time", "freq", NULL};
Let's give it this input.
>>> print d.datetime64(time='1', freq='1')
We've created a datetime64 object and (allegedly) given it values 1 for both time and freq. This should parse so that we create two PyObjects with values '1' and the different kwds ("time" and "freq"). I should be able to extract those values by simply checking the appropriate PyObjects for their respective keywords. But, alas, if life were easy, it would be boring.
if (PyObject_HasAttrString(obj_time, "time"))
This resolves to false. Why? This is the exact same implementation as the TimeSeries Date type. I mean, I almost copied this. I don't understand why the arguments are being completely discraded. I can very easily parse to something else, like longs or ints, but that wouldn't that just be a workaround? Maybe not...
I'll be trying to parse:
if (! PyArg_ParseTupleAndKeywords(args, kwds, "iL", kwlist, freq, time))
Where do the keywords go, now? I assumed they were placed into some kind of magical PyObject slot. But since I'm parsing to an int and a long long int ("iL"), I wonder what happens to the keywords?
Yes, I've been neglecting writing my Unit Tests, I know. But this is just so darned frustrating. Whether or not the Unit Tests even exist, I need to be able to create datetime64 types with appropriate values. I need to be able to comfortably be able to make datetime64 types with different values before doing anything fancy.
This is important, I promise. Now, off to parse.
Monday, June 1, 2009
datetime64 Objects
I love having working code. Once I can get something to properly work at the most basic level, I slowly modify it from the ground up. The Python C API does not make this easy. In order to create even the most basic Python Object from a C module, you need to be acquainted with a host of obscure and often arcane code segments. I'll try to piece together the basic datetime64 object here.
First, as always, include the Python C API
I'm a little confused about this line, but I think it just tells the compiler how to interpret "datetime64Type" as a PyTypeObject.
We need to deallocate datetime64 objects. Later, we tell Python to use this function for just that.
Here we give Python a bunch of information about the datetime64 Object Types. We tell it the size of the object, the name, what to run when it's deallocated, the documentation, and other (sometimes irrelevant and no longer used) information.
The PyMethodDef is an array of every method we can use on objects of datetime64. We put in the name of the method, the function to call, the METH_VARARGS alias, and a description of the method. We could put (for example) "add" as an entry so that we can perform the operation "object.add()". This tells Python where to go when we call each method. The NULL references at the end are a sentinel for Python to know when we're done referencing methods.
Here's the big, important part of the code. We use the PyMODINIT_FUNC preprocessor directive to tell Python that this is our initialization function. Python will run this when we initialize objects of datetime64.
PyObject *date_object;initdatetime64(void)
{
Since we're not really doing anything fancy with our datetime64 objects yet, we use PyType_GenericNew to make a generic Python Object and store it under our datetime64Type.tp_new variable. Remember, the datetime64Type tells Python what kind of object a datetime64 object is. When we make the tp_new a generic type, we don't tell it much, but we at least set a type for it.
These next lines are possibly the most important method. Py_InitModule3 will create a new module object based on a name and table of functions. We give it "datetime64" to tell it the name, and the datetime64_methods array to tell Python what methods we can run on it.
Tell Python to increase the reference count for this type.
Next up for the day, timedelta64!
First, as always, include the Python C API
#include <Python.h>
I'm a little confused about this line, but I think it just tells the compiler how to interpret "datetime64Type" as a PyTypeObject.
staticforward PyTypeObject datetime64Type;The following is the actual datetime64 object, itself. We have a simple struct filled the PyObject_HEAD (a macro to put in the reference to the object's location, I think), freq (to tell us the frequency time refers to), and time (number of freq since the epoch, granted I use Unix Time).
typedef struct
{
PyObject_HEAD // macro used for refcount & pointer
int freq; // frequency of date_value
long long time; // 64 bit time since epoch
} datetime64;
We need to deallocate datetime64 objects. Later, we tell Python to use this function for just that.
static void
datetime64_dealloc(PyObject* self)
{
PyObject_Del(self);
}
Here we give Python a bunch of information about the datetime64 Object Types. We tell it the size of the object, the name, what to run when it's deallocated, the documentation, and other (sometimes irrelevant and no longer used) information.
static PyTypeObject datetime64Type = {
PyObject_HEAD_INIT(NULL)
0, /*ob_size*/
"datetime64.datetime64", /*tp_name*/
sizeof(datetime64), /*tp_basicsize*/
0, /*tp_itemsize*/
datetime64_dealloc, /*tp_dealloc*/
0, /*tp_print*/
0, /*tp_getattr*/
0, /*tp_setattr*/
0, /*tp_compare*/
0, /*tp_repr*/
0, /*tp_as_number*/
0, /*tp_as_sequence*/
0, /*tp_as_mapping*/
0, /*tp_hash */
0, /*tp_call*/
0, /*tp_str*/
0, /*tp_getattro*/
0, /*tp_setattro*/
0, /*tp_as_buffer*/
Py_TPFLAGS_DEFAULT, /*tp_flags*/
"datetime64 objects", /* tp_doc */
};
The PyMethodDef is an array of every method we can use on objects of datetime64. We put in the name of the method, the function to call, the METH_VARARGS alias, and a description of the method. We could put (for example) "add" as an entry so that we can perform the operation "object.add()". This tells Python where to go when we call each method. The NULL references at the end are a sentinel for Python to know when we're done referencing methods.
PyMethodDef datetime64_methods[] = {
{NULL, NULL, 0, NULL}};
Here's the big, important part of the code. We use the PyMODINIT_FUNC preprocessor directive to tell Python that this is our initialization function. Python will run this when we initialize objects of datetime64.
PyMODINIT_FUNCHere we create the date_object using a PyObject.
initdatetime64(void)
{
PyObject *date_object;initdatetime64(void)
{
PyObject *date_object;
Since we're not really doing anything fancy with our datetime64 objects yet, we use PyType_GenericNew to make a generic Python Object and store it under our datetime64Type.tp_new variable. Remember, the datetime64Type tells Python what kind of object a datetime64 object is. When we make the tp_new a generic type, we don't tell it much, but we at least set a type for it.
datetime64Type.tp_new = PyType_GenericNew;These lines will initialize the datetime64 object, and make sure it's a legit object. We'll return, otherwise.
if (PyType_Ready(&datetime64Type) < 0)
return;
These next lines are possibly the most important method. Py_InitModule3 will create a new module object based on a name and table of functions. We give it "datetime64" to tell it the name, and the datetime64_methods array to tell Python what methods we can run on it.
date_object = Py_InitModule3("datetime64", datetime64_methods,"DateTime64 module that creates a DateTime64 Object");
Py_INCREF(&datetime64Type);Add the datetime64Type to Python's module dictionary.
PyModule_AddObject(date_object, "datetime64", (PyObject *)&datetime64Type);There you have it. Let's run the build and install (install so I don't have to go looking for the .so file the setup.py file creates) and see if it worked.
}
>>> import datetime64 as dLooks like a Python Object to me! Since we didn't give Python any methods to run on it, and since the initialization of the object doesn't actually give the object any parameters to set, we can't do a whole lot with it... But hey! We can make them, right? I'll be defining Unit Tests in the next day or so and posting them here, so keep an eye out.
>>> day = d.datetime64()
>>> day
<datetime64.datetime64 object at 0x7f20d28350f0>
Next up for the day, timedelta64!
Wednesday, May 27, 2009
A Simple Python Module
Writing a Python Module in C is simple and quite effective. The following is a module to solve the Fibonacci series up to n. The C version of this module is four times faster than the Python equivalent.
[ All code taken from http://superjared.com/entry/anatomy-python-c-module/ ]This code was pretty easy and made sense. There are a few arcane functions in here that I can't make sense of. The C function type is PyObject*. This is because:
After creating the C file, you have to create a Python setup.py file which uses distutils to "compile" (I think that's the right word) the C module and make it usuable with Python. The file for the Fibonacci example looks like this:
[ All code taken from http://superjared.com/entry/anatomy-python-c-module/ ]
include <Python.h>
// Returns a list of the Fibonacci series up to n in args
static PyObject *fib(PyObject *self, PyObject *args)
{
int a = 0, b = 1, c, n;
// Parse args for a single integer
if (!PyArg_ParseTuple(args, "i", &n))
return NULL;
// Create a list to store the numbers using PyList
PyObject *list = PyList_New(0);
// Calculate the Fibonnaci series
while (b < n)
{
// Append b as an int to list
PyList_Append(list, PyInt_FromLong(b)); // Why FromLong?
c = a + b;
a = b;
b = c;
}
// returns a PyObject*
return list;
}
// Some array of PyMethodDefs
PyMethodDef methods[] = {
{"fib", fib, METH_VARARGS, "Returns a fibonnaci sequence as a list"},
{NULL, NULL, 0, NULL}
};
// Initialize the module
PyMODINIT_FUNC
initfib()
{
(void) Py_InitModule("fib", methods);
}
All object types are extensions of PyObject. This is a type which contains the information Python needs to treat a pointer to an object as an object.I'm going to need to look through the C API and get my stuff down straight before tomorrow when I begin writing the c_datetime64.c and c_timedelta64.c files.
(http://docs.python.org/c-api/structures.html)
After creating the C file, you have to create a Python setup.py file which uses distutils to "compile" (I think that's the right word) the C module and make it usuable with Python. The file for the Fibonacci example looks like this:
from distutils.core import setup, Extension
# Tells distutils that our module is located in fibonacci.c
setup(name = "Fib",
version = "1.0",
ext_modules = [Extension("fib", ["fibonacci.c"])])
This uses distutils to run some setup function. I won't be researching too much into distutils until later. For now, I understand that it can create Extensions from C files. Here's that Fibonacci code in action:
>>> import fib
>>> fib.fib(2131)
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597]
>>> fib.fib(10000)
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
Tuesday, May 26, 2009
Day 1
I've been having some awful graphics issues lately. Today, I decided to take a 10 minute break and update the drivers. That decision cost me about 6 hours while I pounded my fists in frustration trying to make these silly ATI proprietary drivers work. Finally, after a (should have been) obvious revelation, I discovered I was using Ubuntu 8.04 instead of 8.10. Good catch there, Marty. Anyway, problem solved - crisis averted. On to the research.
This is my first week working on a Summer of Code project, so I'll be logging a little more frequently here. I've decided to spend most of the day that I wasn't fiddling with graphics settings on researching gregorian versus POSIX date implementations. A gregorian temporal datatype would store the number of days since January 1st, 1 AD, which is how the TimeSeries module works. The POSIX implementation (also known as Unix Time) would store the number of seconds since the epoch: January 1st, 1970. As of this time, my proposal uses the POSIX (Unix) implementation. My lingering concern is the range of dates. Which implementation would give me a larger, faster, more accurate, and more useful range of dates? More on this tomorrow, I guess.
Unix Time does not count leap seconds.
Leap seconds are necessary because we don't measure time by the rotation of the earth. We measure the frequency of the caesium-133 atom changing states (or something). This method has produced an incredibly accurate representation of time (down to 10-9 seconds per day) and it's a lot easier than trying to find some Archimedean Point to observe the rotation of the earth. It's a lot easier to put a tiny thing in a test tube than to put a planet in a test tube.
There have been lots of different designs for how to store and compute time intervals. Luckily for my proposal, this type of thing has been done lots of times before me. Rather than reinvent the wheel, I can rest on a sound foundation. Lots of databases depend on time storage. I'll have to be checking those out tonight or tomorrow. But my main focus here isn't storage; it's computation. I need to provide quick and easy computation with something as complex as a date or a time interval.
Also, due for this week is the base creation of both datetime64 and timedelta64. Just the basic creation. Next week I'll worry with compatibility between the other Python Datetime. Hopefully, I can borrow some of the timeseries code.
This is my first week working on a Summer of Code project, so I'll be logging a little more frequently here. I've decided to spend most of the day that I wasn't fiddling with graphics settings on researching gregorian versus POSIX date implementations. A gregorian temporal datatype would store the number of days since January 1st, 1 AD, which is how the TimeSeries module works. The POSIX implementation (also known as Unix Time) would store the number of seconds since the epoch: January 1st, 1970. As of this time, my proposal uses the POSIX (Unix) implementation. My lingering concern is the range of dates. Which implementation would give me a larger, faster, more accurate, and more useful range of dates? More on this tomorrow, I guess.
Unix Time does not count leap seconds.
A leap second is a positive or negative one-second adjustment to the Coordinated Universal Time (UTC) time scale that keeps it close to mean solar time.So the UTC time scale was implemented to keep a very accurate time approximation. Days in the UTC time scale have 24 hours, which have 60 minutes, which usually have 60 seconds, but sometimes have 59 or 61 (Wikipedia). Most days in the UTC time scale have 86,400 seconds. This will give pretty accurate approximations for times.
-Wikipedia
Leap seconds are necessary because we don't measure time by the rotation of the earth. We measure the frequency of the caesium-133 atom changing states (or something). This method has produced an incredibly accurate representation of time (down to 10-9 seconds per day) and it's a lot easier than trying to find some Archimedean Point to observe the rotation of the earth. It's a lot easier to put a tiny thing in a test tube than to put a planet in a test tube.
There have been lots of different designs for how to store and compute time intervals. Luckily for my proposal, this type of thing has been done lots of times before me. Rather than reinvent the wheel, I can rest on a sound foundation. Lots of databases depend on time storage. I'll have to be checking those out tonight or tomorrow. But my main focus here isn't storage; it's computation. I need to provide quick and easy computation with something as complex as a date or a time interval.
Also, due for this week is the base creation of both datetime64 and timedelta64. Just the basic creation. Next week I'll worry with compatibility between the other Python Datetime. Hopefully, I can borrow some of the timeseries code.
Tuesday, April 21, 2009
I've Been Accepted!
I've been accepted in to the Google Summer of Code for 2009! Out of nearly 6000 applicants, only around 1000 were chosen to complete projects for Open Source Software Communities.
I've been chosen to implement a Date / Time type into the NumPy Python Module. After much consideration, I've decided to create a separate blog for my project from my personal blog. I'll be spilling out my time lines, progression, interesting code snippets, and frustrations here from time to time.
In the coming weeks before the program, I'll be reading O'Reilly's Python in A Nutshell and reveiwing (more) documentation for NumPy. I guess it's time to contact my mentor.
I've been chosen to implement a Date / Time type into the NumPy Python Module. After much consideration, I've decided to create a separate blog for my project from my personal blog. I'll be spilling out my time lines, progression, interesting code snippets, and frustrations here from time to time.
In the coming weeks before the program, I'll be reading O'Reilly's Python in A Nutshell and reveiwing (more) documentation for NumPy. I guess it's time to contact my mentor.
Subscribe to:
Posts (Atom)