Thursday, June 11, 2009

A Sparse Documentation

A bit of a frustrating last couple of days. I've been all over the place with my research, which means I kept getting lost and confused. First I tried to figure out how to incorporate a scalar datatype into a NumPy dtype. Then I got lost trying to learn some more advanced Python C-API on object handling. I couldn't figure out quite how to make my datetime object play nice with frequencies. So, off I went venturing into the land of the NumPy C-API.

The documenation is very well written, but not exactly geared towards my project. I read, "The best way to truly understand the C-API is to read the source code." So I ran right to the source. The last few days have been a marathon of running over code and trying to understand how everything fits together.

Communication is key. Yeah, we say that for relationships and other insignificant things, but I'm talking about source code. The NumPy source code has a significant amount of C code, which can be overwhelming at times. I've decided the only way to understand what's going is to document it for myself. I've been literally running through each significant file and using pen and paper to pin down exactly what's going on. I'm interested in anything related to the Array Scalar Types, specifically the LongLong type. Both the datetime and timedelta types will very similar to the LongLong type. There are key differences, which I guess I should talk about. But I'll save that for tomorrow. More on the NumPy source code:

There are some fancy repetition techniques employed in the definitions of these scalar types. Commented out above each "generic" method is a comma separated list of each scalar type which. The "generic" methods have a variable (macro?) to replace the name and repeat for each scalar type in the comma separated values. I know how it works, but I don't know why.

There's been a very important update on my project. More on that later this week.

No comments:

Post a Comment