July 2008

A funny thing happened on the way to forum. I fired up my virtual XP machine yesterday for the first time in a while, and I was prompted to upgrade to build 5608. OK. No problem. So I hit update, downloaded the 88 MB dmg package and waited for it to upgrade. Nothing. Bupkiss. Actually, Parallels just hung, bad. I had to force quit it and try to open the dmg package. No dice, the file was corrupt. I seemed to remember this happening the last time that I went for an automatic updare so I manually downloaded 5608, opened the dmg package and installed the update.

Then the fun began. Not only did Parallels hang when I tried to start the virtual machine, I got the grey screen of death! “YOU MUST REBOOT YOUR COMPUTER NOW!” Crap.

Well, looking up the problem in the “Knowledge base” I found that “Errors occur when you try to install or update Parallels Desktop, create or open virtual machines, load the required drivers to the guest OS” The handy solution? Reboot. Repair disk permissions. Reinstall. Why? Because evidently “Working in Mac OS for long periods of time without restart may lead to some minor glitches to appear in the system as a whole.”

Yup, feels like XP!

Blogged with the Flock Browser

OK, another night, another trial. I must say, tonight was a lot more fun than the last couple of nights, because I really felt that I learned something, which is really the whole point of this exercise. So the example I was trying to code tonight is a simple EOF of a 3D data series. This is something that I just had to code up at work today, so it was a perfect chance for me to try out Sage. For work I ended up altering an existing m-file and running the EOFs in Matlab, but that’s OK, because now I know what I expect to see after running this in Sage.
The data names have been changed to protect the innocent.

# Load in required modules
sage: from scipy.io.netcdf import *
sage: from pylab import *
sage: from scipy.stats.stats import nanmean
sage: import datetime

#Load data from NetCDF file
sage: ncfile = netcdf_file('file.nc','r')
sage: varnames = ncfile.variables.keys()
sage: varnames


#Now that I have the order I can load into arrays
sage: lon = ncfile.variables[varnames[0]][:]
sage: lat = ncfile.variables[varnames[2]][:]
sage: dates = ncfile.variables[varnames[1]][:]
sage: raw = ncfile.variables[varnames[3]][:,0:50,:] #I only want 50 records in Y
sage: data = raw.copy() #make a copy
sage: data.shape
(124, 50, 151)
sage: (ncycles, ny, nx) = data.shape

#deal with dates
sage: ncfile.variables[varnames[1]].attributes

{'axis': 'TIME',
'time_origin': '15-JAN-1901 00:00:00',
'units': 'HOURS since 1901-01-15 00:00:00'}

sage: off = datetime.datetime(1901,1,15,0,0,0)
sage: months = ones(ncycles)

sage: for i in range(0,ncycles):
....tdel = datetime.timedelta(days=dates[i]/24)
....td = off + tdel
....months[i] = td.month

sage: ind = where(raw<0)
sage: data[ind] = nan

And here was the first real bottleneck, as things just slowed to a crawl as python tried to find all the instances where the data was less than zero. This is something that is instantaneous in Matlab, and took over 30 seconds to go through 124*50*151 values. There must be a faster way to do this.

#Take out monthly averages
sage: mclim = ones((50,151))
sage: for i in range(1,13):
....index = where(months==i)[0]
....mclim = nanmean(data[index,:,:])
....data2[index,:,:] = data[index,:,:] - mclim

data2.shape = (ncycles, nx*ny)
ltmean = nanmean(data2) #get mean of each time series

#take out long term mean
sage: anom = data2.copy()
sage: for i in range(0,ncycles):
....anom[i,:] = data2[i,:] - ltmean

sage: EOF = nan_to_num(anom) #push land back to zero
sage: [u,s,v] = linalg.svd(EOF)
sage: for i in range(0,ncycles):#build array so that we can project eigenvalues back onto timeseries
....s2[i,i] = s[i]
sage: amp = dot(s2.transpose(),u.transpose()) #get amplitude
sage: spatial = v[0:4,:]# pull out spatial fields
sage: ratios = pow(s,2)/sum(pow(s,2))*100 #get %variance explained for each mode
sage: temp = spatial[0,:]
sage: temp.shape = (ny,nx) #push back to original dims
sage: plot(amp)
sage: savefig('amplitude.png')
sage: imshow(flipud(temp))
sage: savefig('spatial.png')


I actually really felt positive about this whole example as I really learned a lot more. This also was probably too large of an array to test out (measure twice cut once!) but it’s what I was working with so I wanted a real world example. The more that I worked in sage the more comfortable I felt as well. The geographic projection issue is still there, as well as some indexing speed issues, but overall, I was really impressed with the Sage/SciPy/NumPy experience today. Overall I feel that more of a transition was made for me last night/today. Which was great timing as a co-worker actually called me and asked if I knew of any free replacements for Matlab…

Technorati Tags:
, , , , , ,

This was quite possibly the worst idea for title naming that I could have thought of. Anyway, I played around a bit more tonight, and I thought that I would give an update to the three people that are waiting with bated breath.

Anywho, I decided to continue trying to map the data from the netcdf file onto a projection, and here’s what I ran into.

It looks like the basemap module is installed (as basemap) but that it depends on matplotlib > 0.98 and 0.91 is installed. I tried to be tricky and move my locally installed matplotlib over to the sage/local/lib/python2.5/site-packages directory but then that version of matplotlib needed a newer version of numpy than what was installed. At this point I tried

hostname $> sage -upgrade

to see if updated packages/modules were available. This started a huge chain reaction of downloads and source compiling to get to the latest, greatest versions. This process took exactly 59m10.482s to complete (I know because it told me!).

But once again, I get this error:

sage: from basemap import basemap

ImportError: your matplotlib is too old – basemap requires version 0.98 or higher, you have version 0.91.1

At this point though, it’s not working on either the linux or OSX platforms due to outdated dependencies, so either I need to find another way to plot mapped projections or use something else.

Again, this isn’t a knock against Sage, because I really don’t think that is an ideal test for this software. But honestly, a lot of why I went for this approach was to avoid having to use separate approaches for data manipulation and visualization, and this would be a common task. Matlab’s mapping toolbox is useless to me for plotting, so I end up using m_map, which is still not as good as GMT, but it gets the job done in house.

My main thoughts at this point are that it seems easy to get into dependency hell here, as one module upgrade can force another, and so on. At this point it’s another block of time spent on setup, and no result. Time to stop for the time being.

Technorati Tags:
, , , , , , ,

Part 1 of the sage experience was just installing the software. This was incredibly easy on both OSX and linux (CentOS 5.2 and Fedora 9). For the Fedora 9 install I just downloaded the latest version of Sage which was compiled for Fedora 8, and this seemed to be just fine.

So for me, I really just wanted to be able to do a few different examples which would be close to “real world applications” for me.

Some things that I would like to be able to do in sage:

1. Load in a 2-D NetCDF satellite data file and display it as a map projection. This should be really simple. I would usually just use GMT for this (a small shell script wrapping psbasemap, grdimage, and pscoast).

2. Load in a data series with dates and locations, and match this to corresponding satellite data in time and space. Normally I would use a perl script that I wrote many moons ago to do this. I would basically sort the data, then match a block of data at a time using GMT’s grdtrack function. I know that this is inefficient, and really I would like to be able to pull extra data in x,y, or t and take the mean or median value, which would be more CPU intensive, but better than matching just one point in space and time to the nearest pixel.

3. Load in a multivariate data series and do multivariate statistics (e.g. LME, GLM/GAM, RDA). This is where the R interface would come into play. Normally I would prepare the data elsewhere, then import the flat table into R and use the R functions. This may involve installing more packages (nlme, mgcv, etc).

4. Load in a 3-D set (x,y,t) of satellite data files and perform an EOF analysis on them (akin to SVD in Matlab). Normally I would do this in Matlab or Ferret. I’m just curious how easy it would be to do this here.

There are other things that I could do, but these are a few off the top of my head, and things that I am doing now, so it would be incentive to try Sage out with. For tonight, I’ll just work on #1, which should be really fast.

The data file I’m using is just a NetCDF file (created by GMT) which I can read with pupynere in python. Here I’m going to use the scipy.io.netcdf module (which is actually based on pupynere I believe).

sage: from scipy.io.netcdf import *
sage: from pylab import *

# Read in file metadata to object
sage: ncfile = netcdf_file(‘RS2006001_2006031_sst.grd’,’r’)

# get the variables in the data file
sage: ncfile.variables

{‘x’: <scipy.io.netcdf.netcdf_variable object at 0xb47b08c>,
‘y’: <scipy.io.netcdf.netcdf_variable object at 0xb47b16c>,
‘z’: <scipy.io.netcdf.netcdf_variable object at 0xb47b1ec>}

# Yank out data
sage: longitude = ncfile.variables[‘x’][:]
sage: latitude = ncfile.variables[‘y’][:]
sage: sst = ncfile.variables[‘z’][:]

# just plot sst to test 2D image plotting
sage: plot(sst)
[<matplotlib.AxesImage instance at 0xc03636c>]

Nice, but it’s upside down. Let’s flip it vertically.

sage: clf
sage: plot(flipud(sst))
[<matplotlib.AxesImage instance at 0xb86a2ac>]
sage: savefig('temp.png')


Easy, but I want to put this on a projection. Normally I would use the basemap tools which are an add on to matplotlib. I don’t see these installed, and I didn’t see them in the extra sage packages on line, so I downloaded them from SourceForge and installed them.

The first step you have to do is to install the geos package, just read the README in the geos folder and hit


and then we get our first epic fail. Something in the geos chain won’t compile, and I’m just about fried enough to call it quits for this evening.

At this point I’ve been playing with this for more than 2 hours, and I still have yet to make a simple map on a projection. There has to be something I’m missing, but at this point I’m going to pause until tomorrow. So not the best testing evening, but there are some positives so far. The bundling of most packages is a plus, and the ease of loading in NetCDF files is nice. Data displays well using the Pylab interface, even though I am still forced to save to a file at this point.

So immediate goals:

1. Get a backend working for viewing plots in widgets (akin to ipython -pylab)

2. Get the basemap tools installed so that I can make a map with a projection!

Technorati Tags:
, , , , , ,

In an earlier post I alluded to some thoughts that I had on proprietary vs free open source systems for dealing with scientific data. Somewhat out of time constraints and somewhat out of laziness, I had decided to pretty much just stick with the proprietary status quo system that I had in place, rather than commit to the time in learning something new. The double shot of my activation woes with Matlab with my recent realization of how much I had been reinventing the wheel by not using Ferret has given me a new perspective, and after a recent comment inviting me to try out Sage open source mathematical software, I figure that it was time to give it a try.

As Sage is “a free mathematics software system which combines the power of many existing open-source packages into a common Python-based interface”, that means that there are many libraries bundled together. These include ipython, numpy, scipy, R, RPy, and more. While I already had many of these installed on my Macbook, rather than look for a solution to cobble things together to save space, I just took the easy route and downloaded the .dmg file to keep everything together in one environment. I’m not sure if it’s possible (or even wise) to use existing components (probably not) but I figured that it would be more hassle than it was worth.

The .dmg file is fairly large (~256 MB), but smaller than the last Matlab release, which can put things in perspective. Download also only took about 6 minutes over wireless (>500 KB/sec). Installation was as simple as dragging the sage folder from the mounted .dmg file into my Applications folder. Expanded the folder was pretty hefty (~780 MB), which was larger than the last Matlab release (~680 MB), but not by much.

The next step in the installation instructed me to double click the sage icon (inside the Sage folder), and then change some preferences. When I double clicked it however a terminal fired up and I sage started to work

[user@localbox ~]$ /Applications/sage/sage ; exit;
| SAGE Version 3.0.2, Release Date: 2008-05-24 |
| Type notebook() for the GUI, and license() for information. |
The SAGE install tree may have moved.
Regenerating Python.pyo and .pyc files that hardcode the install PATH (please wait at most a few minutes)…
Please do not interrupt this.

Setting permissions of DOT_SAGE directory so only you can read and write it.

After this step I was presented with the sage prompt. Typing in

sage: notebook()

prompted me to create an admin password, and then opened the sage notebook GUI in my default web browser. I closed this down and then went back to the command prompt to start to play with the Tutorial.

I’m going to stop here for now, because it’s late, but I’ll post more after playing around with sage a bit. I’ve been through the tutorial a bit, and plotting works, yet on the surface there seems to be an inability to view plots “on the fly” with matplotlib, instead having to save and view with an external viewer. Just an extra step, but I as a it used to having graphics on the fly with ipython -pylab. As I said though, I just wanted to talk about installation. Once I give it a fair shake I’ll post up what I think so far.

Technorati Tags:
, , , ,

In my travels one of the things that I end up doing is to make seasonal climatologies out of 3 dimensional data sets (longitude, latitude, time). For example, I may have a series of monthly sea surface temperature images for the North Pacific from January 1950 through December 2007 and I want to get the spatial averages for each seasonal. In other words for each spatial point I want to average all the points in time for all January-March months (season 1), April-June months (season 2) etc.

What I would usually do is to do something kludgy like either write a shell script to cycle through all the months that I wanted to collapse, and then dump the data from the GMT NetCDF format into binary files and then run blockmean and regrid them in GMT’s NetCDF format. Sometimes if the data was reasonable in size I would just load them directly into Matlab and index the months that I wanted, subset the array and then do a nanmean(data,3).

Neither one of these methods is the easiest or most efficient way to do things, this I know. So with my newfound friend Ferret I figured that there must be a really simple, built-in way to make the climatologies from this XYT data, and there was.

Basically Ferret allows for a Modulo regridding to a built-in climatological time axis. What would have taken quite a bit of time in my previous attempts took me 10 seconds in Ferret.

! load example data set
SET DATA levitus_climatology
USE climatological_axes
LET sst_clim = SST[d=1,X=100:260,Y=10;60,GT=seasonal_reg@MOD]
CANCEL DATA climatological_axes
SHADE sst_clim[L=1] ! shade Quarter 1

This now provides an L=4 data set of seasonal climatologies centered in time on [15-Feb, 16-May, 16-Aug, 15-Nov].

So, while 10 years ago half the fun may have trying to be clever and code a faster solution than the last one that I had made, in my old age I tend to just want to get this portion of it done so that I can interpret the results. This is very nice indeed.

Technorati Tags:
, , ,

I really do like the Dock on OSX. My main problem however is that I tend to crowd a million things onto it. I do also use Quicksilver as a launcher, and that alleviates having to put everything in the Dock, but it would be nice to have a dock for different working states (e.g. internet, work, utilities, etc).

From this LifeHacker post I was turned on to Dock Spaces, which is a simple application that allows you to have up to 5 different docks which you can switch with a hot key combination. An added bonus is that in the preferences you can make these 2D or 3D docks, which addressed a mild annoyance that I have with the 3D dock (and was too lazy to change).

Basically I downloaded the app, fired it up, and set up 4 docks. One “default” which was the dock that I had, an “internet” dock with, well, internet apps, a “science” dock which has most of the scientific programs that I deal with on the laptop, and then a “utilities” dock which had the common utilities that I use. I also made the docks 2D which I think is a large improvement over the 3D one I had been staring at in Leopard.


The default key combination to switch Docks is the control key + a number key, which unfortunately was the same key combination used to switch my spaces. I didn’t see a way to change the key combo in Dock Spaces, so I switched the key combination in the Spaces settings to use the Command key + a number key to switch between spaces. Now ^0 brings up my default dock, ^1 my internet dock, etc.


At this point it’s more of a luxury application for me, but the simplicity of switching between 2D and 3D bundled with the ability to have multiple docks was worth it to me, and it’s a neat idea. It’s currently freeware with of course donations accepted.

Technorati Tags:

Next Page »