OK, another night, another trial. I must say, tonight was a lot more fun than the last couple of nights, because I really felt that I learned something, which is really the whole point of this exercise. So the example I was trying to code tonight is a simple EOF of a 3D data series. This is something that I just had to code up at work today, so it was a perfect chance for me to try out Sage. For work I ended up altering an existing m-file and running the EOFs in Matlab, but that’s OK, because now I know what I expect to see after running this in Sage.
The data names have been changed to protect the innocent.
# Load in required modules
sage: from scipy.io.netcdf import *
sage: from pylab import *
sage: from scipy.stats.stats import nanmean
sage: import datetime
#Load data from NetCDF file
sage: ncfile = netcdf_file('file.nc','r')
sage: varnames = ncfile.variables.keys()
sage: varnames
['LONGITUDE', 'TIME', 'LATITUDE', 'DATA']
#Now that I have the order I can load into arrays
sage: lon = ncfile.variables[varnames[0]][:]
sage: lat = ncfile.variables[varnames[2]][:]
sage: dates = ncfile.variables[varnames[1]][:]
sage: raw = ncfile.variables[varnames[3]][:,0:50,:] #I only want 50 records in Y
sage: data = raw.copy() #make a copy
sage: data.shape
(124, 50, 151)
sage: (ncycles, ny, nx) = data.shape
#deal with dates
sage: ncfile.variables[varnames[1]].attributes
{'axis': 'TIME',
'time_origin': '15-JAN-1901 00:00:00',
'units': 'HOURS since 1901-01-15 00:00:00'}
sage: off = datetime.datetime(1901,1,15,0,0,0)
sage: months = ones(ncycles)
sage: for i in range(0,ncycles):
....tdel = datetime.timedelta(days=dates[i]/24)
....td = off + tdel
....months[i] = td.month
sage: ind = where(raw<0)
sage: data[ind] = nan
And here was the first real bottleneck, as things just slowed to a crawl as python tried to find all the instances where the data was less than zero. This is something that is instantaneous in Matlab, and took over 30 seconds to go through 124*50*151 values. There must be a faster way to do this.
data2=data.copy()
#Take out monthly averages
sage: mclim = ones((50,151))
sage: for i in range(1,13):
....index = where(months==i)[0]
....mclim = nanmean(data[index,:,:])
....data2[index,:,:] = data[index,:,:] - mclim
data2.shape = (ncycles, nx*ny)
ltmean = nanmean(data2) #get mean of each time series
#take out long term mean
sage: anom = data2.copy()
sage: for i in range(0,ncycles):
....anom[i,:] = data2[i,:] - ltmean
sage: EOF = nan_to_num(anom) #push land back to zero
sage: [u,s,v] = linalg.svd(EOF)
sage: for i in range(0,ncycles):#build array so that we can project eigenvalues back onto timeseries
....s2[i,i] = s[i]
sage: amp = dot(s2.transpose(),u.transpose()) #get amplitude
sage: spatial = v[0:4,:]# pull out spatial fields
sage: ratios = pow(s,2)/sum(pow(s,2))*100 #get %variance explained for each mode
sage: temp = spatial[0,:]
sage: temp.shape = (ny,nx) #push back to original dims
sage: plot(amp)
sage: savefig('amplitude.png')
sage: imshow(flipud(temp))
sage: savefig('spatial.png')
Success!
I actually really felt positive about this whole example as I really learned a lot more. This also was probably too large of an array to test out (measure twice cut once!) but it’s what I was working with so I wanted a real world example. The more that I worked in sage the more comfortable I felt as well. The geographic projection issue is still there, as well as some indexing speed issues, but overall, I was really impressed with the Sage/SciPy/NumPy experience today. Overall I feel that more of a transition was made for me last night/today. Which was great timing as a co-worker actually called me and asked if I knew of any free replacements for Matlab…
Technorati Tags:
Computers, linux, matlab, programming, python, sage, scientific programming