R


Just an update on the Matlab situation. I recently did renew my maintenance fees, but dropped support for the optimization, mapping, and GARCH toolboxes as I never used this. The only thing I ever used the mapping toolbox for was calcultating distances, and that got busted at some point. The free m_map software is a far superior toolbox, and did I mention it was free?

Part of the decision was that all of the analysis for the last chapter of my dissertation was done in R. Some data mining was done with Python, oceanographic transects/profiles in Ocean Data View and maps in GMT, but all analysis was done in R. This was a revelation to me, mostly due to the change in not having to write an m-script full of loops. I know that some of this is due to my bad habits in Matlab (not using the vector operations and writing loops instead) but there are so many convenience functions in R that just do what I want, and I think that this reflects a change to a more statistically demanding anaylsis style that Matlab really just isn’t built for.

This isn’t to say that one is better than the other, but with the cost and difficulty in getting Matlab up and running on new machines, R is definitely more than a viable option, and a joy to work with.

Advertisements

For a while I’ve been looking for a way to add error bars in R. It’s actually not that trivial in some cases, and I think that I wrote my own m-script to do it in matlab for bar plots.

At any rate, Google is my friend, and I found a really good post detailing how to do this. In a nutshell, just grab the gplots package and look at the plotCI function. Works great. The only thing I had to tweak was that the uiw is how far from the mean value you want to go (say, S.E., and not mean + S.E.).

Nice.

At this point just a small update to say that since the activation fiasco, I have not used Matlab at all. Everything that I have done for the last four presentations has been done with R and Python, and I am both the happier and wiser for it.

I have been playing with JMP a bit, but honestly, it’s a bit too “high level” and while it’s neat for data exploration, it really made me nuts that two weeks after I got a license, they were *offering* me a special deal on the impending upgrade. Nothing like dropping a bundle on instantly outdated software. Way to go guys.

So for me, it’s been a pleasure to use R, Python (with iPython of course), and ferret on my Mac Pro. If I ever take a break from playing on the computer I’ll post up what I installed on the new Mac in terms of scientific software.

This was quite possibly the worst idea for title naming that I could have thought of. Anyway, I played around a bit more tonight, and I thought that I would give an update to the three people that are waiting with bated breath.

Anywho, I decided to continue trying to map the data from the netcdf file onto a projection, and here’s what I ran into.

It looks like the basemap module is installed (as basemap) but that it depends on matplotlib > 0.98 and 0.91 is installed. I tried to be tricky and move my locally installed matplotlib over to the sage/local/lib/python2.5/site-packages directory but then that version of matplotlib needed a newer version of numpy than what was installed. At this point I tried

hostname $> sage -upgrade

to see if updated packages/modules were available. This started a huge chain reaction of downloads and source compiling to get to the latest, greatest versions. This process took exactly 59m10.482s to complete (I know because it told me!).

But once again, I get this error:

sage: from basemap import basemap

ImportError: your matplotlib is too old – basemap requires version 0.98 or higher, you have version 0.91.1

At this point though, it’s not working on either the linux or OSX platforms due to outdated dependencies, so either I need to find another way to plot mapped projections or use something else.

Again, this isn’t a knock against Sage, because I really don’t think that is an ideal test for this software. But honestly, a lot of why I went for this approach was to avoid having to use separate approaches for data manipulation and visualization, and this would be a common task. Matlab’s mapping toolbox is useless to me for plotting, so I end up using m_map, which is still not as good as GMT, but it gets the job done in house.

My main thoughts at this point are that it seems easy to get into dependency hell here, as one module upgrade can force another, and so on. At this point it’s another block of time spent on setup, and no result. Time to stop for the time being.

Technorati Tags:
, , , , , , ,

Part 1 of the sage experience was just installing the software. This was incredibly easy on both OSX and linux (CentOS 5.2 and Fedora 9). For the Fedora 9 install I just downloaded the latest version of Sage which was compiled for Fedora 8, and this seemed to be just fine.

So for me, I really just wanted to be able to do a few different examples which would be close to “real world applications” for me.

Some things that I would like to be able to do in sage:

1. Load in a 2-D NetCDF satellite data file and display it as a map projection. This should be really simple. I would usually just use GMT for this (a small shell script wrapping psbasemap, grdimage, and pscoast).

2. Load in a data series with dates and locations, and match this to corresponding satellite data in time and space. Normally I would use a perl script that I wrote many moons ago to do this. I would basically sort the data, then match a block of data at a time using GMT’s grdtrack function. I know that this is inefficient, and really I would like to be able to pull extra data in x,y, or t and take the mean or median value, which would be more CPU intensive, but better than matching just one point in space and time to the nearest pixel.

3. Load in a multivariate data series and do multivariate statistics (e.g. LME, GLM/GAM, RDA). This is where the R interface would come into play. Normally I would prepare the data elsewhere, then import the flat table into R and use the R functions. This may involve installing more packages (nlme, mgcv, etc).

4. Load in a 3-D set (x,y,t) of satellite data files and perform an EOF analysis on them (akin to SVD in Matlab). Normally I would do this in Matlab or Ferret. I’m just curious how easy it would be to do this here.

There are other things that I could do, but these are a few off the top of my head, and things that I am doing now, so it would be incentive to try Sage out with. For tonight, I’ll just work on #1, which should be really fast.

The data file I’m using is just a NetCDF file (created by GMT) which I can read with pupynere in python. Here I’m going to use the scipy.io.netcdf module (which is actually based on pupynere I believe).

sage: from scipy.io.netcdf import *
sage: from pylab import *

# Read in file metadata to object
sage: ncfile = netcdf_file(‘RS2006001_2006031_sst.grd’,’r’)

# get the variables in the data file
sage: ncfile.variables

{‘x’: <scipy.io.netcdf.netcdf_variable object at 0xb47b08c>,
‘y’: <scipy.io.netcdf.netcdf_variable object at 0xb47b16c>,
‘z’: <scipy.io.netcdf.netcdf_variable object at 0xb47b1ec>}

# Yank out data
sage: longitude = ncfile.variables[‘x’][:]
sage: latitude = ncfile.variables[‘y’][:]
sage: sst = ncfile.variables[‘z’][:]

# just plot sst to test 2D image plotting
sage: plot(sst)
[<matplotlib.AxesImage instance at 0xc03636c>]

Nice, but it’s upside down. Let’s flip it vertically.


sage: clf
sage: plot(flipud(sst))
[<matplotlib.AxesImage instance at 0xb86a2ac>]
sage: savefig('temp.png')

RStest

Easy, but I want to put this on a projection. Normally I would use the basemap tools which are an add on to matplotlib. I don’t see these installed, and I didn’t see them in the extra sage packages on line, so I downloaded them from SourceForge and installed them.

The first step you have to do is to install the geos package, just read the README in the geos folder and hit

./configure
make

and then we get our first epic fail. Something in the geos chain won’t compile, and I’m just about fried enough to call it quits for this evening.

At this point I’ve been playing with this for more than 2 hours, and I still have yet to make a simple map on a projection. There has to be something I’m missing, but at this point I’m going to pause until tomorrow. So not the best testing evening, but there are some positives so far. The bundling of most packages is a plus, and the ease of loading in NetCDF files is nice. Data displays well using the Pylab interface, even though I am still forced to save to a file at this point.

So immediate goals:

1. Get a backend working for viewing plots in widgets (akin to ipython -pylab)

2. Get the basemap tools installed so that I can make a map with a projection!

Technorati Tags:
, , , , , ,

I’ve been a Matlab user for 15 years, and over that time period I’ve of course become fairly dependent on it to get things done quickly. The downside? It’s expensive. It’s a pretty penny to buy the base package, toolboxes are extra, and there are recurrent “maintenance” costs each year to get upgrades.

Sure, that’s standard practice, but each year I have to stand up and justify to my boss why we need to pay these costs for our multiple Matlab users in our shop (a multi-user concurrent license is out of the question, don’t even ask). So what’s a user to do?

For years we’ve just bit the bullet and paid the fee, but with options such as R and Numpy/SciPy out there it may be time to loosen the chain a bit. Or maybe not.

A couple of possible alternatives to Matlab and their respective pros and cons:

R

R is a really nice statistical environment which has pretty much become the industry standard, replacing the very expensive S-Plus. It’s easy to install, has an excellent GUI on OS X, and has a ton of community released packages which are usually made during the preparation of scientific papers. There are some downsides, as there can be multiple (sometimes possibly conflicting) packages (e.g. gam vs mgcv) but choice is good, right? The cons for me are that it’s a new language to learn, and even though I write an m-script for everything, I find the scripting in R a bit clunky, even writing in TextWrangler and then hitting CTRL-R to have the SendToR script source the code for me. It’s just something new, and while the built in functions are really nice, the learning curve for coding things is higher, and will it be faster in the long run than just using Matlab?

Numpy/SciPy

The Numpy/SciPy combo in Python is a viable alternative to Matlab, even having a page dedicated to showing you how easy it is to transition from Matlab. As with R, it’s free, and there are a ton of functions available, but there is a downside for me. I’ve successfully installed it on CentOS 5.1 and OS X 10.5, but it was a bit complicated. I know that these are packaged in many distributions, but not in CentOS, and I had to install from either source or .egg files, which isn’t all that tough, but took some time. I’m not writing the 24.3 steps I did to get it installed because honestly, I didn’t write it down and I don’t remember what I did. Next time I promise to list it out! On OS X I did it all through MacPorts on the MP version of python 2.5. Again, it took some massaging to get it all set up since I was using the non-default install of python.

Overall though, the reason for this little diatribe is that while there are alternatives to Matlab, they all involve learning new ways to do things which, after I successfully learn them, may not be faster than just doing it in Matlab. Most of the time I just need to get things done, and the $7/day cost of Matlab may be well worth it if I’m saving more than 10 minutes of time during that day (assuming for a minute that I am earning $42/hour).

I’m rambling a bit here, but these are just questions that I ask myself as I code things up at the desk. For each of these tools has their place, and in terms of maximum comfort and speed, I use each of them for their strengths. The main dilemma is that in a perfect situation I would drop the commercial Matlab for the free/open source alternatives, but at a minimal cost in dollars and time.

Technorati Tags:
, , , , ,

As I alluded to in a previous post, the main reason for getting WinBUGS to run on the Mac was to be able to run WinBUGS through R using the R2WinBUGS R package.

Once I got DarWine up and running it was really only a matter of fixing some variables in the R2WinBUGS call of my model.

While written in Japanese, this guide had enough information to get me started down the path (no pun intended).

One of the main things that you have to do is to define WINEPATH and WINE in the call, which are of course buried a bit in OS X

bug.out <- bugs(…,
useWINE=TRUE,
WINEPATH = “/Applications/Darwine/Wine.bundle/Contents/bin/winepath”,
WINE = “/Applications/Darwine/Wine.bundle/Contents/bin/wine”…)

It works, but as of now it ain’t pretty. I also am not seeing faster model runs than when I use a virtual machine in Parallels, so I may need to test this further.

Technorati Tags:
, ,

Next Page »