Wednesday, May 7, 2008

Ternary plotting in Python, take 2

moved to here

It's generally a pain to make ternary plots with computers. There are a bunch of options but I'm not a fan of any of them. Lately I have taken up Python as a general all-purpose procrastinator, and today & yesterday I made up a little script to generate ternary plots from comma-delimited data, using matplotlib. It's pretty simple to do with the cartesian transformation on wikipedia. Below is an example plot of a granulite mineral assemblage using the numbers from a prac I did last year. Here is a sample data file:

Mineral name, FeO, MgO, Al2O3
Spr, 0.15, 0.3125, 0.5375, k,
Opx, 0.222, 0.69, 0.0864, r.
, 0.041, 0.46, 0.5, go
Spl, 0.35, 0.14, 0.51, b^
Sil, 0 , 0, 1, ys

The numbers in each row should obviously sum to 1 (negative co-ordinates work fine, I think), but at the moment the script only uses the last two in each row to calculate the plot. The first item in each row is an optional label plotted beside the point. The last item in each row specifies the style of the point. Briefly, the first letter is the colour (k is black, r red, g green etc) and the second character is the style of plot (, is a pixel, . point, o circle etc). See this link for the whole list.

The first row gives the axis labels. The first column is plotted with 100% at the bottom left of the triangle, the second is bottom right, third is at the top.

It works with CSV files saved from Excel. You need Python 2.5 and matplotlib and NumPy - for Mac users download them here
(OS X comes with Python already though you may need to upgrade it, not sure).

To download the script, copy the text at the end of this post into a text editor then save as 'ternary.py'. Your data file should be in the same directory. To run, type

python ternary.py datafile.csv

Here is the plot for the sample data above:



By default the script saves this to a file 'ternaryplot.png' in the same directory.

I make no great claim for its practicality or buglessness, it was just an interesting diversion for me. Enjoy.

""" ternary.py - Ternary plotting

This is a Python script which uses matplotlib and NumPy to
make simple ternary plots. It takes comma-delimited (CSV)
files as input. See http://tinyurl.com/6bnnxh for details
on how to use.
"""

from pylab import *
import re, sys
data = []
prev_match = 0

filename = sys.argv[1]
file_handle = open(filename, 'r+')

# Read from CSV file (can handle both \r and \n line returns)
# Need to read the entire file into memory - this is a big
# weakness at the moment
start = file_handle.tell()
sequence = file_handle.read()
p = re.compile(r'[\r\n]')
i = p.finditer(sequence)
for m in i:
line = sequence[prev_match : m.end()]
line = line.replace(' ','')
line = line.replace('\t','')
print 'line', line
item_list = line.split(',')
print 'item_list', item_list
# Need to handle the possibility of a comma being in the
# formatting string
if len(item_list) == 6 :
tmp_str = item_list[4] + ','
print 'tmp_str', tmp_str
item_list[-1:] = []
item_list[-1] = tmp_str
print 'item_list (2)', item_list
for index, item in enumerate(item_list) :
item_list[index] = item.replace('\r', '')
prev_match = m.end()
data.append(item_list)
print 'Axes:', data[0][1:4]

# Read the final row, might not be terminated by a line return
line = sequence[prev_match:]
line = line.replace(' ','')
line = line.replace('\t','')
item_list = line.split(',')
data.append(item_list)

# Close CSV file
file_handle.close()

# The data variable holds what is read from the CSV file
first_data_row = data[1]
first_data_row_floats = [0,0,0]
for index,item in enumerate(first_data_row):
# Ignore first column, for they are data labels
if index==0:
continue
# Ignore last column, they are formatting options
if index==4:
break
first_data_row_floats[index-1] = float(first_data_row[index])
NP_pre = array(first_data_row_floats)
# NP_pre is a NumPy array for holding the data before the
# co-ordinate transformation

for data_row in data[2:]:
data_row_floats = [0,0,0]
for index, item in enumerate(data_row):
# Ignore first column, for they are data labels
if index== 0 :
continue
# Ignore last column, they are formatting options
if index== 4:
break
data_row_floats[index-1] = float(data_row[index])
NP_pre = vstack((NP_pre, data_row_floats))
print 'Data to plot:\n', NP_pre

temp = NP_pre[0,1:3].copy()
NP_post = temp.copy()
NP_post = array([(temp[0] + (temp[1] / 2)), (temp[1] * 0.866)])
# NP_post is the NumPy array for holding the data after the
# co-ordinate transformation is applied next:

for index in range(1, len(NP_pre[:,0])):
temp = [NP_pre[index,1], NP_pre[index,2]]
NP_post = vstack((NP_post, [temp[0] + (temp[1] /2), temp[1] *0.866]))

# row_labels holds the label for each row (the co-ordinates of which
# are now in NP_post)
row_labels = [ data[1][0] ]
row_formatting = [ data[1][4] ]
for each_row in data[2:]:
row_labels.append(each_row[0])
row_formatting.append(each_row[4])
print 'Row labels:', row_labels
print 'Row formatting:', row_formatting

fig = figure()
ax = fig.add_subplot(111)

ax.axis('equal') # Ensure the triangle is equilateral

# Plot data points
for index in range(0, len(NP_post[:,0]) ):
ax.plot([NP_post[index,0]], [NP_post[index,1]], row_formatting[index])

# Plot triangle plot boundaries
ax.plot([1, 0.5], [0, 0.866], 'k-')
ax.plot([0, 0.5], [0, 0.866], 'k-')
ax.plot([0, 1], [0, 0], 'k-')

# Plot invisible points outside triangle to ensure bottom
# boundary is drawn (weird I know, couldn't figure out a more elegant
# way
ax.plot([0.5, 1.01, 0.1], [-0.01, 0.5, 0.88], 'w.')

# Plot the data labels
for i, label_text in enumerate(row_labels):
ax.text(NP_post[i, 0] + 0.02, NP_post[i, 1] + 0.01, label_text)

# Plot the axis labels
ax.text(-0.05, -0.05, data[0][1], fontweight='bold')
ax.text(1.02, -0.05, data[0][2], fontweight='bold')
ax.text(0.45, 0.93, data[0][3], fontweight='bold')

# Make absicca and ordinate axes invisible
ax.axis('off')

# Change this to whatever PNG file you want produced
fig.savefig('ternaryplot')

Tuesday, April 29, 2008

Greetings

I might post things here from time to time. We'll see.