ruby, Feynman diagrams, lambdas

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

sdedeo
Posts: 36
Joined: Tue Apr 08, 2008 10:52 pm UTC
Contact:

ruby, Feynman diagrams, lambdas

I am re-writing some code of mine, that computes Feynman(-like) diagrams on a graph. (It is a linked cluster expansion, if you are curious, which is sort of like a Feynman diagram but where the background system has no coupling between points, and the points themselves are discrete -- the underlying system is a crystal, say. Questions on physics quite welcome!)

I learned ruby last year, and I am curious to see if I can actually do the things ruby is meant to help us do. I have working code that does what I describe below, but it is very slow, and I would love to learn from some ruby (or "FP") gurus how to take my relationship with execution time "to the next level."

In the end, it seems to be a question of how to "pre-compile" a lambda?

Here is the essential problem. You have a large graph, V, which you can represent as a (symmetric) matrix, V[i,j], telling you which points are connected to each other. It might be, for example, a 10^3 lattice. You also have a (much smaller) "diagram", call it D, which you can also represent as a (symmetric) matrix, D[a,b]. A diagram might be, for example, a three-vertex loop:

Code: Select all

`[ 0 1 1   1 0 1   1 1 0 ]`

To "apply" the diagram D to the graph V means to sum up a product of V[i,j] elements in a particular fashion. In the case of the loop above, you sum:

Code: Select all

`V[i,j]*V[j,k]*V[k,i]`

over all values of i, j and k. In other words, D gives the patterns of the indicies in the sum.

I thought a long time about how to do this in the general case. Here was my solution.

First, we need a good way to sum over an arbitrary number of indicies (the dimension of the diagram D); you want to be able to pass a block to the center of all the loops. I built the following (m_inner is the dimension of the graph V):

Code: Select all

`  def all_sum(n_left,m_inner,block,running=[])    # sums from 0 to m_inner-1, n_left times (pass n, at the top level,     # where n is the number of verticies in the diagram D)    # (returns an *object* -- a lambda -- that can be executed by call)    # at the center of the loop, passes the call structure through (i.e.,     # on the 3rd iteration of the first loop, 2nd iteration of the second, the block    # is passed [2,1] )    if n_left == 1 then # (if there's only one sum to do, do it)      lambda {         counter = 0        m_inner.times { |i|          counter += block.call(running+[i])        }        counter      }    else # (do the lower sum)      lambda {         counter = 0        m_inner.times { |i|          counter += all_sum(n_left-1,m_inner,block,running+[i]).call        }         counter      }    end  end`

Then, of course, you need to define the block at the center. That's just the product defined by the diagram D. Here's the block I pass:

Code: Select all

`    inner_summand = lambda { |v_index|      running_prod=1.0            (@n-1).times { |i|        (i+1).upto(@n-1) { |j|                    running_prod *= v.net[v_index[i],v_index[j]]**(@g[i,j])                  }      }            running_prod    }`

Where @n is the dimension of the diagram D (e.g., in the three-loop above, it's 3), @g is the matrix of the diagram D, and v.net is the matrix associated with the graph V.

If all that makes sense, here is the issue. This is extremely slow. Is there an obvious way to speed it up? It seems like my inner block is being executed over and over again, but there should be a way to speed it up?

sdedeo
Posts: 36
Joined: Tue Apr 08, 2008 10:52 pm UTC
Contact:

instance_eval

I poked around a little more, trying to see how to do the thing I thought I knew how to do in LISP -- just make the program write a program already! So you can pass a string to instance_eval, and it allows your code to write code.

This speeds things up until it is as fast as if you had written the code itself. A time test of the first thing I tried (in the post above), then of using instance-eval to write the inner_summand and passing that as a block to some clever lambdas, then of just having instance-eval write the whole thing, then of just cutting and pasting the particular function.

This is for a 5x5x5 cube (with sort of a torus topology), with a three-vertex loop.

standard apply: 34 seconds
with instance-eval function of inner_summand: 21 seconds
with instance-eval function of loops and inner_sumand: 13 seconds
just typing out the function for the particular graph in question: 13 seconds

The lesson is: if you want speed, try to do an instance-eval instead of being clever with lambda.

Code: Select all

`  def apply_faster(v)    # does the index summations corresponding to the graph in question, over the J_ij matrix v.        # define the summations themselves; we'll have 25 spare indicies lying around to use    dummy_index="abcdefghijklmnopqrstuwxyz"        front_sum = "counter = 0\n"    @n.times { |i|      front_sum += "#{v.n}.times { |#{dummy_index[i,1]}| \n"    }    front_sum += "counter += "    vertex_product = ""    (@n-1).times { |i|      (i+1).upto(@n-1) { |j|                vertex_product += "v.net[#{dummy_index[i,1]},#{dummy_index[j,1]}]"        if @g[i,j] > 1 then          vertex_product += "**#{@g[i,j]}*"        else          vertex_product += "*"        end              }    }    vertex_product.chop!        front_sum += "#{vertex_product}\n"        @n.times { |i|      front_sum += "}\n"    }    front_sum += "counter\n"        instance_eval %{ def all_sum_fast(v)        #{front_sum}      end    }        print "#{front_sum}\n"        all_sum_fast(v)  end`

sdedeo
Posts: 36
Joined: Tue Apr 08, 2008 10:52 pm UTC
Contact:

Re: ruby, Feynman diagrams, lambdas

And, the final code twerk -- the most significant of all. I ran the profiler, and noticed a lot of Kernel#kind_of? calls, as well as some calls to GSL. I converted all of the objects in the function to NArray objects (from GSL ones.) Still, many calls that seemed to reference the GSL libraries. Somehow the duck typing of ruby was worried I would pass GSL objects in.

I re-wrote the rest of the code to avoid GSL references, and now the entire computation runs in 3 seconds.

So, a final question would be: is it possible to "unload" a library for a while, or at least to make ruby pretend it does not exist? I would like to use GSL later, but it creates overhead even when the functionality of GSL is not needed.

Beldraen
Posts: 1
Joined: Thu Jul 30, 2009 2:10 pm UTC

Re: ruby, Feynman diagrams, lambdas

The issue you are finding is due to initially writing to the wrong domain of the solution. Traditional coding is about maintainable code--stuff that is readable; however, you're stating specifically that the domain you want is speed. Those are two different techniques.

Ruby is an interpretive, immutable, garbage collected language. All calls, methods, arrays will have a performance hit.
• The point of lambdas is flexibility of tying code dynamically at the cost of all the lookups necessary to find and execute.
• In mutliple dimension arrays, the variable and subscripts have to be interpreted and resolved. Notably, these are dynamic constructs so internally they are linked lists of linked lists. They are not a flat memory allocation with a simple offset lookup.
• The repeated use of temporary obects (especially strings) flood the garbage collector. Strings are immutable. If you really want to create one long big string from a bunch of short strings, throw the shorts ones into an array and join them at them end. Array.join is optimized to do this by creating one big string long enough for it all and copies into place.

So, what is found here is that coding for a specific domain can require specific knowledge of the development environment's design. If you want real speed, in this case you can really take advantage of Ruby's ability to build code. Your final algorythm is deterministic. You know exactly every input variable and the final matrix has a specify definition, so use meta code to def a method in a class that does exactly what you want. The code would generate something like this to be evalutated:

Code: Select all

`Class MyFastMatrix  def Do_Matrix_D_3_by_3_V_10_by_10(vec_d, vec_v)    # Loop through and grab all references into local variables so we never have to look them up again    d_0_by_0 = vec_d[0,0]    d_0_by_1 = vec_d[0,1]    .    .    .    v_9_by_9 = vec_v[9,9]    # Output specific code for each result dynamically into result local variables    r_0_by_0 = d_0_by_0 + d_0_by_1 ... blah, blah, blah    .    .    .    # Assemble result into array and return it   return [ [r_0_by_0], ...end`

All distant lookups are done once, nothing goes into garbage handling until its over, each calculation relies on the simplest lookup, there are no loops, and the routine, once created, can be used over since it's now actually in the class definition. This is trading coding space for speed.

thoughtfully
Posts: 2253
Joined: Thu Nov 01, 2007 12:25 am UTC
Location: Minneapolis, MN
Contact:

Re: ruby, Feynman diagrams, lambdas

This is exactly the kind of thing Numerical Python is intended to help you with. It offloads the math to C, and is very fast. Read the bit in the manual about ufuncs.

Sorry it isn't Ruby

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
-- Antoine de Saint-Exupery

sdedeo
Posts: 36
Joined: Tue Apr 08, 2008 10:52 pm UTC
Contact:

Re: ruby, Feynman diagrams, lambdas

For those who are interested, the paper that relies (in part) on some of these computations is now out.

For those with University subscriptions, it can be found at http://rsif.royalsocietypublishing.org/content/early/2012/02/28/rsif.2011.0840.abstract, while the arXiv (open access) version is at http://arxiv.org/abs/1109.2648.

We had a lot of fun working on this, and it was interesting to try to use ruby to accomplish a task that had, many years ago, been done in restricted cases by hand; an example is http://www.springerlink.com/content/n664krj72h632354/. It took perhaps two weeks to debug the code, and what was perhaps most amazing was to find, at the end, that the group in 1963 had, indeed, done the calculations without error.

Sagekilla
Posts: 382
Joined: Fri Aug 21, 2009 1:02 am UTC
Location: Long Island, NY

Re: ruby, Feynman diagrams, lambdas

For the problem you described above, summing V[i, j] V[j, k] V[k, i],
this is actually equivalent to perform Tr(V * V * V)

Where * is the normal matrix-matrix product. On Mathematica, this is
something like 2300 times faster to perform. It may be faster in numerical
libraries like NumPy as well.

Just a thought.
http://en.wikipedia.org/wiki/DSV_Alvin#Sinking wrote:Researchers found a cheese sandwich which exhibited no visible signs of decomposition, and was in fact eaten.

sdedeo
Posts: 36
Joined: Tue Apr 08, 2008 10:52 pm UTC
Contact:

Re: ruby, Feynman diagrams, lambdas

In that case, yes. In other cases, the summations become more complicated -- e.g., a two-loop graph might be

V[i,j] V[j,k] V[k,p] V[p, i] V[k,i]

I seem to remember that graphs like these, with multiple loops, could not be transformed into "ordinary" matrix multiplication problems (but would be curious to hear if you or others had interesting suggestions -- it would indeed allow one to piggy-back on the standard, more parallelizable algorithms.)

Sagekilla
Posts: 382
Joined: Fri Aug 21, 2009 1:02 am UTC
Location: Long Island, NY

Re: ruby, Feynman diagrams, lambdas

Do you have any useful symmetries in your matrix? Sometimes those help.
It may also help to store the transposed form of your matrix.

For the sum:

V[i, j] V[j, k] V[k, p] V[p, i] V[k, i]

Instead of doing it (perhaps) as:

Code: Select all

`for (i ... )  for (j ...)    for (k ...)      for (p ...)        sum <- V[i, j] V[j, k] V[k, p] V[p, i] V[k, i]`

You can do:

Code: Select all

`for (i ... )  for (j ...)    for (k ...)      for (p ...)        sum <- V[i, j] V[j, k] V[k, p] Vt[i, p] Vt[i, k]`

Where Vt refers to the transpose of V.

More generally, it would be useful to use the transposed arrays where you're going with
the stride of the array rather than against it. No matter how you slice it, it's a N^k problem
for a k-product summation. But if you can utilize the ordering of the matrix to your advantage
then you'll get the benefit of spatial locality.
http://en.wikipedia.org/wiki/DSV_Alvin#Sinking wrote:Researchers found a cheese sandwich which exhibited no visible signs of decomposition, and was in fact eaten.