MathBio Senior Needs to Learn Programming

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Larson, Moderators General, Prelates

MathBio Senior Needs to Learn Programming

Postby bananaKAY » Sat Apr 14, 2012 1:52 pm UTC

Hi~

I'm a senior in mathematical biology and this semester I started learning how to model biological systems using Matlab. I've never taken a formal course for Matlab or any programming class really and so I need to become well-versed in computer science due to my future graduate studies in Computational Biology. Or at least proficient enough to not look like an idiot. :)

So, my dear comp sci colleagues, can you tell me which would be useful courses for a future computational biologist? I dislike data analysis so any suggestions of R or S will be overlooked. :P (I have to learn that during this fall semester in my Biostats class anyway...)

Also, I might be participating in a modeling internship over this summer. So any online crash courses in modeling/programming would be useful as well!

thanksthanksthanks,
kay
User avatar
bananaKAY
 
Posts: 7
Joined: Sat Apr 14, 2012 1:35 pm UTC
Location: TX

Re: MathBio Senior Needs to Learn Programming

Postby Meem1029 » Sun Apr 15, 2012 4:35 am UTC

I would recommend learning whatever it is that you're going to be using. If MATLAB is the standard in your field, then you should learn that.

If you are interested in learning a different language for programming in general, I generally recommend python as a good language for beginners. There are a great deal of tutorials for it on the web and I basically picked it up by using it after knowing other languages, so I can't point you to any in particular.
cjmcjmcjmcjm wrote:If it can't be done in an 80x24 terminal, it's not worth doing
Meem1029
 
Posts: 377
Joined: Wed Jul 21, 2010 1:11 am UTC

Re: MathBio Senior Needs to Learn Programming

Postby bananaKAY » Sun Apr 15, 2012 7:15 am UTC

Thank you for your response. I contacted this one graduate program faculty member and she told me that I should take a "real" programming class, such as C or C++, and know at least Java or HTML. I'm good with HTML, but Java and the C things... Yeah I'm not sure about those. My mathbio prof was telling me that I don't know enough Matlab just yet and a compsci friend of mine said I should learn Python! Sooo there's a lot of options! Which do you think is most useful? Meaning not only in my field of mathematical biology, but it's versatile in all fields?
User avatar
bananaKAY
 
Posts: 7
Joined: Sat Apr 14, 2012 1:35 pm UTC
Location: TX

Re: MathBio Senior Needs to Learn Programming

Postby bananaKAY » Sun Apr 15, 2012 8:12 am UTC

Oops, totally didn't see your suggestion of Python. So I guess let me change my question, if you had to choose between C, C++, Java, and Python in a single elimination death match, who would survive based on their versatility in all applications of data-driven research? Or are they similar somewhat? Think of me as a little child when discussing these programming languages because I'm totally bio/math/chem'd out and not really familiar with each of them and their purposes.
User avatar
bananaKAY
 
Posts: 7
Joined: Sat Apr 14, 2012 1:35 pm UTC
Location: TX

Re: MathBio Senior Needs to Learn Programming

Postby Meem1029 » Sun Apr 15, 2012 8:47 am UTC

For one, HTML is something completely different as it's a markup language rather than a programming language, but you seem to know that.

I can't think of any cases where you would need to use c instead of c++, as the two are very similar. C++ is basically C with classes added. If you are going to be working with large datasets, C or C++ would be the fastest to run, but also take longer to write the program (and learn). Java is a language that is inspired by C++ and has similar syntax, but has managed memory which takes care of a lot of things that you have to worry about, at the cost of speed. Python is a language that is designed to be programmer friendly, but as a result runs slower.

Speedwise, C/C++ > Java > Python, although from googling it looks like if you write python write and use certain libraries that are implemented natively python can be faster than java.
Ease of programming is basically the opposite order.

For actually working with the data, C++ would be your best bet and is the most commonly used (of general purpose languages at least). Also, the class at my Uni for non computer science science and engineering students is taught in C++ and my friends seem to be getting the hang of it just fine.

Do you know what graduate schools you are planning on for next year and who the advisor would be? The best advice I can give is to learn the language that they are using in research (see if it's mentioned in papers at all) so that you can be ready to go when you get there.

Disclaimer: I am an undergrad math/computer science major. I have no experience from the actual research side of things, only from the computer science part. I have spent far too much time here and various other places reading about programming stuff and this seems to be what I see recommended most.
cjmcjmcjmcjm wrote:If it can't be done in an 80x24 terminal, it's not worth doing
Meem1029
 
Posts: 377
Joined: Wed Jul 21, 2010 1:11 am UTC

Re: MathBio Senior Needs to Learn Programming

Postby bananaKAY » Sun Apr 15, 2012 9:51 am UTC

Thank you very much for this useful information! It is actually very nice to hear the point of view of a compsci undergrad, rather than a PhD. Haha, no offense to those oldies, but yeah.

So I am aiming for the Computational Biology graduate program at MIT and have started speaking with a professor that is the head of the program. I have definitely begun my search of her published research so I will check out what type of programming she uses. She actually has a A.B., S.M., and Ph.D. in CompSci. (http://people.csail.mit.edu/bab/) And she suggested to me to take a "real" programming class since I am strong in everything but compsci. However, my up-and-coming super senior year is booked with lots of classes that I am not willing to give up (Physical Chem, Analytical Chem, Stochastics, PDEs, Biostats this Fall). My plan is to start learning one of those languages this summer, especially if I get the internship I will be interviewing for.
User avatar
bananaKAY
 
Posts: 7
Joined: Sat Apr 14, 2012 1:35 pm UTC
Location: TX

Re: MathBio Senior Needs to Learn Programming

Postby WarDaft » Sun Apr 15, 2012 6:14 pm UTC

You definitely want to start with a high level language. Once you are really thinking in algorithms, the details of a language become much less important, and you can pick up a new language fully in about 6 months if you really focus on it. But when you're first learning, the particulars of a language can make the whole process much worse. I heard my university proff stating that they much prefer it now that the university has switched to teaching Scheme as a first language in place of C. With C, only the most advanced introductory courses got beyond defining slightly complicated data structures, where "slightly complicated" is a linked list; but with Scheme the final assignment (for the standard intro course, not advanced, it assumed almost nothing in the way of prior programming experience) was to use graph search to solve the n-queens problem... which is basically just a heuristic away from A*.

This is not to say you should learn Scheme in particular, any high level language is probably a good start.
All Shadow priest spells that deal Fire damage now appear green.
Big freaky cereal boxes of death.
User avatar
WarDaft
 
Posts: 1538
Joined: Thu Jul 30, 2009 3:16 pm UTC

Re: MathBio Senior Needs to Learn Programming

Postby D-503 » Sun Apr 15, 2012 11:52 pm UTC

I would also recommend Python. It's much easier than java or c to learn the basics of and it's pretty popular. If you need to do anything computationally expensive you can compile to c code with Cython or use MRJob for MapReduce stuff.
Also take a look at Sage. It's essentially Python with a really nice web interface and a bunch of math libraries.
D-503
 
Posts: 25
Joined: Sun Apr 15, 2012 11:35 pm UTC

Re: MathBio Senior Needs to Learn Programming

Postby troyp » Mon Apr 16, 2012 8:56 am UTC

Also note that Python is very popular for scientific programming and data analysis. It has lots of relevant libraries, starting with NumPy and SciPy. Python/NumPy is actually a common alternative to Matlab, and lots of people seem to migrating to the Python side. The libraries are written in C to be fast and Python is a very readable and writable language.

I think it would be an excellent choice for your purposes.
troyp
 
Posts: 398
Joined: Thu May 22, 2008 9:20 pm UTC
Location: Lismore, NSW

Re: MathBio Senior Needs to Learn Programming

Postby Zamfir » Mon Apr 16, 2012 2:32 pm UTC

I would advice against numpy/scipy, for a complete beginner. It can require some careful installation, especially if you're not on linux. And in the end, it's an add-on to Python to get Matlab-type functionality, making it somewhat less elegant than either normal python, or matlab.

It's perfect if you're used to normal python but have become limited by speed, or if you're used to matlab and are limited by its quality as programming language. Preferably both, really. But you need to have run into the limits of either, to appreciate what it is doing.

In fact, learning python and matlab is a good combination. Matlab is used a lot, and i's integrated enivronment is as yet unbeaten if you are dealing with numbers and graphs. But at the same time, it's prgramming approach has flaws that easily make you a Very Bad Programmer. It encourages you to use global variables and spaghetti coding, creating functions is just enough of a hassle that you will not chunk your code enough, its data structure arrays can feel weird. Also, many other Matlab programmers are Very Bad Programmers, so you'll easily pick up bad habits from others.

Learning Python (or some other more programmer-oriented language) will help you learn to organize your code better, so when you return to matlab you can recognize the pitfalls.
User avatar
Zamfir
 
Posts: 5746
Joined: Wed Aug 27, 2008 2:43 pm UTC
Location: Nederland

Re: MathBio Senior Needs to Learn Programming

Postby starslayer » Mon Apr 16, 2012 6:29 pm UTC

This is from my perspective as an astronomy graduate student, so it might not be as applicable to biology:

I would suggest learning Python to start with, for the reasons mentioned. However, you'll probably come up against a speed barrier at some point, and at that point you'll need/want learn to learn C/C++ or Fortran. When you get there, learn whatever your group uses, just for code commonality and ease of getting help. If you have a choice at all between C and C++, use the latter.

Most of my department uses Python if at all possible, especially for the purposes of plotting and organizing data, but all of our simulations are written in Fortran (often, unfortunately, Fortran 77) or C.
starslayer
 
Posts: 213
Joined: Wed Dec 02, 2009 9:58 am UTC

Re: MathBio Senior Needs to Learn Programming

Postby troyp » Mon Apr 16, 2012 11:14 pm UTC

Zamfir wrote:I would advice against numpy/scipy, for a complete beginner. It can require some careful installation, especially if you're not on linux.

That's certainly not my experience. I've never had the slightest issue setting up NumPy, on Windows or Linux. On GNU/Linux systems it generally installs direct from the package manager; on Windows, it's just a double click on the installer. Then you're ready to "import numpy". I don't use SciPy as frequently, but it's been the same story.

Anyway, I'd agree that it's best to concentrate on core Python at first, if only because learning the libraries will just distract from learning the language. What I was getting at was that learning Python opens up an entire ecosystem of scientific libraries that will be useful later.
troyp
 
Posts: 398
Joined: Thu May 22, 2008 9:20 pm UTC
Location: Lismore, NSW

Re: MathBio Senior Needs to Learn Programming

Postby Sagekilla » Thu Apr 19, 2012 12:25 am UTC

Disclaimer: I am a Physics/Computer Science major graduating this semester. I've done research for 4 years
now, with most of my scientific computing geared towards Physics. I'm heavily geared towards certain types of
languages because of this, and this should encourage you all the mother to explore your options and what's being used


Meem1029 wrote: can't think of any cases where you would need to use c instead of c++, as the two are very similar. C++ is basically C with classes added.


C++ is nothing like C "with classes added." If you've seen any modern iteration of C++, it's evolved way beyond that.
You can just as easily write C in C++, but writing idiomatic C++ is not "with classes" in the slightest bit.

In terms of actual programming languages, I would tell you to figure out what's being done in whatever field you're
going into, and then learn that. If lots of code is written in C++, do that. Or if it's in Matlab, Mathematica, or
whatever else, go for those. I've done research in random matrix theory, condensed matter physics, computational
physics, computational biology and I can say that there's no two fields which work in the same way. Figure out what
they're doing, then learn that.

If there's no preference, I would say play around with one of each of the following categories:
OOP: C# or Java
Low level, procedural: C or C++ (tbh I'd put C++ in it's own category)
Dynamic: Ruby or Python

Figure out how to write idiomatic programs in whatever it is you're doing. Then see which language paradigm suits
itself best towards the work you expect to be doing. That's about all I can say really, since what you use depends so
heavily on what you plan on doing.

If you intend on doing any sort of high performance computing, I'd seriously consider learning at least C or C++. I can't
think of any serious HPC that's not done in either C, C++, or Fortran. It's changing, but when you need the raw performance
nothing beats them.
http://en.wikipedia.org/wiki/DSV_Alvin#Sinking wrote:Researchers found a cheese sandwich which exhibited no visible signs of decomposition, and was in fact eaten.
Sagekilla
 
Posts: 385
Joined: Fri Aug 21, 2009 1:02 am UTC
Location: Long Island, NY

Re: MathBio Senior Needs to Learn Programming

Postby Rulzern » Wed Apr 25, 2012 9:27 pm UTC

I have to echo what Sagekilla is saying, look into what people in your field are using. In my work with bioinformatics (on the computer science end of things), I've been working with a fairly large Perl codebase, as well as coding in Python, Java and Perl.

That being said, I find that the largest "missing piece" for the biologists-turned-programmers I'm working with is the fundamental understanding of things like memory use and computational complexity, a couple of examples of this are; Using quicksort on a massive list of elements evaluating to the same value. Loading 20GB+ data sets into memory which are then sequentially accessed. This may, however, be issues you won't run into in your work.

So, in short (and tongue-in-cheek), code Python, learn C.
Rulzern
 
Posts: 2
Joined: Wed Apr 25, 2012 8:37 pm UTC

Re: MathBio Senior Needs to Learn Programming

Postby Nath » Thu Apr 26, 2012 3:38 am UTC

bananaKAY wrote:Thank you very much for this useful information! It is actually very nice to hear the point of view of a compsci undergrad, rather than a PhD. Haha, no offense to those oldies, but yeah.

Oldie here. I think you're focusing a bit too much on the language. Your professor probably wants you to take a basic programming course in C++ or Java not because she'll want you to program in those languages, but because she wants you to be familiar with the ideas and ways of thinking that an introductory CS course will provide; C++ and Java just happen to popular choices for these courses. The syntax of the language is mostly irrelevant at this point; you can figure out syntax as you go.

If you're not taking a formal class anyway, then yeah, Python's a nice language to play around with. It may end up being too slow if you're going to work with large datasets. I'm actually using D as my go-to language for research prototypes now, because it's faster than Python, interfaces easily with C/C++, and solves many of my annoyances with those languages. On the downside, it is relatively young. I revert to Python if the datasets are small, or C++ if I need to squeeze out as much speed as possible. Development time usually dwarfs running time for most of the stuff I do, so writing optimized C code is usually not worth it.
User avatar
Nath
 
Posts: 2620
Joined: Sat Sep 08, 2007 8:14 pm UTC


Return to Computer Science

Who is online

Users browsing this forum: Bakstoola and 3 guests