Cabbages have twice as many genes as you. No? Yes… really!
Chris Overall has an impressive lineage — scientifically and culturally — not only is he Australian, but his Post Doctoral supervisor is a Nobel laureate (Dr Michael Smith, invented site–directed mutagenesis). I was lucky enough to speak with Chris last week about his current research (terminomics) and his involvement in BioInfoSummer 2014 (BIS).
Chris prefaced our conversation by declaring that he comes to bioinformatics not as a mathematician or a statistician or a computer scientist, but as a user. In fact, he was quite pleased when the Australian Mathematical Sciences Institute (AMSI) approached him to give the opening lecture at BIS14.
“I am not going to talk about the past of bioinformatics, where it has come from where it is going. I am not qualified to do that,” Chris said. “I would like my audience to leave aspiring to be the Steve Jobs — the Apple — of bioinformatics. To design and code easy to navigate, beautiful programs and interfaces that users, like me, want to come back to and use again and again.”
He said it is important for the user to be able to “go under the hood” if they need or wish to, but that it shouldn’t be necessary.
Chris is a Professor and Canada Research Chair in Protease proteomics and Systems Biology. He works at the Overall Lab — yes, he even has a lab named after him — a proteomics and protein engineering research laboratory.
Set up in the early 90s the team at Overall Labs (which currently consists of Chris, two research associates, ten post docs and two PhD students) are developing tools to identify targeted ways to combat illnesses. Chris explained to me that medical research aims to understand the molecular pathways forming the basis of disease so that we can identify targets or their pathways capable of being used to develop drugs or treatments to counteract these, and so treat these diseases. However, every five to ten years we find another layer of complexity.
He broke this down for me: “Humans have 20,135 genes; a cabbage has 41,174; a London double decker bus has 24,000 different parts. So the question is, on the basis of the assumption of one protein per gene, how do we (humans) have so much more complexity than a cabbage if they have twice as many genes as we do?
“Part of the answer lies in the timing of the genes, that is when they turn on and off,” Chris said. “The other part is what are called post-translational modifications, or PTMs. After the gene synthesizes RNA the RNA then makes a protein – we call the production of new proteins from these parent protein chains protein post-translational modifications and this is what generates millions of forms of proteins in humans and hence generates incredible complexity from which life arises.
“So, post-translational modification looks out how the proteins are mixed and matched. It turns out that there are, on average, five alternate splice forms per protein. Which means that we can have different versions of the same protein depending on how it is made. If we multiply our assumption of one gene, one protein by five we now have 100,000 different proteins. Then there are a further twenty amino acids that make up these proteins, and many of these amino acids can be affected by chemical modifications – giving us even more diversity. In fact, doing the maths takes us up to around five million different targets; much more reasonable for a complex, self-healing, reproducing organism.”
Chris’s research steps one more rung up the ladder: “I investigate the ends of the proteins,” he said.
Enzymes cut through proteins, removing amino acids. And sometimes, missing just four amino acids — from a couple of hundred — can completely change the protein from giving a “go” signal to a “no-go” signal.
“This is what we study at the lab,” Chris informed me. “We call it terminonics,” he continued, with a jovial tone to his voice. “We have found lots of great examples where unless you know what the ends are, you don’t know whether the protein is on or off – whether it is an antagonist or protagonist.”
This is where Chris believes a big problem for bioinformatics arises. He gives me an example: “You have a particular protein, you find it in a database and you say to yourself, ‘This protein is present, therefore this interaction network must be present and therefore the pathway must be…’ But unless you are sure that the ends of the protein are there, those four amino acids, you would be completely wrong. So this is what we study; how molecule functions change depending on how they are cut.”
Fairly green to this field of study I asked Chris if they do the cutting of the proteins with the enzymes to see which amino acids are removed. He laughed as he replied, “That’s old school. I started my PhD adding the enzyme to the protein to see where it gets cut and then characterising it. We look at whole tissues now and determine all the cut ends at once,” Chris finished with a smirk.
The computer programs available now are simple; they don’t take into account the deeper interactions. In order for this to happen, Chris said that bioinformaticians need to work closely with biologists, “We have always had one or two coders, and so we have developed a suite of programs specifically designed to analyse our proteomics data sets and to make these readily accessible for knowledge translation. Our TopFIND knowledgebase has just been updated to v3beta with over 170,000 termini instantly accessible with protein maps, interaction paths, original references and other tools for analysis. It is by no means perfect and Steve Jobs would still most likely turn in his hard drive, but we are continually refining it to make it easier for the user.
“Don’t sit in a silo. As a bioinformatician get yourself embedded in a good biological lab and then you’ll be able to see the experiments and understand first hand what sort of questions need to be asked of the data.”
When Chris said this I immediately recalled hearing Professor Kate Smith-Miles say something very similar: “When you are collaborating with biologists you have to understand their language first and understand what their problem is before you can even get to the point of applying any mathematics.”
Kate, who is this year’s Chair and speaking on Monday and Friday, went on to explain that over her career she has spent a great deal of time learning enough about biology to ensure she can have conversations with a variety of biologists. She said it is the essence of the problem that you need to understand in order to be able to describe it as a mathematical problem.
“Bioinformatics is evolving rapidly,” Chris said. “Currently we have an amazing capability and access to incredible data sets. What I hope to gain from BioInfoSummer is insight into this evolution, to talk to and learn from my peers. And I hope that my experience as a user can help contribute to the depth and diversity of modern bioscience and systems biology.”
Chris said that it is great that AMSI organises events such as these that build platforms and networks for interdisciplinary collaboration. “Collaboration is key in medical research. We need to have collaborative networks as complex as the proteins that make us up. And in the end,” Chris concluded, “… you hope, with a little intuition, some clever experiments and good bioinformatics we will be able to map pathways and find targets that combat infection and disease.”