Science —

Cancer gene sequencing effort struggles through waves of false IDs

Muscle proteins, smell receptors show up in some putative lists.

Cancer gene sequencing effort struggles through waves of false IDs

With the development of DNA sequencing centers that are capable of churning out multiple genomes in a week, many scientists saw a resource that they could turn against cancer. By sequencing a person's healthy cells and comparing those results to the sequence of their cancer cells, it would be possible to map all the genetic changes that drive cancers. Within the list of genes, there might also be hints for future therapies.

As the cancer genomes have rolled in, however, reality hasn't kept pace with the promise. As the number of cancer genomes sequenced has risen, the number of genes identified has continued to grow. And as noted by the authors of a paper released by Nature over the weekend, some of the genes are overwhelmingly unlikely to have anything to do with cancer. So a huge team of researchers set out to find out why and to fix the problem.

Although some cancers are caused by viruses, the majority of cases are caused by mutations that alter or disable the genes that normally control a cell's growth. Many of these have been identified over the years: some that are common to many cancers, others that are specific to just a few. Until recently, there was no way to be sure we had a complete catalog of the genes involved, or knew which ones were important in which cancers. Genome sequencing gave us the chance to develop a complete catalog.

Which mutations are relevant?

The challenge of this approach is that cancer cells carry a lot of mutations. They are constantly adapting to the body's (and doctors') attempts to kill them and mutations are the raw materials for that. As part of their transformations, they also tend to disable the genes that stop cells from dividing if they carry DNA damage. Both of these factors tend to mean that cancer cells have an increased rate of mutations. But these mutations are indiscriminate; they hit irrelevant genes with the same frequency as they hit genes important for cancer's origin and spread.

So, the people doing cancer genomics faced a challenge in trying to weed out the irrelevant mutations and focus on the significant one. For the most part, they were failing.

To illustrate the problem, the authors of the new paper took their own set of normal and cancerous samples from 178 patients with lung cancer. The standard computer analysis used to identify mutations pulled out 450 genes that were mutated at a higher frequency in the cancers, even after accounting for the size of the gene. That's a lot. And some of them were clearly irrelevant to cancer. Nearly a quarter of the 450 genes encoded odorant receptors, which are the basis of your sense of smell, but not expressed anywhere much beyond the nerves of the nasal lining. Other nerve-specific genes were on the list, as were a few that play a structural role in muscles.

Other types of cancer had similarly large lists filled with genes that were probably irrelevant. And a scan of the published literature revealed that many of these had already been reported as associated with cancer.

Why are so many labs being led astray? To sort things out, the authors obtained a large collection of genomes from 27 different types of cancer, and started doing comparisons among them. The first thing they noticed is that different cancer types varied in the frequency of mutations by factors of up to 1,000. Lung cancers and melanomas were at the high end, with rates up to and exceeding one mutation every 10,000 bases. That's likely because these cancers are largely caused by known mutagens—cigarette smoke and UV light, respectively.

Those mutagens are also fairly specific about how they damage the DNA (for example, UV light tends to damage DNA when two Ts are next to each other),so they tended to favor a specific spectrum of mutations. The same was true in some other types of cancer, which suggests they might have a common environmental cause.

In addition to the type and frequency of mutations, there were other variables. Mutation rates could vary greatly among individuals with cancer, so that lung cancers from two different patients might show very different rates. And different areas of the genome were more or less prone to mutation. Active genes seem to be resistant to mutation, possibly because the reside on a section of the chromosome that's accessible to DNA repair genes. Areas that were the last to be copied when a cell divides, in contrast, were more likely to pick up mutations.

Overall, the authors conclude that earlier studies were going wrong because they compared mutations in a gene to the average mutation rate in the genome. Instead, all these other factors—type of cancer type of mutation, the patient's mutation rate, and the region of the genome—need to be taken into account as well. Being the helpful sorts, they even wrote a program called MutSig that did so. (And made it freely available for noncommercial use.)

When MutSig was turned loose on the original data, the list of interesting genes dropped from 450 to just 11. In all probability, the same thing would happen to other data sets if they were subjected to the same analysis; not everything (or not every gene) causes cancer.

This is a great success story, but it's a bit of a silver lining in a dark cloud. 10 of the 11 genes that were identified were already known to be involved in cancer, and the 11th is involved in the immune response, which helps keep cancer in check. So it's not clear that we're getting much in the way of new answers out of a large and expensive project.

Nature, 2013. DOI: 10.1038/nature12213  (About DOIs).

Channel Ars Technica