The Wolfram|Alpha Blog is now part of the Wolfram Blog. Join us there for the latest on Wolfram|Alpha and other Wolfram offerings »
Paul-Jean Letourneau

Did You Know That Wolfram|Alpha Knows Your DNA?

March 10, 2010 —
Comments Off

We’ve been working diligently for several years to build a vast repository of genetic data into Wolfram|Alpha. At launch time, we had the entire human genome and all known human genes. Now, Wolfram|Alpha has genetic data for 11 different species, from humans and mice to fruit flies and worms. And we’re working hard to get more species in all the time.

These days we’re hearing more and more about how particular genes work, what their functions are, and what happens if a gene becomes mutated and stops functioning correctly. And with the personal genomics movement in full swing, we can even get portions of our own genomes sequenced, with a report detailing for us which gene variants we have and whether any put us at known high risk for diseases like breast cancer, diabetes, or Parkinson’s disease.

Well, Wolfram|Alpha makes it really easy to get in-depth information about a gene that interests you.

Take for example the gene SATB1, which recent studies have shown is an important factor in breast cancer growth. Wolfram|Alpha gives you a number of results about this gene. The first information is the standard and alternate names the gene goes by, which are important if you want to look it up in the literature:

Wolfram|Alpha's results for the gene SATB1

After that, Wolfram|Alpha tells us that this gene is on chromosome 3, locus p23, on the minus strand, starting at around 18 megabases along the chromosome. There is then a snippet of the gene’s actual DNA sequence, and we learn that the gene is about 90 kilobases (90,000 base pairs) long, with a picture showing which other genes are close by on the chromosome (in this case, PP1P and KCNH8):

More of Wolfram|Alpha's data for the gene SATB1

Clicking “More” a few times in the “Nearby genes” pod lets us zoom out to see more of the gene’s surrounding environment on the chromosome:

More of Wolfram|Alpha's data for the gene SATB1

In case we’re curious about the region just downstream from this gene, we can find out about it immediately with a query like this:

Wolfram|Alpha results for "1 Mbp downstream from human gene SATB1"

But the whole point of a gene is to carry the information needed to create a protein. First the gene’s DNA is transcribed into messenger RNA (mRNA). At this step, much of the gene’s original DNA is left behind. The sections that are kept for the mRNA are called “exons”, and the sections that are left behind are called “introns”.  There are also sections at the beginning and end of the gene, called untranslated regions (UTRs), that are not included in the mRNA.

The “Gene splicing structures” pod shows the various patterns of introns and exons in the gene’s DNA sequence:

Human gene SATB1 coding sequence in Wolfram|Alpha

The mRNA is then translated into the final protein sequence. The name of the resulting protein and the protein’s molecular weight are also given:

Protein name and protein molecular weight for the gene SATB1

Although the vast majority of our DNA is identical from person to person, every gene contains a small number of point mutations, called SNPs (single nucleotide polymorphisms), where the sequence tends to be different between people.  Often, a disease can be linked to the presence of just a single one of these SNPs. Wolfram|Alpha shows us where the SNPs are on the gene, and what fraction of the people tested had the alternative nucleotide base in their DNA:

gene_7

And if we’re curious about any of these SNPs, we can easily get some further details:

gene_8

Wolfram|Alpha also tells us what the gene’s job in the cell is, by giving us its typical functions, the locations the protein is found in, and the cell processes it plays a role in. In this case, we learn that SATB1 is a DNA-binding protein that acts as a transcription factor in the cell nucleus, which means it regulates the expression of other genes:

gene_9

But how does this gene differ in other species, like the mouse? When the same gene is found in another species, it’s called a “homolog”. Wolfram|Alpha shows us how SATB1’s protein sequence differs between species:

gene_10

In this case the start of the protein is highly conserved among species, with just a few small differences at the end. This tells us that evolution has made very few changes to the gene over millions of years. Compare this to the homologs for the gene BRCA1, which also plays an important role in breast cancer, where there are far more protein differences between species:

gene_11

Notice that when we first asked for SATB1, Wolfram|Alpha offered a choice of species homologs:

Selecting homologs for different species in Wolfram|Alpha

So instead of the human gene, we could have just as well asked for the rat gene SATB1 instead:

gene_13

We can also get a comparison between the two:

Comparing a Rat's SATB1 to a Human's SATB1 gene

Currently, Wolfram|Alpha has genetic information for 11 different species: human (Homo sapiens), mouse (Mus musculus), rat (Rattus norvegicus), fruit fly (Drosophila melanogaster), roundworm (Caenorhabditis elegans), yeast (Saccharomyces cerevisiae), thale cress (Arabidopsis thaliana), cow (Bos taurus), zebrafish (Danio rerio), chicken (Gallus gallus), and dog (Canis lupus familiaris).

Here are some example queries:

So Wolfram|Alpha can be a useful tool for anyone wanting to learn about genetics. Whether you’re a researcher doing a comparative genomics study, or just someone who’s fascinated about a gene and what it does, Wolfram|Alpha is a great place to start.

8 Comments

Do you have the ability to take a chromosome location such as 3p25 and show the genes that are known to be around that location? I was thinking that would be a helpful shortcut to have when you are reading medical literature that talks about copy number variations.

Thanks

Posted by Matt March 10, 2010 at 2:40 pm

    Hi Matt,

    I think being able to enter a chromosome locus and get genes (and potentially other features) in that region is a great suggestion. We should definitely make this work.

    Best regards,

    Paul-Jean

    Posted by Paul-Jean Letourneau March 10, 2010 at 7:05 pm

What about poor old Escherichia coli and his friends ?
Could be fun to include them in the party,
thanks !

Posted by Bianca March 11, 2010 at 2:42 am

    Hi Bianca,

    Great suggestion. We’re already working on getting E. coli into the system ASAP.

    Best,

    Paul-Jean

    Posted by Paul-Jean Letourneau March 11, 2010 at 12:31 pm

Have you guys considered giving access to Wolfram|Alpha usage statistics via Wolfram|Alpha itself?

Something like “most popular query topics in Europe on February 27th”?

Posted by Elver March 11, 2010 at 5:11 am

I entered CYP450 and got back information on converting 450 Cypress pounds. Not useful.

Posted by murphy March 12, 2010 at 2:49 pm

    We are currently working on adding gene families to Wolfram|Alpha. But you can ask for any member of the Cytochrome P450 family:

    gene CYP1B1
    gene CYP3A5
    mouse gene CYP21B

    etc.

    Posted by Paul-Jean Letourneau March 12, 2010 at 3:50 pm

Are you going to incorporate the newly published Neanderthal genome at any point? What about the Chimpanzee?

Posted by Kristian May 28, 2010 at 9:07 am