We’ve been working diligently for several years to build a vast repository of genetic data into Wolfram|Alpha. At launch time, we had the entire human genome and all known human genes. Now, Wolfram|Alpha has genetic data for 11 different species, from humans and mice to fruit flies and worms. And we’re working hard to get more species in all the time.
These days we’re hearing more and more about how particular genes work, what their functions are, and what happens if a gene becomes mutated and stops functioning correctly. And with the personal genomics movement in full swing, we can even get portions of our own genomes sequenced, with a report detailing for us which gene variants we have and whether any put us at known high risk for diseases like breast cancer, diabetes, or Parkinson’s disease.
Well, Wolfram|Alpha makes it really easy to get in-depth information about a gene that interests you.
Take for example the gene SATB1, which recent studies have shown is an important factor in breast cancer growth. Wolfram|Alpha gives you a number of results about this gene. The first information is the standard and alternate names the gene goes by, which are important if you want to look it up in the literature:
After that, Wolfram|Alpha tells us that this gene is on chromosome 3, locus p23, on the minus strand, starting at around 18 megabases along the chromosome. There is then a snippet of the gene’s actual DNA sequence, and we learn that the gene is about 90 kilobases (90,000 base pairs) long, with a picture showing which other genes are close by on the chromosome (in this case, PP1P and KCNH8):
Clicking “More” a few times in the “Nearby genes” pod lets us zoom out to see more of the gene’s surrounding environment on the chromosome:
In case we’re curious about the region just downstream from this gene, we can find out about it immediately with a query like this:
But the whole point of a gene is to carry the information needed to create a protein. First the gene’s DNA is transcribed into messenger RNA (mRNA). At this step, much of the gene’s original DNA is left behind. The sections that are kept for the mRNA are called “exons”, and the sections that are left behind are called “introns”. There are also sections at the beginning and end of the gene, called untranslated regions (UTRs), that are not included in the mRNA.
The “Gene splicing structures” pod shows the various patterns of introns and exons in the gene’s DNA sequence:
The mRNA is then translated into the final protein sequence. The name of the resulting protein and the protein’s molecular weight are also given:
Although the vast majority of our DNA is identical from person to person, every gene contains a small number of point mutations, called SNPs (single nucleotide polymorphisms), where the sequence tends to be different between people. Often, a disease can be linked to the presence of just a single one of these SNPs. Wolfram|Alpha shows us where the SNPs are on the gene, and what fraction of the people tested had the alternative nucleotide base in their DNA:
And if we’re curious about any of these SNPs, we can easily get some further details:
Wolfram|Alpha also tells us what the gene’s job in the cell is, by giving us its typical functions, the locations the protein is found in, and the cell processes it plays a role in. In this case, we learn that SATB1 is a DNA-binding protein that acts as a transcription factor in the cell nucleus, which means it regulates the expression of other genes:
But how does this gene differ in other species, like the mouse? When the same gene is found in another species, it’s called a “homolog”. Wolfram|Alpha shows us how SATB1’s protein sequence differs between species:
In this case the start of the protein is highly conserved among species, with just a few small differences at the end. This tells us that evolution has made very few changes to the gene over millions of years. Compare this to the homologs for the gene BRCA1, which also plays an important role in breast cancer, where there are far more protein differences between species:
Notice that when we first asked for SATB1, Wolfram|Alpha offered a choice of species homologs:
So instead of the human gene, we could have just as well asked for the rat gene SATB1 instead:
We can also get a comparison between the two:
Currently, Wolfram|Alpha has genetic information for 11 different species: human (Homo sapiens), mouse (Mus musculus), rat (Rattus norvegicus), fruit fly (Drosophila melanogaster), roundworm (Caenorhabditis elegans), yeast (Saccharomyces cerevisiae), thale cress (Arabidopsis thaliana), cow (Bos taurus), zebrafish (Danio rerio), chicken (Gallus gallus), and dog (Canis lupus familiaris).
Here are some example queries:
- human gene HIF1A
- human gene MDM2 homologs
- fly gene cyce
- drosophila gene rho
- mouse trp53
- tnf gene rattus norvegicus
- r norvegicus gene drd1a
- zebrafish gene lcp1
- cmyb d rerio
- yeast CDC28
- cow gene NOS3
- arabidopsis HY5
- g gallus vcl
- dog gene KIT
So Wolfram|Alpha can be a useful tool for anyone wanting to learn about genetics. Whether you’re a researcher doing a comparative genomics study, or just someone who’s fascinated about a gene and what it does, Wolfram|Alpha is a great place to start.
Do you have the ability to take a chromosome location such as 3p25 and show the genes that are known to be around that location? I was thinking that would be a helpful shortcut to have when you are reading medical literature that talks about copy number variations.
Thanks
Hi Matt,
I think being able to enter a chromosome locus and get genes (and potentially other features) in that region is a great suggestion. We should definitely make this work.
Best regards,
Paul-Jean
What about poor old Escherichia coli and his friends ?
Could be fun to include them in the party,
thanks !
Hi Bianca,
Great suggestion. We’re already working on getting E. coli into the system ASAP.
Best,
Paul-Jean
Have you guys considered giving access to Wolfram|Alpha usage statistics via Wolfram|Alpha itself?
Something like “most popular query topics in Europe on February 27th”?
I entered CYP450 and got back information on converting 450 Cypress pounds. Not useful.
We are currently working on adding gene families to Wolfram|Alpha. But you can ask for any member of the Cytochrome P450 family:
gene CYP1B1
gene CYP3A5
mouse gene CYP21B
etc.
Are you going to incorporate the newly published Neanderthal genome at any point? What about the Chimpanzee?