Gene query service

This page describes the reference for MyGene.info gene query web service. It’s also recommended to try it live on our interactive API page.

Service endpoint

http://mygene.info/v2/query

GET request

Query parameters

q

Required, passing user query. The detailed query syntax for parameter “q” we explained below.

fields

Optional, can be a comma-separated fields to limit the fields returned from the matching gene hits. The supported field names can be found from any gene object (e.g. gene 1017). Note that it supports dot notation as well, e.g., you can pass “refseq.rna”. If “fields=all”, all available fields will be returned. Default: “symbol,name,taxid,entrezgene”.

species

Optional, can be used to limit the gene hits from given species. You can use “common names” for nine common species (human, mouse, rat, fruitfly, nematode, zebrafish, thale-cress, frog and pig). All other species, you can provide their taxonomy ids. See more details here. Multiple species can be passed using comma as a separator. Passing “all” will query against all available species. Default: human,mouse,rat.

size

Optional, the maximum number of matching gene hits to return (with a cap of 1000 at the moment). Default: 10.

from

Optional, the number of matching gene hits to skip, starting from 0. Default: 0

Hint

The combination of “size” and “from” parameters can be used to get paging for large query:

q=cdk*&size=50                     first 50 hits
q=cdk*&size=50&from=50             the next 50 hits

sort

Optional, the comma-separated fields to sort on. Prefix with “-” for descending order, otherwise in ascending order. Default: sort by matching scores in decending order.

facets

Optional, a single field or comma-separated fields to return facets, for example, “facets=taxid”, “facets=taxid,type_of_gene”. See examples of faceted queries here.

species_facet_filter

Optional, relevant when faceting on species (i.e., “facets=taxid” are passed). It’s used to pass species filter without changing the scope of faceting, so that the returned facet counts won’t change. Either species name or taxonomy id can be used, just like “species” parameter above. See examples of faceted queries here.

entrezonly

Optional, when passed as “true” or “1”, the query returns only the hits with valid Entrez gene ids. Default: false.

ensemblonly

Optional, when passed as “true” or “1”, the query returns only the hits with valid Ensembl gene ids. Default: false.

callback

Optional, you can pass a “callback” parameter to make a JSONP call.

dotfield

Optional, can be used to control the format of the returned fields when passed “fields” parameter contains dot notation, e.g. “fields=refseq.rna”. If “dofield” is true, the returned data object contains a single “refseq.rna” field, otherwise, a single “refseq” field with a sub-field of “rna”. Default: true.

filter

Alias for “fields” parameter.

limit

Alias for “size” parameter.

email

Optional, if you are regular users of our services, we encourage you to provide us an email, so that we can better track the usage or follow up with you.

Query syntax

Examples of query parameter “q”:

Simple queries

search for everything:

q=cdk2                              search for any fields
q=tumor suppressor                  default as "AND" for all query terms
q="cyclin-dependent kinase"         search for the phrase

Fielded queries

q=entrezgene:1017
q=symbol:cdk2
q=refseq:NM_001798
Available fields
Field Description Examples
entrezgene Entrez gene id q=entrezgene:1017
ensemblgene Ensembl gene id q=ensemblgene:ENSG00000123374
symbol official gene symbol q=symbol:cdk2
name gene name q=name:cyclin-dependent
alias gene alias q=alias:p33
summary gene summary text q=summary:insulin
refseq NCBI RefSeq id (both rna and proteins) q=refseq:NM_001798
q=refseq:NP_439892
unigene NCBI UniGene id q=unigene:Hs.19192
homologene NCBI HomoloGene id q=homologene:74409
accession NCBI GeneBank Accession number q=accession:AA810989
ensembltranscript Ensembl transcript id q=ensembltranscript:ENST00000266970
ensemblprotein Ensembl protein id q=ensemblprotein:ENSP00000243067
uniprot UniProt id q=uniprot:P24941
ipi (deprecated!) IPI id q=ipi:IPI00031681
pdb PDB id q=pdb:1AQ1
prosite Prosite id q=prosite:PS50011
pfam PFam id q=pfam:PF00069
interpro InterPro id q=interpro:IPR008351
mim OMIM id q=mim:116953
pharmgkb PharmGKB id q=pharmgkb:PA101
reporter Affymetrix probeset id q=reporter:204252_at
reagent GNF reagent id q=reagent:GNF282834
go Gene Ontology id q=go:0000307
hgnc HUGO Gene Nomenclature Committee q=hgnc:1771
hprd Human Protein Reference Database q=hprd:00310
mgi Mouse Genome Informatics q=mgi:MGI\\:88339
rgb Rat Genome Database q=rgd:620620
flybase A Database of Drosophila Genes & Genomes q=flybase:FBgn0004107&species=fruitfly
wormbase C elegans and related nematodes database q=wormbase:WBGene00057218&species=31234
zfin Zebrafish Information Network q=zfin:ZDB-GENE-980526-104&species=zebrafish
tair Arabidopsis Information Resource q=tair:AT3G48750&species=thale-cress
xenbase
Xenopus laevis and Xenopus tropicalis
biology and genomics resource
q=xenbase:XB-GENE-1001990&species=frog
mirbase
database of published miRNA
sequences and annotation
q=mirbase:MI0017267
retired
Retired Entrez gene id, including
those with replaced gene ids.
q=retired:84999

Genome interval query

When we detect your query (“q” parameter) contains a genome interval pattern like this one:

chrX:151,073,054-151,383,976

we will do the genome interval query for you. Besides above interval string, you also need to specify “species” parameter (with the default as human). These are all acceptted queries:

q=chrX:151073054-151383976&species:9606
q=chrX:151,073,054-151,383,976&species:human

Hint

As you can see above, the genomic locations can include commas in it.

Wildcard queries

Wildcard character “*” or ”?” is supported in either simple queries or fielded queries:

q=CDK?                              single character wildcard
q=symbol:CDK?                       single character wildcard within "symbol" field
q=IL*R                              multiple character wildcard

Note

Wildcard character can not be the first character. It will be ignored.

Boolean operators and grouping

You can use AND/OR/NOT boolean operators and grouping to form complicated queries:

q=tumor AND suppressor                        AND operator
q=CDK2 OR BTK                                 OR operator
q="tumor suppressor" NOT receptor             NOT operator
q=(interleukin OR insulin) AND receptor       the use of parentheses

Returned object

A GET request like this:

http://mygene.info/v2/query?q=symbol:cdk2

should return hits as:

{
  "hits": [
    {
      "name": "cyclin-dependent kinase 2",
      "_score": 87.76775,
      "symbol": "CDK2",
      "taxid": 9606,
      "entrezgene": 1017,
      "_id": "1017"
    },
    {
      "name": "cyclin-dependent kinase 2",
      "_score": 79.480484,
      "symbol": "Cdk2",
      "taxid": 10090,
      "entrezgene": 12566,
      "_id": "12566"
    },
    {
      "name": "cyclin dependent kinase 2",
      "_score": 62.286797,
      "symbol": "Cdk2",
      "taxid": 10116,
      "entrezgene": 362817,
      "_id": "362817"
    }
  ],
  "total": 3,
  "max_score": 87.76775,
  "took": 4
}

Faceted queries

If you need to perform a faceted query, you can pass an optional “facets” parameter. For example, if you want to get the facets on species, you can pass “facets=taxid”:

A GET request like this:

http://mygene.info/v2/query?q=cdk2&size=1&facets=taxid

should return hits as:

{
  "hits":[
    {
      "entrezgene":1017,
      "name":"cyclin-dependent kinase 2",
      "_score":400.43347,
      "symbol":"CDK2",
      "_id":"1017",
      "taxid":9606
    }
  ],
  "total":26,
  "max_score":400.43347,
  "took":7,
  "facets":{
    "taxid":{
      "_type":"terms",
      "total":26,
      "terms":[
        {
          "count":14,
          "term":9606
        },
        {
          "count":7,
          "term":10116
        },
        {
          "count":5,
          "term":10090
        }
      ],
      "other":0,
      "missing":0
    }
  }
}

Another useful field to get facets on is “type_of_gene”:

http://mygene.info/v2/query?q=cdk2&size=1&facets=type_of_gene

It should return hits as:

{
  "hits":[
    {
      "entrezgene":1017,
      "name":"cyclin-dependent kinase 2",
      "_score":400.43347,
      "symbol":"CDK2",
      "_id":"1017",
      "taxid":9606
    }
  ],
  "total":26,
  "max_score":400.43347,
  "took":97,
  "facets":{
    "type_of_gene":{
      "_type":"terms",
      "total":26,
      "terms":[
        {
          "count":20,
          "term":"protein-coding"
        },
        {
          "count":6,
          "term":"pseudo"
        }
      ],
      "other":0,
      "missing":0
    }
  }
}

If you need to, you can also pass multiple fields as comma-separated list:

http://mygene.info/v2/query?q=cdk2&size=1&facets=taxid,type_of_gene

Particularly relevant to species facets (i.e., “facets=taxid”), you can pass a “species_facet_filter” parameter to filter the returned hits on a given species, without changing the scope of the facets (i.e. facet counts will not change). This is useful when you need to get the subset of the hits for a given species after the initial faceted query on species.

You can see the different “hits” are returned in the following queries, while “facets” keeps the same:

http://localhost:9000/v2/query?q=cdk?&size=1&facets=taxid&species_facet_filter=human

v.s.

http://localhost:9000/v2/query?q=cdk?&size=1&facets=taxid&species_facet_filter=mouse

Batch queries via POST

Although making simple GET requests above to our gene query service is sufficient in most of use cases, there are some cases you might find it’s more efficient to make queries in a batch (e.g., retrieving gene annotation for multiple genes). Fortunately, you can also make batch queries via POST requests when you need:

URL: http://mygene.info/v2/query
HTTP method:  POST

Query parameters

q

Required, multiple query terms seperated by comma (also support “+” or white space), but no wildcard, e.g., ‘q=1017,1018’ or ‘q=CDK2+BTK’

scopes

Optional, specify one or more fields (separated by comma) as the search “scopes”, e.g., “scopes=entrezgene”, “scopes=entrezgene,ensemblgene”. The available “fields” can be passed to “scopes” parameter are listed above. Default: “scopes=entrezgene,ensemblgene,retired” (either Entrez or Ensembl gene ids).

species

Optional, can be used to limit the gene hits from given species. You can use “common names” for nine common species (human, mouse, rat, fruitfly, nematode, zebrafish, thale-cress, frog and pig). All other species, you can provide their taxonomy ids. See more details here. Multiple species can be passed using comma as a separator. Default: human,mouse,rat.

fields

Optional, can be a comma-separated fields to limit the fields returned from the matching gene hits. The supported field names can be found from any gene object (e.g. gene 1017). Note that it supports dot notation as well, e.g., you can pass “refseq.rna”. If “fields=all”, all available fields will be returned. Default: “symbol,name,taxid,entrezgene”.

dotfield

Optional, can be used to control the format of the returned fields when passed “fields” parameter contains dot notation, e.g. “fields=refseq.rna”. If “dofield” is true, the returned data object contains a single “refseq.rna” field, otherwise, a single “refseq” field with a sub-field of “rna”. Default: true.

email

Optional, if you are regular users of our services, we encourage you to provide us an email, so that we can better track the usage or follow up with you.

Example code

Unlike GET requests, you can easily test them from browser, make a POST request is often done via a piece of code. Here is a sample python snippet:

import httplib2
h = httplib2.Http()
headers = {'content-type': 'application/x-www-form-urlencoded'}
params = 'q=1017,1018&scopes=entrezgene'
res, con = h.request('http://mygene.info/v2/query', 'POST', params, headers=headers)

Returned object

Returned result (the value of “con” variable above) from above example code should look like this:

[
  {
    "name": "cyclin-dependent kinase 2",
    "symbol": "CDK2",
    "taxid": 9606,
    "entrezgene": 1017,
    "query": "1017",
    "_id": "1017"
  },
  {
    "name": "cyclin-dependent kinase 3",
    "symbol": "CDK3",
    "taxid": 9606,
    "entrezgene": 1018,
    "query": "1018",
    "_id": "1018"
  }
]

Tip

“query” field in returned object indicates the matching query term.

If a query term has no match, it will return with “notfound” field as “true”:

params = 'q=1017,dummy&scopes=entrezgene'
res, con = h.request('http://mygene.info/v2/query', 'POST', params, headers=headers)
[
  {
    "name": "cyclin-dependent kinase 2",
    "symbol": "CDK2",
    "taxid": 9606,
    "entrezgene": 1017,
    "query": "1017",
    "_id": "1017"
  },
  {
    "query": "dummy",
    "notfound": true
  }
]

If a query term has multiple matches, they will be included with the same “query” field:

params = 'q=tp53,1017&scopes=symbol,entrezgene'
res, con = h.request('http://mygene.info/v2/query', 'POST', params, headers=headers)
[
  {
    "name": "tumor protein p53",
    "symbol": "TP53",
    "taxid": 9606,
    "entrezgene": 7157,
    "query": "tp53",
    "_id": "7157"
  },
  {
    "name": "tumor protein p53",
    "symbol": "Tp53",
    "taxid": 10116,
    "entrezgene": 24842,
    "query": "tp53",
    "_id": "24842"
  },
  {
    "name": "cyclin-dependent kinase 2",
    "symbol": "CDK2",
    "taxid": 9606,
    "entrezgene": 1017,
    "query": "1017",
    "_id": "1017"
  }
]