API

MetaSRA provides an API for programmatic access to the current version of the database, with two resources: samples and terms.

If you need to download the whole database, or need an older version of MetaSRA, see the download page.

Samples resource

Access samples as JSON by fetching http://metasra.biostat.wisc.edu/api/v01/samples.json?, or fetch CSV with http://metasra.biostat.wisc.edu/api/v01/samples.csv?.

For downloading SRA data with the SRA Toolkit, you can fetch a text file with one run ID per line with http://metasra.biostat.wisc.edu/api/v01/runs.ids.txt?, or a CSV file with one run per line and accompanying metadata with http://metasra.biostat.wisc.edu/api/v01/runs.csv?. See here for instructions on using these files to download sequence data from SRA and processed expression data from Recount2.

Parameters

You can use any combination of the following query-string arguments to filter samples:

study Filter samples by this SRA study ID.
Required: you must provide a value for study and/or and.
and Given a comma-separated list of ontology term ID's (see below,) return only samples that match all of the terms.
Required: you must provide a value for study and/or and.
not Given a comma-separated list of ontology term ID's (see below,) return only samples that do not match any of the terms.
sampletype Show only samples matching this computationally-predicted sample type. Valid options are cell line, tissue, primary cells, stem cells, in vitro differentiated cells, and iPS cells. The server will accept any of the following as equivalent: primary cells, primary+cells, or primary%20cells.
limit Limit the results to this many studies.
skip Skip this many studies (useful for paging in combination with limit.)

Examples

Results

Returned CSV files have one row per sample, with the following fields:

study_id SRA study ID
study_title Study title
sample_id SRA sample ID
sample_name Sample name from SRA metadata - not all samples have a sample name.
sample_type Computationally-predicted sample type
sample_type_confidence Sample type confidence
mapped_ontology_ids An ID for each of the most-specific ontology terms mapped to this sample (see note below), comma-separated
mapped_ontology_terms Term name for the most-specific ontology terms mapped to this sample (see note below,) comma-separated
raw_SRA_metadata Raw SRA metadata for this sample, except blacklisted fields (see note below.) "key: value" pairs, semicolon-separated.


Returned JSON files have this shape:

{
  studyCount: 24,        // The total number of studies matching your search (not accounting for limit and skip)
  sampleCount: 170,      // Total number of samples matching your search (not accounting for limit and skip)


  terms: [               // Common ontology terms for samples in your search, roughly sorted by frequency
      ...
      {
          sampleCount: 34,                // A rough count of matching samples (not counting descendant terms, see note below)
          dterm: {
              name: "female organism",    // Term name
              ids: ["UBERON:0003100"]     // List of ID's for this term in one or more ontologies
          }
      }
      ...
  ],



  studies: [            // Matching samples are grouped by study
      {
          study: {
              title: "My super fantastic study"
              id: "SRP012345"           // SRA study ID
          }

          sampleCount: 22,              // Number of samples from this study that match your search

          dterms: [                     // All matching terms for samples in this study (see note below)
              {
                  name: "Brodmann (1909) area 11",
                  ids: ["UBERON:0013528"],
              }
              ...
          ],


          sampleGroups: [       // Samples in each study are grouped by their raw SRA attributes, all being the same
                                // except for a blacklist of ID-like fields (see note below) which can vary.
              {
                  samples: [                    // List of samples in this group
                      {
                          id: "SRS0123456"      // SRA sample ID
                          name: "My sample"     // Not all samples have a name
                          experiments: [        // Associated SRA experiment and run ID's
                              {
                                  id: "SRX0123456",
                                  runs: ["SRR0123456", "SRR0123457", ...]
                              },
                              ...
                          ]
                      }
                      ...
                  ],

                  attr: [
                      ["tissue", "lung"],       // [key, value] for raw SRA metadata fields for these samples, excluding blacklist (see note below)
                      ...
                  ],

                  type: {
                      type: "tissue",           // Sample type, computationally predicted from sample attributes
                      conf: 0.9445349           // Sample type confidence
                  },

                  dterms: {                     // Most-specific terms for these samples (see note below.)
                      {
                          name: "disease of cellular proliferation",
                          ids: ["DOID:14566"]
                      }
                      ...
                  }
              }
              ...
          ]
      }
      ...
  ]
}

Note on terms: Ontology terms are hierarchical: e.g. "lung disease" is a descendant of "disease." When you search on a term MetaSRA will include matches to all of its more-specific descendants, e.g. a search for "disease" will return samples that are labeled with "lung disease". But for brevity, the results will show only the most specific terms for a sample that have no descendants in the set, e.g. a sample labeled with "lung disease" will show "lung disease" in the results but not "disease".

Note on attributes and sampleGroups: MetaSRA excludes some raw SRA attributes using a blacklist. There is inconsistancy in how the fields are used, but the blacklisted fields are generally ID fields without information characterizing the sample. This is so that when grouping terms by like-attributes, the grouping is not interrupted by ID fields (sampleGroups are presnt in the JSON files, but not in the CSV's.) You can view the blacklist at the top of this file.

Terms resource

To query ontology terms used by the metaSRA, you can fetch this URL as JSON: http://metasra.biostat.wisc.edu/api/v01/terms?.

This resource only returns terms that are associated with at least one sample in MetaSRA.

Parameters

You can use any combination of the these arguments to filter terms:

q Search string - return terms with names like this argument. Sort terms by relevance.
ids Comma-separated list of ontology term ID's. Return terms matching any of these ID's.
limit Only return up to this many terms. The limit cannot exceed 500, and the limit will default to 500 if none is provided.

Examples

Results

Term results are shaped like this:
{
  terms: [
    {
        name: "tetrapod frontal bone",                            // Term name
        ids: ["UBERON:0000209"],                                  // List of ID's for this term in one or more ontologies
        syn: "frontal, frontal bone, os frontal, os frontale"     // Comma-separated list of synonyms

        ancestors: [                          // Jumble of less-specific (ancestor) related terms (at radius one or two)
            {
                name: "dermal bone",
                ids: ["UBERON:0001474"]
            }
            ...
        ],
        descendants: [...]                    // List of more-specific (descendant) related terms (at radius one or two)
    }
    ...
  ]
}