MetaSRA: API

API

MetaSRA provides an API for programmatic access to the current version of the database, with two resources: samples and terms.

If you need to download the whole database, or need an older version of MetaSRA, see the

Samples resource

Access samples as JSON by fetching http://metasra.biostat.wisc.edu/api/v01/samples.json?, or fetch CSV with http://metasra.biostat.wisc.edu/api/v01/samples.csv?.

For downloading SRA data with the SRA Toolkit, you can fetch a text file with one run ID per line with http://metasra.biostat.wisc.edu/api/v01/runs.ids.txt?, or a CSV file with one run per line and accompanying metadata with http://metasra.biostat.wisc.edu/api/v01/runs.csv?. See for instructions on using these files to download sequence data from SRA and processed expression data from Recount2.

Parameters

You can use any combination of the following query-string arguments to filter samples:

`study`	Filter samples by this SRA study ID. Required: you must provide a value for `study` and/or `and`.
`and`	Given a comma-separated list of ontology term ID's (see below,) return only samples that match all of the terms. Required: you must provide a value for `study` and/or `and`.
`not`	Given a comma-separated list of ontology term ID's (see below,) return only samples that do not match any of the terms.
`sampletype`	Show only samples matching this computationally-predicted sample type. Valid options are `cell line`, `tissue`, `primary cells`, `stem cells`, `in vitro differentiated cells`, and `iPS cells`. The server will accept any of the following as equivalent: `primary cells`, `primary+cells`, or `primary%20cells`.
`species`	Filter samples by species. Valid options are `human` and `mouse`.
`assay`	Filter samples by assay type. Valid options are `RNA-seq` and `ChIP-seq`.
`limit`	Limit the results to this many studies.
`skip`	Skip this many studies (useful for paging in combination with limit.)

Examples

Fetch a JSON file with "tissue" samples matching "liver" (UBERON:0002107) but not "disease" (DOID:4) or "treatment" (EFO:0000727): http://metasra.biostat.wisc.edu/api/v01/samples.json?and=UBERON:0002107&not=DOID:4,EFO:0000727&sampletype=tissue
Fetch a JSON file with "stem cell" samples matching "brain" (UBERON:0000955), filter by mouse species and ChIP-seq assay: http://metasra.biostat.wisc.edu/api/v01/samples.json?and=UBERON:0000955&sampletype=stem cells&species=mouse&assay=ChIP-seq
Fetch a CSV file with one row per sample matching "glioblastoma multiforme" (DOID:3068) and "brain" (UBERON:0000955), limit to 25 studies: http://metasra.biostat.wisc.edu/api/v01/samples.csv?and=DOID:3068,UBERON:0000955&limit=25
Fetch a CSV file with one row per run matching "glioblastoma multiforme" (DOID:3068) and "brain" (UBERON:0000955), limit to 25 studies (useful for downloading data with SRA Toolkit): http://metasra.biostat.wisc.edu/api/v01/runs.csv?and=DOID:3068,UBERON:0000955&limit=25
View all labeled samples for study SRP055569: http://metasra.biostat.wisc.edu/api/v01/samples.json?study=SRP055569

Results

Returned CSV files have one row per sample, with the following fields:

study_id	SRA study ID
study_title	Study title
sample_id	SRA sample ID
sample_name	Sample name from SRA metadata - not all samples have a sample name.
sample_type	Computationally-predicted sample type
sample_type_confidence	Sample type confidence
mapped_ontology_ids	An ID for each of the most-specific ontology terms mapped to this sample (see note below), comma-separated
mapped_ontology_terms	Term name for the most-specific ontology terms mapped to this sample (see note below,) comma-separated
raw_SRA_metadata	Raw SRA metadata for this sample, except blacklisted fields (see note below.) "key: value" pairs, semicolon-separated.
sample_species	Sample species
assay	Assay type

Returned JSON files have this shape:

{
  studyCount: 24,        // The total number of studies matching your search (not accounting for limit and skip)
  sampleCount: 170,      // Total number of samples matching your search (not accounting for limit and skip)


  terms: [               // Common ontology terms for samples in your search, roughly sorted by frequency
      ...
      {
          sampleCount: 34,                // A rough count of matching samples (not counting descendant terms, see note below)
          dterm: {
              name: "female organism",    // Term name
              ids: ["UBERON:0003100"]     // List of ID's for this term in one or more ontologies
          }
      }
      ...
  ],



  studies: [            // Matching samples are grouped by study
      {
          study: {
              title: "My super fantastic study"
              id: "SRP012345"           // SRA study ID
          }

          sampleCount: 22,              // Number of samples from this study that match your search

          dterms: [                     // All matching terms for samples in this study (see note below)
              {
                  name: "Brodmann (1909) area 11",
                  ids: ["UBERON:0013528"],
              }
              ...
          ],


          sampleGroups: [       // Samples in each study are grouped by their raw SRA attributes, all being the same
                                // except for a blacklist of ID-like fields (see note below) which can vary.
              {
                  samples: [                    // List of samples in this group
                      {
                          id: "SRS0123456"      // SRA sample ID
                          name: "My sample"     // Not all samples have a name
                          experiments: [        // Associated SRA experiment and run ID's
                              {
                                  id: "SRX0123456",
                                  runs: ["SRR0123456", "SRR0123457", ...]
                              },
                              ...
                          ]
                      }
                      ...
                  ],

                  info: {
                      species: "human",         // Sample species
                      assay: "RNA-seq"          // Sample assay type
                  },

                  attr: [
                      ["tissue", "lung"],       // [key, value] for raw SRA metadata fields for these samples, excluding blacklist (see note below)
                      ...
                  ],

                  type: {
                      type: "tissue",           // Sample type, computationally predicted from sample attributes
                      conf: 0.9445349           // Sample type confidence
                  },

                  dterms: {                     // Most-specific terms for these samples (see note below.)
                      {
                          name: "disease of cellular proliferation",
                          ids: ["DOID:14566"]
                      }
                      ...
                  }
              }
              ...
          ]
      }
      ...
  ]
}

Note on terms: Ontology terms are hierarchical: e.g. "lung disease" is a descendant of "disease." When you search on a term MetaSRA will include matches to all of its more-specific descendants, e.g. a search for "disease" will return samples that are labeled with "lung disease". But for brevity, the results will show only the most specific terms for a sample that have no descendants in the set, e.g. a sample labeled with "lung disease" will show "lung disease" in the results but not "disease".

Note on attributes and sampleGroups: MetaSRA excludes some raw SRA attributes using a blacklist. There is inconsistancy in how the fields are used, but the blacklisted fields are generally ID fields without information characterizing the sample. This is so that when grouping terms by like-attributes, the grouping is not interrupted by ID fields (sampleGroups are presnt in the JSON files, but not in the CSV's.) You can view the blacklist at the top of this file.

Terms resource

To query ontology terms used by the metaSRA, you can fetch this URL as JSON: http://metasra.biostat.wisc.edu/api/v01/terms?.

This resource only returns terms that are associated with at least one sample in MetaSRA.

Parameters

You can use any combination of the these arguments to filter terms:

`q`	Search string - return terms with names like this argument. Sort terms by relevance.
`ids`	Comma-separated list of ontology term ID's. Return terms matching any of these ID's.
`limit`	Only return up to this many terms. The limit cannot exceed 500, and the limit will default to 500 if none is provided.

Examples

See the top 10 terms matching "brain": http://metasra.biostat.wisc.edu/api/v01/terms?limit=10&q=brain
See the term for "EFO:0000322": http://metasra.biostat.wisc.edu/api/v01/terms?id=EFO:0000322

Results

Term results are shaped like this:

{
  terms: [
    {
        name: "tetrapod frontal bone",                            // Term name
        ids: ["UBERON:0000209"],                                  // List of ID's for this term in one or more ontologies
        syn: "frontal, frontal bone, os frontal, os frontale"     // Comma-separated list of synonyms

        ancestors: [                          // Jumble of less-specific (ancestor) related terms (at radius one or two)
            {
                name: "dermal bone",
                ids: ["UBERON:0001474"]
            }
            ...
        ],
        descendants: [...]                    // List of more-specific (descendant) related terms (at radius one or two)
    }
    ...
  ]
}