Media

Media are any digital objects (such as photographs, sound recordings, or three-dimensional renderings of objects) that can be related to data items in Arctos. Thus, they are essentially anything that can be identified with a Uniform Resource Identifier (URI) and (optionally) related to a primary key in a major table. This arrangement allows us to relate photographs of anatomical features to records, sound recordings to collecting events, text files to agents, and any number of possibilities. A special class of Media paginates multi-page documents (e.g., JPG field notebook scans), allowing book-like browsing. TAGs identify user-selected areas of image media, and further relate these areas to records, places, and people. Media may be created autonomously, as part of a catalog record, or bulkloaded. Additionally, specialized tools have been built to support rapid imaging of herbaria and paleontological collections, including capture of ancillary data in the form of accession and locality cards. Data about media are stored in three tables:

Fields in table Media

media_id

Internal primary key, integer.

MediaID

URI-prefixed media_id serves as a GUID. Example: https://arctos.database.museum/media/10727496

Media_URI

The Uniform Resource Identifier (URI) that finds the media object on the Internet. (A URL is a common type of URI.) See also stability. Media URIs are of very roughly three classes:

NOTE: Files containing characters other than A-Z, a-z, 0-9, and _ are not eligible for scripting. Please sanitize any file names before uploading.

Mime_Type

The Internet media type. Consists of a type and subtype, such as “text/html.” Code-table controlled, as described in Wikipedia.

Media_Type

A description of the kind of media. These values are controlled by a code table. Media Type exists to categorize Media whose MIME type is not sufficiently descriptive. A HTML image viewer application would have MIME_TYPE of ‘text/html’ and MEDIA_TYPE of ‘image,’ for example.

Preview_URI

The Uniform Resource Identifier (URI) for a preview of the Media item. A preview might be something like a thumb-nail sized version of a larger image.

NOTE: Large thumbnails will not display in Arctos UI. Previews should be less than 10KB.

media_license_id

All Media should have a license, a legal document which guides and controls acceptable usage. Values are controlled by a code table.

NOTE: Cleanup in progress!

media_terms_id

Media may optionally have a terms document, which should serve a purpose such as informing a conscientious user how to best comply with the license. Values are controlled by a code table.

Media Relations

Relationships establish primary-key links between Media and Arctos nodes (including other Media).

media_relationship

The kind of relationship between the media and the data item. Relationships are functional and must be comprised of a string containing at least one space and ending with a table name. Values are controlled by a code table.

Created By

The agent who created the relationship between a media object and a data item. This is a foreign key to the Agent table.

Media Labels

Labels attach text to media objects.

media_label

The subject matter of a label describing a Media object. Values are controlled by a code table.

Label_Value

The content of a label. Generally the value is uncontrolled text, with the exception of Media Label = “Made_Date”, which requires its values to be in ISO date format (e.g. “2014-05-01” for “1 May 2014”), and will give an error upon saving if not rendered in that format. Try updating the date to the correct format to avoid error messages.

Assigned By

Media_Labels . Assigned_By_Agent_ID NUMBER(22) not null

The agent who assigned the label. This is a foreign key to the Agent table.

Media Creation Guidelines

Relationships

Create only necessary relationships; allow the relational nature of Arctos to work for you. An image showing a specimen should have a relationship of “shows cataloged_item” but not “shows collecting_event” or “shows locality” or “documents accn,” all of which can be derived from the relationship to the cataloged item.

Format

Choose reasonable media formats; use derivatives if necessary.

Primary relationships should be made to reasonably-sized (~500K maximum) JPG derivatives rather than DNG (Digital Negative, the only recommended format for primary images) originals, for example. JPG is not an appropriate choice for a text document, especially if the resultant file is 2MB!

Preview

Arctos will automatically create and attach previews where possible; this generally works for “normal” image media, such as JPG and PNG. Simply leave preview_uri NULL to let Arctos attempt preview creation.

If previews are created, filesize should be under (preferably much under!) 10K; previews larger than 48K will NOT be displayed. Scale to ~120px. Cropped or otherwise misleading previews should be avoided.

No preview is generally better than bad previews.

Binary Object Creation Guidelines

As a general rule, capture as much information as possible and store it in a lossless, archival format. Disk space is cheap and imaging projects are expensive, time-consuming, and potentially damaging to the subject material. Make lower-resolution derivatives as necessary. Adobe’s open Digital Negative (DNG) format is preferred for photography. DNGs store all the information from the camera in a compact package. Never store archival material in any non-DNG RAW format, which tend to be proprietary and dynamic. TIFF (a proprietary format also owned by Adobe) is not an equivalent choice, but is often the “most archival” option offered by a camera or scanner. JPG is a poor choice for archival media, but is often the only option. Derivatives, which are generally JPG images scaled to be viewed in a “normal” browser, should be sized to facilitate access on less-than-optimal networks while still preserving most of the information available from the original. We’ve had few complaints about 500K images of herbarium sheets. Users always have access to the originals, so it is not necessary for derivatives to facilitate all usage.

Derivatives can generally be batch-created using ImageMagick on TACC’s servers. We advocate sizing derivatives by filesize rather than dimensions, as all modern browsers offer zoom capability. When creating Media, it is recommended to create all relationships and labels from the derivative. This will eliminate the DNGs from most search results (they’re very large, not viewable in most browsers, and users blindly click on thumbnails) while still providing easy access to them from the derivative.

DNG view

DNG view

JPG view – what most users find

Multi-page documents

Multi-page documents are paginated JPGs, such as scanned field notes. Sequential JPGs are used rather than formats such as PDF to support further manipulation, such as adding TAGs.

To create these, do the following:

  1. Scan your material. If you can get the page number (starts with one, increments by one) in the file name, it will make everything else easier. Most scanners support sequential naming, e.g.

    • My_Fieldnotes_1.jpg
    • My_Fieldnotes_2.jpg.
  2. Load the scans to a stable, archival, visible server (as always, we recommend TACC). After this step, you should have a list of URIs, e.g.

  3. Download the Media Bulkloader template and fill in the blanks. Required fields are:

    • MEDIA_URI – the location to which you uploaded your scans
    • MIME_TYPE – this must be image/jpeg
    • MEDIA_TYPE – “multi-page document”
    • media_label_1 & media_label_value_1 (some label-plus-value – it doesn’t have to be 1) are “page” with sequential integers starting with 1 as value.
    • media_label_2 & media_label_value_2 – “title” – this must be EXACTLY the same for all pages in the document.
  4. Use the additional fields as you normally would to add any additional information, such as author(s).

Note that with good organization and clever use of your favorite spreadsheet, most of the work should be simple copy/paste/increment.

Discovery

Media Relationships link Media to specimens, agents, places, other Media, and more (and in turn link those resources together). Data objects thereby exchange information through database-key linkages as necessary; no information is replicated or otherwise made redundant (the process of normalization). An important factor in discoverability is having well-curated data, and our model eliminates any possibility of data becoming “stale” (such as updates to specimens not being made also to images of specimens) while simultaneously providing an efficient and powerful curatorial toolset which requires no redundant tasks, nor any knowledge of existing or future relationships in order to maintain data.

A major benefit of normalization is an inherent flexibility in how the data may be linked, found, or viewed. Arctos may be equally viewed as a Media database (which happens to contain specimens) and a Specimen database (which also contains Media); there is no pre-defined starting point or pathway through the data, and no fixed bounds on the data for any given object. “Specimen data” may be viewed to include data concerning publications in which the specimen is cited, and “media data” may be viewed as including specimen data, including linked data such as publications. Therefore, Media may be linked to and draw information from any information in Arctos, directly or indirectly, and those objects may in turn be linked to each other. For example, a Specimen may have Media and be cited in a Publication, providing a pathway from a citation to the Media; Media are therby discoverable across multiple “nodes.” Arctos users are presented with Media from various pathways; by searching directly for Media, or by encountering Specimens, Projects, Publications, Taxonomy, Agents, Places or Events with associated Media, or by encountering objects which use those things. For example, Media showing Localities is available from Specimens which use those Localities.

The same mechanism allows relationships among Media, allowing for example specimens linked to web-friendly JPGs which are in turn linked to original RAW (DNG) files, providing simultaneously for maximum discoverability and usability.

Media are displayed with appropriate metadata, including machine-readable formats, to encourage and facilitate discovery by both people and machines (such as search engine crawlers). In addition to being displayed in various places and forms in Arctos, Media (and related data) are indexed by search engines and are directly shared with various resources via IPT and other mechanisms.

Arctos provides a stable URI for Media (and other objects), and the capacity to create long-term stable targets via DOI (which are themselves discoverability aids). Thereby, in addition to linking out to relevant data, Arctos encourages links in. Many users discover Arctos by following links from NCBI, Google, content aggregators (such as iDigBio), scientific publications, and researchers’ home pages.

Keywords

Media Keywords are select words and phrases pulled from related objects and Media Labels which exist to facilitate discovery.

TAGs

TAGs relate a specific area of a JPG image to data objects in Arctos.

Comments in TAGs may contain a very specific (and currently very limited) type of markup language in order to form links to specimens in Arctos. This can be useful for negative references: “this area on the field notes probably does NOT refer to that specimen, possible because someone’s mixed some numbers up somewhere along the way.” To create a link from a comment TAG to a specimen, simply type doubled square brackets around the GUID-string. Example:

[[MVZ:Mamm:184092]]

forms a HTML-link to http://arctos.database.museum/guid/MVZ:Mamm:184092

NOTE: TAGs also form media relationships for discoverability purposes, see https://github.com/ArctosDB/arctos/issues/1682

URLs and Stability

Arctos provides several URLs to media objects. It it important to know which to use in any particular situation.

Accessibility

All Media stored at TACC is publicly available, and may appear in various search engines or linked from various places in Arctos.
TACC will not host restricted-access Media. Media may be made private by controlling access to the content;
use a private password-protected server, a password-protected Google document, archived in a password-protected ZIP file, etc.  

How To

Instructions for doing specifc tasks related to Media in Arctos (please note that “under construction” icons on pages indicate that the documentation may be incomplete or out-of-date):

Edit this Documentation

If you see something that needs to be edited in this document, you can create an issue using the link under the search widget at the top left side of this page, or you can edit directly here.