Media are any digital objects (such as photographs, sound recordings, or three-dimensional renderings of objects) that can be related to data items in Arctos. Thus, they are essentially anything that can be identified with a Uniform Resource Identifier (URI) and (optionally) related to a primary key in a major table (examples). This arrangement allows us to relate photographs of anatomical features to specimens, sound recordings to collecting events, text files to agents, and any number of possibilities. A special class of Media paginates multi-page documents (e.g., JPG field notebook scans), allowing browsing and PDF creation. TAGs identify user-selected areas of image media, and further relate these areas to specimens, places, and people. Media may be created autonomously, as part of a specimen record, or bulkloaded. Additionally, specialized tools exist for rapid imaging of herbaria and paleontological collections, including ancillary data in the form of accession and locality cards. Data about media are stored in three tables:
- Media Relations
- Media Labels
Fields in table Media
Media . Media_URI VARCHAR2(255) not null
The Uniform Resource Identifier (URI) that finds the media object on the Internet. (A URL is a common type of URI.) See also stability. Media URIs are of very roughly three classes:
- A binary object which your a web browser natively “knows” how to process, such as a JPG or MP3 file.
- A web page, such as YouTube or the Arctos Multi-Page Document handler.
- A binary file which your web browser does not know how to process, and therefore should download. DNG and ZIP files are generally in this category (although some users employ browser extensions which can process these types of files).
Try to avoid characters other than A-Z/a-z/0-9 in URLs. This is not always possible, but scripting “protocol://some_domain/my/directory/structure/file_name_is_out_here.jpg” will always work, while admins may be less able to help with, for example, “protocol://some_domain/my/directory/structure/file name is out here.jpg”
Media . Mime_Type VARCHAR2(255) not null
The Internet media type. Consists of a type and subtype, such as “text/html.” Code-table controlled, as described in Wikipedia.
Media . Media_Type VARCHAR2(255) not null
A description of the kind of media. These values are controlled by a code table. Media Type exists to categorize Media whose MIME type is not sufficiently descriptive. A HTML image viewer application would have MIME_TYPE of ‘text/html’ and MEDIA_TYPE of ‘image,’ for example.
Media . Preview_URI VARCHAR2(255) null
The Uniform Resource Identifier (URI) for a preview of the Media item. A preview might be something like a thumb-nail sized version of a larger image.
Media_Relations . Media_Relationship VARCHAR2(40) not null
The kind of relationship between the media and the data item. Relationships are functional and must be comprised of a string containing at least one space and ending with a table name. ColdFusion and Oracle both rely on this. Values are controlled by a code table.
Media_Relations . Created_By_Agent_ID NUMBER(22) not null
The agent who created the relationship between a media object and a data item. This is a foreign key to the Agent table.
Related Data Item
Media_Relations . Related_Primary_Key NUMBER(22) not null
The data item to which the Media object is related. This is a foreign key.
Media_Labels . Media_Label VARCHAR2(255) not null
The subject matter of a label describing a Media object. Values are controlled by a code table.
Media_Labels . Label_Value VARCHAR2(4000) not null
The content of a label. Generally the value is uncontrolled text, with the exception of Media Label = “Made_Date”, which requires its values to be in ISO date format (e.g. “2014-05-01” for “1 May 2014”), and will give an error upon saving if not rendered in that format. Try updating the date to the correct format to avoid error messages.
Media_Labels . Assigned_By_Agent_ID NUMBER(22) not null
The agent who assigned the label. This is a foreign key to the Agent table.
Checksum is a cryptographic hash, or “digital fingerprint,” of a file. At a later time, another checksum can be generated and compared to the original. Matching checksum values provide great certainty that the file has not changed. Arctos provides a means to generate an MD5 checksum and save this value as a Media Label. Checksum values generated by outside sources can be manually entered as Media Labels.
Media Creation Guidelines
Create only necessary relationships; allow the relational nature of Arctos to work for you. An image showing a specimen should have a relationship of “shows cataloged_item” but not “shows collecting_event” or “shows locality” or “documents accn,” all of which can be derived from the relationship to the cataloged item.
Choose reasonable media formats; use derivatives if necessary.
Primary relationships should be made to reasonably-sized (~500K maximum) JPG derivatives rather than DNG (Digital Negative, the only recommended format for primary images) originals, for example. JPG is not an appropriate choice for a text document, especially if the resultant file is 2MB!
Use reasonable previews; filesize should be under (preferably much
under!) 10K (previews larger than 48K will NOT be displayed), and scale
to ~120px. Cropped or otherwise misleading previews should be avoided.
NULL if producing a good thumbnail isn’t possible.
Binary Object Creation Guidelines
As a general rule, capture as much information as possible and store it in a lossless, archival format. Disk space is cheap and imaging projects are expensive, time-consuming, and potentially damaging to the subject material. Make lower-resolution derivatives as necessary. Adobe’s open Digital Negative (DNG) format is preferred for photography. DNGs store all the information from the camera in a compact package. Never store archival material in any non-DNG RAW format, which tend to be proprietary and dynamic. TIFF (a proprietary format also owned by Adobe) is not an equivalent choice, but is often the “most archival” option offered by a camera or scanner. JPG is a poor choice for archival media, but is often the only option. Derivatives, which are generally JPG images scaled to be viewed in a “normal” browser, should be sized to facilitate access on less-than-optimal networks while still preserving most of the information available from the original. We’ve had few complaints about 500K images of herbarium sheets. Users always have access to the originals, so it is not necessary for derivatives to facilitate all usage.
Derivatives can generally be batch-created using ImageMagick on TACC’s servers. We advocate sizing derivatives by filesize rather than dimensions, as all modern browsers offer zoom capability. When creating Media, it is recommended to create all relationships and labels from the derivative. This will eliminate the DNGs from most search results (they’re very large, not viewable in most browsers, and users blindly click on thumbnails) while still providing easy access to them from the derivative.
JPG view – what most users find
Multi-page documents are paginated JPGs, such as scanned field notes. Sequential JPGs are used rather than formats such as PDF to support further manipulation, such as adding TAGs.
To create these, do the following:
Scan your material. If you can get the page number (starts with one, increments by one) in the file name, it will make everything else easier. Most scanners support sequential naming, e.g.
Load the scans to a stable, archival, visible server (as always, we recommend TACC). After this step, you should have a list of URIs, e.g.
Download the Media Bulkloader template and fill in the blanks, or use the “pull from server” option to build a Media bulkloader template using information embedded in URIs. (This requires some knowledge of Regular Expressions.) Required fields are:
- MEDIA_URI – the location to which you uploaded your scans
- MIME_TYPE – this must be image/jpeg
- MEDIA_TYPE – “multi-page document”
- media_label_1 & media_label_value_1 (some label-plus-value – it doesn’t have to be 1) are “page” with sequential integers starting with 1 as value.
- media_label_2 & media_label_value_2 – “title” – this must be EXACTLY the same for all pages in the document.
Use the additional fields as you normally would to add any additional information, such as author(s).
Note that with good organization and clever use of your favorite spreadsheet, most of the work should be simple copy/paste/increment.
This entry describes one process to create Media on Arctos. It is not the only method possible, and may not be most suitable for any given need. MVZ has developed specialized procedures to fit their workflow, for example.
- Get your Media – the binary objects – to a suitable web-accessible location. We recommend TACC, and they’ve provided various tools* to facilitate loading data. You may wish to create a preview of your Media at this time – you’ll need to script that, do it manually, or coordinate with whomever is hosting your binary data (e.g. TACC).
- Locate related objects in Arctos. The Media Bulkloader cannot handle all relationships, and how relationships are formed is not always intuitive. See the Media Bulkloader for details.
- Populate the Media Bulkloader using the URI you’ve created in loading media, upload the CSV file, and follow the directions until you get a “spiffy” message.
Media Relationships link Media to specimens, agents, places, other Media, and more (and in turn link those resources together). Data objects thereby exchange information through database-key linkages as necessary; no information is replicated or otherwise made redundant (the process of normalization). An important factor in discoverability is having well-curated data, and our model eliminates any possibility of data becoming “stale” (such as updates to specimens not being made also to images of specimens) while simultaneously providing an efficient and powerful curatorial toolset which requires no redundant tasks, nor any knowledge of existing or future relationships in order to maintain data.
A major benefit of normalization is an inherent flexibility in how the data may be linked, found, or viewed. Arctos may be equally viewed as a Media database (which happens to contain specimens) and a Specimen database (which also contains Media); there is no pre-defined starting point or pathway through the data, and no fixed bounds on the data for any given object. “Specimen data” may be viewed to include data concerning publications in which the specimen is cited, and “media data” may be viewed as including specimen data, including linked data such as publications. Therefore, Media may be linked to and draw information from any information in Arctos, directly or indirectly, and those objects may in turn be linked to each other. For example, a Specimen may have Media and be cited in a Publication, providing a pathway from a citation to the Media; Media are therby discoverable across multiple “nodes.” Arctos users are presented with Media from various pathways; by searching directly for Media, or by encountering Specimens, Projects, Publications, Taxonomy, Agents, Places or Events with associated Media, or by encountering objects which use those things. For example, Media showing Localities is available from Specimens which use those Localities.
The same mechanism allows relationships among Media, allowing for example specimens linked to web-friendly JPGs which are in turn linked to original RAW (DNG) files, providing simultaneously for maximum discoverability and usability.
Media are displayed with appropriate metadata, including machine-readable formats, to encourage and facilitate discovery by both people and machines (such as search engine crawlers). In addition to being displayed in various places and forms in Arctos, Media (and related data) are indexed by search engines and are directly shared with various resources via IPT and other mechanisms.
Arctos provides a stable URI for Media (and other objects), and the capacity to create long-term stable targets via DOI (which are themselves discoverability aids). Thereby, in addition to linking out to relevant data, Arctos encourages links in. Many users discover Arctos by following links from NCBI, Google, content aggregators (such as iDigBio), scientific publications, and researchers’ home pages.
Media Keywords are select words and phrases pulled from related objects and Media Labels which exist to facilitate discovery.
TACC tools include direct Secure Copy Protocol (SCP) access, a WEBDAV dropbox (contact us for access), and the ability to automatically push files uploaded to Arctos to TACC. Additionally there are various wrappers and tools using SCP access, such as the ALA Imaging Project, which pushes DNG files from a local computer to TACC, creates thumbnails and high-resolution JPGs, and automatically associates images with barcoded specimens).
TAGs relate a specific area of a JPG image to data objects in Arctos.
Comments in TAGs may contain a very specific (and currently very limited) type of markup language in order to form links to specimens in Arctos. This can be useful for negative references: “this area on the field notes probably does NOT refer to that specimen, possible because someone’s mixed some numbers up somewhere along the way.” To create a link from a comment TAG to a specimen, simply type doubled square brackets around the GUID-string. Example:
forms a HTML-link to http://arctos.database.museum/guid/MVZ:Mamm:184092
URLs and Stability
Arctos provides several URLs to media objects. It it important to know which to use in any particular situation.
Detail URL: (http://arctos.database.museum/media/10002230)
This is a “fairly permanent” (a decade?) link to the Media Detail page. This link should continue to work as long as Arctos is at the same URL and running under the HTTP protocol. Suitable for any maintained online link; DOI is perhaps a better choice for paper or PDF archives.
“Stable Exit link” ( (http://arctos.database.museum/media/10002230?open)
Append “?open” (or “?open=true”) to the detail_url; logs the request and redirects to media_URI. This is a “fairly permanent” (a decade?) link to the Media URI. Logging is enabled. This link should continue to work as long as Arctos is at the same URL and running under the HTTP protocol, and the media itself is maintained. Suitable for any maintained online link. DOI is perhaps a better choice for paper or PDF archives. This link is a suitable DOI target.
DOIs should be (50-year-plus) permanent; they (with proper curatorial commitment) will survive the material moving out of Arctos, the deprecation of the HTTP protocol, and other imaginable changes. Not all material in Arctos has a DOI, but Operators may assign DOIs by request.
Links of this format trigger application-level logging and then redirect to the media URI. These links are as stable as the Media URI, but trigger local usage logging which helps us pay to maintain and create content for Arctos. (This approach is necessary because Arctos Media may be located anywhere, including servers over which we have no control.)
The link to the binary. These do NOT fire application-level logging, come with no stability guarantees, and should not be used for most purposes.
All Media stored at TACC is publicly available, and may appear in various search engines or linked from various places in Arctos. TACC will not host restricted-access Media. Media may be made private by controlling access to the content; use a private password-protected server, a password-protected Google document, archived in a password-protected ZIP file, etc.