Arctos Data Quality Checks, Reports, and Tools

Arctos includes built-in checks, reports, and tools for creating and maintaining high quality data. Checks prevent the addition of low quality data and reports and tools detect problems with data after it has been entered. This document provides an overview of the data quality checks, reports, and tools available.

Data Quality Checks

When using the single record data entry form or bulkloader, the following checks occur at the point of data entry and must be resolved before a record can be saved.

Dates

Dates must be in the ISO format. End dates must be after begin dates anywhere there are two dates provided. Dates are always entered as a single value. Components (year, month, day, time) are extracted at the time of request, never stored. Future dates of collection (dates that fall after the current date) are not allowed.

Nonprinting Characters

No fields may include a non-printing character, leading spaces, or trailing spaces.

Catalog Numbers

Catalog numbers must match the expected format for the collection and may not already exist in Arctos. Duplicate catalog numbers are not allowed in Arctos. Any duplicate of an existing number will generate and error and fail to upload.

Pro Tip Collections using the integer catalog number format can leave catalog number blank and Arctos will assign the next available integer catalog number to the record as it is loaded.

Basis of Record

Basis of record is required in Arctos and must match a controlled vocabulary that includes the terms expected in the DarwinCore Archive prepared for GBIF. Collections can select a preferred value and if left blank during data entry the preferred value will be automatically used.

Accession

Every record must be associated with a pre-exiting accesion.

Agents

Any field that accepts an agent must include a value that matches exactly one Agent in Arctos. This includes collectors, preparators, creators, identification determiners, attribute determiners, participants in transactions and publications, and creators of media.

Code Tables

Arctos has a published list of acceptable terms for many fields Code Tables. Any field that accepts values from any of these code tables, must match a term in the table that is accepted by the collection in which the record is being entered.

Identification (Taxon Names)

Identifications in Arctos can be made in several formats, however, they all must include a reference to at least one term from the Taxon Name Table. This table is maintained by Arctos Operators with manage_taxonomy permissions and is not guaranteed to exclude misspellings or errors, but when these are discovered, there are paths for linking poorly formatted names to the correct version and/or quaratining such names from use while still allowing them to be present for the purposes of search and discoverability.

Higher Geography

Higher geography in Arctos is a controlled vocabulary composed of terms from GADM and IHO World Seas supported by shapes. Higher geography must match a term in this vocabulary, so any “misspellings” would be intentionally matching the relevant authority.

Elevation and Depth

Lowest elevation or depth cannot be more than highest and elevation values are constrained to avoid elevations or depths not possible on Earth.

Georeference

Latitude and longitide must either both be NULL or both include a value.

Datum must be supplied with coordinates, but cannot be supplied without them. In addition, georeference protocol and georeference error cannot be supplied without coordinates, although coordinates can be supplied without them. All spatial data are converted to WGS84 and datum is explicitly provided. Input datum is also retained.

Coordinate values are datatyped to disallow invalid entries.

Data Quality Reports and Tools

Dates

Many legitimate very old dates exist, however a date of collection or identification before the birth date of the collector or determiner will trigger a data quality notification in Arctos.

Arctos supports more than collecting, so something may legitimately be identified (as in an observation) prior to being collected, however, there is a curatorial report that flags this situation for review.

Agents

Agent pages include a list of potential duplicates.

Locality

Higher geography in Arctos is a controlled vocabulary of data objects associated with spatial polygons. Components are extracted on demand, never stored. Assigned coordinates plus error that do not fall within the higher geography polygon for any location generate a data quality report for all collections using the locality. This clearly highlights improper negation as well as coordinate/geography mismatches.

Taxonomy

Taxon pages in Arctos include external validation through comparisons with select taxonomic authorities including the World Register of Marine Species (WoRMS), Encyclopedia of Life (EOL), the Global Biodiversity Information Facility (GBIF) and Wikipedia among others. This tool is also engaged whenever a new name is added to the taxonomic name table to help avoid the addition of mispellings and misused names. The Taxonomy Gap tool in Arctos allows for review of taxonomic classifications with missing terms (Family, Order, etc.) or with no associated local classification. Arctos also pulls data from GlobalNames so records are generally still discoverable even when local taxonomic sources are missing terms or entire classifications.

Curatorial

Individual Count

Individual count is a curatorial assertion, there are no constraints.

Edit this Documentation

If you see something that needs to be edited in this document, you can create an issue using the link under the search widget at the top left side of this page, or you can edit directly here.