Locality Documentation

Collecting Event Documentation

Understanding the Arctos Locality Model

The core Arctos locality model consists of 4 primary tables plus an edit archive. This guide and the following illustration describe their primary function and interaction.

Arctos Locality Stack Diagram

Specimen_Event

Specimen-Event is the link from specimens to localities. Specimen-Event is not shared; a unique instance exists for and establishes every specimen<—>locality link, so a specimen with multiple encounters or unaccepted coordinates will have multiple specimen-events.

Collecting_Event

Collecting Event adds verbatim data plus dates. Collecting events are shared; one collecting_event may be parent to any number of specimen_events. In the case of co-collected specimens (e.g., hosts and parasites) maintaining one collecting event for multiple specimen-events is doubly critical.

Locality

Locality adds formality and vertical spatial data to collecting events. Localities are shared; one locality may be parent to any number of collecting_events.

Locality_Archive

Geography

Geography adds formalized descriptive data to locality. Geography is shared; one geography may be parent to any number of localities.

Not Included

In addition to the primary tables listed above, geology_attributes adds any number of hierarchical geology determinations to localities, table GEOG_SEARCH_TERM adds discovery data (such as old or local placenames, or placenamess in local charactersets) to geography, and several service-populate fields in Locality add automated georeference and reverse-georeference data which aids in discoverability and provides editing suggestions.

The Locality Stack

“A specimen’s locality” or can be viewed as everything from one record in collecting_event, locality (potentially including geology), and geog_auth_rec, while specimen_event is the glue which attaches “the locality” to specimens. Note that all Arctos keys are bit-wise, and very minor differences in the data (error distance or units, punctuation in remarks, etc.) can force into existence new and (slightly) different data objects. (Arctos provides “fuzzy” merger tools.) In addition to inconsequential differences, localities (which are simultaneously descriptive and/or spatial) often differ from similar localities by specific locality, coordinate point, coordinate error distance, elevation, depth, or the choice of higher geography. That is, there is not necessarily one correct interpretation of “Fairbanks, Alaska.” Collecting events often differ by the format of verbatim date or locality. Finally, the choice of higher geography is often somewhat arbitrary. All of these factors must be considered when attempting to ensure that a specimen shares a locality stack with other specimens. Collecting_event_id and collecting_event_name serve as proxies to the locality stack and may be used in the bulkloader or data entry screens to select existing “places.”

Hierarchy

The core data model is hierarchical; an example as a hierarchical “tree” is given below.

All of the above is from one geography: North America, United States, Alaska.

There are two localities given; we will focus only on Locality 1, which contains two collecting events differing only by date. Collecting Event 1-1 contains two specimen-events and, like all other specimen-events, each contains exactly one specimen. Collecting Event 1-2 also contains two collecting events, but in this case they contain the same specimen - perhaps a tagged individual which moved and was re-encountered on the same day.

Note that the specimen<–>specimen_event relationship is always 1:1; all other relationships in this model are 1:∞. One geography may contain two (or zero or two million) localities, one locality may contain two (or zero or two million) events, and one event may contain two (or zero or two million) specimen-events.

(Note also that the possibility of 1:0 relationships is in the context of specimens; “unused” data objects may exist in support of other nodes, such as Media.)

History

Origins

“The Old Model” consisted of two tables in a one:many relationship. “Locality” contained textual data, and “coordinates” contained spatial data. Zero or one coordinate determinations could be “accepted” for any locality. In this model, the “locality” data are structurally primary data and coordinates are structurally data about the locality, or metadata. Coordinates contained metadata (agent, method, date, reference, etc.) allowing precise tracking of their origin. There was no capacity to treat coordinates (eg, those downloaded from a GPS) as primary data. There was no history of the locality component; it was possible to edit a locality after it had been used to determine coordinates, which often lead to a partial and confusing “history” from the perspective of specimens. It was exceedingly difficult to find duplicates and “almost duplicates” (e.g., those records that differ by a few unimportant characters in the many free-text metadata fields). Localities are “facts” in this model and coordinates are “determinations.” There was no additional “determination node” between localities and specimens.

New Model

“The New Model” consists of a single table in which coordinates and string-data (such as specific locality) are treated as parts of the same place or data object. The entire object is a “fact” - there are no determiners involved. The determination is inserted between the specimen and the entire locality stack; the determiner is asserting the the locality as a whole applies to the specimen. Bare coordinates, bare specific locality, specific locality determined from coordinates, and coordinates determined from specific locality are all possible. The model is much more normalized, although multiple denormalizers (locality remarks, datum, various distance units) remain. A specimen may have any number of localities, each containing a determiner and date, verificationstatus, and specimen-event type. Changes (INSERT and UPDATE) are logged, and these may be used to discover the agents who georeferenced, reverse-georeferenced, changed geography, make corrections (or introduced errors), etc. A full history of locality data may be maintained from the perspective of specimens by “verifying” erroneous data as unaccepted and adding a new locality stack with a not-unaccepted VerificationStatus.

Geography Model

“Old Geography”

The former geography model was generally an attempt to “fill in the blanks,” which resulted in many - often dozens - of ways of saying, or almost saying, the same thing. This model was primary descriptive, with most shapes not having spatial data available. https://github.com/ArctosDB/arctos/issues/4836 is one discussion regarding this problem. Additionally there were no clear guidelines on what does or does not constitute geography, often involving very large (“Patagonia”) or very small (one of the smaller Florida Keys).

New geography

Several months of intense discussion lead to The Plan, which is summarized in Geography Documentation. Some common questions will be addressed below.

What’s going on with higher geography? Seems all my data have lost their continent.

Correct, no currently assertable source of geography contains continents.

How can I find a former Feature?

Consult https://github.com/ArctosDB/arctos/issues/5207, then search Feature:

Screenshot 2022-11-08 at 6 45 21 AM

How can I get a former feature in CSV or labels?

Use function concatLocalityAttributeValue (lid int,attrtype varchar) (enabled in search results as locality_feature) or file a Report Template Request for help.

How do I enter data with a feature?

Use the finder to locate locality attributes…

Screenshot 2022-11-08 at 7 11 39 AM

turn at least one row on, choose Feature, and begin typing to get values.

Screenshot 2022-11-08 at 7 14 07 AM

Where is previous geography?

https://github.com/ArctosDB/arctos/issues/5144 encompasses significant changes. (Predictable operations, like removing continent from an entire country, are not captured here.)

Now specific locality is redundant!

See Locality Documentation and https://github.com/ArctosDB/arctos/issues/5132 - some geography was moved to specific locaity per The Plan, and not following the “Do not include higher geography … in the Specific Locality” directive results in redundancy. File an Issue; we can help clean up.

Associated Names is still crazy!

The new approach to Geography means that I can use Arctos alone to get “should be” placenames. This operation is very resource limited, but is progressing. https://arctos.database.museum/guid/BYU:Herp:41840 which asserts United States, California gets Associated Names of North America, United States, California, Mariposa County, Yosemite National Park, for example.

How do Fun Flexible Feature (FFF) work?

https://arctos.database.museum/place.cfm?action=detail&geog_auth_rec_id=10016796 is a FFF.

Screenshot 2022-11-07 at 2 04 13 PM finds stuff from Yosmite (or it would if Arctos had sufficient resources) no matter what’s asserted.

Screenshot 2022-11-07 at 2 05 48 PM

is the same spatial query plus a search term (also derived from the spatial query) that shouldn’t time out and finds about the same thing. Note that the search by spatial intersection is also the operation which generates Associated Names. Associated Names will eventually be almost interchangeable with spatial shape search, but as above is resource limited.

Edit this Documentation

If you see something that needs to be edited in this document, you can create an issue using the link under the search widget at the top left side of this page, or you can edit directly here.