Bulkloader
Navigation: Enter Data > Bulkloader > Bulkload Catalog Records
New specimen records may be created from a single flat (non-relational) file, a text file in which all (or most) data for a single cataloged item are in a single row. This file can be created with any convenient client-side application. The file is then loaded into a similarly structured table on the server, and a server-side application (the bulkloader) parses the columns from each row into the relational structure of the database. The process provides an independent layer of data checking before new information is incorporated into the database proper. Original data that are received in electronic format may require minimal manipulation; you can sometimes merely add the necessary columns to build a file in the bulk-loading format.
Bulkloader templates should be created from the Bulkloader Builder in Arctos. All other means, including this documentation, may produce non-current data which will be rejected.
There is no standard method for moving data into table Bulkloader. You may import data from any file format, type the data into the table, write your own data entry screen, or use any other method you choose. We appreciate documentation, even for specialized datasets – contact us if you wish to contribute.
You may mix accessions, collections, or anything else in a single load.
The specimen Bulkloader will not alone handle every eventuality that may ever occur while entering data. (The suite of tools available should.) Use flags to mark incomplete records for further editing, tie to other bulkloaders with UUIDs, or talk to your friendly local Arctos development team BEFORE you make a mess.
Error messages should include more than enough information to allow you to locate and correct the problem. If that isn’t the case, contact us with the error message and a description of the action that caused the error message.
Arctos is case-sensitive. JOHN DOE is not the same value as John Doe. Leading and trailing spaces and other non-printing characters matter.
The web-based applications may not work well for very large loads. Contact us if you’re having problems.
Agent Names
Agent Names must match a unique namestring, not necessarily the preferred name. If you are loading “John Smith” and there are three John Smiths in Arctos, you might create a new name “John Smith (my project)” and use that namestring in your data. Once loaded, the records will display preferred name, and agent name “John Smith (my project)” may be removed.
Taxonomy
Special note primarily for botanists: The bulkloader
requires taxonomy.scientific_name
, not taxonomy.display_name
. That is,
“Carex bigelowii subsp. lugens” rather than “Carex bigelowii Torr.
subsp. lugens (Holm) T.V. Egorova”.
Any of the following are acceptable taxon name values (current 23 Aug 2011, see code table for most current formulas):
- Formula “A”: An exact match to any accepted
taxonomy.scientific_name
- Sorex cinereus
- Soricidae
- Formula “A cf.”: Any accepted
taxonomy.scientific_name
plus “ cf.”- Sorex cf.
- Formula “A ?”: Any accepted
taxonomy.scientific_name
plus “ ?”- Sorex ?
- Formula “A x B”: Any two accepted
taxonomy.scientific_name
s separated by “ x “- Sorex cinereus x Sorex yukonicus
- Formula “A or B”: Any two accepted
taxonomy.scientific_name
s separated by “ or “- Sorex cinereus or Sorex yukonicus
- Formula “A and B”: Any two accepted
taxonomy.scientific_name
s separated by “ and “- Sorex cinereus and Sorex yukonicus
- Formula “A {string}”: Any valid
taxonomy.scientific_name
, followed by a space, followed by an opening curly bracket, followed by a verbatim identification, followed by a closing curly bracket.- Sorex {Sorex new species “my name”}
- unidentifiable {granite}
Be sure anything coming from other applications (especially Microsoft products) has not changed field length, precision, or other attributes. Watch dates and non-integer numbers (such as decimal latitude) most closely.
Fields
The following table describes select individual fields in the Bulkloader. Check the Bulkloader Builder for the latest table structure. Do not attempt to use this as a template. Let us know if it’s out of date, incomplete, cryptic, or otherwise useless.
Field Name | Data Type/Vocabulary | Description/Example |
---|---|---|
required conditionally required not required |
||
Collection_Object_Id | any unique number | Temporary record identifier; Does NOT carry over to any internal primary keys. |
Cat_Num | set by collection | Existing catalog number, or leave blank to assign sequential numbers on upload. |
Began_Date | ISO8601 date | [ doc ] Earliest date the specimen could have been collected. |
Ended_Date | ISO8601 date | [ doc ] Latest date the specimen could have been collected. |
Verbatim_Date | text; any string | [ doc ] Examples: ‘winter 2002’; ‘1 Nov 2002’; ‘Nov 2002’. |
VERIFICATIONSTATUS | text; ctverificationstatus | |
SPECIMEN_EVENT_TYPE | text; ctspecimen_event_type | Type of specimen-event relationship |
Event_Assigned_By_Agent | text; agent name | Agent asserting specimen-to-event relationship; often coordinate determiner. |
Event_Assigned_Date | date | Date on which the specimen-event relationship is made |
Coll_Event_Remarks | text; any string | Remarks about Collecting Event. |
Higher_Geog | text; pre-existing | [ doc ] Higher Geography exactly as it appears in table Geog_Auth_Rec. New values must be added to the database prior to bulk-loading. |
Maximum_Elevation | integer > minimum_elevation | [ doc ] Maximum elevation from which the specimen could have come. Used in conjunction with Minimum_Elevation and Orig_Elev_Units. |
Minimum_Elevation | integer < maximum_elevation | [ doc ] Minimum elevation from which the specimen could have come. Used in conjunction with Maximum_Elevation and Orig_Elev_Units. |
Orig_Elev_Units | text; ctorig_elev_units | Used in conjunction with Maximum_Elevation and Minimum_Elevation. (Code table controlled.) |
Spec_Locality | text; any string | [ doc ] Specific locality from which a specimen originates. |
Locality_Remarks | text; any string | Remarks associated with Locality. |
— Begin coordinate fields. All coordinate data are optional unless Orig_Lat_Long_Units is specified, and leaving Orig_Lat_Long_Units NULL will cause all other coordinate data to be ignored. — |
||
Orig_Lat_Long_Units | text; ctlat_long_units | [ doc ] Lat/Long units as given by the determining agent and before any transformations. |
Datum | text; ctdatum | [ doc ] Map datum used to determine Lat/Long. Required if coordinates are given. |
GEOREFERENCE_SOURCE | text; any string | [ doc ] A code indicating the reference from which a Lat/Long was determined. |
GEOREFERENCE_PROTOCOL | text; ctgeoreference_protocol | |
Max_Error_Distance | number | [ doc ] The maximum possible error in distance between the recorded Lat_Long and the actual Lat_Long of the specific locality. Required if Max_Error_Units provided. |
Max_Error_Units | text; ctlat_long_error_units | [ doc ] The units in which the Max_Error_Distance are recorded. Required if Max_Error_Distance provided. Geographic coordinates may be entered in decimal degrees1, degrees-minutes-seconds2, or in degrees with decimal minutes3 [ doc ]. |
Dec_Lat1 | number | Decimal latitude. |
Dec_Long1 | number | Decimal longitude. |
LatDeg2 and 3 | positive number | Degrees Latitude (Integer, 90 or less). |
LatMin2 | positive number | Minutes Latitude (Integer, less than 60). |
LatSec2 | positive number | Seconds Latitude (Decimal fraction, less than 60). |
LatDir2 and 3 | text; N or S | Latitude Direction: “N” or “S” (North or South). |
LongDeg2 and 3 | positive number | Degrees Longitude (Integer, 180 or less). |
LongMin2 | positive number | Minutes Longitude (Integer, less than 60). |
LongSec2 | positive number | Seconds Longitude (Decimal fraction, less than 60). |
LongDir2 and 3 | text; W or E | Longitude Direction: “E” or “W” (East or West). |
Dec_Lat_Min3 | positive number | Decimal Latitude Minutes (Used with LatDeg, decimal fraction, less than 60). |
Dec_Long_Min3 | positive number | Decimal Longitude Minutes (Used with LongDeg, decimal fraction, less than 60). |
— end coordinate fields — | ||
Verbatim_Locality | text; any string | [ doc ] The locality, entered as closely as possible to the original text provided by the collector. (Not necessarily the same as specific locality.) |
Collecting_Source | text; ctcollecting_source | [ doc ] Source from which the specimen was received. Example: “wild caught” |
Habitat | text; any string | [ doc ] A description of habitat at the time of the collecting event. |
Associated_Species | text; any string | A description of other species occurring at the collecting event. Use relationships to other specimens when possible. |
Coll_Object_Remarks | text; any string | Remarks about the cataloged item. |
Id_Made_By_Agent | text; agent name | [ doc ] Determiner, or agent who identified the specimen. |
Identification_Remarks | text; any string | [ doc ] Remarks associated with this identification. |
Made_Date | ISO8601 date | [ doc ] Date that the taxonomic determination (or identification) was made. |
Nature_of_Id | text; ctnature_of_id | [ doc ] How identification was determined. (Code-table controlled.) |
Taxon_Name | text; taxon name | [ doc ] Scientific Name assigned by identifying agent. |
Other_Id_Num_x | text; any string | Other identifying numbers (ie, original field number). |
Other_Id_Num_Type_x | text; ctcoll_other_id_type | Used in conjunction with Other_Id_Num. (Code-table controlled.) |
Other_Id_References_x | text; ctid_references | Establish relationships to other specimens. (Code-table controlled.) |
Collector_Agent_x | text; agent name | Collector or preparator name as it appears in Arctos. At least one collector_agent is required. |
Collector_Role_x | text; ctcollector_role | Collector Role. |
Part_Name_x | text; ctspecimen_part_name | [ doc ] At least one part is required. |
Part_lot_count_x | number | [ doc ] A part_lot_count is required for all non-null parts. |
Part_Condition_x | text; any string | [ doc ] A description of the latest documented condition. |
Part_disposition_x | text; ctcoll_obj_disp | [ doc ] A Part_disposition is required for all non-null parts. Example: “in collection” |
Part_Barcode_x | text; any barcode | [ doc ] Barcode on the part as it will be read by a barcode scanner. |
Part_Container_Label_x | text; any string | [ doc ] Label on the container (e.g., Nunc tube). The human-readable printing on the container. NULL results in no changes to the part container; ignored if Part_Barcode_x is null . |
Part_Remark_x | text; any string | Remark about the part. |
Part_preservation_x | text; ctpart_preservation | [ doc ] This is a shortcut to creating a part attribute of type preservation . Attribute date will default to current_date and determiner will default to enteredAgent |
Accn | text; accn number | [ doc ] Accession Number assigned upon acceptance of specimens. Format is accn number without collection information, but see cross-collection considerations. |
EnteredBy | text; agent name | [ doc ] Agent entering the data into this table. Must match agent_name of type login. NULL able if entered_by_agent_id provided. |
ENTERED_AGENT_ID | number; key | EnterdBy’s agent_id. Increased performance over EnteredBy. |
GUID_Prefix | text; controlled | [ doc ] Unique-within-Arctos identifier of the collection under which the specimen will be cataloged. Replaces Institution_Acronym + Collection_Cde. |
collection_id | number; key | Primary key of table Collection. Alternative to GUID_prefix. |
Status | text; any string | This is where errors are stored after Bulkloader processing. More Info |
Flags | text; ctflags | Flag indicating the specimen needs further work. |
Attribute | text; ctattribute_type | [ doc ] Attribute name. (Code-table controlled.) |
Attribute_value | text; various | [ doc ] Value of the attribute. Leaving this NULL will cause the bulkloader to ignore the attribute entry regardless of other values. |
Attribute_units | text; L,W, etc. | [ doc ] Units on attribute_value, where appropriate. |
attribute_remarks | text; any string | [ doc ] Remarks about the attribute. |
attribute_date | ISO8601 date | [ doc ] Date the attribute was determined. |
attribute_det_meth | text; any string | [ doc ] How the attribute was determined. |
attribute_determiner | text; agent name | [ doc ] Agent who determined the attribute. |
locality_id | number; key | A primary key from table locality may be used in place of locality information. A value here will over-ride anything entered into higher_geog, spec_locality, coordinates, etc. |
locality_name | string; Exact:locality.locality_name | A persistent locality identifier which may be used in place of locality information. A value here will over-ride anything entered into higher_geog, spec_locality, coordinates, geology, etc. |
collecting_event_id | number; key | A primary key from table collecting_event may be used in place of collecting_event information. A value here will over-ride anything entered into higher_geog, spec_locality, coordinates, dates, method, etc. * All date fields should be formatted as ISO 8601, e.g., 2006-12-31. |
cataloged_item_type | text: ctcataloged_item_type | Designates the type of material held, passed to biodiversity data aggregators as BasisOfRecord. A value here will over-ride anything entered into Default Cataloged Item Type in Manage Collection. |
Primary Key Warning
Some values may be replaced by or require primary keys: locality_id
,
entered_by_agent_id
, collecting_event_id
, etc. These are internal
database identifiers that exist only for convenience, and may be
updated, transferred to another data object, or removed for seemingly
arbitrary reasons and without warning. They’ll probably work over short
time-periods, but we offer no guarantees.
Processing
Once a record is marked to load by making status
“autoload_core” (loads data from table bulkloader) or “autoload_extras” (also marks UUID-linked records in “component loaders” to autoload), a script periodically attempts to parse the record into the normalized core Arctos structure. This may result in two things:
* the record is created and marked for cache refresh, or
* an error is returned in the status
column
Records which successfully load must be refreshed in the cache before appearing in the user interfaces. Records are refreshed in the order they enter the queue. This process often takes less than one minute, but in the case of many thousands of records being queued can take up to several days. Reports/Services >View Statistics >FLAT status provides a summary of the state of the cache, and may be useful in estimating processing time.
Note that there is a period of time between successful loading and the cache being refreshed where records are not visible in any user interface.
Additionally, status DELETE (case-sensitive) can be used to mark records for deletion. This process generally takes about 30 minutes.