By Juliette Bryan, Data Analyst at the ALA
One of the challenges the Atlas of Living Australia faces is the integration of biodiversity occurrence data in many different forms. Most of our data comes from museums, herbaria, other biological collections, State conservation agencies and BirdLife Australia. While these data are generally well structured, data from other sources may be inconsistent in format – and can be from both amateurs and professionals.
Some of the data we receive has been recorded over many years or decades, by a variety of individuals from a regional group or nature club, so will be of high value for certain research questions. However, even within a single club, members have often adopted their own method for recording the data and, while there are some standards, there is often great inconsistency in the format.
Data from government agencies or organisations with a background in biodiversity data often require very little restructuring before the data can be loaded in to the Atlas of Living Australia.
Before any data is loaded into the ALA it must be mapped to Darwin Core terms. The Darwin Core is a body of standards primarily based on taxa and their occurrence in nature as documented by observations, specimens, and samples, and related information. Darwin Core terms include the basis of record, location, event date, sampling protocol, recorded by, identified by, species common and scientific names and associated media.
When reading and cleaning the more difficult, unstructured data, we use a variety of tools. Open source tools such as MySQL, Talend and Pentaho are all useful in mapping the data to Darwin Core terms. The more structured data usually only require changing the column names to match their corresponding Darwin Core terms and then the data is ready to load.
Example 1: Raw data which will be mapped to Darwin Core terms
2011 (Spring) October 12th |
Waterbird Sanctuary |
Watsons block |
Tips Billabong |
Racecourse |
Corridor/River |
Purple Swamphen |
3+2chicks |
4+2chicks |
5 |
||
Dusky Moorhen |
3 |
4 |
|||
Eurasian Coot |
20 |
23 |
Below is the data mapped to Darwin Core terms. As we were given the locality in the original data we used the ALA Spatial Portal (http://spatial.ala.org.au/) to create the columns: coordinatePrecision, coordinateUncertaintyInMeters, decimalLatitude and decimalLongitude.
vernacularName | locality | individualCount | occuranceRemarks | eventDate | samplingprotocol |
Purple Swamphen | Waterbird Sanctuary | 3+2chicks |
12/10/2011 |
Present: Time: 8:30am – 11.30am Weather: 6 – 18 deg | |
Dusky Moorhen | Watsons Block |
3 |
12/10/2011 |
Present: Time: 8:30am – 11.30am Weather: 6 – 18 deg | |
Eurasian Coot | Watsons Block |
20 |
12/10/2011 |
Present: Time: 8:30am – 11.30am Weather: 6 – 18 deg | |
Purple Swamphen | Tips Billabong | 4+2chicks |
12/10/2011 |
Present: Time: 8:30am – 11.30am Weather: 6 – 18 deg | |
Dusky Moorhen | Tips Billabong |
4 |
12/10/2011 |
Present: Time: 8:30am – 11.30am Weather: 6 – 18 deg | |
Eurasian Coot | Tips Billabong |
23 |
12/10/2011 |
Present: Time: 8:30am – 11.30am Weather: 6 – 18 deg |
coordinatePrecision | coordinateUncertaintyInMeters | georeferencedDate | decimalLatitude | decimalLongitude |
0.00001 |
100 |
13/12/2011 |
-36.93344 |
149.87758 |
0.00001 |
100 |
13/12/2011 |
-36.93281 |
149.87255 |
0.00001 |
100 |
13/12/2011 |
-36.93281 |
149.87255 |
0.00001 |
100 |
13/12/2011 |
-36.93313 |
149.875 |
0.00001 |
100 |
13/12/2011 |
-36.93313 |
149.875 |
0.00001 |
100 |
13/12/2011 |
-36.93313 |
149.875 |
georeferenceProtocol | georeferenceSources | georeferencedBy |
Location description looked up on map and digitised in ALA spatial portal | Panboola Wetlands Map provided by surveyors and ALA spatial portal | Miles Nicholls (ALA) |
Location description looked up on map and digitised in ALA spatial portal | Panboola Wetlands Map provided by surveyors and ALA spatial portal | Miles Nicholls (ALA) |
Location description looked up on map and digitised in ALA spatial portal | Panboola Wetlands Map provided by surveyors and ALA spatial portal | Miles Nicholls (ALA) |
Location description looked up on map and digitised in ALA spatial portal | Panboola Wetlands Map provided by surveyors and ALA spatial portal | Miles Nicholls (ALA) |
Location description looked up on map and digitised in ALA spatial portal | Panboola Wetlands Map provided by surveyors and ALA spatial portal | Miles Nicholls (ALA) |
Location description looked up on map and digitised in ALA spatial portal | Panboola Wetlands Map provided by surveyors and ALA spatial portal | Miles Nicholls (ALA) |
Below is the same data displayed in the ALA
Example 2: Data digitised from a series of publications
Scientific Name | Year of collection | Attribution | Identified by | Recorded by | Synonymy | station no | Material Studied |
Caulastrea echinulata (Edwards & Haime, 1849) | 1976 | Veron J. E. N., Pichon M., Maya Wijsman-Best, 1977, Schleractinia of Eastern Australia Part 2, Families Faviidae, Trachyphylliidae, Australian Institute of Marine Science (AIMS), Australian Government Publishing Service, Canberra | J. Veron, M. Pichon, M. Wijsman-Best | J. Veron | Dasyphyllia echinulata Edwards & Haime, 1849; Edwards & Haime (1857); Ortmann (1888). Caulastrea echinulata (Edwards & Haime, 1849); Matthai (1928); Nemenzo (1959); Wijsman-Best (1972). Caulastrea aiharai Yabe & Sugiyama, 1935; Yabe, Sugiyama & Eguchi (1936). | 9,36,90 | Yonge Reef, Palm Islands (4 specimens). These localities include collecting stations 9, 36, 90. |
Psammocora explanulata van der Horst, 1922 | 1975 | Veron J. E. N., Pichon M., 1976, Schleractinia of Eastern Australia Part 1, Families Thamnasteriidae, Astrocoeniidae, Pocilloporidae, Australian Institute of Marine Science (AIMS), Australian Government Publishing Service, Canberra | J. Veron, M. Pichon | J. Veron | Psammocora explanulata van der Horst, 1922; Wells (1954). | 55 | Palm Islands (2 specimens), collecting station 55. |
Additionally, we were provided with a lookup table of 259 stations and their locations. As we were given the locality of the station number in the original data, we used the ALA Spatial Portal (http://spatial.ala.org.au/) to create the columns: Uncertainty in Km, Latitude and Longitude. A sample of the data is shown below. The Location Remarks column is used to identify if a particular station is a dredging station or not.
Station No |
Station location |
Longitude |
Latitude |
Uncertainty in Km |
Location Remarks |
1 | Great Detached Reef | 144.028 | -11.694 | 6 | |
2 | Tijou Reef | 143.95 | -13.166 | 2 | |
3 | Yonge Reef | 145.657 | -14.693 | 0.5 | |
4 | Bowl Reef | 147.545 | -18.512 | 1 |
The data was transposed by Station number in order to create 1 row per station. This data was then joined to the Station lookup table to pick up the locality and mapping coordinates. Below is the final data mapped to Darwin Core terms.
scientificName |
year |
recordedBy |
identifiedBy |
locality |
locationId |
Caulastrea echinulata (Edwards & Haime, 1849) | 1976 | J. Veron | J. Veron, M. Pichon, M. Wijsman-Best | Yonge Reef | 9 |
Caulastrea echinulata (Edwards & Haime, 1849) | 1976 | J. Veron | J. Veron, M. Pichon, M. Wijsman-Best | Electra Head, Great Palm Island | 36 |
Caulastrea echinulata (Edwards & Haime, 1849) | 1976 | J. Veron | J. Veron, M. Pichon, M. Wijsman-Best | Pelorus Island, Palm Islands, W | 90 |
Psammocora explanulata van der Horst, 1922 | 1975 | J. Veron | J. Veron, M. Pichon | Orpheus Island (Palm Islands), NW point | 55 |
locationRemarks | decimalLongitude | decimalLatitude | coordinateUncertaintyInMeters |
145.623 | -14.597 | 500 | |
146.69 | -18.73 | 500 | |
146.488 | -18.551 | 500 | |
146.48 | -18.567 | 500 |
occuranceRemarks | previousIdentifications | references |
Yonge Reef, Palm Islands (4 specimens). These localities include collecting stations 9, 36, 90. | Dasyphyllia echinulata Edwards & Haime, 1849; Edwards & Haime (1857); Ortmann (1888). Caulastrea echinulata (Edwards & Haime, 1849); Matthai (1928); Nemenzo (1959); Wijsman-Best (1972). Caulastrea aiharai Yabe & Sugiyama, 1935; Yabe, Sugiyama & Eguchi (1936). | Veron J. E. N., Pichon M., Maya Wijsman-Best, 1977, Schleractinia of Eastern Australia Part 2, Families Faviidae, Trachyphylliidae, Australian Institute of Marine Science (AIMS), Australian Government Publishing Service, Canberra |
Yonge Reef, Palm Islands (4 specimens). These localities include collecting stations 9, 36, 90. | Dasyphyllia echinulata Edwards & Haime, 1849; Edwards & Haime (1857); Ortmann (1888). Caulastrea echinulata (Edwards & Haime, 1849); Matthai (1928); Nemenzo (1959); Wijsman-Best (1972). Caulastrea aiharai Yabe & Sugiyama, 1935; Yabe, Sugiyama & Eguchi (1936). | Veron J. E. N., Pichon M., Maya Wijsman-Best, 1977, Schleractinia of Eastern Australia Part 2, Families Faviidae, Trachyphylliidae, Australian Institute of Marine Science (AIMS), Australian Government Publishing Service, Canberra |
Yonge Reef, Palm Islands (4 specimens). These localities include collecting stations 9, 36, 90. | Dasyphyllia echinulata Edwards & Haime, 1849; Edwards & Haime (1857); Ortmann (1888). Caulastrea echinulata (Edwards & Haime, 1849); Matthai (1928); Nemenzo (1959); Wijsman-Best (1972). Caulastrea aiharai Yabe & Sugiyama, 1935; Yabe, Sugiyama & Eguchi (1936). | Veron J. E. N., Pichon M., Maya Wijsman-Best, 1977, Schleractinia of Eastern Australia Part 2, Families Faviidae, Trachyphylliidae, Australian Institute of Marine Science (AIMS), Australian Government Publishing Service, Canberra |
Palm Islands (2 specimens), collecting station 55. | Psammocora explanulata van der Horst, 1922; Wells (1954). | Veron J. E. N., Pichon M., 1976, Schleractinia of Eastern Australia Part 1, Families Thamnasteriidae, Astrocoeniidae, Pocilloporidae, Australian Institute of Marine Science (AIMS), Australian Government Publishing Service, Canberra |