IntChron is an indexing service
for chronological data such as radiocarbon dates (Bronk Ramsey et al. 2019). It specifies a
standard exchange format and provides a consistent API for querying
databases that use its schema. The rintchron package provides a simple
interface for querying these databases with intchron()
,
explained in this vignette.
The package also includes low level functions for interacting with
the IntChron API directly, described in
vignette("intchron-api")
.
Use intchron()
to query databases indexed by IntChron.
At a minimum, you will need to specify which databases or ‘hosts’ you
want to query.1 Use intchron_hosts()
to see a
list of currently available databases:
intchron_hosts()
#> # A tibble: 6 × 2
#> host database
#> <chr> <chr>
#> 1 egyptdb Egyptian Radiocarbon Database
#> 2 intimate INTIMATE Database
#> 3 nrcf NERC Radiocarbon Facility (Oxford)
#> 4 oxa Oxford Radiocarbon Accelerator Unit
#> 5 sadb Southern Africa Radiocarbon Database
#> 6 intcal20 IntCal20 archive
The first argument to intchron()
should be a the ‘host’
code of the database you want to query, or a vector of hosts to query
more than one (e.g. intchron(hosts = c("oxa", "nrcf"))
).
For example, to return the entire South Africa Radiocarbon Database
(‘sadb’):
intchron("sadb")
#> # A tibble: 2,565 × 36
#> record_environment record_site_context record_z_type record_z_basis
#> <chr> <chr> <lgl> <lgl>
#> 1 Savanna Biome settlement NA NA
#> 2 Savanna Biome settlement NA NA
#> 3 Savanna Biome settlement NA NA
#> 4 Savanna Biome settlement NA NA
#> 5 Savanna Biome settlement NA NA
#> 6 Savanna Biome settlement NA NA
#> 7 NA settlement NA NA
#> 8 NA settlement NA NA
#> 9 Savanna Biome settlement NA NA
#> 10 Desert Biome settlement NA NA
#> # ℹ 2,555 more rows
#> # ℹ 32 more variables: record_z_units <lgl>, record_t_source <lgl>,
#> # record_suppress_t <lgl>, record_suppress_z <lgl>,
#> # record_record_comment <chr>, record_site <chr>, record_country <chr>,
#> # record_region <chr>, record_latitude <dbl>, record_longitude <dbl>,
#> # record_elevation <lgl>, record_name <chr>, record_site_type <chr>,
#> # record_color <chr>, series_type <chr>, t <dbl>, t_sigma <dbl>, …
You can further refine your query by specifying the locations you are
interested in with the countries
and sites
parameters. Like hosts
, these can also accept a vector of
locations. For example, to download records from Jordan in the ORAU
(oxa
) and NERC-RF (nrcf
) databases:
jordan <- intchron(c("oxa", "nrcf"), countries = "Jordan")
jordan
#> # A tibble: 156 × 19
#> record_site record_country record_name record_longitude record_latitude
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 Araq ed-Dubb Jordan Araq ed-Dubb 32.3 35.7
#> 2 Ayn Qasiyah Jordan Ayn Qasiyah 36.8 31.8
#> 3 Ayn Qasiyah Jordan Ayn Qasiyah 36.8 31.8
#> 4 Ayn Qasiyah Jordan Ayn Qasiyah 36.8 31.8
#> 5 Ayn Qasiyah Jordan Ayn Qasiyah 36.8 31.8
#> 6 Ayn Qasiyah Jordan Ayn Qasiyah 36.8 31.8
#> 7 Azraq 31 Jordan Azraq 31 36.8 31.8
#> 8 Azraq 31 Jordan Azraq 31 36.8 31.8
#> 9 Azraq 31 Jordan Azraq 31 36.8 31.8
#> 10 Burqu' 02 Jordan Burqu' 02 37.8 32.7
#> # ℹ 146 more rows
#> # ℹ 14 more variables: series_type <chr>, labcode <chr>, longitude <dbl>,
#> # latitude <dbl>, sample <chr>, material <chr>, species <chr>, d13C <dbl>,
#> # r_date <int>, r_date_sigma <int>, qual <chr>, F14C <dbl>, F14C_sigma <dbl>,
#> # refs <chr>
Use intchron_countries()
to get a list of available
countries on IntChron:
intchron_countries()
#> # A tibble: 117 × 1
#> country
#> <chr>
#> 1 Albania
#> 2 Algeria
#> 3 Andorra
#> 4 Angola
#> 5 Antarctica
#> 6 Argentina
#> 7 Armenia
#> 8 Australia
#> 9 Austria
#> 10 Bahamas
#> # ℹ 107 more rows
Or in specific databases:
intchron_countries(c("intimate", "egyptdb"))
#> # A tibble: 13 × 2
#> host country
#> <chr> <chr>
#> 1 intimate ""
#> 2 intimate "France"
#> 3 intimate "Greenland"
#> 4 intimate "Ireland"
#> 5 intimate "Italy"
#> 6 intimate "Norway"
#> 7 intimate "Romania"
#> 8 intimate "Slovenia"
#> 9 intimate "Switzerland"
#> 10 intimate "UK"
#> 11 egyptdb "Egypt"
#> 12 egyptdb "Palestinian Territory"
#> 13 egyptdb "Sudan"
With the default setting tabulate = TRUE
,
intchron()
returns a table of records, ready for you to use
in your analysis:
library("dplyr", warn.conflicts = FALSE)
# Summarise radiocarbon dates available from sites in Jordan
jordan %>%
distinct(labcode, .keep_all = TRUE) %>%
group_by(record_site) %>%
summarise(n_dates = n(), .groups = "drop_last")
#> # A tibble: 20 × 2
#> record_site n_dates
#> <chr> <int>
#> 1 Araq ed-Dubb 1
#> 2 Ayn Qasiyah 5
#> 3 Azraq 31 3
#> 4 Burqu' 02 1
#> 5 Burqu' 03 1
#> 6 Burqu' 27 3
#> 7 Burqu' 35 3
#> 8 Dahikiya, Badia Region 2
#> 9 Dhuweila 4
#> 10 Kharaneh IV 7
#> 11 Shuna Project 22
#> 12 Tell Abu Al-Kharaz 18
#> 13 Tell Abu en-Niaj 3
#> 14 Tell Hesban 1
#> 15 Tell el-Hayyat 4
#> 16 Tell el-Hibr 1
#> 17 Wadi Jilat 11
#> 18 Wadi Jilat 13 1
#> 19 Wadi Jilat 22 2
#> 20 Wadi Jilat 25 1
Note the use of distinct(labcode)
above. The data from
IntChron usually requires some cleaning; for example, the ORAU and
NERC-RF databases contain many duplicate radiocarbon dates. The c14bazAAR package (Schmid, Seidensticker, and Hinz 2019) includes
many useful functions for tidying radiocarbon data.
You may find the stratigraphr and rcarbon (Crema and Bevan 2020) packages useful for further analysis of radiocarbon dates in R.
In some situations you might want to access the full records returned
by IntChron. Setting tabulate = FALSE
will return the raw
JSON responses as a named list. See
vignette("intchron-api")
for some tips on how to work with
these objects.
There are several reasons for this. First and foremost,
it reduces the number of requests a given query has to make to the
IntChron API. Querying all hosts isn’t usually necessary because
IntChron indexes different types of database (e.g. radiocarbon dates
from ORAU, palaeoclimate records from INTIMATE) which are rarely
combined in a single analysis. Also, since IntChron is an indexing
service, it is designed to include more databases over time, meaning
analysis code that is not explicit about which hosts it needs is likely
to break or at least become much less efficient in the future. But if
you do need to, you can query all hosts by setting
host = "all"
.↩︎