Basic usage • rintchron

IntChron is an indexing service for chronological data such as radiocarbon dates (Bronk Ramsey et al. 2019). It specifies a standard exchange format and provides a consistent API for querying databases that use its schema. The rintchron package provides a simple interface for querying these databases with intchron(), explained in this vignette.

The package also includes low level functions for interacting with the IntChron API directly, described in vignette("intchron-api").

library("rintchron")

Querying IntChron

Use intchron() to query databases indexed by IntChron. At a minimum, you will need to specify which databases or ‘hosts’ you want to query.¹ Use intchron_hosts() to see a list of currently available databases:

intchron_hosts()
#> # A tibble: 6 × 2
#>   host     database                            
#>   <chr>    <chr>                               
#> 1 egyptdb  Egyptian Radiocarbon Database       
#> 2 intimate INTIMATE Database                   
#> 3 nrcf     NERC Radiocarbon Facility (Oxford)  
#> 4 oxa      Oxford Radiocarbon Accelerator Unit 
#> 5 sadb     Southern Africa Radiocarbon Database
#> 6 intcal20 IntCal20 archive

The first argument to intchron() should be a the ‘host’ code of the database you want to query, or a vector of hosts to query more than one (e.g. intchron(hosts = c("oxa", "nrcf"))). For example, to return the entire South Africa Radiocarbon Database (‘sadb’):

intchron("sadb")
#> # A tibble: 2,565 × 36
#>    record_environment record_site_context record_z_type record_z_basis
#>    <chr>              <chr>               <lgl>         <lgl>         
#>  1 Savanna Biome      settlement          NA            NA            
#>  2 Savanna Biome      settlement          NA            NA            
#>  3 Savanna Biome      settlement          NA            NA            
#>  4 Savanna Biome      settlement          NA            NA            
#>  5 Savanna Biome      settlement          NA            NA            
#>  6 Savanna Biome      settlement          NA            NA            
#>  7 NA                 settlement          NA            NA            
#>  8 NA                 settlement          NA            NA            
#>  9 Savanna Biome      settlement          NA            NA            
#> 10 Desert Biome       settlement          NA            NA            
#> # ℹ 2,555 more rows
#> # ℹ 32 more variables: record_z_units <lgl>, record_t_source <lgl>,
#> #   record_suppress_t <lgl>, record_suppress_z <lgl>,
#> #   record_record_comment <chr>, record_site <chr>, record_country <chr>,
#> #   record_region <chr>, record_latitude <dbl>, record_longitude <dbl>,
#> #   record_elevation <lgl>, record_name <chr>, record_site_type <chr>,
#> #   record_color <chr>, series_type <chr>, t <dbl>, t_sigma <dbl>, …

You can further refine your query by specifying the locations you are interested in with the countries and sites parameters. Like hosts, these can also accept a vector of locations. For example, to download records from Jordan in the ORAU (oxa) and NERC-RF (nrcf) databases:

jordan <- intchron(c("oxa", "nrcf"), countries = "Jordan")
jordan
#> # A tibble: 156 × 19
#>    record_site  record_country record_name  record_longitude record_latitude
#>    <chr>        <chr>          <chr>                   <dbl>           <dbl>
#>  1 Araq ed-Dubb Jordan         Araq ed-Dubb             32.3            35.7
#>  2 Ayn Qasiyah  Jordan         Ayn Qasiyah              36.8            31.8
#>  3 Ayn Qasiyah  Jordan         Ayn Qasiyah              36.8            31.8
#>  4 Ayn Qasiyah  Jordan         Ayn Qasiyah              36.8            31.8
#>  5 Ayn Qasiyah  Jordan         Ayn Qasiyah              36.8            31.8
#>  6 Ayn Qasiyah  Jordan         Ayn Qasiyah              36.8            31.8
#>  7 Azraq 31     Jordan         Azraq 31                 36.8            31.8
#>  8 Azraq 31     Jordan         Azraq 31                 36.8            31.8
#>  9 Azraq 31     Jordan         Azraq 31                 36.8            31.8
#> 10 Burqu' 02    Jordan         Burqu' 02                37.8            32.7
#> # ℹ 146 more rows
#> # ℹ 14 more variables: series_type <chr>, labcode <chr>, longitude <dbl>,
#> #   latitude <dbl>, sample <chr>, material <chr>, species <chr>, d13C <dbl>,
#> #   r_date <int>, r_date_sigma <int>, qual <chr>, F14C <dbl>, F14C_sigma <dbl>,
#> #   refs <chr>

Use intchron_countries() to get a list of available countries on IntChron:

intchron_countries()
#> # A tibble: 117 × 1
#>    country   
#>    <chr>     
#>  1 Albania   
#>  2 Algeria   
#>  3 Andorra   
#>  4 Angola    
#>  5 Antarctica
#>  6 Argentina 
#>  7 Armenia   
#>  8 Australia 
#>  9 Austria   
#> 10 Bahamas   
#> # ℹ 107 more rows

Or in specific databases:

intchron_countries(c("intimate", "egyptdb"))
#> # A tibble: 13 × 2
#>    host     country                
#>    <chr>    <chr>                  
#>  1 intimate ""                     
#>  2 intimate "France"               
#>  3 intimate "Greenland"            
#>  4 intimate "Ireland"              
#>  5 intimate "Italy"                
#>  6 intimate "Norway"               
#>  7 intimate "Romania"              
#>  8 intimate "Slovenia"             
#>  9 intimate "Switzerland"          
#> 10 intimate "UK"                   
#> 11 egyptdb  "Egypt"                
#> 12 egyptdb  "Palestinian Territory"
#> 13 egyptdb  "Sudan"

Working with IntChron records

With the default setting tabulate = TRUE, intchron() returns a table of records, ready for you to use in your analysis:

library("dplyr", warn.conflicts = FALSE)

# Summarise radiocarbon dates available from sites in Jordan
jordan %>% 
  distinct(labcode, .keep_all = TRUE) %>% 
  group_by(record_site) %>% 
  summarise(n_dates = n(), .groups = "drop_last")
#> # A tibble: 20 × 2
#>    record_site            n_dates
#>    <chr>                    <int>
#>  1 Araq ed-Dubb                 1
#>  2 Ayn Qasiyah                  5
#>  3 Azraq 31                     3
#>  4 Burqu' 02                    1
#>  5 Burqu' 03                    1
#>  6 Burqu' 27                    3
#>  7 Burqu' 35                    3
#>  8 Dahikiya, Badia Region       2
#>  9 Dhuweila                     4
#> 10 Kharaneh IV                  7
#> 11 Shuna Project               22
#> 12 Tell Abu Al-Kharaz          18
#> 13 Tell Abu en-Niaj             3
#> 14 Tell Hesban                  1
#> 15 Tell el-Hayyat               4
#> 16 Tell el-Hibr                 1
#> 17 Wadi Jilat                  11
#> 18 Wadi Jilat 13                1
#> 19 Wadi Jilat 22                2
#> 20 Wadi Jilat 25                1

Note the use of distinct(labcode) above. The data from IntChron usually requires some cleaning; for example, the ORAU and NERC-RF databases contain many duplicate radiocarbon dates. The c14bazAAR package (Schmid, Seidensticker, and Hinz 2019) includes many useful functions for tidying radiocarbon data.

You may find the stratigraphr and rcarbon (Crema and Bevan 2020) packages useful for further analysis of radiocarbon dates in R.

In some situations you might want to access the full records returned by IntChron. Setting tabulate = FALSE will return the raw JSON responses as a named list. See vignette("intchron-api") for some tips on how to work with these objects.

References

Bronk Ramsey, Christopher, Maarten Blaauw, Rebecca Kearney, and Staff, Richard A Staff. 2019. “The Importance of Open Access to Chronological Information: The IntChron Initiative.” Radiocarbon 61 (5): 1–11. https://doi.org/10.1017/RDC.2019.21.

Crema, Enrico R, and Andrew Bevan. 2020. “Inference from Large Sets of Radiocarbon Dates: Software and Methods.” Radiocarbon, 1–17. https://doi.org/10.1017/RDC.2020.95.

Schmid, Clemens, Dirk Seidensticker, and Martin Hinz. 2019. “c14bazAAR: An R Package for Downloading and Preparing C14 Dates from Different Source Databases.” Journal of Open Source Software 4 (43): 1914. https://doi.org/10.21105/joss.01914.

There are several reasons for this. First and foremost, it reduces the number of requests a given query has to make to the IntChron API. Querying all hosts isn’t usually necessary because IntChron indexes different types of database (e.g. radiocarbon dates from ORAU, palaeoclimate records from INTIMATE) which are rarely combined in a single analysis. Also, since IntChron is an indexing service, it is designed to include more databases over time, meaning analysis code that is not explicit about which hosts it needs is likely to break or at least become much less efficient in the future. But if you do need to, you can query all hosts by setting host = "all".↩︎