R package bangarang

The bangarang package is a bundle of datasets and functions that help analysts with research in the Kitimat Fjord System (KFS) in the north coast of mainland British Columbia. Nearly all of the content is catered specifically to analysis of datasets from the RV Bangarang expedition of 2013 - 2015, which was focused on the abundance, distribution, and foraging ecology of whales, seabirds, salmon, and their prey during the months of summer and early fall. The research involved line-transect sampling, active acoustic (echosounder) surveys, and oceanographic sampling (both while underway and at a grid of stations).

Methodological details can be found here. More info on the Bangarang project can be found here. The Bangarang project was carried out as a doctoral thesis at Scripps Institution of Oceanography in close collaboration with the Gitga’at First Nation, BC Whales, Fisheries & Oceans Canada, and the NOAA Southwest Fisheries Science Center.

Installation

The bangarang package can be downloaded directly from GitHub:

# Install devtools if needed
if (!require('devtools')) install.packages('devtools')

# Increase timeout for download, since there are datasets
options(timeout=9999999)

# Install package
devtools::install_github('ericmkeen/bangarang')

Load into your R session:

library(bangarang)

## Warning: replacing previous import 'data.table::first' by 'dplyr::first' when
## loading 'bangarang'

## Warning: replacing previous import 'data.table::last' by 'dplyr::last' when
## loading 'bangarang'

## Warning: replacing previous import 'data.table::between' by 'dplyr::between'
## when loading 'bangarang'

## Warning: replacing previous import 'scales::rescale' by
## 'spatstat.geom::rescale' when loading 'bangarang'

## Warning: replacing previous import 'data.table::shift' by
## 'spatstat.geom::shift' when loading 'bangarang'

## Warning: replacing previous import 'scales::viridis_pal' by
## 'viridis::viridis_pal' when loading 'bangarang'

This vignette was made with bangarang version 3.16, and will make use of a few other packages:

library(dplyr)
library(tidyr)
library(ggplot2)

Maps

Produce a map of the Bangarang study area in the KFS using the ggplot and sf packages:

gg_kfs()

As with all the bangarang functions, see this function’s documentation for changing the geographic range, color, and transparency settings.

?gg_kfs

Datasets

Land

data(kfs_land)

kfs_land %>% class

## [1] "SpatialPolygons"
## attr(,"package")
## [1] "sp"

kfs_land %>% glimpse

## Formal class 'SpatialPolygons' [package "sp"] with 4 slots
##   ..@ polygons   :List of 1
##   .. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots
##   ..@ plotOrder  : int 1
##   ..@ bbox       : num [1:2, 1:2] -129.9 52.7 -127.5 54.1
##   .. ..- attr(*, "dimnames")=List of 2
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
##   ..$ comment: chr "TRUE"

Snapshot of dataset:

par(mar=c(.1,.1,.1,.1))
kfs_land %>% plot

Seafloor

data(kfs_seafloor)

kfs_seafloor %>% glimpse

## Rows: 375,178
## Columns: 3
## $ x     <dbl> -129.6300, -129.6292, -129.6283, -129.6275, -129.6267, -129.6258…
## $ y     <dbl> 53.55, 53.55, 53.55, 53.55, 53.55, 53.55, 53.55, 53.55, 53.55, 5…
## $ layer <dbl> -0.4261364, -70.2601395, -67.4961166, -85.6188736, -99.3454132, …

Plot it:

ggplot(kfs_seafloor,
       aes(x=x, y=y, color=layer)) + 
  geom_point(size=.1) + 
  xlab(NULL) + ylab(NULL) + labs(color = 'Depth (m)') + 
  theme_minimal()

Proposed LNG shipping routes

data(shiplane)

shiplane %>% glimpse

## Rows: 137
## Columns: 4
## $ PID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ POS <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,…
## $ X   <dbl> -128.6883, -128.6832, -128.6903, -128.6990, -128.7111, -128.7373, …
## $ Y   <dbl> 53.99415, 53.97649, 53.95486, 53.93647, 53.91967, 53.89999, 53.879…

Plot it:

gg_kfs() + 
  geom_path(data=shiplane,
            mapping=aes(x=X, y=Y, group=PID)) + 
  xlab(NULL) + ylab(NULL)

Geostrata

Four main “provinces” referenced in study

data(provinces)

provinces %>% glimpse

## Rows: 4,458
## Columns: 5
## $ PID      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ POS      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
## $ X        <dbl> -129.2988, -129.1944, -129.1999, -129.1978, -129.1965, -129.2…
## $ Y        <dbl> 52.97428, 52.97387, 52.98048, 52.98586, 52.99082, 52.99371, 5…
## $ province <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…

Check it out:

gg_kfs() + 
  geom_polygon(data=provinces,
            mapping=aes(x=X, 
                        y=Y,
                        group = factor(province),
                        fill = factor(province),
                        color = factor(province)),
            alpha=.4) + 
  xlab(NULL) + ylab(NULL) + labs(fill='Province', color='Province')

Eight main channels referenced in study

data(channels)

channels %>% glimpse

## Rows: 4,811
## Columns: 5
## $ PID      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ POS      <int> 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 5…
## $ X        <dbl> -129.3084, -129.1936, -129.1912, -129.1887, -129.1886, -129.1…
## $ Y        <dbl> 52.96084, 52.92536, 52.92495, 52.92681, 52.93126, 52.93364, 5…
## $ province <chr> "caa", "caa", "caa", "caa", "caa", "caa", "caa", "caa", "caa"…

Check it out:

gg_kfs() + 
  geom_polygon(data=channels,
            mapping=aes(x=X, 
                        y=Y,
                        fill = province,
                        color = province),
            alpha=.4) + 
  xlab(NULL) + ylab(NULL)

Rectangular blocks (medium size).

data(kfs_blocks_bbox)

kfs_blocks_bbox %>% glimpse

## Rows: 26
## Columns: 5
## $ id     <chr> "CAAS", "CAAC", "CAAN", "ESTS", "ESTC", "ESTN", "CMPS", "CMPC",…
## $ left   <dbl> -129.3070, -129.5499, -129.5500, -129.6373, -129.7473, -129.761…
## $ right  <dbl> -129.0806, -129.3070, -129.3127, -129.4416, -129.5395, -129.540…
## $ bottom <dbl> 52.76424, 52.77458, 52.95501, 53.04410, 53.11001, 53.17002, 52.…
## $ top    <dbl> 52.96012, 52.95498, 53.04409, 53.10999, 53.16999, 53.22782, 52.…

Check it out:

gg_kfs() + 
  geom_rect(data=kfs_blocks_bbox,
            mapping=aes(xmin=left, 
                        xmax=right,
                        ymin=bottom,
                        ymax=top,
                        group=id),
            fill=NA,
            color='black') + 
  xlab(NULL) + ylab(NULL)

Rectangular blocks (small size).

data(blocks)

blocks %>% glimpse

## Rows: 54
## Columns: 11
## $ ID       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
## $ name     <chr> "Parker Pass", "Central Loredo", "North Loredo", "West Rennis…
## $ sq.km    <dbl> 78.59, 44.82, 45.17, 38.56, 15.63, 97.74, 12.16, 51.55, 28.96…
## $ left     <dbl> -129.4811, -129.2744, -129.2737, -129.4814, -129.3482, -129.6…
## $ right    <dbl> -129.2744, -129.0512, -129.1175, -129.3482, -129.2741, -129.4…
## $ top      <dbl> 52.81542, 52.81563, 52.85918, 52.85939, 52.85939, 52.91180, 5…
## $ bottom   <dbl> 52.75271, 52.75271, 52.81563, 52.81563, 52.81542, 52.81563, 5…
## $ width    <dbl> 0.20668030, 0.22315979, 0.15621185, 0.13320923, 0.07415771, 0…
## $ height   <dbl> 0.06270989, 0.06291739, 0.04355275, 0.04376004, 0.04396754, 0…
## $ center.x <dbl> -129.3777, -129.1628, -129.1956, -129.4148, -129.3111, -129.5…
## $ center.y <dbl> 52.78407, 52.78417, 52.83740, 52.83751, 52.83740, 52.86371, 5…

Check it out:

gg_kfs() + 
  geom_rect(data=blocks,
            mapping=aes(xmin=left, 
                        xmax=right,
                        ymin=bottom,
                        ymax=top,
                        group=name),
            fill=NA,
            color='black') + 
  xlab(NULL) + ylab(NULL)

Oceanographic stations

data(stations)

stations %>% glimpse

## Rows: 106
## Columns: 11
## $ X         <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ block     <chr> "CAA", "CAA", "CAA", "CAA", "CAA", "CAA", "CAA", "CAA", "CAA…
## $ sta       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1, 2, 3, 4, 5, 6,…
## $ mode      <chr> "stout", "stout", "lite", "stout", "stout", "stout", "lite",…
## $ lat       <dbl> 52.89065, 52.86761, 52.84456, 52.85030, 52.85604, 52.88457, …
## $ long      <dbl> -129.1702, -129.2017, -129.2332, -129.2840, -129.3348, -129.…
## $ color     <chr> "black", "black", "blue", "black", "black", "black", "blue",…
## $ dist2next <dbl> 6.413930, 6.413930, 12.404973, 12.404973, 5.399255, 5.399255…
## $ tlen      <dbl> 3.206965, 3.206965, 6.202486, 6.202486, 2.699627, 2.699627, …
## $ dist.nm   <dbl> 2.063725, 2.064170, 2.155478, 2.155202, 2.063981, 2.063863, …
## $ dist.km   <dbl> 3.822019, 3.822842, 3.991944, 3.991435, 3.822493, 3.822274, …

Check it out:

gg_kfs() + 
  geom_point(data=stations,
            mapping=aes(x=long, 
                        y=lat,
                        group = block),
            alpha=.8) + 
  xlab(NULL) + ylab(NULL) + labs(group='Waterway')

Effort

All survey effort aboard the Bangarang:

data(effort)

effort %>% glimpse

## Rows: 25,526
## Columns: 27
## $ date         <dttm> 2013-07-06 07:21:38, 2013-07-06 07:22:01, 2013-07-06 07:…
## $ lat          <dbl> 53.38721, 53.38750, 53.38818, 53.38875, 53.38947, 53.3902…
## $ lon          <dbl> -129.1697, -129.1698, -129.1700, -129.1703, -129.1703, -1…
## $ knots        <dbl> 2.7, 2.5, 2.4, 2.2, 2.5, 4.3, 2.8, 3.2, 1.7, 1.3, 0.7, 0.…
## $ hdg          <dbl> NA, 355.1, 325.5, NA, NA, NA, NA, 347.7, 342.2, 335.8, 34…
## $ circuit      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ effort       <chr> "casual", "casual", "casual", "casual", "casual", "casual…
## $ echo         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ obs1         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ pos1         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ obs2         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ pos2         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ obs3         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ pos3         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ bft          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ perc_cloud   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ vis          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ precip       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ glare_L      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ glare_R      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ wind_hdg_raw <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ wind_kph_raw <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ air_temp     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ ss_temp      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ ss_sal       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ seconds      <dbl> 23, 59, 61, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 6…
## $ km           <dbl> 0.03287696, 0.07681844, 0.06633145, 0.08024583, 0.0842948…

Map overview:

gg_kfs() + 
  geom_point(data=effort,
            mapping=aes(x=lon, 
                        y=lat,
                        color = effort),
            alpha=.4,
            size=.2) + 
  xlab(NULL) + ylab(NULL)

Show each year separately:

gg_kfs() + 
  geom_point(data=effort,
            mapping=aes(x=lon, 
                        y=lat,
                        color = effort),
            alpha=.4,
            size=.2) + 
  facet_wrap(~lubridate::year(date)) + 
  xlab(NULL) + ylab(NULL) + 
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

Show each circuit in 2015 separately, systematic transect effort only:

# Filter to transect effort only
transects <- 
  effort %>% 
  filter(lubridate::year(date) == 2015,
         effort == 'transect') %>% 
  mutate(group = paste0(lubridate::year(date), ' circuit ', circuit))

# plot it
gg_kfs() + 
  geom_point(data= transects,
             mapping=aes(x=lon, 
                        y=lat),
            alpha=.4,
            size=.2) + 
  facet_wrap(~group) + 
  xlab(NULL) + ylab(NULL) + 
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

Seabirds

data(seabirds)

seabirds %>% glimpse

## Rows: 3,668
## Columns: 30
## $ datetime     <dttm> 2013-07-06 09:16:18, 2013-07-06 09:19:43, 2013-07-06 09:…
## $ lat          <dbl> 53.38375, 53.38587, 53.38589, 53.40532, 53.40536, 53.4053…
## $ lon          <dbl> -129.1378, -129.1364, -129.1364, -129.0073, -129.0074, -1…
## $ effort       <chr> "transect", "transect", "transect", "transect", "transect…
## $ bft          <dbl> 0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
## $ precip       <chr> "clear", "clear", "clear", "clear", "clear", "clear", "cl…
## $ knots        <chr> NA, NA, NA, NA, NA, "4", "4", "4", "4", NA, NA, NA, NA, N…
## $ hdg          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ wind_kph_raw <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ wind_hdg_raw <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ zone         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ line         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ side         <chr> "HELM", NA, "PORT", "STAR", NA, "HELM", NA, "PORT", "PORT…
## $ best         <dbl> 2, 1, 1, 2, 2, 6, 1, 1, 2, 2, 2, 2, 4, 1, 2, 1, 2, 1, 1, …
## $ min          <dbl> 2, 1, 1, 2, 2, 6, 1, 1, 2, 2, 2, 2, 4, 1, 2, 1, 2, 1, 1, …
## $ max          <dbl> 2, 1, 1, 2, 2, 6, 1, 1, 2, 2, 2, 2, 4, 1, 2, 1, 2, 1, 1, …
## $ feed         <chr> "N", "Y", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N…
## $ motion       <chr> "FLY", "SIT", "FLY", "FLY", "SIT", "FLY", "RAFT", "FLY", …
## $ dir          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ height       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ sp1          <chr> "PIGU", "MAMU", "WHCG", "MAMU", "MAMU", "MAMU", "WEGU", "…
## $ per1         <dbl> 100, 100, 100, 100, 100, 100, 100, 0, 0, 0, 0, 0, 0, 0, 0…
## $ plum1        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ sp2          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ per2         <dbl> 0, 0, 0, 0, 0, 0, 0, 100, 100, 100, 100, 100, 100, 100, 1…
## $ plum2        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ sp3          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ per3         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ plum3        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ date         <date> 2013-07-06, 2013-07-06, 2013-07-06, 2013-07-06, 2013-07-…

Map it:

gg_kfs() + 
  geom_point(data=seabirds,
            mapping=aes(x=lon, 
                        y=lat,
                        color = sp1,
                        size = best),
            alpha=.4) + 
  xlab(NULL) + ylab(NULL)

Column definitions:

date: Date and time, in format YYYY-MM-DD HH:MM:SS .
lat: Latitude in decimal degrees.
lon: Longitude in decimal degrees.
effort: Code for search effort: "off" = opportunistic sighting occurring while off search effort; "casual" = opportunistic sighting occurring while transiting area while off seaerch effort; "station" = opportunistic sighting while conducting sampling at oceanographic stations; "with whale" = opportunistic sighting while conducting focal follows of whales; "transect" = seabird detection during systematic line-transect effort. Analyses focused on density/abundance estimation should filter to "transect" effort only.
bft: Beaufort sea state, ranging from 0 (glassy calm) to 4 (white caps everywhere); this may be a useful covariate in a density/abundance model.
precip: Precipitation state; another potentially useful covariate.
knots: Ship speed in knots.
hdg: Ship heading (range = 0 - 360 degrees).
wind_kph_raw: Apparent wind speed, in kilometers per hour, as received by the shipboard weather station, unadjusted for ship speed or heading.
wind_hdg_raw: Apparent wind heading, in degrees, as received by the shipboard weather station, unadjusted for ship speed or heading.
zone: Category of distance from the vessel’s track line, as estimated by a handheld rangefinder: Zone "1" is 0 - 75m, Zone "2" is 75m - 150m. The zone "OUT" means the bird was estimated to be beyond 150m. The zone "221" means that the bird was originally seen in Zone 2 but it approached as near as Zone 1 (either due to bird movement or ship approaching the bird). We recorded this note to test for a couple things: (1) If more birds are being observed in Zone 1 vs. Zone 2, that may mean that we are missing some birds in Zone 2 and our assumption that our strip width is 150m on either side of the vessel may be problematic; and (2) If more birds are being observed in Zone 2 than Zone 1, that may mean that birds are avoiding the vessel and some could even be flushing beyond Zone 2, which could also be problematic for density estimation.
line: An indication of whether there was uncertainty about which zone the bird occurred within; if MIDL, the bird occurred right on the line between Zone 1 and Zone 2 by the observer’s estimation; if OUTR, the bird occurred right on the line between Zone 2 and beyond. If your analysis assumed a strip width of 150m on each side of the vessel, the sightings marked OUTR may be of special interest: how drastically would your density estimates change if those sightings were included or excluded?
side: The side of the vessel (PORT or STARBOARD) the bird was seen on. If the value is HELM, that means the researcher at the data entry position at the helm detected the sighting. Likely not relevant to basic density/abundance estimation analyses.
best: Best estimate of total flock size.
min: Minimum estimate of total flock size.
max: Maximum estimate of total flock size.
feed: Indication of whether or not the birds are feeding: "M" = Maybe; "N" = No; "S" = Some; "W" = Unknown; "Y" = Yes.
motion: Indication of bird’s motion behavior: "FLUSH" = Birds were originally sitting but were flushed (either dove or flew off) in response to the research vessel; "FLY" = Birds were in flight; "FOLO" = Birds are following the research vessel (problematic for recounting!); "RAFT" = Birds were rafting on surface debris, such as floating kelp or logs; "SIT" = Birds were sitting on the water.
dir: Indication of bird’s flight direction (if flying); cardinal directions are typically used ("N", "NE", "E", "SE", "S", "SW", "W", "NW") with the exception of birds circling the vessel ("CIR" or "CR").
height: Estimate of bird’s flight height above sea level, in meters. This is relevant because wind speed increases predictably with height above sea level.
sp1: Four-letter species code for primary (and possibly only) species in the flock.
per1: Percentage of total flock size that sp1 comprises.
plum1: Primary plumage state for sp1. Note that, in some cases, if a single species occurs in several different plumage state, the same species might be entered as sp1, sp2, and even sp3. Codes: "AB" or "ABE" = Adult breeding, either/both sex(es); "ABF" = Adult breeding, female; "ABM" = Adult breeding, male; "ANE" = Adult non-breeding, either/both sex(es); "FLE" = Fledgling; "JV" or "JVE" = Juvenile; "MX" = Mixture of unspecified plumages present; "NBE" = Non-breeding, either/both sex(es); "OTH" = Other unspecified plumage; "SA" = Sub-adult plumage; "AAE" or "AAM" = unknown.
sp2: If this is a mixed-species flock with two or more species present, this is the four-letter species code for the second species.
per2: Percentage of total flock size that sp2 comprises, if present.
plum2: Primary plumage state for sp2, if present.
sp3: If this is a mixed-species flock with three species present, this is the four-letter species code for the third species.
per3: Percentage of total flock size that sp3 comprises, if present.
plum3: Primary plumage state for sp3, if present.

Salmon

data(salmon)

salmon %>% glimpse

## Rows: 1,033
## Columns: 14
## $ date    <dttm> 2013-07-06 07:56:28, 2013-07-06 08:25:17, 2013-07-06 09:12:13…
## $ lat     <dbl> 53.38662, 53.37430, 53.37436, 53.37551, 53.38709, 53.38761, 53…
## $ lon     <dbl> -129.1690, -129.1522, -129.1522, -129.1383, -129.1353, -129.13…
## $ effort  <chr> "off", "casual", "transect", "transect", "transect", "transect…
## $ echo    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ circuit <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ bft     <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ precip  <chr> NA, "clear", "clear", "clear", "clear", "clear", "clear", "cle…
## $ knots   <dbl> NA, NA, 4.0, 4.0, 5.8, 5.8, 5.8, 5.8, 5.8, 5.8, 5.8, 4.0, NA, …
## $ hdg     <dbl> NA, NA, 357, 357, NA, NA, NA, NA, NA, NA, NA, 49, NA, NA, NA, …
## $ jumps   <dbl> 1, 5, 1, 1, 2, 1, 2, 1, 0, 1, 1, 2, 2, NA, 1, 1, 1, 1, 1, 3, N…
## $ zone    <dbl> NA, NA, NA, NA, NA, 12, NA, 12, 12, 12, 12, 12, 12, 12, NA, 12…
## $ line    <chr> "MIDL", NA, NA, NA, NA, NA, NA, NA, NA, "MIDL", "MIDL", NA, NA…
## $ side    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

Map it;

gg_kfs() + 
  geom_point(data=salmon,
            mapping=aes(x=lon, 
                        y=lat,
                        size = jumps),
            alpha=.4) + 
  xlab(NULL) + ylab(NULL)

ggplot(salmon %>% 
         filter(zone < 3) %>% 
         mutate(Zone = factor(zone)),
       aes(x=Zone)) + 
  geom_bar(stat='count')

Whales

data(whale_sightings)

whale_sightings %>% glimpse

## Rows: 549
## Columns: 15
## $ day      <dbl> 20130706, 20130706, 20130713, 20130713, 20130713, 20130713, 2…
## $ datetime <dttm> 2013-07-06 09:51:38, 2013-07-06 11:06:33, 2013-07-13 13:03:4…
## $ x        <dbl> -129.1201, -129.1180, -129.2433, -129.2839, -129.3088, -129.2…
## $ y        <dbl> 53.42348, 53.42000, 53.33910, 53.35212, 53.31438, 53.28156, 5…
## $ distance <dbl> NA, NA, NA, NA, 0.37324576, 0.43999272, NA, NA, NA, NA, NA, N…
## $ size     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2…
## $ spp      <chr> "HW", "HW", "HW", "HW", "HW", "BW", "BW", "BW", "BW", "HW", "…
## $ block    <chr> "VER", "VER", "WRI", "WRI", "WRI", "WRI", "WRI", "WRI", "WRI"…
## $ bft      <dbl> 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1…
## $ seg_id   <dbl> 2, 3, 26, 26, 28, 29, 29, 29, 29, 29, 29, 30, 30, 33, 34, 34,…
## $ z        <dbl> 273.0400, 315.2533, 508.5700, 511.5049, 534.4100, 507.9533, 5…
## $ zmin     <dbl> 0.0088041, 0.0199857, 433.0874939, 119.7148972, 375.9632874, …
## $ zmax     <dbl> 324.5167, 342.9700, 514.0425, 522.6160, 538.5000, 542.8300, 5…
## $ zsd      <dbl> 86.327932, 59.499729, 28.563368, 90.137482, 24.123574, 164.44…
## $ zrange   <dbl> 324.50789, 342.95002, 80.95499, 402.90113, 162.53671, 542.811…

gg_kfs() + 
  geom_point(data=whale_sightings,
            mapping=aes(x=x, 
                        y=y,
                        color = spp,
                        size = size),
            alpha=.4) + 
  xlab(NULL) + ylab(NULL)

Oceanography

This dataset is an interpolated grid of oceanographic values for each circuit in each year. Some variables are from the thermosalinograph we had running at the surface during transects, some are from the echosounder we used while underway, and others are from the Seabird Electronics CTD we used at the grid of oceanographic stations.

data(ocean)

ocean %>% glimpse

## Rows: 1,488,101
## Columns: 7
## $ year    <chr> "2013", "2013", "2013", "2013", "2013", "2013", "2013", "2013"…
## $ circuit <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1…
## $ metric  <chr> "MLD", "MLD", "MLD", "MLD", "MLD", "MLD", "MLD", "MLD", "MLD",…
## $ lat     <dbl> 53.54780, 53.54780, 53.54780, 53.54780, 53.54780, 53.54780, 53…
## $ lon     <dbl> -129.0307, -129.0262, -129.0218, -129.0173, -129.0128, -129.00…
## $ value   <dbl> 1.4525017, 1.1519441, 1.1362478, 1.1260875, 0.8683625, 1.11903…
## $ color   <chr> "#A80000", "#930000", "#930000", "#930000", "#930000", "#93000…

Here are the variables included:

ocean$metric %>% unique %>% sort

##  [1] "BSM"       "Chl.max"   "Chl.max.z" "Chl.sum"   "Int033"    "Int200"   
##  [7] "MLD"       "Sbot"      "Sdeep"     "Secchi"    "SSS"       "SST"      
## [13] "Stop"      "Str.all"   "Str.bot"   "Str.top"   "Tbot"      "TC.str"   
## [19] "TC.z"      "Tdeep"     "Ttop"

As an example, say we wanted a map of the sea surface salinity (SSS) for each circuit of the 2015 season.

# Filter the dataset
sss <- 
  ocean %>% 
  filter(metric == 'SSS', 
         year==2015)

# Map it
gg_kfs() + 
  geom_point(data=sss,
             mapping=aes(x=lon, 
                         y=lat,
                         color=value),
             size=.05) + 
  facet_wrap(~circuit) +
  xlab(NULL) + ylab(NULL) + 
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) + 
  scale_color_gradientn(colours = rev(rainbow(5)))

As another example, here is integrated 200kHz backscatter (which is a useful proxy for euphausiids) in the two circuits completed in 2014:

# Filter the dataset
krill <- 
  ocean %>% 
  filter(metric == 'Int200')

# Map it
gg_kfs() + 
  geom_point(data=krill,
             mapping=aes(x=lon, 
                         y=lat,
                         color=value),
             size=.05) + 
  facet_wrap(~circuit) +
  xlab(NULL) + ylab(NULL) + 
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) + 
  scale_color_gradientn(colours = rev(rainbow(5)), 
                        trans='log', breaks=c(2.5, 25, 250, 2500))

Helper functions

`in_block()`

A way to see which geostratum a pair of coordinates is located within.

in_block(x= -129.4, y=53.2)

## [1] "-SQUN"

in_block(x= -129.2, y=52.9)

## [1] "-CAAS"

`in_kfs()`

A quick test to see if coordinates properly within the water within the boundaries of the KFS (as defined by the Bangarang project). The function returns the dataset with a new column, inkfs:

test <- in_kfs(seabirds %>% rename(x=lon, y=lat),
                 toplot = TRUE)

test$inkfs %>% table

## .
## FALSE  TRUE 
##     6  3616

`in_water()`

A quick test to see if coordinates properly within the water within the water (and not on land by mistake). The function returns the dataset with a new column, valid:

test <- in_water(seabirds %>% rename(x=lon, y=lat),
                 toplot = TRUE)

test$valid %>% table

## .
## TRUE 
## 3589

`whale_map()`

Calculates the true position of a sighting within the KFS, in offshore or confined coastal waters, accounting for whether or not the observer is using the horizon or a shoreline as the basis for the reticle reading.

The X and Y you supply is the observer’s location (either a boat or a stationary field station).

whalemap(X= -129.2, 
         Y=52.9,
         bearing = 313,
         reticle = 0.2,
         eye.height = 2.1,
         vessel.hdg = 172,
         toplot=TRUE)

## $X
## [1] -129.2168
## 
## $Y
## [1] 52.90468
## 
## $radial.dist
## [1] 1.243924
## 
## $boundary
## [1] "horizon"
## 
## $boundary.dist
## [1] 5.172833
## 
## $perp.dist
## [1] 1.046189
## 
## $track.bearing
## [1] 122.75

Seabird density estimation

To showcase some other functions in the bangarang package, here is an example workflow for estimating the density/abundance of a seabird from the Bangarang study area. I use a density-surface-modeling (DSM) approach based on the R package dsm (Github). In a DSM, species density is estimated for each point in a grid of your study area. The bangarang package provides a 1-km² grid as a built-in dataset:

data(grid, package='bangarang')
grid %>% head

## # A tibble: 6 × 14
##      y1    y2     y    x1    x2     x   km2    id     z         zmin  zmax   zsd
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>        <dbl> <dbl> <dbl>
## 1  52.8  52.8  52.8 -129. -129. -129.  1.02   466 164.   0.000000500 210.  59.9 
## 2  52.8  52.8  52.8 -129. -129. -129.  1.02   467 223.  12.5         238.  44.0 
## 3  52.8  52.8  52.8 -129. -129. -129.  1.02   475  45.0  0            79.5 21.5 
## 4  52.8  52.8  52.8 -129. -129. -129.  1.02   476  68.7 45.8          84.8  6.76
## 5  52.8  52.8  52.8 -129. -129. -129.  1.02   477  77.2 69.2          95.6  6.13
## 6  52.8  52.8  52.8 -129. -129. -129.  1.02   478  97.6 81.8         126.  11.2 
## # ℹ 2 more variables: zrange <dbl>, block <chr>

Here is a map of the centroid of each grid cell:

gg_kfs() + 
  geom_point(data=grid, mapping=aes(x=x, y=y), size=.2)

The package also offers a dataset of systematic transect effort split into 5-km segments, a common length in habitat modeling:

data(segments_5km)

# Rename
segments <- segments_5km

# Check out
segments %>% head

## # A tibble: 6 × 18
##   Sample.Label Effort     x     y datetime             year      day block     z
##          <dbl>  <dbl> <dbl> <dbl> <dttm>              <dbl>    <dbl> <chr> <dbl>
## 1            1   5.00 -129.  53.4 2013-07-06 09:09:01  2013 20130706 VER    330.
## 2            2   5.01 -129.  53.4 2013-07-06 09:55:43  2013 20130706 VER    316.
## 3            3   5.00 -129.  53.4 2013-07-06 11:19:48  2013 20130706 VER    220.
## 4            4   5.00 -129.  53.5 2013-07-06 12:14:16  2013 20130706 VER    168.
## 5            5   5.01 -129.  53.5 2013-07-06 14:11:25  2013 20130706 VER    102.
## 6            6   5.00 -129.  53.5 2013-07-06 14:42:32  2013 20130706 VER    131.
## # ℹ 9 more variables: zmin <dbl>, zmax <dbl>, zsd <dbl>, zrange <dbl>,
## #   yday <dbl>, month <dbl>, circuit <dbl>, circuit_name <fct>,
## #   circuit_yday <dbl>

# View centroid of each segment
gg_kfs() + 
  geom_point(data=segments, mapping=aes(x=x, y=y), alpha=.5)

Sitting & flushed birds

We then filter the built-in sightings dataset, "seabirds", to our species of interest. In this case, marbled murrelet (4-letter code: “MAMU”). We will analyze sitting/flushed birds separately from flying birds, since the former will be relevant to subsequent foraging preference analyses.

data(seabirds)

mamu <-
  seabirds %>%
  filter(effort == 'transect') %>%
  filter(zone != 'OUT') %>%
  # filter to any sighting that had MAMU in any of the three species slots
  filter(sp1 == 'MAMU' | sp2 == 'MAMU' | sp3 == 'MAMU') %>%
  filter(motion %in% c('SIT', 'FLUSH')) %>%
  # provide a single authoritative species ID
  mutate(spp = 'MAMU')

# Check sample size
mamu %>% nrow

## [1] 235

# Review data
mamu %>% as.data.frame %>% head

##              datetime      lat       lon   effort bft precip knots   hdg
## 1 2014-08-11 12:42:14 53.42632 -129.0965 transect   3   haze   5.1 270.3
## 2 2014-08-11 13:57:59 53.40101 -129.1281 transect   1   haze   6.4 189.3
## 3 2014-08-12 11:24:55 53.28757 -129.4219 transect   0  clear   5.1 108.6
## 4 2014-08-14 08:37:25 53.09309 -129.1341 transect   0  clear     5  69.9
## 5 2014-08-20 08:16:01 53.18610 -129.5766 transect   1  clear   5.9 265.2
## 6 2014-08-20 11:32:14 53.10107 -129.5686 transect   1  clear   5.3 193.4
##   wind_kph_raw wind_hdg_raw zone line side best min max feed motion  dir height
## 1      29.9316        163.3    1 <NA> PORT    2   2   2    M    SIT <NA>     NA
## 2       5.4000        351.2    1 <NA> STAR    3   3   3    M    SIT <NA>     NA
## 3       6.6000         10.3    2 <NA> STAR    1   1   1    M  FLUSH <NA>     NA
## 4       7.2000          2.3    1 <NA> PORT    1   1   1    N    SIT <NA>     NA
## 5      10.2000          9.3    2 <NA> STAR    1   1   1    N    SIT <NA>     NA
## 6       9.7000         28.7    1 <NA> STAR    1   1   1    M    SIT <NA>     NA
##    sp1 per1 plum1  sp2 per2 plum2  sp3 per3 plum3       date  spp
## 1 MAMU  100  <NA> <NA>    0  <NA> <NA>    0  <NA> 2014-08-11 MAMU
## 2 MAMU  100    AB <NA>    0  <NA> <NA>    0  <NA> 2014-08-11 MAMU
## 3 MAMU  100    SA <NA>    0  <NA> <NA>    0  <NA> 2014-08-12 MAMU
## 4 MAMU  100    SA <NA>    0  <NA> <NA>    0  <NA> 2014-08-14 MAMU
## 5 MAMU  100    AB <NA>    0  <NA> <NA>    0  <NA> 2014-08-20 MAMU
## 6 MAMU  100  <NA> <NA>    0  <NA> <NA>    0  <NA> 2014-08-20 MAMU

# Map data
gg_kfs() + 
  geom_point(data=mamu, mapping=aes(x=lon, y=lat, size=best), alpha=.5)

We now need to determine which effort segment each sighting occurred during. To do so, use the function segment_sync(), which uses time differences to find the segment that overlaps with each sighting’s timestamp:

# Sync segment numbers to sightings
mamu <- segment_sync(segments = segments,
                     sightings = mamu)

We then use the format_observation_data() function to transform the mamu dataset to meet formatting requirements for the DSM:

observation_data <- format_observation_data(mamu)

observation_data %>% as.data.frame %>% head

##              datetime      lat       lon   effort bft precip knots   hdg
## 1 2014-08-11 12:42:14 53.42632 -129.0965 transect   3   haze   5.1 270.3
## 2 2014-08-11 13:57:59 53.40101 -129.1281 transect   1   haze   6.4 189.3
## 3 2014-08-12 11:24:55 53.28757 -129.4219 transect   0  clear   5.1 108.6
## 4 2014-08-14 08:37:25 53.09309 -129.1341 transect   0  clear     5  69.9
## 5 2014-08-20 08:16:01 53.18610 -129.5766 transect   1  clear   5.9 265.2
## 6 2014-08-20 11:32:14 53.10107 -129.5686 transect   1  clear   5.3 193.4
##   wind_kph_raw wind_hdg_raw zone line side best min max feed motion  dir height
## 1      29.9316        163.3    1 <NA> PORT    2   2   2    M    SIT <NA>     NA
## 2       5.4000        351.2    1 <NA> STAR    3   3   3    M    SIT <NA>     NA
## 3       6.6000         10.3    2 <NA> STAR    1   1   1    M  FLUSH <NA>     NA
## 4       7.2000          2.3    1 <NA> PORT    1   1   1    N    SIT <NA>     NA
## 5      10.2000          9.3    2 <NA> STAR    1   1   1    N    SIT <NA>     NA
## 6       9.7000         28.7    1 <NA> STAR    1   1   1    M    SIT <NA>     NA
##    sp1 per1 plum1  sp2 per2 plum2  sp3 per3 plum3                date  spp
## 1 MAMU  100  <NA> <NA>    0  <NA> <NA>    0  <NA> 2014-08-11 12:42:14 MAMU
## 2 MAMU  100    AB <NA>    0  <NA> <NA>    0  <NA> 2014-08-11 13:57:59 MAMU
## 3 MAMU  100    SA <NA>    0  <NA> <NA>    0  <NA> 2014-08-12 11:24:55 MAMU
## 4 MAMU  100    SA <NA>    0  <NA> <NA>    0  <NA> 2014-08-14 08:37:25 MAMU
## 5 MAMU  100    AB <NA>    0  <NA> <NA>    0  <NA> 2014-08-20 08:16:01 MAMU
## 6 MAMU  100  <NA> <NA>    0  <NA> <NA>    0  <NA> 2014-08-20 11:32:14 MAMU
##   Sample.Label object size distance
## 1          169      1    2       75
## 2          170      2    3       75
## 3          174      3    1      150
## 4          186      4    1       75
## 5          211      5    1      150
## 6          215      6    1       75

Since we used strip-width sampling in seabird surveys aboard the Bangarang, we will create a dummy detection function that stands in for this aspect of the DSM analysis:

dummy_ds <- create_dummy_detection_function(observation_data)

We can now develop our DSM using the function dsm() from package dsm. Here is a basic example that models density according to a spline interaction of longitude and latitude. (The convert.units input handles the fact that, in our seabirds dataset, distance is specified in meters instead of kilometers.).

library(dsm)

dsm_basic <-
  dsm(formula = density.est ~ s(x,y),
      ddf.obj = dummy_ds,
      convert.units=1/1000,
      segment.data = segments,
      observation.data = observation_data)

Here is a summary of that model:

dsm_basic %>% summary

## 
## Family: quasipoisson 
## Link function: log 
## 
## Formula:
## density.est ~ s(x, y)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.0613     0.2508   -8.22 1.01e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df    F p-value    
## s(x,y) 22.97  26.46 14.3  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.168   Deviance explained = 51.5%
## -REML = 726.52  Scale est. = 5.2335    n = 713

We then use that DSM to predict murrelet density on our spatial grid:

d <- predict_dsm_kfs(dsm_basic, segments, grid)

That function, predict_dsm_kfs(), returns a vector of density estimates (birds km^-2) equal in length to the number of rows in the grid data.frame.

gg_kfs() + 
  geom_point(data=grid %>% mutate(ds=d),
             mapping=aes(x=x, y=y, color=ds), pch=15, size=2) + 
  scale_color_viridis_c(trans='log', breaks=c(0.02, .2, 2, 20)) + 
  labs(color = 'Density')

Model selection

To improve the DSM model, you can introduce covariates from your segments dataset. The bangarang package provides a function, dsm_comps(), that will use AICc to rank candidate models for all possible combinations of the supplied covariates. The analyst can then decide which formula to bring into the density estimation stage. The assumed distribution family is "tweedie".

dsm_comps <- 
  dsm_compare(segments,
              observation_data,
              dummy_ds,
              candidate_splines = c('s(yday)', 's(z)', 'year'),
              base_formula = 'density.est ~ s(x,y)')

That process produces the following summary:

dsm_comps$dsm

##   n                                        formula model_id     AICc  dev.expl
## 1 3 density.est ~ s(x,y)  +  s(yday) + s(z) + year        7 1136.953 0.6915890
## 2 2        density.est ~ s(x,y)  +  s(yday) + year        5 1151.730 0.6579551
## 3 2        density.est ~ s(x,y)  +  s(yday) + s(z)        4 1152.788 0.6723841
## 4 1               density.est ~ s(x,y)  +  s(yday)        1 1165.685 0.6379926
## 5 2           density.est ~ s(x,y)  +  s(z) + year        6 1193.288 0.5885648
## 6 1                  density.est ~ s(x,y)  +  year        3 1203.471 0.5496048
## 7 1                  density.est ~ s(x,y)  +  s(z)        2 1252.977 0.4971941

Based on this summary, we can get details on the most parsimonious model:

dsm_comps$objects[[7]] %>% summary

## 
## Family: Tweedie(p=1.374) 
## Link function: log 
## 
## Formula:
## density.est ~ s(x, y) + s(yday) + s(z) + year
## 
## Parametric coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2269.8085   504.1799  -4.502 7.93e-06 ***
## year            1.1254     0.2503   4.497 8.11e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df      F p-value    
## s(x,y)  18.881 23.395 10.226 < 2e-16 ***
## s(yday)  7.979  8.676  9.616 < 2e-16 ***
## s(z)     6.126  7.264  3.144 0.00263 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.63   Deviance explained = 69.2%
## -REML = 586.03  Scale est. = 4.4529    n = 712

And we will save that model object as our model to keep:

dsm_keep <- dsm_comps$objects[[7]]

Our best-fit model involves date components. So, to produce the point estimate, we will use a wrapper function, predict_circuits(), to predict the density surface during each circuit of the study area. We completed a total of 9 circuits whose information is contained in the built-in dataset data(circuits):

data(circuits)

circuits %>% head

## # A tibble: 6 × 7
##   circuit  year begin               end                 name       
##     <dbl> <dbl> <dttm>              <dttm>              <chr>      
## 1       1  2013 2013-07-06 00:01:00 2013-07-27 23:59:00 Jul '13    
## 2       2  2013 2013-08-10 00:01:00 2013-09-04 23:59:00 Aug '13    
## 3       3  2014 2014-08-11 00:01:00 2014-08-28 23:59:00 Aug '14    
## 4       4  2014 2014-08-29 00:01:00 2014-09-14 23:59:00 Sep ' 14   
## 5       5  2015 2015-05-24 00:01:00 2015-06-14 23:59:00 May-Jun '15
## 6       6  2015 2015-06-15 00:01:00 2015-07-10 23:59:00 Jun-Jul '15
## # ℹ 2 more variables: middate <dttm>, middate_yday <dbl>

We now rerun the model:

d_circuits <- 
  predict_circuits(dsm_keep, 
                   segments,
                   grid,
                   toplot=FALSE)

We can use the function point_summary() to calculate average density across the DSM:

point_summary(d_circuits)

## # A tibble: 9 × 6
##    year circuit circuit_name circuit_yday  dmean    dsd
##   <dbl>   <dbl> <fct>               <dbl>  <dbl>  <dbl>
## 1  2013       1 Jul '13               198 0.369   2.23 
## 2  2013       2 Aug '13               235 0.0363  0.219
## 3  2014       3 Aug '14               232 0.300   1.81 
## 4  2014       4 Sep ' 14              249 0.174   1.05 
## 5  2015       5 May-Jun '15           155 1.26    7.64 
## 6  2015       6 Jun-Jul '15           179 3.95   23.9  
## 7  2015       7 Late Jul '15          202 7.16   43.3  
## 8  2015       8 Aug '15               226 0.934   5.65 
## 9  2015       9 Sep '15               252 0.498   3.01

Here is a plot of it:

gg_kfs() + 
  geom_point(data=d_circuits %>% mutate(ds=d),
             mapping=aes(x=x, y=y, color=ds), pch=15) + 
  scale_color_viridis_c(trans='log', breaks=c(0.01, .1, 1, 10, 100)) + 
  facet_wrap(~circuit_name) + 
  xlab(NULL) + ylab(NULL) + labs(color='Density')

To convert this DSM density estimate to a total abundance estimate, we take the average density of the grid and multiply it by the study area:

point_summary(d_circuits) %>% 
  mutate(N = round(dmean * sum(grid$km2)))

## # A tibble: 9 × 7
##    year circuit circuit_name circuit_yday  dmean    dsd     N
##   <dbl>   <dbl> <fct>               <dbl>  <dbl>  <dbl> <dbl>
## 1  2013       1 Jul '13               198 0.369   2.23    525
## 2  2013       2 Aug '13               235 0.0363  0.219    52
## 3  2014       3 Aug '14               232 0.300   1.81    427
## 4  2014       4 Sep ' 14              249 0.174   1.05    247
## 5  2015       5 May-Jun '15           155 1.26    7.64   1796
## 6  2015       6 Jun-Jul '15           179 3.95   23.9    5619
## 7  2015       7 Late Jul '15          202 7.16   43.3   10186
## 8  2015       8 Aug '15               226 0.934   5.65   1328
## 9  2015       9 Sep '15               252 0.498   3.01    708

Variance estimation

To quantify the uncertainty in our point estimate, the bangarang package provides a bootstrap function.

First, we store the formula used in the best-fit DSM:

(dsm_formula <- summary(dsm_keep)$formula)

## density.est ~ s(x, y) + s(yday) + s(z) + year

Next, we can use the function bootstrapper_wrapper() to repeat the circuit density prediction routine with iteratively resampled versions of the segments and sightings data. (The norm is typically 1,000 or more bootstrap iterations or more – prepare for a multi-hour run! I recommend trying this out with just 50 iterations at first to confirm that the results make sense before investing in the full-fledged run – bird pun intended!).

bootstraps <-
  bootstrapper_wrapper(dsm_formula,
                       segments,
                       observation_data,
                       grid,
                       group_draw = TRUE,
                       on_the_line = FALSE,
                       B = 1000,
                       output_filename = NULL)

load('demo_boots.rds')

(See the documentation for that function for more help.)

These results be summarized using the function bootstrapper_summary():

bootstrapper_summary(bootstraps)

## # A tibble: 9 × 7
##    year circuit circuit_name circuit_yday     SD     L95    U95
##   <dbl>   <dbl> <fct>               <dbl>  <dbl>   <dbl>  <dbl>
## 1  2013       1 Jul '13               198  1.31  0.0657   4.00 
## 2  2013       2 Aug '13               235  0.191 0.00353  0.434
## 3  2014       3 Aug '14               232  1.12  0.0557   2.64 
## 4  2014       4 Sep ' 14              249  0.668 0.0299   1.83 
## 5  2015       5 May-Jun '15           155  6.35  0.303   13.4  
## 6  2015       6 Jun-Jul '15           179 14.4   1.11    38.4  
## 7  2015       7 Late Jul '15          202 16.6   1.41    48.5  
## 8  2015       8 Aug '15               226  3.80  0.240    8.07 
## 9  2015       9 Sep '15               252  2.17  0.0909   5.45

That bootstrap result can be combined with the point estimates as follows:

result <- 
  left_join(point_summary(d_circuits) %>% select(-dsd),
          (bootstrapper_summary(bootstraps) %>% select(circuit_name, SD, L95, U95)),
          by='circuit_name') %>% 
  mutate(CV = SD / dmean)

result

## # A tibble: 9 × 9
##    year circuit circuit_name circuit_yday  dmean     SD     L95    U95    CV
##   <dbl>   <dbl> <fct>               <dbl>  <dbl>  <dbl>   <dbl>  <dbl> <dbl>
## 1  2013       1 Jul '13               198 0.369   1.31  0.0657   4.00   3.56
## 2  2013       2 Aug '13               235 0.0363  0.191 0.00353  0.434  5.27
## 3  2014       3 Aug '14               232 0.300   1.12  0.0557   2.64   3.74
## 4  2014       4 Sep ' 14              249 0.174   0.668 0.0299   1.83   3.85
## 5  2015       5 May-Jun '15           155 1.26    6.35  0.303   13.4    5.03
## 6  2015       6 Jun-Jul '15           179 3.95   14.4   1.11    38.4    3.64
## 7  2015       7 Late Jul '15          202 7.16   16.6   1.41    48.5    2.32
## 8  2015       8 Aug '15               226 0.934   3.80  0.240    8.07   4.07
## 9  2015       9 Sep '15               252 0.498   2.17  0.0909   5.45   4.35

This result can be plotted as follows:

ggplot(result %>% 
         mutate(circuit_name = factor(circuit_name, levels=circuit_name)),
       aes(x=circuit_name,
           y=dmean,
           ymin=L95,
           ymax=U95)) + 
  geom_errorbar(width=.1, color='grey40') + 
  geom_point(size=2.5) +
  xlab(NULL) + 
  ylab('Density') + 
  theme_light()

Flying birds

Flying birds pose special challenges (see Spear, Nur & Ainley 1992, Spear & Ainley 1997, and this white paper on handling the issues involved). In order to account for the bias caused by the possibliity of repeat-counting a flying bird, we need to perform corrections that rely upon ship speed and direction; bird speed, direction and flight height; and published regressions that predict bird speed for different seabird groups at certain orientations to the wind and at certain heights above the water.

So first we get the flying murrelets in our dataset:

data(seabirds)

mamu <-
  seabirds %>%
  filter(effort == 'transect') %>%
  filter(zone != 'OUT') %>%
  filter(sp1 == 'MAMU' | sp2 == 'MAMU' | sp3 == 'MAMU') %>%
  filter(motion %in% c('FLY')) %>%
  mutate(spp = 'MAMU')

# Sample size
mamu %>% nrow

## [1] 149

To account for the biases introduced by flying bird “flux”, we bring in a built-in dataset that contains flight-wind regression data from Spear & Ainley (1997):

data(flight_groups)

flight_groups %>% head

## # A tibble: 5 × 14
##   flight_group spp   tail_mean tail_sd head_mean head_sd cross_mean cross_sd
##   <chr>        <chr>     <dbl>   <dbl>     <dbl>   <dbl>      <dbl>    <dbl>
## 1 VIII.23      COMU       22.1    11.3      14.5    10         18.8      9.4
## 2 VIII.24      RHAU       19.4     2.5      14.6     1.7       16.3      1.2
## 3 VIII.24      PIGU       19.4     2.5      14.6     1.7       16.3      1.2
## 4 VIII.25      CAAU       19       3.7      13.1     0.9       13.2      1.4
## 5 VIII.26      MAMU       35.2    31.7      22.4     7.7       22.6     12  
## # ℹ 6 more variables: tail_b <dbl>, tail_a <dbl>, head_b <dbl>, head_a <dbl>,
## #   cross_b <dbl>, cross_a <dbl>

We then use that regression data in the function calculate_flight_flux(), which scales group size by the flux correction factor (see this white paper for details):

# Mean group size before flux correction
mamu$best %>% mean(na.rm=TRUE)

## [1] 2.503356

# Calculate flux correction
mamu <- calculate_flight_flux(mamu, flight_groups)

# Apply to group estimates
mamu <- 
  mamu %>% mutate(best = best * correction_factor,
                  min = min * correction_factor,
                  max = max * correction_factor)
                  
# Mean after
mamu$best %>% mean(na.rm=TRUE)

## [1] 0.2621422

Now that flux is accounted for, you can proceed using the same method for sitting birds above.

Total murrelet density will be the sum of sitting and flying birds.

`R` package `bangarang`

Eric Keen Ezell

2025-09-22

Installation

Maps

Datasets

Land

Seafloor

Proposed LNG shipping routes

Geostrata

Four main “provinces” referenced in study

Eight main channels referenced in study

Rectangular blocks (medium size).

Rectangular blocks (small size).

Oceanographic stations

Effort

Seabirds

Salmon

Whales

Oceanography

Helper functions

`in_block()`

`in_kfs()`

`in_water()`

`whale_map()`

Seabird density estimation

Sitting & flushed birds

Model selection

Variance estimation

Flying birds