Chapter 5 Error checking

The most important part of analyzing camera trap data is checking and exploring your data! Based on the projects we have worked on synthesizing multiple datasets from different sources… camera trappers are not doing a very good job of checking for errors.

Working in R makes it possible to rapidly check your data, ideally in almost real time as you collect it. In an ideal world it would be worth downloading the data for your project at least once per month and checking that ‘everything’ looks good. But what constitutes ‘everything’?

5.1 Standardised exploration script

In the Wildlife Coexistence Lab developed a standardized R script to check the data generated by camera trap projects. This script is kept on our WildCO Single Site Exploration GitHub page.

Below we run through the important elements of checking camera trap data, and where they is a coding skill fundamental to the process, we explore it in more detail (a.k.a. skill checks).

Let’s go!

First, open the .Rproj file your created in the course preparation section. Then click

File -> New file -> Rscript (alternatively you can use the R Markdown option if you are comfortable with that)

After the file has opened, immediately save it as ’01_example_error_checking_and_export.R`. We will usually make a new R sheet for each chapter - however the error checking and analysis data creation chapters should be in the same document.

Second, read in our standardized example datasets:

# Load your data 
pro <- read.csv("data/raw_data/example_data/proj.csv", header=T)
img <- read.csv("data/raw_data/example_data/img.csv", header=T)
dep <- read.csv("data/raw_data/example_data/dep.csv", header=T)
cam <- read.csv("data/raw_data/example_data/cam.csv", header=T)

Next, load in the packages we will use in this chapter - we give a brief description of them too. Cut and paste the code block below.

#Load Packages
list.of.packages <- c(
                      "leaflet",       # creates interactive maps
                      "plotly",        # creates interactive plots   
                      "kableExtra",    # Creates interactive tables 
                      "tidyr",         # A package for data manipulation
                      "dplyr",         # A package for data manipulation
                      "viridis",       # Generates colors for plots  
                      "corrplot",      # Plots pairwise correlations
                      "lubridate",     # Easy manipulation of date objects
                      "taxize",        # Package to check taxonomy 
                      "sf")            # Package for spatial data analysis 

# Check you have them in your library
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
# load them
if(length(new.packages)) install.packages(new.packages,repos = "http://cran.us.r-project.org")
lapply(list.of.packages, require, character.only = TRUE)

5.2 Formatting dates

Every aspect of camera trapping involves manipulating date objects - calculating how long cameras were active, when detections occurred, working with timezones etc. Thus, as a camera trapper working in R you need to be comfortable dealing with them.

Fortunately the process has been made far easier with the lubridate package.

The first dates we need to convert are those in the deployment (dep) datasheet - the start and end times of each period of camera activity. The way lubridate works is you specify the order of the days, months years, hours, minutes and seconds with the codes d,m,y,h,m, and s respectively.

5.2.1 Skill check: lubridate

Try importing the 25th of December in a couple of formats. Copy and run the following:

#library(lubridate)
# day-month-year
dmy("24-12-2022")

# year-month-day
ymd("2022-12-24")

The output should be identical. Note lubridate defaults to UTC - unless otherwise specified.

Now the real power of lubridate lies in the fact that you can handle multiple different date formats in one column using the parse_date_time() function. This sometimes happens - I usually blame excel - but you could be merging two data sets formatted in different ways too. Lets try it:

x <- c("24-12-2022", "2022-12-24", "12-24-2022") #Three different date formats
parse_date_time(x, c("ymd", "dmy", "mdy"))

Again, they should give all the same output!

Next, lets calculate the amount of time which has elapsed between two dates. A fundamental operation in the management of camera data. To do this we first create an interval object interval(date1, date2), then ask to return the object in days /ddays(1). Lets try it:

# Specify your start and end dates
start <- ymd("2021-10-13")
end <- ymd("2021-12-11")
# Specify the interval, and put it in days
interval(start, end)/ddays(1)

# Interval creates an "interval object" - run that along and see what it looks like
# ddays() converts the native units of date objects in R (seconds) to days - run it on its own to see.

How many days elapsed between those two dates?

We can change the units of the output by changing the denominator:

# Specify the interval, and put it in weeks
interval(start, end)/ddays(7)
## [1] 8.428571

And in decimal years:

interval(start, end)/ddays(365)
## [1] 0.1616438

Easy! If you want to learn more about the amazing functionality of the ‘lubridate’ package - check out the pdf Lubridate Cheatsheet

5.2.2 Deployment dates

Lets get back to the camera data. Which lubridate format should we use for 2018-04-11?

ymd() should do the job. Lets convert the date columns from character strings to date objects:

# start dates
dep$start_date <- ymd(dep$start_date)

# end dates
dep$end_date   <- ymd(dep$end_date)

Now lets make a new column in the deployment data called days, and calculate the interval for all the deployments:

dep$days <- interval(dep$start_date, dep$end_date)/ddays(1)

We should then check the range of dates the cameras were active for. Things to look out for are:

- 0’s A value of zero would mean a deployments which started and ended on the same day -> it typically denotes a camera which malfunctioned instantly.

- NA’s This is either an end_date which is NA e.g. if the camera was stolen, or it could be a date value which failed to parse e.g. if you had a typo in your date column such as ymd("202-212-24") it would return NA

- Negative numbers Are more common that you think… someone probably got the start and end dates the wrong way round when entering data.

Lets look at the range of values we have:

summary(dep$days)

Cameras were active between 15 and 234 days per deployment, and we have an NA. Let’s see what it relates to:

dep[is.na(dep$days)==T,] %>% 
  kbl() %>% 
  kable_styling(full_width = T) %>% 
  kableExtra::scroll_box(width = "100%")
project_id deployment_id placename longitude latitude start_date end_date bait_type bait_description feature_type feature_type_methodology camera_id camera_name quiet_period camera_functioning sensor_height height_other sensor_orientation orientation_other plot_treatment plot_treatment_description detection_distance subproject_name subproject_design event_name event_description event_type recorded_by days
3 AlgarRestorationProject ALG027_2019-04-03 ALG027 -112.4735 56.3328 2019-04-03 NA None NA HumanUse NA NA NA 1 Camera Functioning 100 NA NA NA NA NA NA Restoration NA NA NA NA NA NA

It was a camera that was on a ‘HumanUse’ feature - this camera was stolen!

5.2.3 Image dates

We next need to convert the img$timestamp column. What lubridate format is required for a a date which looks like 2018-04-13 13:51:01?

ymd_hms("2015-11-21 03:03:44")

Lets apply this to our image dataset:

img$timestamp <- ymd_hms(img$timestamp)

And do a quick check to see if all the dates parsed correctly. First check the range:

range(img$timestamp)
## [1] "2018-04-08 04:22:13 UTC" "2019-12-16 12:41:43 UTC"

We have data from early 2018 to late 2019.

And check for NA’s:

table(is.na(img$timestamp))
## 
## FALSE 
## 15290

No NA’s - great!

5.3 Basic trapping summaries

Now that our camera trap data are loaded into R, we can very quickly find out summary information about the dataset. These can feed directly into the methods section of your report/paper.

First let’s count the number of unique locations:

# Count the number of camera locations
paste(length(unique(dep$placename)), "locations"); paste(length(unique(dep$deployment_id)), "deployments");paste(nrow(img), "image labels"); paste(nrow(img[img$is_blank == TRUE,]), "blanks")
## [1] "38 locations"
## [1] "114 deployments"
## [1] "15290 image labels"
## [1] "2633 blanks"

5.4 Error checks

Lets start with the fundamental error checks required with a new data set:

  • Camera locations
  • Deployment date checks
  • Image and deployment matching
  • Taxonomy
  • Diel time

5.4.1 Camera locations

A common mistake in camera trap data sets is that the locations are not where they are supposed to be. The safest way to check your data is to plot them… preferably R! After synthesizing >100 different projects from different data contributors for one project, we found ~20%(!) of submissions had a clear and obvious location errors (e.g. a camera station in the middle of the Atlantic).

Don’t just take my word for it:

(p.s. Mason is well worth a follow on Twitter )

5.4.1.1 Skill check: Leaflet maps

Below we make use of the fantastic ‘leaflet’ package to produce interactive plots to help us check our camera locations. leaflet has a tonne of different customization options and freely available, high resolution, base layers to choose from.

Note - Leaflet is best used using tidyverse ‘pipe’ notation - %>%. It allows you to add successive operations in the order that the elements occur.

The simplest version of a leaflet map looks like this:

m <- leaflet() %>%             # call leaflet
        addTiles() %>%         # add the default basemap
        addCircleMarkers(      # Add circles for stations
          lng=dep$longitude, lat=dep$latitude) 
m                              # return the map

This is great! We can zoom in using the +/_ in the top left hand corner, and the default basemap is OpenStreetMap. The camera stations all appear to be in the right place (we don’t have any in the Atlantic).

But wouldn’t it be great to see the dep$placename when we click over a symbol? Run the following:

m <- leaflet() %>%             
        addTiles() %>%         
        addCircleMarkers(      
          lng=dep$longitude, lat=dep$latitude,
          popup=paste(dep$placename)) # include a popup with the placename!
m                              

Okay, but a map isn’t really useful until have have some satellite imagery, right? Easy:

m <- leaflet() %>%             
        addProviderTiles(providers$Esri.WorldImagery) %>% #Add Esri Wrold imagery         
        addCircleMarkers(      
          lng=dep$longitude, lat=dep$latitude,
          popup=paste(dep$placename)) # include a popup with the placename!
m                              

Zoom in - what can you tell me about where the stations are located? Can you spot any differences between the stations? Clue: look for lines on the landscape.

These lines are linear features related to oil and gas exploration, some cameras are deployed on them, others away from them.

5.4.1.2 Making corrections

As you can see, we have one deployment location that is a long way from the others. It almost looks like it belongs to another project?! Let’s take a look at all of the deployments from that location (ALG069):

dep[dep$placename=="ALG069",c("deployment_id", "placename", "longitude", "latitude")]
##         deployment_id placename longitude latitude
## 100 ALG069_2018-04-07    ALG069 -113.5075 56.49352
## 101 ALG069_2018-11-14    ALG069 -112.5075 56.49352
## 102 ALG069_2019-04-02    ALG069 -112.5075 56.49352

It looks like there is a typo in one of the coordinates: -112.5075 -> -113.5075. Let’s correct it:

dep$longitude[dep$placename=="ALG069"] <- -112.5075

5.4.1.3 The ultimate leaflet map

We will need to check our correction has worked. Let’s also color camera locations based on their dep$feature_type, include them in the legend, and have their names show up when we click on them too!

# First, set a single categorical variable of interest from station covariates for summary graphs. If you do not have an appropriate category use "project_id".
category <- "feature_type"

# We first convert this category to a factor with discrete levels
dep[,category] <- factor(dep[,category])
# then use the turbo() function to assign each level a color
col.cat <- turbo(length(levels(dep[,category])))
# then we apply it to the dataframe
dep$colours <- col.cat[dep[,category]]

m <- leaflet() %>%
  addProviderTiles(providers$Esri.WorldImagery, group="Satellite") %>%  
  addTiles(group="Base") %>%     # Include a basemap option too
  addCircleMarkers(lng=dep$longitude, lat=dep$latitude,
                   # Co lour the markers depending on the 'feature type'
                   color=dep$colours,
                   # Add a popup of the placename and feature_type together 
                   popup=paste(dep$placename, dep[,category])) %>%
  
  # Add a legend explaining what is going on
  addLegend("topleft", colors = col.cat,  labels = levels(dep[,category]),
                   title = category,
                   labFormat = labelFormat(prefix = "$"),
                   opacity = 1) %>%
  
  # add a layer control box to toggle between the layers
  addLayersControl(
                    baseGroups = c("Satellite", "Base"))

m

If you click on a point you will see it’s corresponding placename and feature_type - so you can find the problem data. You can also check your treatment categories using the key. If you zoom in, all the “offline” locations should be >100m away from a linear features, the other on top of them.

For more examples of leaflet in R, see RStudio’s leaflet tutorial.

Check the distance between camera pairs

Sometimes the coordinates of a camera stations are accidentally repeated in the deployment data, which can actually be very hard to see on a map as the points will overlay perfectly. The way we check this is to calculate the pairwise distance between all of the unique deployments in the project. This helps us in two ways:

  • we can find “cryptic” duplication events in the deployment coordinates
  • this distance is often reported in manuscript method sections

In the following code block we make first use of the simple features (sf) package - tools which make spatial operations which you would normally perform in ArcMap very easy (e.g. plotting and manipulating polygons). More on that later!

# create a list of all the non-duplicated placenames
camera_locs <- dep %>% 
  dplyr::select(placename, latitude, longitude) %>% 
  unique() %>% # remove duplicated rows (rows where the placename and coordinates match)
  st_as_sf(coords = c("longitude", "latitude"), crs = "+proj=longlat") # Convert to `sf` format

First lets check that none of the placenames are duplicated - this would suggest a placename with two sets of coordinates. If all is well, this should return an empty dataframe.

# Check that there are no duplicated stations
camera_locs[duplicated(camera_locs$placename)==T,]
## Simple feature collection with 0 features and 1 field
## Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
## Geodetic CRS:  +proj=longlat
## [1] placename geometry 
## <0 rows> (or 0-length row.names)

If it returns a list of <0 rows>, we don’t have duplicates with different coordinates. Phew!

Now let’s crunch the numbers.Paste and run the following:

# distance matrix for all cameras
camera_dist <- st_distance(camera_locs) %>% 
                  as.dist() %>% 
                  usedist::dist_setNames(as.character(camera_locs$placename)) %>% 
                  as.matrix()


#Make temporary camera_dist_mins by  converting diagonals/zeros to 999999 so we can avoid the zeros when using which.min function to find nearest cameras
camera_dist_mins <- camera_dist + diag(999999,dim(camera_dist)[1])

#Create new empty dataframe for appending results to
camera_dist_list <- data.frame(focal_cam = character(),nearest_cam = character(), dist = double())

#Cycle through each column of camera_dist_mins
for (i in (1:dim(camera_dist_mins)[1])) 
  {

    #Get index of minimum value of column i
    t <- which.min(camera_dist_mins[,i])

    #Combine relevant data into new_row
    new_row <- data.frame(colnames(camera_dist_mins)[i],names(t),camera_dist_mins[t,i])

    #Append the new_row to the accumulated results dataframe
    camera_dist_list[nrow(camera_dist_list) + 1,] = new_row

  }

Lets summarize the output:

summary(camera_dist_list$dist)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1103    1540    1978    2210    2578    5263

So the largest distance between two cameras is 5263m, the minimum is 1103m and on average it is 2210m.

Again, put that straight into the methods section of your report/paper.

5.4.1.4 Do all images have a deployment associated with them?

Another very useful check is to verify that all of the placenames have corresponding image data, and that all image data has corresponding deployment data! You would be surprised how often this is not the case!

# check all check the placenames in images are represented in deployments
# This code returns TRUE if it is and FALSE if it isn't. We can then summarize this with table()
table(unique(img$placename) %in% unique(dep$placename))
## 
## TRUE 
##   38

We have 38 TRUE’s, which means all the images have deployment data.

Let’s check that all the placenames also have image data:

# check all the placenames in deployments are represented in the images data
table(unique(dep$placename)  %in% unique(img$placename))
## 
## TRUE 
##   38

Great.

If you see any FALSE observations - you either have image data or deployments missing. Go back and check your raw data!

5.4.2 Camera activity checks

The next step is to plot out the camera activity at each unique place name to see when our cameras are functioning. To make this plot we will need to use the plotly package for the first time. We use this because the plots are interactive, just like with leaflet we can zoom in and zoom out and find problem observations. It also dynamically changes the y-axis and x-axis labels to fit the data, which is very useful! Let’s experiment with some basic ‘plotly’ graphs to get warmed up:

5.4.2.1 Skill check: plotly

If you are familiar with making plots in base R or ggplot, hopefully plotly is not too intimidating.

Let us start with a basic scatter plot using the deployment data.

library(plotly)
fig <- plot_ly(data = dep,                    # Specify your data frame
               x = ~longitude, y = ~latitude, # The x and y axis columns
               type="scatter")                # and the type of plot
fig

Like we said, this is very similar to conventional plotting tools. However, things get different when we start to specify the style of the points. In base R we might use pch= and cex= to change the style and size of the points, whereas with plotly we use the “marker” option, and include elements as a list.

library(plotly)
fig <- plot_ly(data = dep,                    
               x = ~longitude, y = ~latitude,
               color=~feature_type,              # We can specify color categories
               type="scatter",
               marker=list(size=15))             # the default size is 10           
fig

As ever- this just scratches the surface of the plotly package. See the Plotly graphing library for a wealth of options.

5.4.2.2 The ultimate plotly camera activity figure

In the following plot, black dots denote start and end dates, lines denote periods where a camera is active. Each unique placename gets its own row on the plot - you can hover over the lines to get the deployment_id.

We will use a loop to build the different elements… you don’t need to understand the code itself, just how to interpret the output. Cut and paste the following:

# Call the plot
p <- plot_ly()

# We want a separate row for each 'placename' - so lets turn it into a factor
dep$placename <- as.factor(dep$placename)

# loop through each place name
for(i in seq_along(levels(dep$placename)))
  {
      #Subset the data to just that placename
      tmp <- dep[dep$placename==levels(dep$placename)[i],]
      # Order by date
      tmp <- tmp[order(tmp$start_date),]
      # Loop through each deployment at that placename
      for(j in 1:nrow(tmp))
      {
        # Add a line to 'p'
        p <- add_trace(p, 
                       #Use the start and end date as x coordinates
                       x = c(tmp$start_date[j], tmp$end_date[j]), 
                       #Use the counter for the y coordinates
                       y = c(i,i), 
                       # State the type of chart
                       type="scatter",
                       # make a line that also has points
                       mode = "lines+markers", 
                       # Add the deployment ID as hover text
                       hovertext=tmp$deployment_id[j], 
                       # Color it all black
                       color=I("black"), 
                       # Suppress the legend
                       showlegend = FALSE)
      }
      
  }
# Add a categorical y axis
 p <- p %>%   layout(yaxis = list(

      ticktext = as.list(levels(dep$placename)), 

      tickvals = as.list(1:length(levels(dep$placename))),

      tickmode = "array"))


p

What do the gaps signify? The breaks in the line signify periods when the camera at a location was not active. You can see there was a point in 2018 when 9 out of 38 cameras had stopped working.

Can you see any issues? Yes!

Sometimes you will see a deployment a long way to the left or right of the plot, this is usually a date error (e.g. ALG036).

5.4.2.3 Corrections

We checked the original datasheets for ALG036 and found out that the deployment end date for ALG036_2019-04-04 was incorrectly entered. It was written as 2020 not 2019. Let’s correct it:

dep$end_date[dep$deployment_id=="ALG036_2019-04-04"] <- ymd("2019-11-21") 
                                    #remember to format it as a date object

5.4.3 Detection check

Once we are happy that are cameras were functioning when we expected them to be, we now need to check if all of our labelled images fall within the associated deployment periods. To do this we build on the previous plot above, but also add in the image data over the top. This plot can get very messy, so we divide it into sections of ten deployments.

As before, black lines show an active camera. Red dots show an image detections at that time.

We only show the output of the first 10 deployments, but you should do this for every single deployment you have!

Note - the code below is complex, you don’t have to understand it all unless you want to

# Make a separate plot for each 20 stations For each 20 stations
# To do this make a plot dataframe
tmp <- data.frame("deployment_id"=unique(dep$deployment_id), "plot_group"=ceiling(1:length(unique(dep$deployment_id))/20))

dep_tmp <- left_join(dep,tmp, by="deployment_id")

for(i in 1:max(dep_tmp$plot_group))
{  
  # Call the plot
  p <- plot_ly() 
  
  #Subset the data to just that placename
  tmp <- dep_tmp[dep_tmp$plot_group==i,]
  # Order by placename 
  tmp <- tmp[order(tmp$placename),]
  
 
 # Loop through each deployment at that placename
  for(j in 1:nrow(tmp))
    {
        #Subset the image data
        tmp_img <- img[img$deployment_id==tmp$deployment_id[j],]
        
        if(nrow(tmp_img)>0)
        {
         
          p <- add_trace(p, 
                       #Use the start and end date as x coordinates
                       x = c(tmp_img$timestamp), 
                       #Use the counter for the y coordinates
                       y = rep(j, nrow(tmp_img)), 
                       # State the type of chart
                       type="scatter",
                       # make a line that also has points
                       mode = "markers", 
                       # Add the deployment ID as hover text
                       hovertext=paste(tmp_img$genus,tmp_img$species), 
                       # Color it all black
                       marker = list(color = "red"), 
                       # Suppress the legend
                       showlegend = FALSE)
        }
        
       # Add a line to 'p'
        p <- add_trace(p, 
                       #Use the start and end date as x coordinates
                       x = c(tmp$start_date[j], tmp$end_date[j]), 
                       #Use the counter for the y coordinates
                       y = c(j,j), 
                       # State the type of chart
                       type="scatter",
                       # make a line that also has points
                       mode = "lines", 
                       # Add the deployment ID as hover text
                       hovertext=tmp$deployment_id[j], 
                       # Color it all black
                       color=I("black"), 
                       # Suppress the legend
                       showlegend = FALSE)
      }
  # Add custom y axis labels  
  p <- p %>%   layout(yaxis = list(

      ticktext = as.list(tmp$deployment_id), 

      tickvals = as.list(1:nrow(tmp)),

      tickmode = "array"))
  
  print(p)
      
  
} 

What would a problem look like?

If you have images (red dots) occurring outside a period of camera activity - that would indicate a miss-match between the deployment data and the image data. You would need to revisit your datasheets to see where this mismatch occurred.

If the error is in the deployment dates - correct them as above!

If the error is in the image metadata (i.e. camera was set to the wrong date), you have several options:

  • If you are working in a platform like Wildlife Insights there is a date-time frameshift correction you can perform: see The correcting timestamps section

  • You can correct the underlying exif data of the images using EXIF date changer

  • Finally, you could change the dates in R using lubridate. If you look at the deployment ALG029_2019-04-02 you will see that has what has happened here. We checked the datasheets and realised that the camera’s timestamp was set to the incorrect month when the deployment began (2019-05-02 instead of 2019-04-02).

# We set the wrong date for the camera collecting images in deployment 
#":"ALG029_2019-04-02"

# We established that the deployment was 30 days out (as there are 30 days in April)
# So we add 30 days to all of the images in that deployment.
img[img$deployment_id=="ALG029_2019-04-02",]$timestamp <-
                   img[img$deployment_id=="ALG029_2019-04-02",]$timestamp - days(30)
# Easy!

You should repeat the plot above to check it has worked!

5.4.4 Taxonomy check

Dealing with taxonomy in camera trap data sets can be a nightmare, particularly if your data labeling software does not give standardized lists of species (e.g. you are manually sorting images into folders). A species list is also something which is often produced for the appendix of a report or paper.

Let us start with building a list of our taxonomic classifications:

# First define vector of the headings you want to see (we will use this trick a lot later on)
taxonomy_headings <- c("class", "order", "family", "genus", "species", "common_name")

# Subset the image data to just those columns
tmp<- img[,colnames(img)%in% taxonomy_headings]
# Remove duplicates
tmp <- tmp[duplicated(tmp)==F,]

# Create an ordered species list
sp_list  <- tmp[order(tmp$class, tmp$order, tmp$family, tmp$genus, tmp$species),]

# Create a column to the species list with genus and species pasted together
sp_list$sp <- paste(sp_list$genus, sp_list$species, sep=".")

# View the species list using kableExtra
sp_list %>%
  kbl(row.names=F) %>%
  kable_styling(full_width = T) %>% 
  kableExtra::scroll_box(width = "100%", height = "250px")
class order family genus species common_name sp
Aves Galliformes Phasianidae Tympanuchus phasianellus NA Tympanuchus.phasianellus
Aves Gruiformes Gruidae Grus canadensis NA Grus.canadensis
Aves Passeriformes Corvidae Corvus corax NA Corvus.corax
Aves Passeriformes Corvidae Perisoreus canadensis NA Perisoreus.canadensis
Aves Strigiformes Strigidae Strix nebulosa NA Strix.nebulosa
Mammalia Artiodactyla Cervidae Alces alces NA Alces.alces
Mammalia Artiodactyla Cervidae Cervus canadensis NA Cervus.canadensis
Mammalia Artiodactyla Cervidae Odocoileus virginianus NA Odocoileus.virginianus
Mammalia Artiodactyla Cervidae Rangifer tarandus NA Rangifer.tarandus
Mammalia Carnivora Canidae Canis latrans NA Canis.latrans
Mammalia Carnivora Canidae Canis lupus NA Canis.lupus
Mammalia Carnivora Canidae Vulpes vulpes NA Vulpes.vulpes
Mammalia Carnivora Felidae Lynx canadensis NA Lynx.canadensis
Mammalia Carnivora Mustelidae Lontra canadensis NA Lontra.canadensis
Mammalia Carnivora Mustelidae Martes americana NA Martes.americana
Mammalia Carnivora Ursidae Ursus americanus NA Ursus.americanus
Mammalia Lagomorpha Leporidae Lepus americanus NA Lepus.americanus
Mammalia Lagomorpha Leporidae Oryctolagus cuniculus NA Oryctolagus.cuniculus
Mammalia Primates Hominidae Homo sapiens NA Homo.sapiens
Mammalia Rodentia Sciuridae Tamiasciurus hudsonicus NA Tamiasciurus.hudsonicus
NA NA NA blank NA .blank
NA NA NA spp. NA .spp.
NA NA NA Canachites canadensis NA Canachites.canadensis
NA NA NA Unknown unknown NA Unknown.unknown
NA NA NA Weasel spp. NA Weasel.spp.

That is a lot of species classifications - are they all correct? If you are familiar with the species you may be able to know by eye, however if you are unfamiliar with them. You can also use a package called taxize.

5.4.5 Skill Check: Taxize package

Lets start by seeing what taxize can do with a single species. Run the following:

library(taxize)
gnr_resolve("Lynx canadensis")

For each hit in an online taxonomy database, you get a row in a dataframe and a confidence score in the identification.

Lets try miss-spelling a common name:

gnr_resolve("Lynx cramadensis")

Cool! We can even recover incorrectly spelled names!

What about common names?

Well there is a way we can get those too using sci2comm()

sci2comm("Lynx canadensis")

For more information see the Taxize documentation.

In this instance, we have an external list of common names which we will use to update our img file.

# Import the dataframe
tmp <- read.csv("data/raw_data/example_data/common_names.csv")

# Join it with the existing species list
sp_list$common_name <- NULL
sp_list <- left_join(sp_list, tmp)

Then let’s write our species list into our raw data folder:

# Note we use the project_id from from project data frame to name the file - that was we wont overwrite it if we run things with a different project. 
write.csv(sp_list, paste0("data/raw_data/",pro$project_id[1],"_raw_species_list.csv"))

Then lets update the common_name column in our img dataframe to reflect the updated common names.

We will do this using a left_join(), an operation which is invaluable when programming in R.

It uses a specified “key” variable to merge two dataframes in this case we will use the ‘sp’ column.

# first remove the common_name column
img$common_name <- NULL

# add an sp column to the img dataframe - remember the genus and species columns are not pasted together yet
img$sp <- paste(img$genus, img$species, sep=".")

# Next we do the 'left_join'
img <- left_join(img, sp_list[, c("sp", "common_name")], by="sp")

Lets move on!

5.5 Diel activity check

Sometimes when setting up a camera trap, you can input the time incorrectly. This is actually very hard to detect unless you happen to be looking for it. The way we check is to plot the detections for each species by the 24 hour clock. If were get detections of nocturnal species in the day, or vice versa, it suggests there may be a problem.

Caveat A diurnal species active at night doesn’t mean there is actually a problem, camera traps have revealed that many animals are active when we thought they were not!

Cool note Researchers are increasingly using this information to determine a species “availability” for detection! More on that in the density and activity chapters.

For any species detected more than 10 times, we will plot when they were detected:

# First lets convert our timestamp to decimal hours
img$hours <- hour(img$timestamp) + minute(img$timestamp)/60 + second(img$timestamp)/(60*60)

# Count all of the captures
tmp <- img %>% group_by(common_name) %>% summarize(count=n())

yform <- list(categoryorder = "array",
              categoryarray = tmp$common_name)

fig <- plot_ly(x = img$hours, y = img$common_name,type="scatter",
               height=1000, text=img$deployment_id, hoverinfo='text',
               mode   = 'markers',
               marker = list(size = 5,
                             color = 'rgba(50, 100, 255, .2)',
                             line = list(color = 'rgba(0, 0, 0, 0)',
                                         width = 0))) %>% 
              layout(yaxis = yform)
fig
# Remove the column
img$hours <- NULL

Can you see any exclusively diurnal species?

Sandhill crane would be a good candidate. They are very rarely detected at night.

Can you see any exclusively nocturnal species?

Snowshoe hare!

More on activity data in the Activity chapter

5.6 Conclusion

Congratulations - you have thoroughly error checked your camera data! We may find more errors in the data exploration chapter, so stay vigilant.