Chapter 5 Error checking
The most important part of analyzing camera trap data is checking and exploring your data! Based on the projects we have worked on synthesizing multiple datasets from different sources… camera trappers are not doing a very good job of checking for errors.
Working in R makes it possible to rapidly check your data, ideally in almost real time as you collect it. In an ideal world it would be worth downloading the data for your project at least once per month and checking that ‘everything’ looks good. But what constitutes ‘everything’?
5.1 Standardised exploration script
In the Wildlife Coexistence Lab developed a standardized R script to check the data generated by camera trap projects. This script is kept on our WildCO Single Site Exploration GitHub page.
Below we run through the important elements of checking camera trap data, and where they is a coding skill fundamental to the process, we explore it in more detail (a.k.a. skill checks
).
Let’s go!
First, open the .Rproj
file your created in the course preparation section.
Then click
File
-> New file
-> Rscript
(alternatively you can use the R Markdown
option if you are comfortable with that)
After the file has opened, immediately save it as ’01_example_error_checking_and_export.R`. We will usually make a new R sheet for each chapter - however the error checking and analysis data creation chapters should be in the same document.
Second, read in our standardized example datasets:
# Load your data
pro <- read.csv("data/raw_data/example_data/proj.csv", header=T)
img <- read.csv("data/raw_data/example_data/img.csv", header=T)
dep <- read.csv("data/raw_data/example_data/dep.csv", header=T)
cam <- read.csv("data/raw_data/example_data/cam.csv", header=T)
Next, load in the packages we will use in this chapter - we give a brief description of them too. Cut and paste the code block below.
#Load Packages
list.of.packages <- c(
"leaflet", # creates interactive maps
"plotly", # creates interactive plots
"kableExtra", # Creates interactive tables
"tidyr", # A package for data manipulation
"dplyr", # A package for data manipulation
"viridis", # Generates colors for plots
"corrplot", # Plots pairwise correlations
"lubridate", # Easy manipulation of date objects
"taxize", # Package to check taxonomy
"sf") # Package for spatial data analysis
# Check you have them in your library
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
# load them
if(length(new.packages)) install.packages(new.packages,repos = "http://cran.us.r-project.org")
lapply(list.of.packages, require, character.only = TRUE)
5.2 Formatting dates
Every aspect of camera trapping involves manipulating date objects - calculating how long cameras were active, when detections occurred, working with timezones etc. Thus, as a camera trapper working in R you need to be comfortable dealing with them.
Fortunately the process has been made far easier with the lubridate
package.
The first dates we need to convert are those in the deployment (dep
) datasheet - the start and end times of each period of camera activity.
The way lubridate
works is you specify the order of the days, months years, hours, minutes and seconds with the codes d,m,y,h,m, and s respectively.
5.2.1 Skill check: lubridate
Try importing the 25th of December in a couple of formats. Copy and run the following:
The output should be identical.
Note lubridate
defaults to UTC - unless otherwise specified.
Now the real power of lubridate
lies in the fact that you can handle multiple different date formats in one column using the parse_date_time()
function.
This sometimes happens - I usually blame excel - but you could be merging two data sets formatted in different ways too.
Lets try it:
x <- c("24-12-2022", "2022-12-24", "12-24-2022") #Three different date formats
parse_date_time(x, c("ymd", "dmy", "mdy"))
Again, they should give all the same output!
Next, lets calculate the amount of time which has elapsed between two dates.
A fundamental operation in the management of camera data.
To do this we first create an interval object interval(date1, date2)
, then ask to return the object in days /ddays(1)
.
Lets try it:
# Specify the interval, and put it in days
interval(start, end)/ddays(1)
# Interval creates an "interval object" - run that along and see what it looks like
# ddays() converts the native units of date objects in R (seconds) to days - run it on its own to see.
How many days elapsed between those two dates?
We can change the units of the output by changing the denominator:
## [1] 8.428571
And in decimal years:
## [1] 0.1616438
Easy! If you want to learn more about the amazing functionality of the ‘lubridate’ package - check out the pdf Lubridate Cheatsheet
5.2.2 Deployment dates
Lets get back to the camera data.
Which lubridate
format should we use for 2018-04-11?
ymd()
should do the job.
Lets convert the date columns from character strings to date objects:
Now lets make a new column in the deployment data called days
, and calculate the interval for all the deployments:
We should then check the range of dates the cameras were active for. Things to look out for are:
- 0’s A value of zero would mean a deployments which started and ended on the same day -> it typically denotes a camera which malfunctioned instantly.
- NA’s This is either an end_date
which is NA e.g. if the camera was stolen, or it could be a date value which failed to parse e.g. if you had a typo in your date column such as ymd("202-212-24")
it would return NA
- Negative numbers Are more common that you think… someone probably got the start and end dates the wrong way round when entering data.
Lets look at the range of values we have:
Cameras were active between 15 and 234 days per deployment, and we have an NA. Let’s see what it relates to:
dep[is.na(dep$days)==T,] %>%
kbl() %>%
kable_styling(full_width = T) %>%
kableExtra::scroll_box(width = "100%")
project_id | deployment_id | placename | longitude | latitude | start_date | end_date | bait_type | bait_description | feature_type | feature_type_methodology | camera_id | camera_name | quiet_period | camera_functioning | sensor_height | height_other | sensor_orientation | orientation_other | plot_treatment | plot_treatment_description | detection_distance | subproject_name | subproject_design | event_name | event_description | event_type | recorded_by | days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | AlgarRestorationProject | ALG027_2019-04-03 | ALG027 | -112.4735 | 56.3328 | 2019-04-03 | NA | None | NA | HumanUse | NA | NA | NA | 1 | Camera Functioning | 100 | NA | NA | NA | NA | NA | NA | Restoration | NA | NA | NA | NA | NA | NA |
It was a camera that was on a ‘HumanUse’ feature - this camera was stolen!
5.2.3 Image dates
We next need to convert the img$timestamp
column.
What lubridate
format is required for a a date which looks like 2018-04-13 13:51:01?
Lets apply this to our image dataset:
And do a quick check to see if all the dates parsed correctly. First check the range:
## [1] "2018-04-08 04:22:13 UTC" "2019-12-16 12:41:43 UTC"
We have data from early 2018 to late 2019.
And check for NA’s:
##
## FALSE
## 15290
No NA’s - great!
5.3 Basic trapping summaries
Now that our camera trap data are loaded into R, we can very quickly find out summary information about the dataset. These can feed directly into the methods section of your report/paper.
First let’s count the number of unique locations:
# Count the number of camera locations
paste(length(unique(dep$placename)), "locations"); paste(length(unique(dep$deployment_id)), "deployments");paste(nrow(img), "image labels"); paste(nrow(img[img$is_blank == TRUE,]), "blanks")
## [1] "38 locations"
## [1] "114 deployments"
## [1] "15290 image labels"
## [1] "2633 blanks"
5.4 Error checks
Lets start with the fundamental error checks required with a new data set:
- Camera locations
- Deployment date checks
- Image and deployment matching
- Taxonomy
- Diel time
5.4.1 Camera locations
A common mistake in camera trap data sets is that the locations are not where they are supposed to be. The safest way to check your data is to plot them… preferably R! After synthesizing >100 different projects from different data contributors for one project, we found ~20%(!) of submissions had a clear and obvious location errors (e.g. a camera station in the middle of the Atlantic).
Don’t just take my word for it:
(p.s. Mason is well worth a follow on Twitter )
5.4.1.1 Skill check: Leaflet maps
Below we make use of the fantastic ‘leaflet’ package to produce interactive plots to help us check our camera locations.
leaflet
has a tonne of different customization options and freely available, high resolution, base layers to choose from.
Note - Leaflet is best used using tidyverse ‘pipe’ notation - %>%
.
It allows you to add successive operations in the order that the elements occur.
The simplest version of a leaflet map looks like this:
m <- leaflet() %>% # call leaflet
addTiles() %>% # add the default basemap
addCircleMarkers( # Add circles for stations
lng=dep$longitude, lat=dep$latitude)
m # return the map
This is great! We can zoom in using the +/_ in the top left hand corner, and the default basemap is OpenStreetMap. The camera stations all appear to be in the right place (we don’t have any in the Atlantic).
But wouldn’t it be great to see the dep$placename
when we click over a symbol?
Run the following:
m <- leaflet() %>%
addTiles() %>%
addCircleMarkers(
lng=dep$longitude, lat=dep$latitude,
popup=paste(dep$placename)) # include a popup with the placename!
m
Okay, but a map isn’t really useful until have have some satellite imagery, right? Easy:
m <- leaflet() %>%
addProviderTiles(providers$Esri.WorldImagery) %>% #Add Esri Wrold imagery
addCircleMarkers(
lng=dep$longitude, lat=dep$latitude,
popup=paste(dep$placename)) # include a popup with the placename!
m
Zoom in - what can you tell me about where the stations are located? Can you spot any differences between the stations? Clue: look for lines on the landscape.
These lines are linear features related to oil and gas exploration, some cameras are deployed on them, others away from them.
5.4.1.2 Making corrections
As you can see, we have one deployment location that is a long way from the others. It almost looks like it belongs to another project?! Let’s take a look at all of the deployments from that location (ALG069
):
## deployment_id placename longitude latitude
## 100 ALG069_2018-04-07 ALG069 -113.5075 56.49352
## 101 ALG069_2018-11-14 ALG069 -112.5075 56.49352
## 102 ALG069_2019-04-02 ALG069 -112.5075 56.49352
It looks like there is a typo in one of the coordinates: -112.5075 -> -113.5075. Let’s correct it:
5.4.1.3 The ultimate leaflet map
We will need to check our correction has worked. Let’s also color camera locations based on their dep$feature_type
, include them in the legend, and have their names show up when we click on them too!
# First, set a single categorical variable of interest from station covariates for summary graphs. If you do not have an appropriate category use "project_id".
category <- "feature_type"
# We first convert this category to a factor with discrete levels
dep[,category] <- factor(dep[,category])
# then use the turbo() function to assign each level a color
col.cat <- turbo(length(levels(dep[,category])))
# then we apply it to the dataframe
dep$colours <- col.cat[dep[,category]]
m <- leaflet() %>%
addProviderTiles(providers$Esri.WorldImagery, group="Satellite") %>%
addTiles(group="Base") %>% # Include a basemap option too
addCircleMarkers(lng=dep$longitude, lat=dep$latitude,
# Co lour the markers depending on the 'feature type'
color=dep$colours,
# Add a popup of the placename and feature_type together
popup=paste(dep$placename, dep[,category])) %>%
# Add a legend explaining what is going on
addLegend("topleft", colors = col.cat, labels = levels(dep[,category]),
title = category,
labFormat = labelFormat(prefix = "$"),
opacity = 1) %>%
# add a layer control box to toggle between the layers
addLayersControl(
baseGroups = c("Satellite", "Base"))
m
If you click on a point you will see it’s corresponding placename
and feature_type
- so you can find the problem data.
You can also check your treatment categories using the key.
If you zoom in, all the “offline” locations should be >100m away from a linear features, the other on top of them.
For more examples of leaflet in R, see RStudio’s leaflet tutorial.
Check the distance between camera pairs
Sometimes the coordinates of a camera stations are accidentally repeated in the deployment data, which can actually be very hard to see on a map as the points will overlay perfectly. The way we check this is to calculate the pairwise distance between all of the unique deployments in the project. This helps us in two ways:
- we can find “cryptic” duplication events in the deployment coordinates
- this distance is often reported in manuscript method sections
In the following code block we make first use of the simple features (sf
) package - tools which make spatial operations which you would normally perform in ArcMap very easy (e.g. plotting and manipulating polygons).
More on that later!
# create a list of all the non-duplicated placenames
camera_locs <- dep %>%
dplyr::select(placename, latitude, longitude) %>%
unique() %>% # remove duplicated rows (rows where the placename and coordinates match)
st_as_sf(coords = c("longitude", "latitude"), crs = "+proj=longlat") # Convert to `sf` format
First lets check that none of the placenames
are duplicated - this would suggest a placename with two sets of coordinates.
If all is well, this should return an empty dataframe.
## Simple feature collection with 0 features and 1 field
## Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
## Geodetic CRS: +proj=longlat
## [1] placename geometry
## <0 rows> (or 0-length row.names)
If it returns a list of <0 rows>, we don’t have duplicates with different coordinates. Phew!
Now let’s crunch the numbers.Paste and run the following:
# distance matrix for all cameras
camera_dist <- st_distance(camera_locs) %>%
as.dist() %>%
usedist::dist_setNames(as.character(camera_locs$placename)) %>%
as.matrix()
#Make temporary camera_dist_mins by converting diagonals/zeros to 999999 so we can avoid the zeros when using which.min function to find nearest cameras
camera_dist_mins <- camera_dist + diag(999999,dim(camera_dist)[1])
#Create new empty dataframe for appending results to
camera_dist_list <- data.frame(focal_cam = character(),nearest_cam = character(), dist = double())
#Cycle through each column of camera_dist_mins
for (i in (1:dim(camera_dist_mins)[1]))
{
#Get index of minimum value of column i
t <- which.min(camera_dist_mins[,i])
#Combine relevant data into new_row
new_row <- data.frame(colnames(camera_dist_mins)[i],names(t),camera_dist_mins[t,i])
#Append the new_row to the accumulated results dataframe
camera_dist_list[nrow(camera_dist_list) + 1,] = new_row
}
Lets summarize the output:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1103 1540 1978 2210 2578 5263
So the largest distance between two cameras is 5263m, the minimum is 1103m and on average it is 2210m.
Again, put that straight into the methods section of your report/paper.
5.4.1.4 Do all images have a deployment associated with them?
Another very useful check is to verify that all of the placenames
have corresponding image data, and that all image data has corresponding deployment data!
You would be surprised how often this is not the case!
# check all check the placenames in images are represented in deployments
# This code returns TRUE if it is and FALSE if it isn't. We can then summarize this with table()
table(unique(img$placename) %in% unique(dep$placename))
##
## TRUE
## 38
We have 38 TRUE’s, which means all the images have deployment data.
Let’s check that all the placenames also have image data:
# check all the placenames in deployments are represented in the images data
table(unique(dep$placename) %in% unique(img$placename))
##
## TRUE
## 38
Great.
If you see any FALSE observations - you either have image data or deployments missing. Go back and check your raw data!
5.4.2 Camera activity checks
The next step is to plot out the camera activity at each unique place name to see when our cameras are functioning.
To make this plot we will need to use the plotly
package for the first time.
We use this because the plots are interactive, just like with leaflet we can zoom in and zoom out and find problem observations.
It also dynamically changes the y-axis and x-axis labels to fit the data, which is very useful!
Let’s experiment with some basic ‘plotly’ graphs to get warmed up:
5.4.2.1 Skill check: plotly
If you are familiar with making plots in base R or ggplot, hopefully plotly
is not too intimidating.
Let us start with a basic scatter plot using the deployment data.
library(plotly)
fig <- plot_ly(data = dep, # Specify your data frame
x = ~longitude, y = ~latitude, # The x and y axis columns
type="scatter") # and the type of plot
fig
Like we said, this is very similar to conventional plotting tools.
However, things get different when we start to specify the style of the points.
In base R we might use pch=
and cex=
to change the style and size of the points, whereas with plotly
we use the “marker” option, and include elements as a list.
library(plotly)
fig <- plot_ly(data = dep,
x = ~longitude, y = ~latitude,
color=~feature_type, # We can specify color categories
type="scatter",
marker=list(size=15)) # the default size is 10
fig
As ever- this just scratches the surface of the plotly
package.
See the Plotly graphing library for a wealth of options.
5.4.2.2 The ultimate plotly camera activity figure
In the following plot, black dots denote start and end dates, lines denote periods where a camera is active.
Each unique placename
gets its own row on the plot - you can hover over the lines to get the deployment_id
.
We will use a loop to build the different elements… you don’t need to understand the code itself, just how to interpret the output. Cut and paste the following:
# Call the plot
p <- plot_ly()
# We want a separate row for each 'placename' - so lets turn it into a factor
dep$placename <- as.factor(dep$placename)
# loop through each place name
for(i in seq_along(levels(dep$placename)))
{
#Subset the data to just that placename
tmp <- dep[dep$placename==levels(dep$placename)[i],]
# Order by date
tmp <- tmp[order(tmp$start_date),]
# Loop through each deployment at that placename
for(j in 1:nrow(tmp))
{
# Add a line to 'p'
p <- add_trace(p,
#Use the start and end date as x coordinates
x = c(tmp$start_date[j], tmp$end_date[j]),
#Use the counter for the y coordinates
y = c(i,i),
# State the type of chart
type="scatter",
# make a line that also has points
mode = "lines+markers",
# Add the deployment ID as hover text
hovertext=tmp$deployment_id[j],
# Color it all black
color=I("black"),
# Suppress the legend
showlegend = FALSE)
}
}
# Add a categorical y axis
p <- p %>% layout(yaxis = list(
ticktext = as.list(levels(dep$placename)),
tickvals = as.list(1:length(levels(dep$placename))),
tickmode = "array"))
p
What do the gaps signify? The breaks in the line signify periods when the camera at a location was not active. You can see there was a point in 2018 when 9 out of 38 cameras had stopped working.
Can you see any issues? Yes!
Sometimes you will see a deployment a long way to the left or right of the plot, this is usually a date error (e.g. ALG036
).
5.4.3 Detection check
Once we are happy that are cameras were functioning when we expected them to be, we now need to check if all of our labelled images fall within the associated deployment periods. To do this we build on the previous plot above, but also add in the image data over the top. This plot can get very messy, so we divide it into sections of ten deployments.
As before, black lines show an active camera. Red dots show an image detections at that time.
We only show the output of the first 10 deployments, but you should do this for every single deployment you have!
Note - the code below is complex, you don’t have to understand it all unless you want to
# Make a separate plot for each 20 stations For each 20 stations
# To do this make a plot dataframe
tmp <- data.frame("deployment_id"=unique(dep$deployment_id), "plot_group"=ceiling(1:length(unique(dep$deployment_id))/20))
dep_tmp <- left_join(dep,tmp, by="deployment_id")
for(i in 1:max(dep_tmp$plot_group))
{
# Call the plot
p <- plot_ly()
#Subset the data to just that placename
tmp <- dep_tmp[dep_tmp$plot_group==i,]
# Order by placename
tmp <- tmp[order(tmp$placename),]
# Loop through each deployment at that placename
for(j in 1:nrow(tmp))
{
#Subset the image data
tmp_img <- img[img$deployment_id==tmp$deployment_id[j],]
if(nrow(tmp_img)>0)
{
p <- add_trace(p,
#Use the start and end date as x coordinates
x = c(tmp_img$timestamp),
#Use the counter for the y coordinates
y = rep(j, nrow(tmp_img)),
# State the type of chart
type="scatter",
# make a line that also has points
mode = "markers",
# Add the deployment ID as hover text
hovertext=paste(tmp_img$genus,tmp_img$species),
# Color it all black
marker = list(color = "red"),
# Suppress the legend
showlegend = FALSE)
}
# Add a line to 'p'
p <- add_trace(p,
#Use the start and end date as x coordinates
x = c(tmp$start_date[j], tmp$end_date[j]),
#Use the counter for the y coordinates
y = c(j,j),
# State the type of chart
type="scatter",
# make a line that also has points
mode = "lines",
# Add the deployment ID as hover text
hovertext=tmp$deployment_id[j],
# Color it all black
color=I("black"),
# Suppress the legend
showlegend = FALSE)
}
# Add custom y axis labels
p <- p %>% layout(yaxis = list(
ticktext = as.list(tmp$deployment_id),
tickvals = as.list(1:nrow(tmp)),
tickmode = "array"))
print(p)
}
What would a problem look like?
If you have images (red dots) occurring outside a period of camera activity - that would indicate a miss-match between the deployment data and the image data. You would need to revisit your datasheets to see where this mismatch occurred.
If the error is in the deployment dates - correct them as above!
If the error is in the image metadata (i.e. camera was set to the wrong date), you have several options:
If you are working in a platform like Wildlife Insights there is a date-time frameshift correction you can perform: see The correcting timestamps section
You can correct the underlying exif data of the images using EXIF date changer
Finally, you could change the dates in R using
lubridate
. If you look at the deploymentALG029_2019-04-02
you will see that has what has happened here. We checked the datasheets and realised that the camera’s timestamp was set to the incorrect month when the deployment began (2019-05-02 instead of 2019-04-02).
# We set the wrong date for the camera collecting images in deployment
#":"ALG029_2019-04-02"
# We established that the deployment was 30 days out (as there are 30 days in April)
# So we add 30 days to all of the images in that deployment.
img[img$deployment_id=="ALG029_2019-04-02",]$timestamp <-
img[img$deployment_id=="ALG029_2019-04-02",]$timestamp - days(30)
# Easy!
You should repeat the plot above to check it has worked!
5.4.4 Taxonomy check
Dealing with taxonomy in camera trap data sets can be a nightmare, particularly if your data labeling software does not give standardized lists of species (e.g. you are manually sorting images into folders). A species list is also something which is often produced for the appendix of a report or paper.
Let us start with building a list of our taxonomic classifications:
# First define vector of the headings you want to see (we will use this trick a lot later on)
taxonomy_headings <- c("class", "order", "family", "genus", "species", "common_name")
# Subset the image data to just those columns
tmp<- img[,colnames(img)%in% taxonomy_headings]
# Remove duplicates
tmp <- tmp[duplicated(tmp)==F,]
# Create an ordered species list
sp_list <- tmp[order(tmp$class, tmp$order, tmp$family, tmp$genus, tmp$species),]
# Create a column to the species list with genus and species pasted together
sp_list$sp <- paste(sp_list$genus, sp_list$species, sep=".")
# View the species list using kableExtra
sp_list %>%
kbl(row.names=F) %>%
kable_styling(full_width = T) %>%
kableExtra::scroll_box(width = "100%", height = "250px")
class | order | family | genus | species | common_name | sp |
---|---|---|---|---|---|---|
Aves | Galliformes | Phasianidae | Tympanuchus | phasianellus | NA | Tympanuchus.phasianellus |
Aves | Gruiformes | Gruidae | Grus | canadensis | NA | Grus.canadensis |
Aves | Passeriformes | Corvidae | Corvus | corax | NA | Corvus.corax |
Aves | Passeriformes | Corvidae | Perisoreus | canadensis | NA | Perisoreus.canadensis |
Aves | Strigiformes | Strigidae | Strix | nebulosa | NA | Strix.nebulosa |
Mammalia | Artiodactyla | Cervidae | Alces | alces | NA | Alces.alces |
Mammalia | Artiodactyla | Cervidae | Cervus | canadensis | NA | Cervus.canadensis |
Mammalia | Artiodactyla | Cervidae | Odocoileus | virginianus | NA | Odocoileus.virginianus |
Mammalia | Artiodactyla | Cervidae | Rangifer | tarandus | NA | Rangifer.tarandus |
Mammalia | Carnivora | Canidae | Canis | latrans | NA | Canis.latrans |
Mammalia | Carnivora | Canidae | Canis | lupus | NA | Canis.lupus |
Mammalia | Carnivora | Canidae | Vulpes | vulpes | NA | Vulpes.vulpes |
Mammalia | Carnivora | Felidae | Lynx | canadensis | NA | Lynx.canadensis |
Mammalia | Carnivora | Mustelidae | Lontra | canadensis | NA | Lontra.canadensis |
Mammalia | Carnivora | Mustelidae | Martes | americana | NA | Martes.americana |
Mammalia | Carnivora | Ursidae | Ursus | americanus | NA | Ursus.americanus |
Mammalia | Lagomorpha | Leporidae | Lepus | americanus | NA | Lepus.americanus |
Mammalia | Lagomorpha | Leporidae | Oryctolagus | cuniculus | NA | Oryctolagus.cuniculus |
Mammalia | Primates | Hominidae | Homo | sapiens | NA | Homo.sapiens |
Mammalia | Rodentia | Sciuridae | Tamiasciurus | hudsonicus | NA | Tamiasciurus.hudsonicus |
NA | NA | NA | blank | NA | .blank | |
NA | NA | NA | spp. | NA | .spp. | |
NA | NA | NA | Canachites | canadensis | NA | Canachites.canadensis |
NA | NA | NA | Unknown | unknown | NA | Unknown.unknown |
NA | NA | NA | Weasel | spp. | NA | Weasel.spp. |
That is a lot of species classifications - are they all correct? If you are familiar with the species you may be able to know by eye, however if you are unfamiliar with them. You can also use a package called taxize
.
5.4.5 Skill Check: Taxize package
Lets start by seeing what taxize
can do with a single species.
Run the following:
For each hit in an online taxonomy database, you get a row in a dataframe and a confidence score in the identification.
Lets try miss-spelling a common name:
Cool! We can even recover incorrectly spelled names!
What about common names?
Well there is a way we can get those too using sci2comm()
”
For more information see the Taxize
documentation.
In this instance, we have an external list of common names which we will use to update our img
file.
# Import the dataframe
tmp <- read.csv("data/raw_data/example_data/common_names.csv")
# Join it with the existing species list
sp_list$common_name <- NULL
sp_list <- left_join(sp_list, tmp)
Then let’s write our species list into our raw data folder:
# Note we use the project_id from from project data frame to name the file - that was we wont overwrite it if we run things with a different project.
write.csv(sp_list, paste0("data/raw_data/",pro$project_id[1],"_raw_species_list.csv"))
Then lets update the common_name
column in our img
dataframe to reflect the updated common names.
We will do this using a left_join()
, an operation which is invaluable when programming in R.
It uses a specified “key” variable to merge two dataframes in this case we will use the ‘sp’ column.
# first remove the common_name column
img$common_name <- NULL
# add an sp column to the img dataframe - remember the genus and species columns are not pasted together yet
img$sp <- paste(img$genus, img$species, sep=".")
# Next we do the 'left_join'
img <- left_join(img, sp_list[, c("sp", "common_name")], by="sp")
Lets move on!
5.5 Diel activity check
Sometimes when setting up a camera trap, you can input the time incorrectly. This is actually very hard to detect unless you happen to be looking for it. The way we check is to plot the detections for each species by the 24 hour clock. If were get detections of nocturnal species in the day, or vice versa, it suggests there may be a problem.
Caveat A diurnal species active at night doesn’t mean there is actually a problem, camera traps have revealed that many animals are active when we thought they were not!
Cool note Researchers are increasingly using this information to determine a species “availability” for detection! More on that in the density and activity chapters.
For any species detected more than 10 times, we will plot when they were detected:
# First lets convert our timestamp to decimal hours
img$hours <- hour(img$timestamp) + minute(img$timestamp)/60 + second(img$timestamp)/(60*60)
# Count all of the captures
tmp <- img %>% group_by(common_name) %>% summarize(count=n())
yform <- list(categoryorder = "array",
categoryarray = tmp$common_name)
fig <- plot_ly(x = img$hours, y = img$common_name,type="scatter",
height=1000, text=img$deployment_id, hoverinfo='text',
mode = 'markers',
marker = list(size = 5,
color = 'rgba(50, 100, 255, .2)',
line = list(color = 'rgba(0, 0, 0, 0)',
width = 0))) %>%
layout(yaxis = yform)
fig
Can you see any exclusively diurnal species?
Sandhill crane would be a good candidate. They are very rarely detected at night.
Can you see any exclusively nocturnal species?
Snowshoe hare!
More on activity data in the Activity chapter
5.6 Conclusion
Congratulations - you have thoroughly error checked your camera data! We may find more errors in the data exploration chapter, so stay vigilant.