I design tools that promote animal health and welfare in the zoo and agricultural industries.
I constantly find myself needing a function to subtract two sets from each other, yet can’t find a good way to do this in R.
As a reprex, let’s say a
is my inventory of items, and b
is a customer order. I want to know what products are left in my warehouse after fulfilling the customer’s order.
For example:
a <- c(1,1,1,1,2,2,3,3)
b <- c(1,1,1,3)
my remaining inventory is c(1,2,2,3)
.
These don’t work:
setdiff(a,b)
returns 2a - b
iterates subtraction over the
vectorsintersect(a,b)
returns only (1,3) because duplicates are
removed(a[a %in% intersect(a, b)])
returns too many items c(1, 1, 1, 1, 3, 3)
because R includes any matching digit as many times as it matches.How can I simply remove all elements in one vector from another, treating each item as unique, so not removing duplicates?
Here’s a function I wrote to do that:
library(dplyr)
library(tidyr)
subtract_sets <- function(inventory, order){
inv <- inventory %>%
table() %>%
as_tibble() %>%
rename(item = 1)
ord <- order %>%
as.character() %>%
table() %>%
as_tibble() %>%
rename(item = 1)
final <- left_join(inv, ord, by = "item") %>%
mutate(n.y = replace_na(n.y, 0)) %>%
mutate(Freq = ifelse(n.y >= n.x, 0, n.x - n.y))
as.vector(rep.int(final$item, times = final$Freq))
}
Here is the result of that.
a <- c(1,1,1,1,2,2,3,3)
b <- c(1,1,1,3)
subtract_sets(a, b)
## [1] "1" "2" "2" "3"
It works even if the second set has items not in the first set.
c <- c(1,1,1,1,2,2,3,3)
d <- c(1,1,1,3,3,3,3,4)
subtract_sets(c, d)
## [1] "1" "2" "2"
On a related note, for finding out if two sets contain the same elements including duplicates, but ignoring order:
same_elements <- function(x,y) setequal(x,y) && (length(x) == length(y))
So, to wrap up:
a <- c(1,2)
b <- c(2,1)
c <- c(1,2,2)
d <- c(1,1,1,1,2,2,3,3)
e <- c(1,1,2,3)
# 'identical' only returns true if sets have both same ORDER and ELEMENTS
identical(a,b) # FALSE
## [1] FALSE
identical(a,c) # FALSE
## [1] FALSE
# 'same_elements' checks if sets contain the same elements including duplicates, but ignores ORDER
same_elements(a,b) # TRUE
## [1] TRUE
same_elements(a,c) # FALSE
## [1] FALSE
same_elements(d, e) # FALSE -
## [1] FALSE
same_elements(d,a) # FALSE
## [1] FALSE
# 'setequal' returns TRUE as long as ANY elements match, ignores duplicates
setequal(a,b) # TRUE
## [1] TRUE
setequal(a,c) # TRUE
## [1] TRUE
setequal(d,c) # FALSE
## [1] FALSE
# 'subtract_sets' sets returns the difference, treats duplicates as additional items
subtract_sets(a,b) # empty set
## character(0)
subtract_sets(c,a) # c(2)
## [1] "2"
subtract_sets(d,e) # c(1, 1, 2, 3)
## [1] "1" "1" "2" "3"
subtract_sets(e, b) # c(1, 3)
## [1] "1" "3"