Subtracting sets

Tools Tidy Data 3 min read • Last updated: Apr 10, 2019

I constantly find myself needing a function to subtract two sets from each other, yet can’t find a good way to do this in R.

As a reprex, let’s say a is my inventory of items, and b is a customer order. I want to know what products are left in my warehouse after fulfilling the customer’s order.

For example:

a <- c(1,1,1,1,2,2,3,3)
b <- c(1,1,1,3)

my remaining inventory is c(1,2,2,3).

These don’t work:

setdiff(a,b) returns 2
a - b iterates subtraction over the vectors
intersect(a,b) returns only (1,3) because duplicates are removed
(a[a %in% intersect(a, b)]) returns too many items c(1, 1, 1, 1, 3, 3) because R includes any matching digit as many times as it matches.

How can I simply remove all elements in one vector from another, treating each item as unique, so not removing duplicates?

Here’s a function I wrote to do that:

library(dplyr)
library(tidyr)

subtract_sets <- function(inventory, order){

inv <- inventory %>%
  table() %>%
  as_tibble() %>%
  rename(item = 1)

ord <- order %>%
  as.character() %>%
  table() %>%
  as_tibble() %>%
  rename(item = 1)

final <- left_join(inv, ord, by = "item") %>%
  mutate(n.y = replace_na(n.y, 0)) %>%
  mutate(Freq = ifelse(n.y >= n.x, 0, n.x - n.y)) 

as.vector(rep.int(final$item, times = final$Freq))
}

Here is the result of that.

a <- c(1,1,1,1,2,2,3,3)
b <- c(1,1,1,3)
subtract_sets(a, b)

## [1] "1" "2" "2" "3"

It works even if the second set has items not in the first set.

c <- c(1,1,1,1,2,2,3,3)
d <- c(1,1,1,3,3,3,3,4)
subtract_sets(c, d)

## [1] "1" "2" "2"

On a related note, for finding out if two sets contain the same elements including duplicates, but ignoring order:

same_elements <- function(x,y) setequal(x,y) && (length(x) == length(y))

So, to wrap up:

a <- c(1,2)
b <- c(2,1)
c <- c(1,2,2)
d <- c(1,1,1,1,2,2,3,3)
e <- c(1,1,2,3)

# 'identical' only returns true if sets have both same ORDER and ELEMENTS
identical(a,b)  # FALSE

## [1] FALSE

identical(a,c)  # FALSE

## [1] FALSE

# 'same_elements' checks if sets contain the same elements including duplicates, but ignores ORDER
same_elements(a,b) # TRUE

## [1] TRUE

same_elements(a,c) # FALSE

## [1] FALSE

same_elements(d, e) # FALSE -

## [1] FALSE

same_elements(d,a) # FALSE

## [1] FALSE

# 'setequal' returns TRUE as long as ANY elements match, ignores duplicates
setequal(a,b)  # TRUE

## [1] TRUE

setequal(a,c)  # TRUE

## [1] TRUE

setequal(d,c)  # FALSE

## [1] FALSE

# 'subtract_sets' sets returns the difference, treats duplicates as additional items
subtract_sets(a,b) # empty set

## character(0)

subtract_sets(c,a) # c(2)

## [1] "2"

subtract_sets(d,e)  # c(1, 1, 2, 3)

## [1] "1" "1" "2" "3"

subtract_sets(e, b) # c(1, 3)

## [1] "1" "3"

DataKritter

DataKritter

Subtracting sets