Skip to contents

Computes similarity scores between two or more lists of numeric values using multiple comparison methods.

Usage

same_number(
  ...,
  method = c("exact", "raw", "exp", "percent", "normalized", "fuzzy"),
  epsilon = 0.05,
  epsilon_pct = 0.02,
  max_diff = NULL,
  digits = 3
)

Arguments

...

Two or more lists containing numeric values to compare. Can be named (e.g., "l1" = list1, "l2" = list2) to control list names.

method

Character vector specifying similarity methods (default: all)

epsilon

Threshold for fuzzy matching (default: NULL for auto-calculation)

epsilon_pct

Relative epsilon percentile (default: 0.02 or 2%). Only used when method is "fuzzy"

max_diff

Maximum difference for normalization (default: NULL for auto-calculation)

digits

Number of digits to round results (default: 3)

Value

An S3 object containing:

  • scores: A list of similarity scores for each method and list pair

  • summary: A list of statistical summaries for each method and list pair

  • methods: The similarity methods used

  • list_names: Names of the input lists

  • raw_values: The original input lists

Details

The available methods are:

  • exact: Binary similarity (1 if equal, 0 otherwise)

  • percent: Percentage difference relative to the larger value

  • normalized: Absolute difference normalized by a maximum difference value

  • fuzzy: Similarity based on an epsilon threshold

  • exp: Exponential decay based on absolute difference (e^-diff)

  • raw: Returns the raw absolute difference (|num1 - num2|) instead of a similarity score

Examples

list1 <- list(1, 2, 3)
list2 <- list(1, 2.1, 3.2)

# Using unnamed lists
result1 <- same_number(list1, list2)
#>  Using auto-calculated max_diff: 2.2
#>  Computed exact scores for "list1_list2" [mean: 0.333]
#>  Computed raw scores for "list1_list2" [mean: 0.1]
#>  Computed exp scores for "list1_list2" [mean: 0.908]
#>  Computed percent scores for "list1_list2" [mean: 0.963]
#>  Computed normalized scores for "list1_list2" [mean: 0.955]
#>  Computed fuzzy scores for "list1_list2" [mean: 0.978]

# Using named lists for more control
result2 <- same_number("n1" = list1, "n2" = list2)
#>  Using auto-calculated max_diff: 2.2
#>  Computed exact scores for "n1_n2" [mean: 0.333]
#>  Computed raw scores for "n1_n2" [mean: 0.1]
#>  Computed exp scores for "n1_n2" [mean: 0.908]
#>  Computed percent scores for "n1_n2" [mean: 0.963]
#>  Computed normalized scores for "n1_n2" [mean: 0.955]
#>  Computed fuzzy scores for "n1_n2" [mean: 0.978]