Skip to contents

Computes similarity scores between two or more lists of numeric values. Applies various mathematical comparison methods including exact matching, percentage differences, normalized differences, fuzzy threshold-based matching, and exponential decay metrics.

Usage

same_number(
  ...,
  method = c("exact", "raw", "exp", "percent", "normalized", "fuzzy"),
  epsilon = 0.05,
  epsilon_pct = 0.02,
  max_diff = NULL,
  digits = 3
)

Arguments

...

Two or more lists containing numeric values to compare.

method

Character vector specifying similarity methods (default: c("exact", "raw", "exp", "percent", "normalized", "fuzzy")).

epsilon

Threshold for fuzzy matching (default: NULL for auto-calculation).

epsilon_pct

Relative epsilon percentile for "fuzzy" method (default: 0.02).

max_diff

Maximum difference for normalization (default: NULL for auto-calculation).

digits

Number of digits to round results (default: 3).

Value

An S3 object containing:

  • scores: A list of similarity scores for each method and list pair

  • summary: A list of statistical summaries for each method and list pair

  • methods: The similarity methods used

  • list_names: Names of the input lists

  • raw_values: The original input lists

Details

The available methods are:

  • exact: Binary similarity (1 if equal, 0 otherwise)

  • percent: Percentage difference relative to the larger value

  • normalized: Absolute difference normalized by a maximum difference value

  • fuzzy: Similarity based on an epsilon threshold

  • exp: Exponential decay based on absolute difference (e^-diff)

  • raw: Returns the raw absolute difference (|num1 - num2|) instead of a similarity score

Examples

list1 <- list(1, 2, 3)
list2 <- list(1, 2.1, 3.2)

# Using unnamed lists
result1 <- same_number(list1, list2)
#>  Using auto-calculated max_diff: 2.2
#>  Computed exact scores for "list1_list2" [mean: 0.333]
#>  Computed raw scores for "list1_list2" [mean: 0.1]
#>  Computed exp scores for "list1_list2" [mean: 0.908]
#>  Computed percent scores for "list1_list2" [mean: 0.963]
#>  Computed normalized scores for "list1_list2" [mean: 0.955]
#>  Computed fuzzy scores for "list1_list2" [mean: 0.978]

# Using named lists for more control
result2 <- same_number("n1" = list1, "n2" = list2)
#>  Using auto-calculated max_diff: 2.2
#>  Computed exact scores for "n1_n2" [mean: 0.333]
#>  Computed raw scores for "n1_n2" [mean: 0.1]
#>  Computed exp scores for "n1_n2" [mean: 0.908]
#>  Computed percent scores for "n1_n2" [mean: 0.963]
#>  Computed normalized scores for "n1_n2" [mean: 0.955]
#>  Computed fuzzy scores for "n1_n2" [mean: 0.978]