11

I have two arrays and I want to know if they contain the same items. Equals(object obj) doesn't work because an array is a reference type. I have posted my attempt below, but since I'm sure this is a common task I'd like to know if there is a better test.

    public bool ContainsEquivalentSequence<T>(T[] array1, T[] array2)
    {
        bool a1IsNullOrEmpty = ReferenceEquals(array1, null) || array1.Length == 0;
        bool a2IsNullOrEmpty = ReferenceEquals(array2, null) || array2.Length == 0;
        if (a1IsNullOrEmpty) return a2IsNullOrEmpty;
        if (a2IsNullOrEmpty || array1.Length != array2.Length) return false;
        for (int i = 0; i < array1.Length; i++)
            if (!Equals(array1[i], array2[i]))
                return false;
        return true;
    }

Update - System.Linq.Enumerable.SequenceEqual is not better

I reflected the source and it does not compare the length prior to executing the loop. This makes sense since the method is designed generally for an IEnumerable<T>, not for a T[].

    public static bool SequenceEqual<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
    {
        if (comparer == null)
        {
            comparer = EqualityComparer<TSource>.Default;
        }
        if (first == null)
        {
            throw Error.ArgumentNull("first");
        }
        if (second == null)
        {
            throw Error.ArgumentNull("second");
        }
        using (IEnumerator<TSource> enumerator = first.GetEnumerator())
        {
            using (IEnumerator<TSource> enumerator2 = second.GetEnumerator())
            {
                while (enumerator.MoveNext())
                {
                    if (!enumerator2.MoveNext() || !comparer.Equals(enumerator.Current, enumerator2.Current))
                    {
                        return false;
                    }
                }
                if (enumerator2.MoveNext())
                {
                    return false;
                }
            }
        }
        return true;
    }
smartcaveman
  • 41,281
  • 29
  • 127
  • 212
  • Take a look at LINQ Intersect() method – sll Aug 16 '11 at 23:19
  • 2
    @sllev: `SequenceEqual` would be more appropriate in this case. – LukeH Aug 16 '11 at 23:20
  • 3
    Possible duplicate: http://stackoverflow.com/questions/486749/compare-two-net-array-objects – Bas Aug 16 '11 at 23:21
  • @LukeH : Thanks for pointing to this one! – sll Aug 16 '11 at 23:21
  • @LukeH: `SequenceEqual` uses `IEnumerable`s so cannot do any usefully fast length checking optimisations. But certainly something to use for the bit after length checking. – Callum Rogers Aug 16 '11 at 23:24
  • 1
    Your method name says Contains, but you are also checking they are in the same sequence. Normally contains should be sequence agnostic. I'd call it SequenceEqual. :) – John Buchanan Aug 16 '11 at 23:36
  • @johnbuchanen, good point - I changed it, but not exactly because it would be ambiguous with linq – smartcaveman Aug 16 '11 at 23:40
  • Why has everyone deleted their answers to this question! – manojlds Aug 17 '11 at 00:33
  • @Callum: You're right that `SequenceEqual` doesn't do any optimised length check etc prior to the brute-force checks, but there's no reason in principle why it couldn't: other LINQ methods -- for example, `Count` and `Last` -- have optimised paths if the runtime type is `IList`, `ICollection` etc. – LukeH Aug 17 '11 at 09:06

1 Answers1

24

I've done some tests using Any, Contains, All and SequenceEqual then I picked the best 3.

There are different results for different inputs...

Two identical arrays of size 100: SequenceEqual was faster

[     SequenceEqual: 00:00:00.027   ]*
[     ContainsEqSeq: 00:00:00.046   ]
[          Parallel: 00:00:00.281   ]

Two identical arrays of size 1000: SequenceEqual was faster

[     SequenceEqual: 00:00:00.240   ]*
[     ContainsEqSeq: 00:00:00.361   ]
[          Parallel: 00:00:00.491   ]

Two identical arrays of size 10000: Parallel was faster

[     SequenceEqual: 00:00:02.357   ]
[     ContainsEqSeq: 00:00:03.341   ]
[          Parallel: 00:00:01.688   ]*

Two identical arrays of size 50000: Parallel kick ass

[     SequenceEqual: 00:00:11.824   ]
[     ContainsEqSeq: 00:00:17.206   ]
[          Parallel: 00:00:06.811   ]*

Two arrays with one difference at position 200: SequenceEqual was faster

[     SequenceEqual: 00:00:00.050   ]*
[     ContainsEqSeq: 00:00:00.075   ]
[          Parallel: 00:00:00.332   ]

Two arrays with one difference at position 0: ContainsEqSeq and SequenceEqual were faster

[     SequenceEqual: 00:00:00.002   ]*
[     ContainsEqSeq: 00:00:00.001   ]*
[          Parallel: 00:00:00.211   ]

Two arrays with one difference at position 999: SequenceEqual was faster

[     SequenceEqual: 00:00:00.237   ]*
[     ContainsEqSeq: 00:00:00.330   ]
[          Parallel: 00:00:00.691   ]

Two arrays with one difference at position 9999: Parallel kick ass

[     SequenceEqual: 00:00:02.386   ]
[     ContainsEqSeq: 00:00:03.417   ]
[          Parallel: 00:00:01.614   ]*

The code for SequenceEqual is

a1.SequenceEqual(a2)

The code for ContainsEqSeq is your method.

The code for Parallel is

bool a1IsNullOrEmpty = ReferenceEquals(a1, null) || a1.Length == 0;
bool a2IsNullOrEmpty = ReferenceEquals(a2, null) || a2.Length == 0;
if (a1IsNullOrEmpty) return a2IsNullOrEmpty;
if (a2IsNullOrEmpty || a1.Length != a2.Length) return false;

var areEqual = true;
Parallel.ForEach(a1,
    (i, s, x) =>
    {
        if (a1[x] != a2[x])
        {
            areEqual = false;
            s.Stop();
        }
    });

return areEqual;

I would say that the best one depends on what your input will be.

If you will work with huge arrays (like 10000+) I would say Parallel is the best choice, it only loses when there is a difference on the beginning.

For other cases SequenceEqual might be the best one, I only tested with int[], but I believe it can be fast with complex types as well.

But remember, results will vary accordingly to the input.

BrunoLM
  • 97,872
  • 84
  • 296
  • 452
  • The efficiency of Parallel varies greatly depending on system used, as it utilizes multiple cores because it uses multiple threads, so also take that into account. Upvote because you showed OP's code to be inferior bar that one test for some reason. SequenceEqual it is. – Wolfzoon Aug 27 '16 at 15:34