Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

i have an array with randomly placed numbers from 1 to 1000000 ,there is an integer duplicated in the array what is the easiest , perf. efficient way to find that value

Posted on 2014-11-19
17
Medium Priority
?
250 Views
Last Modified: 2016-02-18
Hi,

I have been recently asked a question by someone and it drove me crazy as i could not give an efficient answer to a question.

so the trick is; there is an array with numbers  placed from 1 to 1 million randomly but one of the integers inside the array has been duplicated and i need to find this value in the most efficient way. what could this be?

Obvious answer is to get the first value of the array and compare this with the rest and then move to the next one etc.. but this is not what is wanted, there needs to be a way to very efficiently retrieve this but how?

could you help me on this puzzle like algorithm question? thanks
0
Comment
Question by:nicedone
  • 4
  • 3
  • 2
  • +6
17 Comments
 
LVL 85

Expert Comment

by:ozo
ID: 40453837
If the array contains 1000001 entries, with one each of all integer values from 1 to 1000000, except for one value that is duplicated, then you can take the sum of all elements and subtract 500000500000
0
 
LVL 84

Expert Comment

by:Dave Baldwin
ID: 40453866
In databases, the 'standard' method is to sort the rows on the field you want to find the duplicates in.  Then duplicates will be in consecutive rows.  If you can sort the array, you can look for the occurrence of the same value in two consecutive positions.  I don't know C# but PHP has an array_sort function that I can use.  Sort functions are usually much more efficient than going thru the array and making comparisons to each and every element.  Then I can use 'foreach' to go thru the sorted array to find items that are duplicated.
0
 
LVL 85

Expert Comment

by:ozo
ID: 40453892
If the array can contain some missing values as well as a duplicate value, then you might use a hash table or a bit vector to record numbers that you have seen and detect a duplicate when you find a number you have seen before
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 33

Assisted Solution

by:phoffric
phoffric earned 400 total points
ID: 40453899
You could have a bit array of 1 million bits. As you go through each integer value in the array, you see if the corresponding bit in the array has been set. If it has, then you now know the index in the array where the duplicate value has occurred and you know the value. If the bit is not set, then set it.

For a faster routine, and for only 8 times the cost in memory resources, you could change the bit array into a 1 million byte array and 0 means not yet found, and 1 means, the number is already found.
0
 
LVL 23

Expert Comment

by:Michael Fowler
ID: 40454166
You could move the array to a dictionary and either stop when you get an error or even get a count of the number of duplicates eg

Dictionary dict = new Dictionary<int, int>();    
foreach (int i in array)
{
   if (dict.ContainsKey(i))
      dict[i]++;
    else
        dict.Add (i, 1)
 }

Open in new window

Dictionary dict = new Dictionary<int, int>(); 
int dup = 0   
foreach (int i in array)
{
     if (dict.ContainsKey(i))
         dup = i;
         break;
}

Open in new window

0
 
LVL 85

Assisted Solution

by:ozo
ozo earned 400 total points
ID: 40454197
If there are no missing values, and exactly one duplicate value, and no values outside of the range 1 to 1000000,
and you don't want to overflow a 32 bit int while adding up all numbers from 1 to 1000000,
then you could use xor instead of add, and xor the result with 1000000.
0
 
LVL 23

Expert Comment

by:Michael Fowler
ID: 40454225
Second code block from my post is wrong. 2nd time lucky

Dictionary dict = new Dictionary<int, int>();
int dup = 0 
foreach (int i in array)
{
   if (dict.ContainsKey(i))
      dup = i;
      break;
    else
       dict.Add (i, 1)
 }

Open in new window

0
 

Expert Comment

by:Garnik Giloyan
ID: 40454282
Hello

You can find the duplicate values using the following algorithm.

int my_array[] = { 50, 51, 32, 50, 7, 3, 7 };
	int size = sizeof(my_array) / sizeof(my_array[0]);

	for (int i = 0; i < size; i++)
	{
		for (int j = i + 1; j < size; j++)
		{
			if (my_array[i] == my_array[j])
				printf(" %d ", my_array[i]);
		}
	}

Open in new window


But also you can use 'Set' Associative containers which will ignore the duplicate values
0
 
LVL 45

Expert Comment

by:AndyAinscow
ID: 40454374
I'd have suggested a dictionary - lot of the work already done for you - and check as you add to the dictionary if that key is in use.  BUT, Michael74 has already suggested that.
I note you request efficient.
You also talk of 1 to 1 million as the range of numbers being stored in the array but it is no clear how many slots there are in your array.  10, 100, 1000, 100,000,000 (extreme but not excluded from your wording as I understand it)
If the array has relatively few values then the 'double' looping is easy to code and understand and probably pretty much as efficient as the dictionary.  However with 100 million entries in the array (that still leaves a reasonable possibility of some numbers only appearing once) then that is very poor on performance.
0
 
LVL 18

Expert Comment

by:Richard Lee
ID: 40455212
Hi,

Here is an approach in C# that is more commonly used in an RDBMS. Its effective and quite fast. Simply using Grouping.

var Duplicates = Numbers
	.GroupBy(x => x)
	.Where(x => x.Count() > 1)
	.Select(x => x.Key)
	.ToArray();

Open in new window


Any numbers found in the resulting array are duplicates. If you want to know how often each number of duplication then you can try the following query.

var Duplicates = Numbers
	.GroupBy(x => x)
	.Where(x => x.Count() > 1)
	.Select(x => new { Number = x.Key, Count = x.Count() })
	.ToArray();

Open in new window


A different approach from the previous answers but very fast.

Hope this helps.
0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 1200 total points
ID: 40455318
I'd have to agree with ozo's and phoffric's suggestion if performance is truly critical. A bit array would have you only looping the array once to locate the duplicate. So in terms of runtime cost, you should be looking at O(n). Memory cost is, of course, greater because you'd be introducing a second array/vector to track duplicate values.

Again, if performance is paramount, then the LINQ and sorting suggestions will not be as performant. LINQ has a bit of overhead associated with it--that's on top of the overhead you introduce by creating the new anonymous objects in the GroupBy's and Select's. Sorting is a bit better on performance than LINQ would be, but you've still got the overhead of executing the sort. You can mitigate some of this by choosing the most-performant sort, but you'll still have some degree of overhead.

The dictionary approach is the bit array solution reimagined. It's easier to read, but it may not be as performant. You'd have to time it.

If performance is not critical, then the LINQ or dictionary solutions are good ways to go in terms of readability. These days, readability trumps performance save for a select few areas.

As a C# example of what ozo and phoffric suggest, you could use the BitArray class:

e.g.

using System;
using System.Collections;
using System.Diagnostics;

namespace ConsoleApplication34
{
    class Program
    {
        static void Main(string[] args)
        {
            const int MAX_VALUE = 1000000;
            BitArray vector = new BitArray(MAX_VALUE + 1, false);
            int[] array = SetupArray(MAX_VALUE);
            Stopwatch clock = Stopwatch.StartNew();

            for (int i = 0; i < array.Length; i++)
            {
                int value = array[i];

                if (vector[value])
                {
                    Console.WriteLine("Duplicate value: {0}", value);
                }
                else
                {
                    vector[value] = true;
                }
            }

            clock.Stop();

            Console.WriteLine("Elapsed MS: {0}", clock.ElapsedMilliseconds);

            System.Diagnostics.Debugger.Break();
        }

        static int[] SetupArray(int maxItems)
        {
            Random rand = new Random();
            int[] array = new int[maxItems];

            for (int i = 1; i < maxItems; i++)
            {
                rand.Next(rand.Next());
                array[i] = rand.Next(1, maxItems + 1);
            }

            return array;
        }
    }
}

Open in new window


There's probably a way you can pull of the same with an array of integers and bit masking, but I don't know that it would be any more performant. That would be something you'd have to time.
0
 
LVL 18

Expert Comment

by:Richard Lee
ID: 40455913
Bringing the figures to the table might clarify matters a bit. Using an example of 1 million randomly generated numbers such as the following yielded an elapsed time of 531.0978ms.

class Program
{
	static void Main(string[] args)
	{
		var rand = new Random((int) DateTime.Now.Ticks);
		var sw = new Stopwatch();

		var array = new int[1000000];

		for (int i = 0; i < array.Length; i++)
			array[i] = rand.Next(1000000);

		sw.Start();
			
		var duplicates = array
			.GroupBy(x => x)
			.Where(x => x.Count() > 1)
			.Select(x => x.Key)
			.ToArray();

		sw.Stop();

		Console.WriteLine("Elapsed Time: {0}", sw.Elapsed.TotalMilliseconds);
		Console.ReadKey();
	}
}

Open in new window


Hope this helps clarify matters.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 40455946
@DaTribe

Oh, it's on like Donkey Kong now...    ; )

My code is slower because of the printing to the console. Take that bit out, and you're looking at around 15 ms (on my i7-3770).
0
 
LVL 33

Expert Comment

by:phoffric
ID: 40456294
1/2 second is a very long time, IMO.
0
 
LVL 18

Expert Comment

by:Richard Lee
ID: 40456400
Okay I concede. :)
0
 
LVL 85

Expert Comment

by:ozo
ID: 40456422
I'm still wondering if there is a reason the question author said "randomly placed numbers from 1 to 1000000" instead of "random numbers from 1 to 1000000"
0
 

Author Closing Comment

by:nicedone
ID: 40463427
@kaufman ,thank you for your great help and code of course !! it was really helpful
@ozo ,thanks for your help
@phoffric, thank you for your support
0

Featured Post

[Webinar] Database Backup and Recovery

Does your company store data on premises, off site, in the cloud, or a combination of these? If you answered “yes”, you need a data backup recovery plan that fits each and every platform. Watch now as as Percona teaches us how to build agile data backup recovery plan.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Prime numbers are natural numbers greater than 1 that have only two divisors (the number itself and 1). By “divisible” we mean dividend % divisor = 0 (% indicates MODULAR. It gives the reminder of a division operation). We’ll follow multiple approac…
A long time ago (May 2011), I have written an article showing you how to create a DLL using Visual Studio 2005 to be hosted in SQL Server 2005. That was valid at that time and it is still valid if you are still using these versions. You can still re…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…
Is your OST file inaccessible, Need to transfer OST file from one computer to another? Want to convert OST file to PST? If the answer to any of the above question is yes, then look no further. With the help of Stellar OST to PST Converter, you can e…
Suggested Courses
Course of the Month12 days, 13 hours left to enroll

578 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question