Where I work, one of the serviced we give to our clients is Medical articles.With the time, we have tones of articles. Millions. All PDF files, and all are in the same folder.
It became impossible to open the folder with Windows explorer, trying to search, copy or move files. I had to find a way to reorder it, and divide it into 1000 folders that every folder will have around the same number of files, and when a request for a specific file will come, I will be able to know in witch folder it is.
I contacted my brother Ari (his site is outdated) for help. He have PhD from the "Electrical Engineering department" at the Technion institute. He is the smartest guy I have known. So, that is his Algorithm, I just implemented it in C#. It's not too complicate, but it does the job perfect.
The algorithm gets a string and maps it into the set 0-999 (or any given range) with uniform distribution.
/// Get the position in the range of the specified string
/// <param name="input"></param>
public int GetPosition(string input)
input = hasher.ComputeHash(input).Replace("-",string.Empty);
double Sum = 0.0;
for (int j = 0; j < input.Length; j++)
Aj = (int)input[j];
Tj = (Math.PI * (1 + j%5) / 2);
Pj = Math.Pow(Aj,Tj);
Sum += Math.Round((Math.Ceiling(Pj) - Pj) * Range) * Tj;
return ((int)(Sum % Range)) + LowerValue;
To use it, just create an object from type Mapper, and call the 'GetPosition' method for any needed string:
Mapper aMap = new Mapper(RANGE);
int position = aMap .GetPosition(input));
I run the algorithm few times with random strings 10 chars long and map them into some ranges of 'folders'. Here are the results:
1,000,000 random strings into range of 1000: the folder with the maximum # of files contained 1156 files, and the minimum contained 886 files.
1,000,000 random strings into range of 100: the folder with the
maximum # of files contained 10,699 files, and the minimum contained 8800
1,000,000 random strings into range of 10: the folder with the
maximum # of files contained 106,873 files, and the minimum contained 98,747 files.
Very nice results. Maybe not perfect, but certainly good enough for this kind of purpose!
Special thanks to my brother Ari for his help.
AriMap.zip (2.34 kb)