使用hashCode和Arrays.equals时可能存在散列问题

时间:2021-05-29 16:45:57

As the comment in my code explains, the task is to find the number of pairs of strings from a given input file which are permutations of each other. For example, "ABCD" and "BCDA" are permutations of each other, meaning that a pair has been found.

正如我的代码中的注释所解释的那样,任务是找到给定输入文件中的字符串对的数量,这些字符串是彼此的排列。例如,“ABCD”和“BCDA”是彼此的排列,意味着已经找到了一对。

The main bulk of my program is then as follow:

我的程序的主要部分如下:

/**
 * Finds the number of pairs of strings that are permutations of each other.
 * 
 * A hash map is created with a hash code generated from the array formed using the getFrequency
 * method as key and a pair containing a string array and the number of times a permutation of that 
 * particular string array has been found as value.
 * 
 * If a permutation is already in the hash table previously, increment the counter.
 */
public static int findPairs(String fileName) {
    try {
        //Sets up the necessary file readers
        FileReader dataFile = new FileReader(fileName);
        BufferedReader bufferedDataFile = new BufferedReader(dataFile);

        String line = bufferedDataFile.readLine();

        //Finds the number of entries in the file
        int num = Integer.parseInt(line);

        int counter = 0;
        int accumulator = 0;

        HashMap<Integer, Pair> store = new HashMap<>();

        for (int i = 0; i < num; i++) {
            String current = bufferedDataFile.readLine();
            int[] currentArr = getFrequency(current);
            int currHashCode = Arrays.hashCode(currentArr);

            if (store.containsKey(currHashCode)) {
                Pair pairToCheck = store.get(currHashCode);
                int[] arrToCheck = pairToCheck.getArr();

                //Double checking, in case there is a collision and unequal arrays 
                //have the same hashCode
                if (Arrays.equals(currentArr, arrToCheck)) {
                    counter = pairToCheck.getCount();
                    pairToCheck.updateCount();
                } else {
                    //if the current bucket is not empty, and not a permutation of the input string,
                    //continue to conduct a linear  probe
                    while (pairToCheck != null && !Arrays.equals(currentArr, arrToCheck)) {
                        currHashCode++;
                        pairToCheck = store.get(currHashCode);
                        arrToCheck = pairToCheck.getArr();
                    }

                    //if the current bucket is empty, add the new pair into the position
                    if (pairToCheck == null) {
                        counter = 0;
                    //otherwise, a permutation has been found later in the linear probe!
                    } else {
                        counter = pairToCheck.getCount();
                        pairToCheck.updateCount();
                    }
                }
            //no such permutation in the hash table yet!    
            } else {
                counter = 0;
            }

            //Updates the accumulator using the counter. If there were already other strings
            //which are permutations of the current string, the current string will be able to
            //form a pair with each of these strings.
            accumulator += counter;

            //Updates the hash map only if the permutation has not been stored previously
            if (counter == 0) {
                Pair newPair = new Pair(currentArr, 1);
                store.put(currHashCode, newPair);
            }
        }

        //Close the file reader
        bufferedDataFile.close();

        return accumulator;
    } catch (Exception e) {
        System.out.println(e);
    }

    //In the event of an error, return -1
    return -1;
}

What are some potential problems which can result from such manipulation of Java's hashCode and Arrays implementations? This is particularly because I have been given some private test cases to pass, and while I can pass a number of them, there's one which I repeatedly fail. I suspect it has to do with the way I am dealing with collisions... But although I have inspected this multiple times, I am still uncertain where the error might possibly lie. Any help is much appreciated!

Java的hashCode和Arrays实现的这种操作会导致哪些潜在的问题?这尤其是因为我已经获得了一些私有测试用例,虽然我可以通过其中一些,但是我反复失败了。我怀疑它与我处理碰撞的方式有关......但是虽然我多次检查过这个问题,但我仍然不确定错误可能在哪里。任何帮助深表感谢!

EDIT: As per request, here is my getFrequency method:

编辑:根据要求,这是我的getFrequency方法:

public static int[] getFrequency(String s) {
    //There are 128 legal ascii characters
    int[] charArr = new int[128];

    //Iterate through the given string, and increment the count for a character using its 
    //ascii value to locate its position in the array
    for (int i = 0; i < s.length(); i++) {

        char c = s.charAt(i);
        int ascii = (int) c;
        charArr[ascii] += 1;    
    }

    return charArr;
}

EDIT 2: And Pair:

编辑2:和配对:

public class Pair {

   private int[] m_arr;
   private int m_count;

   public Pair(int[] arr, int count) {
       this.m_arr = arr;
       this.m_count = count;
   }

   public int[] getArr() {
       return this.m_arr;
   }

   public int getCount() {
       return this.m_count;
   }

   public void updateCount() {
       this.m_count++;
   }

}

1 个解决方案

#1


2  

Finding anagrams is a known problem. The usual solution is to sort the strings and compare sorted strings. When you sort, "ABCD" and "BCDA" both become "ABCD".

寻找字谜是一个众所周知的问题。通常的解决方案是对字符串进行排序并比较排序的字符串。排序时,“ABCD”和“BCDA”都变为“ABCD”。

Storing the sorted strings in a set will let you find matches easily. Make a class that keeps the string in its sorted and unsorted versions separately for easy retrieval of the unsorted version of the string.

将已排序的字符串存储在一个集合中可以让您轻松找到匹配项。创建一个类,将字符串分别保存在已排序和未排序的版本中,以便轻松检索字符串的未排序版本。

Your hash function is not good, since "BB" will hash to the same value as "AC". Use a better hash function on the sorted version of the string.

你的哈希函数不好,因为“BB”会哈希到与“AC”相同的值。在字符串的排序版本上使用更好的哈希函数。

#1


2  

Finding anagrams is a known problem. The usual solution is to sort the strings and compare sorted strings. When you sort, "ABCD" and "BCDA" both become "ABCD".

寻找字谜是一个众所周知的问题。通常的解决方案是对字符串进行排序并比较排序的字符串。排序时,“ABCD”和“BCDA”都变为“ABCD”。

Storing the sorted strings in a set will let you find matches easily. Make a class that keeps the string in its sorted and unsorted versions separately for easy retrieval of the unsorted version of the string.

将已排序的字符串存储在一个集合中可以让您轻松找到匹配项。创建一个类,将字符串分别保存在已排序和未排序的版本中,以便轻松检索字符串的未排序版本。

Your hash function is not good, since "BB" will hash to the same value as "AC". Use a better hash function on the sorted version of the string.

你的哈希函数不好,因为“BB”会哈希到与“AC”相同的值。在字符串的排序版本上使用更好的哈希函数。