在散点图中找到X轴和Y轴附近的点

I have a particular problem involving a scatter plot I am trying to create. I have a file that contains data in the following format: Each Name is unique and the CompanyScore and CommunityScore are integer values.

我有一个特别的问题涉及到我正在尝试创建的散点图。我有一个包含以下格式数据的文件:每个名称都是唯一的，CompanyScore和CommunityScore都是整数值。

Name    CompanyScore   CommunityScore
Patrick 8383           99000

The file goes on in the same format for quite some time. I am trying to figure out the top twenty points on a scatterplot that would be near the X (CompanyScore) and Y (CommunityScore) axes. There is probably some mathematical way to do this but at the moment I am at a complete loss. Ideally I would probably have to make the scatter plot in Java from the file and then it shouldn't be too hard to figure out the closest values to the X and Y axes right? I'm not sure if there's a library for this type of thing or not. I know there are statistics tools like R but I think it may be almost easier to just see the details in Java. Hopefully this is not a long shot. If anyone could help me I would greatly appreciate it!

这个文件以相同的格式运行了很长一段时间。我试着在一个散点图上找出在X (CompanyScore)和Y (CommunityScore)轴附近的前20个点。也许有某种数学方法可以做到这一点，但目前我完全不知所措。理想情况下，我可能需要从文件中创建Java中的散点图然后算出最接近X轴和Y轴的值应该不会太难，对吧?我不确定这类东西是否有库。我知道有像R这样的统计工具，但是我认为用Java查看细节会更容易。希望这不是一个长远的目标。如果有人能帮助我，我将非常感激!

2 个解决方案

#1

If you're looking for the points who are closest to X or closest to Y, then just pick the ones with the lowest X or Y scores.

如果你在寻找最接近X或者最接近Y的点，那么就选那些X或Y最低的点。

If you're looking for the points who are closest to X and closest to Y, then calculate the distance from that point to zero:

如果你在寻找最接近X和最接近Y的点，那么计算这个点到0的距离:

distances = sqrt(i * ((x - min_x) / (max_x - min_x))^2 + 
                 j * ((y - min_y) / (max_y - min_y))^2 )

where i + j = 1.0, and 0.0 <= i, j <= 1.0 with i and j being the weight
constants if you want to emphasize one axis over the other

and then take the smallest of those distances.

然后取最小的距离。

#2

If I understand you correctly, you want the 20 rows with the lowest CompanyScore values, and the 20 rows with the lowest CommunityScore values. You can do this with ?order and ?head. Try:

如果我理解正确，您想要具有最低CompanyScore值的20行，以及具有最低CommunityScore值的20行。你可以按顺序和顺序来做。试一试:

head(myData[order(myData$CompanyScore)],   n=20)
head(myData[order(myData$CommunityScore)], n=20)

I'm assuming that all values are positive. If you want the values that are closest to 0, and some are negative, you could use abs(myData$CompanyScore), e.g., inside the order() call instead.

我假设所有的值都是正的。如果您想要接近0的值，而有些值是负数，您可以使用abs(myData$CompanyScore)，例如，在order()调用中。

#1