Python:如何将列表列表与2D数组进行比较?

时间:2022-10-29 18:10:28

I am new to the subject of arrays and using for loops in this context so I was hoping that someone could give me pointers on how to proceed for this problem.

我是数组主题的新手,并在此上下文中使用for循环,所以我希望有人能指出如何解决这个问题。

I have a list of lists which looks like this:

我有一个列表,看起来像这样:

[[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 1], [1, 0, 1], [0, 1, 1], [1, 1, 1]]

and a 2D array which looks like this:

和一个看起来像这样的2D数组:

[[1 0 0]
 [1 0 1]
 [1 1 1]
 [1 0 0]
 [1 1 0]
 [1 0 1]
 [1 1 1]
 [0 0 1]
 [0 0 1]]

The array will eventually have close to 420k records, I would like to get a count of how many times I can see the combinations in the list of lists. I tried using a for loop like so:

该数组最终将有接近420k的记录,我想知道我可以在列表列表中看到组合的次数。我尝试使用for循环,如下所示:

from matplotlib import pyplot as plt
import numpy as np
from matplotlib_venn import venn3, venn3_circles
import os
import sys
from itertools import islice

input_file= "/home/ruchik/bodyMap_Data/bodyMap_Files/final.txt";
col0_idx = 6
col1_idx = 1
col2_idx = 2

print "number of sys arg", len(sys.argv)
print "sys arg list", sys.argv
input_file = sys.argv[1]
col1_idx = int(sys.argv[2])
col2_idx = int(sys.argv[3])

## keep it real ;)
col1_idx -= 1
col2_idx -= 1


print >> sys.stderr,'File is {file} and used columns are {col1} and {col2}.'.format(file=input_file, col1=col0_idx+col1_idx+1, col2=col2_idx+col0_idx+1)

## Openning and reading the file
#f = open(input_file, "r")
#g = open("final_fixed.txt", "w")
#print "Opened File Handle", f
#
#for line in f:
#    if line.strip():
#        g.write("\t".join(line.split()[7:]) + "\n")

#f.close()
#g.close()

#print "File created."

#f = open("final_fixed.txt", "r")
f = open(input_file, "r")

# header_all is a list of the content of the 1st line from position col0_idx-th to last-column-th
header_all_list = []
header_all_list = f.readline().rstrip("\n").split('\t')[col0_idx:]
header_reduced = [header_all_list[col1_idx], header_all_list[col2_idx], 'others']



# data_all is a (line-wise) list of (column-wise) list 
# with the content of each line but the 1st one from position col0_idx-th to last-column-th
data_all_lol = []
for line in f:
        data_all_lol.append(line.rstrip("\n").split('\t')[col0_idx:])

# just print the data_all list of list ... to make sure it is all fine up to there
#for i in range(len(data_all)):
#        for j in range(len(data_all[i])):
#                print >> sys.stderr, 'all data {col_i} , {col_j} : {val_ij}'.format(col_i=i+1, col_j=j+1+col0_idx, val_ij = data_all[i][j])

op_lol = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 1], [1, 0, 1], [0, 1, 1], [1, 1, 1]]
count = [0, ] * len(op_lol)
for i in range(len(op_lol)):
    for j in range(len(data_reduced_transposed_npa)):
        if list(data_reduced_transposed_npa[j]) == op_lol[i]:
            count[i] += 1

op = [[1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 1], [1, 0, 1], [0, 1, 1], [1, 1, 1]]
count2 = [0, ] * len(op_lol)
for column in data_reduced_npa:
        for j in range(len(op_lol)):
                count2[j] += 1

#for k in range(len(op)):
#    print str(op[k]) + ': ' + str(count[k])
#header_venn3 = header_venn
#data_venn3 = data_venn
#print >> sys.stderr,"\nvenn3  order :"
#print >> sys.stderr,"(Abc, aBc, ABc, abC, AbC, aBC, ABC)"
#print >> sys.stderr,"venn3  header :"
#print >> sys.stderr, header_venn3
#print >> sys.stderr,"venn3 data :"
#for i in range(len(data_venn3)):
#    for j in range(len(data_venn3[i])):
#        print >> sys.stderr, 'venn3 data {col_i} , {col_j} : {val_ij}'.format(col_i=i, col_j=j, val_ij = data_venn3[i][j])




## Making the venn'
plt.figure(figsize=(4,4))
v = venn3(subsets=count, set_labels = ('Introns', 'Heart', 'Others'))
v.get_patch_by_id('100').set_alpha(1.0)
v.get_patch_by_id('100').set_color('white')
v.get_label_by_id('100').set_text('Unknown')
plt.show()

But this is just transposing the 2D array and printing that for me, what am I doing wrong?

但这只是转换2D阵列并打印出来,对我而言,我做错了什么?

1 个解决方案

#1


1  

You can't do this:

你不能这样做:

count = [0, ] * len(op_lol);

or this:

或这个:

count2 = [0, ] * len(op_lol);

Those are creating shallow copies of the value 0 in memory, so as you go to index those count lists and assign them new values, you're only overwriting a single location in memory. You need to actually instantiate the lists by calling a for loop, using range, using map, or using the copy.deepcopy() method.

这些是在内存中创建值0的浅表副本,因此当您为这些计数列表编制索引并为其分配新值时,您只会覆盖内存中的单个位置。您需要通过调用for循环,使用range,使用map或使用copy.deepcopy()方法来实际实例化列表。

Also, you don't say where data_reduced_npa and data_reduced_transposed_npa come from, or what they are, so it can't really be said what's causing your output. But at the very least you should look into copy.deepcopy():

另外,你没有说data_reduced_npa和data_reduced_transposed_npa来自哪里,或者它们是什么,因此无法确定是什么导致你的输出。但至少你应该看看copy.deepcopy():

https://docs.python.org/2/library/copy.html

https://docs.python.org/2/library/copy.html

#1


1  

You can't do this:

你不能这样做:

count = [0, ] * len(op_lol);

or this:

或这个:

count2 = [0, ] * len(op_lol);

Those are creating shallow copies of the value 0 in memory, so as you go to index those count lists and assign them new values, you're only overwriting a single location in memory. You need to actually instantiate the lists by calling a for loop, using range, using map, or using the copy.deepcopy() method.

这些是在内存中创建值0的浅表副本,因此当您为这些计数列表编制索引并为其分配新值时,您只会覆盖内存中的单个位置。您需要通过调用for循环,使用range,使用map或使用copy.deepcopy()方法来实际实例化列表。

Also, you don't say where data_reduced_npa and data_reduced_transposed_npa come from, or what they are, so it can't really be said what's causing your output. But at the very least you should look into copy.deepcopy():

另外,你没有说data_reduced_npa和data_reduced_transposed_npa来自哪里,或者它们是什么,因此无法确定是什么导致你的输出。但至少你应该看看copy.deepcopy():

https://docs.python.org/2/library/copy.html

https://docs.python.org/2/library/copy.html