如何使以下python程序（代码）更高效？

Any efficient way to solve the following problem assuming data is large. I solved the problem but how can I improve the code, which will make it efficient. any suggestions?

假设数据很大,任何有效的方法来解决以下问题。我解决了这个问题,但是如何改进代码,这将使其高效。有什么建议?

Data:

movie_sub_themes = {
'Epic': ['Ben Hur', 'Gone With the Wind', 'Lawrence of Arabia'],
'Spy': ['James Bond', 'Salt', 'Mission: Impossible'],
'Superhero': ['The Dark Knight Trilogy', 'Hancock, Superman'],
'Gangster': ['Gangs of New York', 'City of God', 'Reservoir Dogs'],
'Fairy Tale': ['Maleficent', 'Into the Woods', 'Jack the Giant Killer'],
'Romantic':['Casablanca', 'The English Patient', 'A Walk to Remember'],
'Epic Fantasy': ['Lord of the Rings', 'Chronicles of Narnia', 'Beowulf']}

movie_themes = {
'Action': ['Epic', 'Spy', 'Superhero'],
'Crime' : ['Gangster'],
'Fantasy' : ['Fairy Tale', 'Epic Fantasy'],
'Romance' : ['Romantic']}

themes_keys = movie_themes.keys()
theme_movies_keys = movie_sub_themes.keys()

#Iterate in movie_themes
#Check movie_themes keys in movie_sub_keys
#if yes append the movie_sub_keys into the newdict
newdict = {}
for i in range(len(themes_keys)):
   a = []
   for j in range(len(movie_themes[themes_keys[i]])):
     try:
         if movie_themes[themes_keys[i]][j] in theme_movies_keys:
            a.append(movie_sub_themes[movie_themes[themes_keys[i]][j]])
     except:
         pass
   newdict[themes_keys[i]] = a

# newdict contains nested lists
# Program to unpack the nested list into single list
# Storing the value into theme_movies_data 
theme_movies_data = {}
for k, v in newdict.iteritems():
    mylist_n = [j for i in v for j in i]
    theme_movies_data[k] = dict.fromkeys(mylist_n).keys()

print (theme_movies_data)

Output:

{'Action': ['Gone With the Wind', 'Ben Hur','Hancock, Superman','Mission: Impossible','James Bond','Lawrence of Arabia','Salt','The Dark Knight Trilogy'],
 'Crime': ['City of God', 'Reservoir Dogs', 'Gangs of New York'],
 'Fantasy': ['Jack the Giant Killer','Beowulf','Into the Woods','Maleficent','Lord of the Rings','Chronicles of Narnia'],
 'Romance': ['The English Patient', 'A Walk to Remember', 'Casablanca']}

Apologies for not properly commenting the code.

抱歉没有正确评论代码。

I am more concern about the running time.

我更关心的是运行时间。

Thank you..

2 个解决方案

#1

You could use a relational database to store two tables, one of movies and their sub-theme and one relating sub-themes to movie themes. You could then use SQL to query the database, selecting a list of all movies and their associated movie themes.

您可以使用关系数据库来存储两个表,一个是电影及其子主题,另一个是与电影主题相关的子主题。然后,您可以使用SQL查询数据库,选择所有电影及其相关电影主题的列表。

This approach would be more efficient, as SQL commands tend to be compiled for speed of processing. The relational database model is very scalable, and so will work for very large datasets with minimal overhead.

这种方法会更有效,因为SQL命令往往是为了处理速度而编译的。关系数据库模型具有很高的可扩展性,因此可以用于非常大的数据集而且开销最小。

For an example of creating and using a simple database in Python, see here. If you are not familiar with SQL operations, see here for a simple tutorial on the useful operations.

有关在Python中创建和使用简单数据库的示例,请参见此处。如果您不熟悉SQL操作,请参阅此处以获取有关有用操作的简单教程。

#2

Here's my solution (using defaultdict):

这是我的解决方案(使用defaultdict):

movie_sub_themes = {
'Epic': ['Ben Hur', 'Gone With the Wind', 'Lawrence of Arabia'],
'Spy': ['James Bond', 'Salt', 'Mission: Impossible'],
'Superhero': ['The Dark Knight Trilogy', 'Hancock, Superman'],
'Gangster': ['Gangs of New York', 'City of God', 'Reservoir Dogs'],
'Fairy Tale': ['Maleficent', 'Into the Woods', 'Jack the Giant Killer'],
'Romantic':['Casablanca', 'The English Patient', 'A Walk to Remember'],
'Epic Fantasy': ['Lord of the Rings', 'Chronicles of Narnia', 'Beowulf']}

movie_themes = {
'Action': ['Epic', 'Spy', 'Superhero'],
'Crime' : ['Gangster'],
'Fantasy' : ['Fairy Tale', 'Epic Fantasy'],
'Romance' : ['Romantic']}

from collections import defaultdict
newdict = defaultdict(list)

for theme, sub_themes_list in movie_themes.items():
    for sub_theme in sub_themes_list:
        newdict[theme] += movie_sub_themes.get(sub_theme, [])       

dict(newdict)

>> {'Action': ['Ben Hur',
  'Gone With the Wind',
  'Lawrence of Arabia',
  'James Bond',
  'Salt',
  'Mission: Impossible',
  'The Dark Knight Trilogy',
  'Hancock, Superman'],
 'Crime': ['Gangs of New York', 'City of God', 'Reservoir Dogs'],
 'Fantasy': ['Maleficent',
  'Into the Woods',
  'Jack the Giant Killer',
  'Lord of the Rings',
  'Chronicles of Narnia',
  'Beowulf'],
 'Romance': ['Casablanca', 'The English Patient', 'A Walk to Remember']}

timings: 4.84 µs vs 14.6 µs

时序:4.84μs对14.6μs

#1