I'm doing some crawling with Python, and would like to be able to identify (however imperfectly) the flash I come across - is it a video, an ad, a game, or whatever.
我正在使用Python进行一些爬行,并且希望能够识别(但不完美)我遇到的闪存 - 它是视频,广告,游戏还是其他什么。
I assume I would have to decompile the swf, which seems doable. But what sort of processing would I do with the decompiled Actionscript to figure out what it's purpose is?
我假设我必须反编译swf,这似乎可行。但是我会用反编译的Actionscript做什么样的处理来弄清楚它的用途是什么?
Edit: or any better ideas would be most welcome also.
编辑:或者也欢迎任何更好的想法。
3 个解决方案
#1
I think your best bet would be to check the context where you see the swf file
我认为你最好的选择是检查你看到swf文件的上下文
usually they're embedded within web pages so if that page has 100 occurences of the word "game", then it might be a game, as an example
通常它们嵌入在网页中,所以如果该页面有100个“游戏”一词,那么它可能是一个游戏,作为一个例子
To detect an ad it might be trickier but i think that checking the domainname where the swf is hosted might do the trick, also html tags around the swf will be of great use
要检测广告可能会比较棘手,但我认为检查托管swf的域名可能会有所帮助,swf周围的html标签也会很有用
#2
It might help to look at the arguments passed to the Flash movie. If there's reference to an FLV file then there's a good chance the SWF is being used to play a movie.
查看传递给Flash电影的参数可能会有所帮助。如果有FLV文件的引用,那么SWF很有可能被用来播放电影。
The path to the SWF might help too. If it's under, say an /ads directory then it's probably just a banner ad. Or if it's under /games then it's probably a game.
SWF的路径也可能有所帮助。如果它位于/ ads目录下,那么它可能只是一个横幅广告。或者如果它在/游戏之下那么它可能是一个游戏。
Other than using heuristics like this there's probably not much you can do. SWFs can be used for a lot of different things, and there's really nothing in the SWF itself that would tell you what "type" it is.
除了使用这样的启发式方法之外,你可能做的并不多。 SWF可用于许多不同的事情,SWF本身并没有什么可以告诉你它是什么“类型”。
#3
Tough one. I guess you should try find a scope for a swf context. As you said, swfs can be: ads,games, video players, they can also contain experimental art. who knows. Once you know what exactly your after, it should be easier to figure out how to look for that kind of data.
艰难的一个。我想你应该尝试找一个swf上下文的范围。正如你所说,swfs可以是:广告,游戏,视频播放器,它们也可以包含实验艺术。谁知道。一旦你知道你的追求是什么,就应该更容易找出如何寻找那种数据。
I think it would be easier to get started with commercial websites. Those need promotion, so if they might promotional ria's setup with a little bit of SEO in mind so look for things like swfobject, swfaddress and tracking stuff ( omniture and who knows what else ). They should have keywords in the embedding html.
我认为开始使用商业网站会更容易。那些需要推广,所以如果他们可能会考虑一点点SEO的促销ria的设置,所以寻找像swfobject,swfaddress和跟踪东西(omniture和谁知道还有什么)的东西。他们应该在嵌入html中有关键字。
Google and Yahoo are working with Adobe as far as I know to make SWFs indexable. There is something mentioned about a custom FlashPlayer used for Flash indexing in the Flash Internals presentation from Adobe MAX. Hope it helps.
据我所知,谷歌和雅虎正与Adobe合作,使SWF可转换。在Adobe MAX的Flash Internals演示文稿中,提到了一些用于Flash索引的自定义FlashPlayer。希望能帮助到你。
#1
I think your best bet would be to check the context where you see the swf file
我认为你最好的选择是检查你看到swf文件的上下文
usually they're embedded within web pages so if that page has 100 occurences of the word "game", then it might be a game, as an example
通常它们嵌入在网页中,所以如果该页面有100个“游戏”一词,那么它可能是一个游戏,作为一个例子
To detect an ad it might be trickier but i think that checking the domainname where the swf is hosted might do the trick, also html tags around the swf will be of great use
要检测广告可能会比较棘手,但我认为检查托管swf的域名可能会有所帮助,swf周围的html标签也会很有用
#2
It might help to look at the arguments passed to the Flash movie. If there's reference to an FLV file then there's a good chance the SWF is being used to play a movie.
查看传递给Flash电影的参数可能会有所帮助。如果有FLV文件的引用,那么SWF很有可能被用来播放电影。
The path to the SWF might help too. If it's under, say an /ads directory then it's probably just a banner ad. Or if it's under /games then it's probably a game.
SWF的路径也可能有所帮助。如果它位于/ ads目录下,那么它可能只是一个横幅广告。或者如果它在/游戏之下那么它可能是一个游戏。
Other than using heuristics like this there's probably not much you can do. SWFs can be used for a lot of different things, and there's really nothing in the SWF itself that would tell you what "type" it is.
除了使用这样的启发式方法之外,你可能做的并不多。 SWF可用于许多不同的事情,SWF本身并没有什么可以告诉你它是什么“类型”。
#3
Tough one. I guess you should try find a scope for a swf context. As you said, swfs can be: ads,games, video players, they can also contain experimental art. who knows. Once you know what exactly your after, it should be easier to figure out how to look for that kind of data.
艰难的一个。我想你应该尝试找一个swf上下文的范围。正如你所说,swfs可以是:广告,游戏,视频播放器,它们也可以包含实验艺术。谁知道。一旦你知道你的追求是什么,就应该更容易找出如何寻找那种数据。
I think it would be easier to get started with commercial websites. Those need promotion, so if they might promotional ria's setup with a little bit of SEO in mind so look for things like swfobject, swfaddress and tracking stuff ( omniture and who knows what else ). They should have keywords in the embedding html.
我认为开始使用商业网站会更容易。那些需要推广,所以如果他们可能会考虑一点点SEO的促销ria的设置,所以寻找像swfobject,swfaddress和跟踪东西(omniture和谁知道还有什么)的东西。他们应该在嵌入html中有关键字。
Google and Yahoo are working with Adobe as far as I know to make SWFs indexable. There is something mentioned about a custom FlashPlayer used for Flash indexing in the Flash Internals presentation from Adobe MAX. Hope it helps.
据我所知,谷歌和雅虎正与Adobe合作,使SWF可转换。在Adobe MAX的Flash Internals演示文稿中,提到了一些用于Flash索引的自定义FlashPlayer。希望能帮助到你。