文件名称:俄国牛人写的开源爬虫xNet.zip
文件大小:86KB
文件格式:ZIP
更新时间:2022-08-07 01:01:50
开源项目
这个一个俄国牛人写的开源工具,为啥说他强悍了,因为他将所有Http协议的底层都实现了一遍,这有啥好处?只要你是写爬虫的,都会遇到一个让人抓狂的问题,就是明明知道自己Http请求头跟浏览器一模一样了,为啥还会获取不到自己想要的数据。这时你如果使用HttpWebReaquest,你只能调试到GetRespone,底层的字节流是调试不到了。所以必须得有个更深入的底层组件,方便自己调试。以下是xNet的开源地址:https://github.com/X-rus/xNet 快速入门。 首先来一个读取cnblogs首页的案例,HttpWebRequest在上一篇已经举例,我们看看xNet是怎么写的using (var request = new xNet.HttpRequest()){ var html = request.Get("http://www.cnblogs.com").ToString();}注意,默认的http头,建议用属性进行设置,譬如KeepAlive,Referer和UserAgent 扩展的Http头,譬如Upgrade-Insecure-Requests,可以使用AddHeader方法进行设置譬如using (var request = new xNet.HttpRequest()){ request.AddHeader("Upgrade-Insecure-Requests", "1"); var html=request.Get("http://www.cnblogs.com").ToString();}当然有些方法使用AddHeader和设置属性值是一样的,例如: request.AddHeader("User-Agent","Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0"); request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0"; request.UserAgent = xNet.Http.FirefoxUserAgent(); 当然不是所有值都可以使用AddHeader进行设置,譬如:Content-Type,这是POST时说明POST的数据类型,如果使用AddHeader会报错。如果你不知道那些可以手动设那些是不能的,参考xNet.HttpHeader的枚举值public enum HttpHeader{ Accept = 0, AcceptCharset = 1, AcceptLanguage = 2, AcceptDatetime = 3, CacheControl = 4, ContentType = 5, Date = 6, Expect = 7, From = 8, IfMatch = 9, IfModifiedSince = 10, IfNoneMatch = 11, IfRange = 12, IfUnmodifiedSince = 13, MaxForwards = 14, Pragma = 15, Range = 16, Referer = 17, Upgrade = 18, UserAgent = 19, Via = 20, Warning = 21, DNT = 22, AccessControlAllowOrigin = 23, AcceptRanges = 24, Age = 25, Allow = 26, ContentEncoding = 27, ContentLanguage = 28, ContentLength = 29, ContentLocation = 30, ContentMD5 = 31, ContentDisposition = 32, ContentRange = 33, ETag = 34, Expires = 35, LastModified = 36, Link = 37, Location = 38, P3P = 39, Refresh = 40, RetryAfter = 41, Server = 42, TransferEncoding = 43,}当然他还支持Socks4和Socks5,代理的好处不言而喻了 标签:.net爬虫
【文件预览】:
xNet-master
----Resources.Designer.cs(32KB)
----LICENSE.txt(1KB)
----xNet.sln(952B)
----Properties()
--------AssemblyInfo.cs(2KB)
----README.md(1KB)
----xNet.csproj(5KB)
----Resources.resx(18KB)
----xNet()
--------~Internal()
--------~Proxy()
--------~Http()
--------Html.cs(29KB)
--------NetException.cs(2KB)
--------RequestParams.cs(1KB)
--------WinInet.cs(14KB)
----.gitignore(3KB)