How can I check what files are on a download section of a web site using c# windows app

时间:2022-08-26 22:00:47

We produce a device that is used in the safety industry and i have written the utility software to change user settings. There is an option to update the internal firmware and i would like to know the best way to access our company website download section and check if there is a new version of the firmware

我们生产一种用于安全行业的设备,我编写了实用程序软件来更改用户设置。有一个更新内部固件的选项,我想知道访问我们公司网站下载部分的最佳方式,并检查是否有新版本的固件

The file format will be similar to "Application Firmware VX_XX.hex"

文件格式类似于“Application Firmware VX_XX.hex”

I can download a specific file using

我可以使用下载特定文件

WebClient myClient = new WebClient();
myClient.DownloadFile(strDownloadAddress, FileName);

But I would rather not loop through may possible files just to see if one exists.

但我宁愿不通过可能的文件循环只是为了看看是否存在。

I have tried some suggestions which are mainly the file system commands which dont seem to work on http.

我已经尝试了一些主要是文件系统命令的建议,这些命令似乎不适用于http。

I have no experience programming web apps/ services and Global IT have denied me access to the website to add services etc.

我没有编程网络应用程序/服务的经验,全球IT部门拒绝我访问该网站以添加服务等。

1 个解决方案

#1


1  

I'm just going to throw this out there as a pretty crude solution to your immediate issue, which I understand to be a need to retrieve a list of available update files from a web page. Hopefully it helps a bit.

我只是将其作为一个非常粗略的解决方案来解决您的问题,我理解需要从网页中检索可用更新文件列表。希望它有点帮助。

As I don't have access to your company's updates page I'm going to use a page on Project Gutenburg which lists Children's Anthologies and grab all the links from that instead

由于我无法访问贵公司的更新页面,我将使用Project Gutenburg上的一个页面列出儿童的选集,并从中获取所有链接

The first thing to do is to grab the source code for the page into a string I'm calling source.

首先要做的是将页面的源代码抓取到我正在调用源代码的字符串中。

WebClient client = new WebClient();
String source = client.DownloadString(@"http://www.gutenberg.org/wiki/Children%27s_Anthologies_(Bookshelf)");

Then everything after that is just the same as programming for the desktop. source is just a piece of text to be parsed. In case you are unfamiliar, links in HTML are written <a href="linkgoeshere">textgoeshere</a> and because we're looking for that specific pattern we can just rip all the links out with Regex.

然后,之后的所有内容与桌面编程相同。 source只是一段要解析的文本。如果你不熟悉,HTML中的链接是 textgoeshere 写的,因为我们正在寻找特定的模式,我们可以用Regex撕掉所有链接。

Regex regex = new Regex("href=\"(http://[^\"]+)\"", RegexOptions.IgnoreCase | RegexOptions.Multiline );
var matches = regex.Matches(source);
foreach (Match match in matches)
{
    Console.WriteLine(match.Groups[1].Value);
}

That code will output all the links on the page. With a bit of luck the files on your update page links to will be direct links that end in .hex, which makes your Regex easier to target, and then you can sort them as required and use the latest link to grab the file as required.

该代码将输出页面上的所有链接。幸运的是,更新页面链接上的文件将是以.hex结尾的直接链接,这使得您的正则表达式更容易定位,然后您可以根据需要对它们进行排序并使用最新链接根据需要获取文件。

EDIT: Incidentally, some links are written with <a href='linkgoeshere'>textgoeshere</a> with a ' rather than a ", also sometimes there is more info crammed in between the angle brackets, but your URL should always be directly after the href=["'] part.

编辑:顺便说一下,有些链接是用 textgoeshere 用'而不是'来编写的,有时候在尖括号之间还有更多的信息,但你的URL应该总是直接在href = [“']部分之后。

#1


1  

I'm just going to throw this out there as a pretty crude solution to your immediate issue, which I understand to be a need to retrieve a list of available update files from a web page. Hopefully it helps a bit.

我只是将其作为一个非常粗略的解决方案来解决您的问题,我理解需要从网页中检索可用更新文件列表。希望它有点帮助。

As I don't have access to your company's updates page I'm going to use a page on Project Gutenburg which lists Children's Anthologies and grab all the links from that instead

由于我无法访问贵公司的更新页面,我将使用Project Gutenburg上的一个页面列出儿童的选集,并从中获取所有链接

The first thing to do is to grab the source code for the page into a string I'm calling source.

首先要做的是将页面的源代码抓取到我正在调用源代码的字符串中。

WebClient client = new WebClient();
String source = client.DownloadString(@"http://www.gutenberg.org/wiki/Children%27s_Anthologies_(Bookshelf)");

Then everything after that is just the same as programming for the desktop. source is just a piece of text to be parsed. In case you are unfamiliar, links in HTML are written <a href="linkgoeshere">textgoeshere</a> and because we're looking for that specific pattern we can just rip all the links out with Regex.

然后,之后的所有内容与桌面编程相同。 source只是一段要解析的文本。如果你不熟悉,HTML中的链接是 textgoeshere 写的,因为我们正在寻找特定的模式,我们可以用Regex撕掉所有链接。

Regex regex = new Regex("href=\"(http://[^\"]+)\"", RegexOptions.IgnoreCase | RegexOptions.Multiline );
var matches = regex.Matches(source);
foreach (Match match in matches)
{
    Console.WriteLine(match.Groups[1].Value);
}

That code will output all the links on the page. With a bit of luck the files on your update page links to will be direct links that end in .hex, which makes your Regex easier to target, and then you can sort them as required and use the latest link to grab the file as required.

该代码将输出页面上的所有链接。幸运的是,更新页面链接上的文件将是以.hex结尾的直接链接,这使得您的正则表达式更容易定位,然后您可以根据需要对它们进行排序并使用最新链接根据需要获取文件。

EDIT: Incidentally, some links are written with <a href='linkgoeshere'>textgoeshere</a> with a ' rather than a ", also sometimes there is more info crammed in between the angle brackets, but your URL should always be directly after the href=["'] part.

编辑:顺便说一下,有些链接是用 textgoeshere 用'而不是'来编写的,有时候在尖括号之间还有更多的信息,但你的URL应该总是直接在href = [“']部分之后。