脚本下载网页

时间:2021-11-07 07:52:00

i made a web server to show my page locally, because is located in a place with a poor connection so what i want to do is download the page content and replace the old one, so i made this script running in background but i am not very sure if this will work 24/7 (the 2m is just to test it, but i want it to wait 6-12 hrs), so, ¿what do you think about this script? is insecure? or is enough for what i am doing? Thanks.

我犯了一个web服务器在本地显示我的页面,因为位于一个地方与一个贫穷的连接我要做的就是下载页面内容并取代旧的,所以我做了这个脚本中运行的背景,但我不是很确定这将24/7(2 m只是测试它,但我想让它等待6 - 12小时),所以,你怎么看待这个脚本?是不安全的吗?还是说我所做的就足够了?谢谢。

#!/bin/bash
a=1;
while [ $a -eq 1 ]
do
echo "Starting..."
sudo wget http://www.example.com/web.zip  --output-document=/var/www/content.zip
sudo unzip -o /var/www/content.zip -d /var/www/
sleep 2m
done
exit

UPDATE: This code i use now: (Is just a prototype but i pretend not using sudo)

更新:我现在使用的代码:(只是一个原型,但我假装没有使用sudo)

#!/bin/bash
a=1;
echo "Start"
while [ $a -eq 1 ]
do
echo "Searching flag.txt"
if [ -e flag.txt ]; then
    echo "Flag found, and erasing it"
    sudo rm flag.txt

    if [ -e /var/www/content.zip ]; then
    echo "Erasing old content file"
        sudo rm /var/www/content.zip
    fi
    echo "Downloading new content"
    sudo wget ftp://user:password@xx.xx.xx.xx/content/newcontent.zip  --output-document=/var/www/content.zip
    sudo unzip -o /var/www/content.zip -d /var/www/
    echo "Erasing flag.txt from ftp"
    sudo ftp -nv < erase.txt
    sleep 5s
else
    echo "Downloading flag.txt"
    sudo wget ftp://user:password@xx.xx.xx.xx/content/flag.txt
    sleep 5s
fi
echo "Waiting..."
sleep 20s

done
exit 0

erase.txt

erase.txt

open xx.xx.xx.xx
user user password
cd content
delete flag.txt
bye

2 个解决方案

#1


1  

Simply unzipping the new version of your content overtop the old may not be the best solution. What if you remove a file from your site? The local copy will still have it. Also, with a zip-based solution, you're copying EVERY file each time you make a copy, not just the files that have changed.

简单地解压缩新版本的内容可能不是最好的解决方案。如果你从你的站点上删除一个文件呢?本地拷贝仍然保留着它。此外,使用基于zip的解决方案,每次复制一个文件时都要复制每个文件,而不仅仅是修改过的文件。

I recommend you use rsync instead, to synchronize your site content.

我建议您使用rsync来同步站点内容。

If you set your local documentroot to something like /var/www/mysite/, an alternative script might then look something like this:

如果您将本地文档根设置为/var/www/mysite/,那么另一个脚本可能是这样的:

#!/usr/bin/env bash

logtag="`basename $0`[$$]"

logger -t "$logtag" "start"

# Build an array of options for rsync
#
declare -a ropts
ropts=("-a")
ropts+=(--no-perms --no-owner --no-group)
ropts+=(--omit-dir-times)
ropts+=("--exclude ._*")
ropts+=("--exclude .DS_Store")

# Determine previous version
#
if [ -L /var/www/mysite ]; then
    linkdest="$(stat -c"%N" /var/www/mysite)"
    linkdest="${linkdest##*\`}"
    ropts+=("--link-dest '${linkdest%'}'")
fi

now="$(date '+%Y%m%d-%H:%M:%S')"

# Only refresh our copy if flag.txt exists
#
statuscode=$(curl --silent --output /dev/stderr --write-out "%{http_code}" http://www.example.com/flag.txt")
if [ ! "$statuscode" = 200 ]; then
    logger -t "$logtag" "no update required"
    exit 0
fi

if ! rsync "${ropts[@]}" user@remoteserver:/var/www/mysite/ /var/www/"$now"; then
    logger -t "$logtag" "rsync failed ($now)"
    exit 1
fi

# Everything is fine, so update the symbolic link and remove the flag.
#
ln -sfn /var/www/mysite "$now"
ssh user@remoteserver rm -f /var/www/flag.txt

logger -t "$logtag" "done"

This script uses a few external tools that you may need to install if they're not already on your system:

这个脚本使用了一些外部工具,如果你的系统上没有这些工具,你可能需要安装它们:

  • rsync, which you've already read about,
  • rsync,你们已经读过了,
  • curl, which could be replaced with wget .. but I prefer curl
  • 卷发,可以用wget代替。但我更喜欢卷发
  • logger, which is probably installed in your system along with syslog or rsyslog, or may be part of the "unix-util" package depending on your Linux distro.
  • logger,它可能与syslog或rsyslog一起安装在您的系统中,也可能是“unix-util”包的一部分,具体取决于您的Linux发行版。

rsync provides a lot of useful functionality. In particular:

rsync提供了很多有用的功能。特别是:

  • it tries to copy only what has changed, so that you don't waste bandwidth on files that are the same,
  • 它只复制已经更改的内容,这样就不会在相同的文件上浪费带宽,
  • the --link-dest option lets you refer to previous directories to create "links" to files that have not changed, so that you can have multiple copies of your directory with only single copies of unchanged files.
  • -link-dest选项允许您引用以前的目录来创建未更改的文件的“链接”,以便您可以有多个目录副本,只有一个未更改的文件副本。

In order to make this go, both the rsync part and the ssh part, you will need to set up SSH keys that allow you to connect without requiring a password. That's not hard, but if you don't know about it already, it's the topic of a different question .. or a simple search with your favourite search engine.

为了实现这一点,rsync部分和ssh部分都需要设置ssh密钥,以便在不需要密码的情况下进行连接。这并不难,但如果你还不知道的话,那就另当别论了。或者简单搜索一下你最喜欢的搜索引擎。

You can run this from a crontab every 5 minutes:

你可以从crontab中每5分钟运行一次:

*/5 * * * * /path/to/thisscript

If you want to run it more frequently, note that the "traffic" you will be using for every check that does not involve an update is an HTTP GET of the flag.txt file.

如果您希望更频繁地运行它,请注意,对于不涉及更新的每个检查,您将使用的“流量”是标记的HTTP GET。txt文件。

#2


2  

I would suggest setting up a cron job, this is much more reliable than a script with huge sleeps.

我建议建立cron作业,这比一个有大量睡眠的脚本要可靠得多。

Brief instructions:

简要说明:

If you have write permissions for /var/www/, simply put the downloading in your personal crontab. Run crontab -e, paste this content, save and exit from the editor:

如果您有/var/www/的写权限,只需将下载放在您的个人crontab中。运行crontab -e,粘贴此内容,保存并从编辑器中退出:

17 4,16 * * * wget http://www.example.com/web.zip --output-document=/var/www/content.zip && unzip -o /var/www/content.zip -d /var/www/

Or you can run the downloading from system crontab. Create file /etc/cron.d/download-my-site and place this content into in:

或者可以从系统crontab上运行下载。创建文件/etc/cron.d/download-my-site并将这些内容放在:

17 4,16 * * * <USERNAME> wget http://www.example.com/web.zip --output-document=/var/www/content.zip && unzip -o /var/www/content.zip -d /var/www/

Replace <USERNAME> with a login that has suitable permissions for /var/www.

替换为具有/var/www适当权限的登录

Or you can put all the necessary commands into single shell script like this:

或者,您可以将所有必要的命令放入单个shell脚本中,如下所示:

#!/bin/sh
wget http://www.example.com/web.zip --output-document=/var/www/content.zip
unzip -o /var/www/content.zip -d /var/www/

and invoke it from crontab:

并从crontab中调用:

17 4,16 * * * /path/to/my/downloading/script.sh

This task will run twice a day: at 4:17 and 16:17. You can set another schedule if you'd like.

这个任务每天进行两次:在4:17和16:17。如果你愿意,你可以安排另一个时间表。

More on cron jobs, crontabs etc:

更多关于cron工作,crontabs等:

#1


1  

Simply unzipping the new version of your content overtop the old may not be the best solution. What if you remove a file from your site? The local copy will still have it. Also, with a zip-based solution, you're copying EVERY file each time you make a copy, not just the files that have changed.

简单地解压缩新版本的内容可能不是最好的解决方案。如果你从你的站点上删除一个文件呢?本地拷贝仍然保留着它。此外,使用基于zip的解决方案,每次复制一个文件时都要复制每个文件,而不仅仅是修改过的文件。

I recommend you use rsync instead, to synchronize your site content.

我建议您使用rsync来同步站点内容。

If you set your local documentroot to something like /var/www/mysite/, an alternative script might then look something like this:

如果您将本地文档根设置为/var/www/mysite/,那么另一个脚本可能是这样的:

#!/usr/bin/env bash

logtag="`basename $0`[$$]"

logger -t "$logtag" "start"

# Build an array of options for rsync
#
declare -a ropts
ropts=("-a")
ropts+=(--no-perms --no-owner --no-group)
ropts+=(--omit-dir-times)
ropts+=("--exclude ._*")
ropts+=("--exclude .DS_Store")

# Determine previous version
#
if [ -L /var/www/mysite ]; then
    linkdest="$(stat -c"%N" /var/www/mysite)"
    linkdest="${linkdest##*\`}"
    ropts+=("--link-dest '${linkdest%'}'")
fi

now="$(date '+%Y%m%d-%H:%M:%S')"

# Only refresh our copy if flag.txt exists
#
statuscode=$(curl --silent --output /dev/stderr --write-out "%{http_code}" http://www.example.com/flag.txt")
if [ ! "$statuscode" = 200 ]; then
    logger -t "$logtag" "no update required"
    exit 0
fi

if ! rsync "${ropts[@]}" user@remoteserver:/var/www/mysite/ /var/www/"$now"; then
    logger -t "$logtag" "rsync failed ($now)"
    exit 1
fi

# Everything is fine, so update the symbolic link and remove the flag.
#
ln -sfn /var/www/mysite "$now"
ssh user@remoteserver rm -f /var/www/flag.txt

logger -t "$logtag" "done"

This script uses a few external tools that you may need to install if they're not already on your system:

这个脚本使用了一些外部工具,如果你的系统上没有这些工具,你可能需要安装它们:

  • rsync, which you've already read about,
  • rsync,你们已经读过了,
  • curl, which could be replaced with wget .. but I prefer curl
  • 卷发,可以用wget代替。但我更喜欢卷发
  • logger, which is probably installed in your system along with syslog or rsyslog, or may be part of the "unix-util" package depending on your Linux distro.
  • logger,它可能与syslog或rsyslog一起安装在您的系统中,也可能是“unix-util”包的一部分,具体取决于您的Linux发行版。

rsync provides a lot of useful functionality. In particular:

rsync提供了很多有用的功能。特别是:

  • it tries to copy only what has changed, so that you don't waste bandwidth on files that are the same,
  • 它只复制已经更改的内容,这样就不会在相同的文件上浪费带宽,
  • the --link-dest option lets you refer to previous directories to create "links" to files that have not changed, so that you can have multiple copies of your directory with only single copies of unchanged files.
  • -link-dest选项允许您引用以前的目录来创建未更改的文件的“链接”,以便您可以有多个目录副本,只有一个未更改的文件副本。

In order to make this go, both the rsync part and the ssh part, you will need to set up SSH keys that allow you to connect without requiring a password. That's not hard, but if you don't know about it already, it's the topic of a different question .. or a simple search with your favourite search engine.

为了实现这一点,rsync部分和ssh部分都需要设置ssh密钥,以便在不需要密码的情况下进行连接。这并不难,但如果你还不知道的话,那就另当别论了。或者简单搜索一下你最喜欢的搜索引擎。

You can run this from a crontab every 5 minutes:

你可以从crontab中每5分钟运行一次:

*/5 * * * * /path/to/thisscript

If you want to run it more frequently, note that the "traffic" you will be using for every check that does not involve an update is an HTTP GET of the flag.txt file.

如果您希望更频繁地运行它,请注意,对于不涉及更新的每个检查,您将使用的“流量”是标记的HTTP GET。txt文件。

#2


2  

I would suggest setting up a cron job, this is much more reliable than a script with huge sleeps.

我建议建立cron作业,这比一个有大量睡眠的脚本要可靠得多。

Brief instructions:

简要说明:

If you have write permissions for /var/www/, simply put the downloading in your personal crontab. Run crontab -e, paste this content, save and exit from the editor:

如果您有/var/www/的写权限,只需将下载放在您的个人crontab中。运行crontab -e,粘贴此内容,保存并从编辑器中退出:

17 4,16 * * * wget http://www.example.com/web.zip --output-document=/var/www/content.zip && unzip -o /var/www/content.zip -d /var/www/

Or you can run the downloading from system crontab. Create file /etc/cron.d/download-my-site and place this content into in:

或者可以从系统crontab上运行下载。创建文件/etc/cron.d/download-my-site并将这些内容放在:

17 4,16 * * * <USERNAME> wget http://www.example.com/web.zip --output-document=/var/www/content.zip && unzip -o /var/www/content.zip -d /var/www/

Replace <USERNAME> with a login that has suitable permissions for /var/www.

替换为具有/var/www适当权限的登录

Or you can put all the necessary commands into single shell script like this:

或者,您可以将所有必要的命令放入单个shell脚本中,如下所示:

#!/bin/sh
wget http://www.example.com/web.zip --output-document=/var/www/content.zip
unzip -o /var/www/content.zip -d /var/www/

and invoke it from crontab:

并从crontab中调用:

17 4,16 * * * /path/to/my/downloading/script.sh

This task will run twice a day: at 4:17 and 16:17. You can set another schedule if you'd like.

这个任务每天进行两次:在4:17和16:17。如果你愿意,你可以安排另一个时间表。

More on cron jobs, crontabs etc:

更多关于cron工作,crontabs等: