慢速运行脚本。我怎样才能提高它的速度?

时间:2021-07-07 16:07:22

How can I speed this up? it's taking about 5 minutes to make one file... it runs correctly, but I have a little more than 100000 files to make.

我怎样才能加快速度呢?制作一个文件需要大约5分钟...它运行正常,但我有超过100000个文件。

Is my implementation of awk or sed slowing it down? I could break it down into several smaller loops and run it on multiple processors but one script is much easier.

我的awk或sed的实现是否会降低速度?我可以将其分解为几个较小的循环并在多个处理器上运行,但一个脚本更容易。

#!/bin/zsh
#1000 configs per file

alpha=( a b c d e f g h i j k l m n o p q r s t u v w x y z )
m=1000 # number of configs per file
t=1 #file number
for (( i=1; i<=4; i++ )); do
  for (( j=i; j<=26; j++ )); do
    input="arc"${alpha[$i]}${alpha[$j]}
    n=1 #line number
    #length=`sed -n ${n}p $input| awk '{printf("%d",$1)}'`
    #(( length= $length + 1 ))
length=644

for ((k=1; k<=$m; k++ )); do
    echo "$hmbi" >> ~/Glycine_Tinker/configs/config$t.in
    echo "jobtype = energy" >> ~/Glycine_Tinker/configs/config$t.in
    echo "analyze_only = false" >> ~/Glycine_Tinker/configs/config$t.in
    echo "qm_path = qm_$t" >> ~/Glycine_Tinker/configs/config$t.in
    echo "mm_path = aiff_$t" >> ~/Glycine_Tinker/configs/config$t.in
    cat head.in >> ~/Glycine_Tinker/configs/config$t.in
    water=4
    echo $k
  for (( l=1; l<=$length; l++ )); do
    natom=`sed -n ${n}p $input| awk '{printf("%d",$1)}'`
    number=`sed -n ${n}p $input| awk '{printf("%d",$6)}'`
    if [[ $natom -gt 10 && $number -gt 0 ]]; then
     symbol=`sed -n ${n}p $input| awk '{printf("%s",$2)}'`
     x=`sed -n ${n}p $input| awk '{printf("%.10f",$3)}'`
     y=`sed -n ${n}p $input| awk '{printf("%.10f",$4)}'`
     z=`sed -n ${n}p $input| awk '{printf("%.10f",$5)}'`

     if [[ $water -eq 4 ]]; then
     echo "--" >> ~/Glycine_Tinker/configs/config$t.in
     echo "0 1 0.4638" >> ~/Glycine_Tinker/configs/config$t.in
     water=1
     fi


     echo "$symbol  $x  $y  $z" >> ~/Glycine_Tinker/configs/config$t.in
     (( water= $water + 1 ))
    fi
    (( n= $n + 1 ))

  done
  cat tail.in >> ~/Glycine_Tinker/configs/config$t.in
  (( t= $t + 1 ))
 done

 done

done

2 个解决方案

#1


One thing that is going to be killing you here is the sheer number of processes being created. Especially when they are doing the exact same thing.

在这里杀死你的一件事是创建的进程数量庞大。特别是当他们做同样的事情时。

Consider doing the sed -n ${n}p $input once per loop iteration.

考虑每循环迭代执行一次sed -n $ {n} p $输入。

Also consider doing the equivalent of awk as a shell array assignment, then accessing the individual elements.

还要考虑将awk等效于shell数组赋值,然后访问各个元素。

With these two things you should be able to get the 12 or so processes (and the shell invocation via back quotes) down to a single shell invocation and the backquote.

有了这两件事,您应该能够将12个左右的进程(以及通过后引号的shell调用)下载到单个shell调用和反引号。

#2


Obviously, Ed's advice is far preferable, but if you don't want to follow that, I had a couple of thoughts...

显然,Ed的建议是可取的,但如果你不想这样做,我有几个想法......

Thought 1

Rather than run echo 5 times and cat head.in onto the Glycine file, each of which causes the file to be opened, seeked (or sought maybe) to the end, and appended, you could do that in one go like this:

而不是运行echo 5次和cat head.in到Glycine文件上,每个文件都会导致文件被打开,寻找(或寻找或许)到最后,并附加,你可以一次性这样做:

# Instead of 
hmbi=3
echo "$hmbi"            >> ~/Glycine_thing
echo "jobtype = energy" >> ~/Glycine_thing
echo "somethingelse"    >> ~/Glycine_thing
echo ...                >> ~/Glycine_thing          
echo ...                >> ~/Glycine_thing
cat  ...                >> ~/Glycine_thing

# Try this
{
  echo "$hmbi"
  echo "jobtype = energy"
  echo "somethingelse"
  echo
  echo
  cat head.in
} >> ~/Glycine_thing

# Or, better still, this
echo -e "$hmbi\njobtype = energy\nsomethingelse" >> Glycine_thing

# Or, use a here-document, as suggested by @mklement0
cat -<<EOF >>Glycine
$hmbi
jobtype = energy
next thing
EOF

Thought 2

Rather than invoke sed and awk 5 times to find 5 parameters, just let awk do what sed was doing, and also do all 5 things in one go:

而不是调用sed和awk 5次来找到5个参数,只需让awk做sed正在做的事情,并且一次完成所有5件事:

read symbol x y z < <(awk '...{printf "%.10f %.10f %.10f" $2,$3,$4}' $input)

#1


One thing that is going to be killing you here is the sheer number of processes being created. Especially when they are doing the exact same thing.

在这里杀死你的一件事是创建的进程数量庞大。特别是当他们做同样的事情时。

Consider doing the sed -n ${n}p $input once per loop iteration.

考虑每循环迭代执行一次sed -n $ {n} p $输入。

Also consider doing the equivalent of awk as a shell array assignment, then accessing the individual elements.

还要考虑将awk等效于shell数组赋值,然后访问各个元素。

With these two things you should be able to get the 12 or so processes (and the shell invocation via back quotes) down to a single shell invocation and the backquote.

有了这两件事,您应该能够将12个左右的进程(以及通过后引号的shell调用)下载到单个shell调用和反引号。

#2


Obviously, Ed's advice is far preferable, but if you don't want to follow that, I had a couple of thoughts...

显然,Ed的建议是可取的,但如果你不想这样做,我有几个想法......

Thought 1

Rather than run echo 5 times and cat head.in onto the Glycine file, each of which causes the file to be opened, seeked (or sought maybe) to the end, and appended, you could do that in one go like this:

而不是运行echo 5次和cat head.in到Glycine文件上,每个文件都会导致文件被打开,寻找(或寻找或许)到最后,并附加,你可以一次性这样做:

# Instead of 
hmbi=3
echo "$hmbi"            >> ~/Glycine_thing
echo "jobtype = energy" >> ~/Glycine_thing
echo "somethingelse"    >> ~/Glycine_thing
echo ...                >> ~/Glycine_thing          
echo ...                >> ~/Glycine_thing
cat  ...                >> ~/Glycine_thing

# Try this
{
  echo "$hmbi"
  echo "jobtype = energy"
  echo "somethingelse"
  echo
  echo
  cat head.in
} >> ~/Glycine_thing

# Or, better still, this
echo -e "$hmbi\njobtype = energy\nsomethingelse" >> Glycine_thing

# Or, use a here-document, as suggested by @mklement0
cat -<<EOF >>Glycine
$hmbi
jobtype = energy
next thing
EOF

Thought 2

Rather than invoke sed and awk 5 times to find 5 parameters, just let awk do what sed was doing, and also do all 5 things in one go:

而不是调用sed和awk 5次来找到5个参数,只需让awk做sed正在做的事情,并且一次完成所有5件事:

read symbol x y z < <(awk '...{printf "%.10f %.10f %.10f" $2,$3,$4}' $input)