如何分析解决ANR

时间:2025-01-25 14:24:05

ANR一般分析步骤

1.首先从找到进程出现anr对应的大体时间,如在log中查询"anr in"字段

2.根据出现anr的进程名到anr文件夹中找出文件,根据文件的信息首先判断anr的类型,是app自身还是系统问题,如果是app应用问题,则根据对用的调用解决问题

3.如不是app应用问题,则从中判断anr的类型,如是keydispatch time out,则应该根据anr准确的时间点上推5s钟,看看此时对用进程正在进程何操作(具体anr的准确时间可以在中搜索anr)

4.trace中无明显异常,可以从下面的情况考虑

a.是否由于io,数据库处理导致cpu使用率过高从而导致其他应用进程无法抢占cpu时间片
b.是否是低内存导致anr(如低内存,可以从中查看进程被kill, 输入某某进程died)
c.是否由于输入法交互处理不当导致不能返回出现anr
d.是否由于进程锁等待,死锁情况出现anr

CPU 使用率是最直观和最常用的系统性能指标

CPU 使用率是最直观和最常用的系统性能指标,更是我们在排查性能问题时,通常会关注的第一个指标。所以我们更要熟悉它的含义,尤其要弄清楚用户(%user)、Nice(%nice)、系统(%system) 、等待 I/O(%iowait) 、中断(%irq)以及软中断(%softirq)这几种不同 CPU 的使用率。比如说:

用户 CPU 和 Nice CPU 高,说明用户态进程占用了较多的 CPU,所以应该着重排查进程的性能问题。
系统 CPU 高,说明内核态占用了较多的 CPU,所以应该着重排查内核线程或者系统调用的性能问题。
I/O 等待 CPU 高,说明等待 I/O 的时间比较长,所以应该着重排查系统存储是不是出现了 I/O 问题。
软中断和硬中断高,说明软中断或硬中断的处理程序占用了较多的 CPU,所以应该着重排查内核中的中断服务程序。
碰到 CPU 使用率升高的问题,你可以借助 top、pidstat 等工具,确认引发 CPU 性能问题的来源;再使用 perf 等工具,排查出引起性能问题的具体函数。
 

一:什么是ANR
                  ANR:Application Not Responding,即应用无响应
二:ANR的类型
                  ANR一般有三种类型:
                  1:KeyDispatchTimeout(5 seconds) --主要类型
                  按键或触摸事件在特定时间内无响应
                  2:BroadcastTimeout(10 seconds)
                  BroadcastReceiver在特定时间内无法处理完成
                  3:ServiceTimeout(20 seconds) --小概率类型
                  Service在特定的时间内无法处理完成
  三:KeyDispatchTimeout
                  Akey or touch event was not dispatched within the specified
                  time(按键或触摸事件在特定时间内无响应)
                  具体的超时时间的定义在framework下的
                 
                  //How long we wait until we timeout on key dispatching.
                  staticfinal int KEY_DISPATCHING_TIMEOUT = 5*1000
四:为什么会超时呢?
                  超时时间的计数一般是从按键分发给app开始。超时的原因一般有两种:
                  (1)当前的事件没有机会得到处理(即UI线程正在处理前一个事件,没有及时的完成或者looper被某种原因阻塞住了)
                  (2)当前的事件正在处理,但没有及时完成
五:如何避免KeyDispatchTimeout
                  1:UI线程尽量只做跟UI相关的工作
                  2:耗时的工作(比如数据库操作,I/O,连接网络或者别的有可能阻碍UI线程的操作)把它放入单独的线程处理
                  3:尽量用Handler来处理UIthread和别的thread之间的交互


 六:UI线程
                  说了那么多的UI线程,那么哪些属于UI线程呢?
                  UI线程主要包括如下:
                    :onCreate(), onResume(), onDestroy(), onKeyDown(),
                    onClick(),etc
                    : onPreExecute(), onProgressUpdate(),
                    onPostExecute(), onCancel,etc
                   3. Mainthread handler: handleMessage(), post*(runnable r), etc

                   4. other

    七:如何去分析ANR
                  先看个LOG:
                  04-01 13:12:11.572 I/InputDispatcher( 220): Application is not  responding:Window{/=false}.
                   5009.8ms since event, 5009.5ms since waitstarted
                  04-0113:12:11.572 I/WindowManager( 220): Input event
                  dispatching timedout sending
                  /

                  04-01 13:12:14.123 I/Process(  220): Sending signal. PID:    21404 SIG: 3---发生ANR的时间和生成的时间
                  04-01 13:12:14.123 I/dalvikvm(21404):threadid=4: reacting to
                  signal 3
                  ……
                  04-0113:12:15.872 E/ActivityManager(  220): ANR in
                  (/.)
                  04-0113:12:15.872 E/ActivityManager(  220):
                  Reason:keyDispatchingTimedOut
                  04-0113:12:15.872 E/ActivityManager(  220): Load: 8.68 / 8.37  / 8.53
                  04-0113:12:15.872 E/ActivityManager(  220): CPUusage from   4361ms to 699ms ago ----CPU在ANR发生前的使用情况


                  04-0113:12:15.872 E/ActivityManager(  220):    5.5%21404/: 1.3% user + 4.1% kernel / faults:
                  10 minor
                  04-0113:12:15.872 E/ActivityManager(  220):    4.3%220/system_server: 2.7% user + 1.5% kernel / faults: 11
                  minor 2 major
                  04-0113:12:15.872 E/ActivityManager(  220):     0.9%52/spi_qsd.0: 0% user + 0.9% kernel
                  04-0113:12:15.872 E/ActivityManager(  220):    0.5%65/irq/170-cyttsp-: 0% user + 0.5% kernel
                  04-0113:12:15.872 E/ActivityManager(  220):     0.5%296/: 0.5% user + 0% kernel
                  04-0113:12:15.872 E/ActivityManager(  220): 100%TOTAL: 4.8%  user + 7.6% kernel + 87% iowait
                  04-0113:12:15.872 E/ActivityManager(  220): CPUusage from 3697ms to 4223ms later:-- ANR后CPU的使用量
                  04-0113:12:15.872 E/ActivityManager(  220):    25%21404/: 25% user + 0% kernel / faults: 191  minor
                  04-0113:12:15.872 E/ActivityManager(  220):    16% 21603/__eas(: 16% user + 0% kernel
                  04-0113:12:15.872 E/ActivityManager(  220):    7.2% 21406/GC: 7.2% user + 0% kernel
                  04-0113:12:15.872 E/ActivityManager(  220):    1.8% 21409/Compiler: 1.8% user + 0% kernel
                  04-0113:12:15.872 E/ActivityManager(  220):   5.5%220/system_server: 0% user + 5.5% kernel / faults: 1 minor
                  04-0113:12:15.872 E/ActivityManager(  220):    5.5% 263/InputDispatcher: 0% user + 5.5% kernel
                  04-0113:12:15.872 E/ActivityManager(  220): 32%TOTAL: 28% user  + 3.7% kernel


                  从LOG可以看出ANR的类型,CPU的使用情况,如果CPU使用量接近100%,说明当前设备很忙,有可能是CPU饥饿导致了ANR
                  如果CPU使用量很少,说明主线程被BLOCK了
                  如果IOwait很高,说明ANR有可能是主线程在进行I/O操作造成的
                  除了看LOG,解决ANR还得需要文件,
                  如何获取呢?可以用如下命令获取
                    $chmod 777 /data/anr
                    $rm /data/anr/
                    $ps
                    $kill -3 PID
                    adbpull data/anr/ ./


                  从文件,看到最多的是如下的信息:
                  -----pid 21404 at 2011-04-01 13:12:14 ----- 
                  Cmdline:

                  DALVIK THREADS:
                  (mutexes: tll=0tsl=0 tscl=0 ghl=0 hwl=0 hwll=0)
                  "main" prio=5 tid=1NATIVE
                    | group="main" sCount=1 dsCount=0obj=0x2aad2248 self=0xcf70
                    | sysTid=21404 nice=0 sched=0/0cgrp=[fopen-error:2]
                  handle=1876218976
                    (Native Method)
                    (:119)
                    (:110)

                   at (:3688)
                   at (Native Method)
                    (:507)
                   
                  $(:866)
                   at
                  (:624)
                   at (Native Method)
                  说明主线程在等待下条消息进入消息队列
八:Thread状态
                  ThreadState (defined at “dalvik/vm/ “)
                  THREAD_UNDEFINED = -1, /* makes enum compatible with int32_t */
                  THREAD_ZOMBIE = 0, /* TERMINATED */
                  THREAD_RUNNING = 1, /* RUNNABLE or running now */
                  THREAD_TIMED_WAIT = 2, /* TIMED_WAITING in () */
                  THREAD_MONITOR = 3, /* BLOCKED on a monitor */
                  THREAD_WAIT = 4, /* WAITING in () */
                  THREAD_INITIALIZING= 5, /* allocated, not yet running */
                  THREAD_STARTING = 6, /* started, not yet on thread list */
                  THREAD_NATIVE = 7, /* off in a JNI native method */
                  THREAD_VMWAIT = 8, /* waiting on a VM resource */
                  THREAD_SUSPENDED = 9, /* suspended, usually by GC or debugger
                  */


九:如何调查并解决ANR
                  1:首先分析log
                  2: 从文件查看调用stack.
                  3: 看代码
                  4:仔细查看ANR的成因(iowait?block?memoryleak?)

十:案例

案例1:关键词:ContentResolver in AsyncTask onPostExecute, high iowait

Process:
Activity:/.
Subject:keyDispatchingTimedOut
CPU usage from 2550ms to -2814ms ago:
5%187/system_server: 3.5% user + 1.4% kernel / faults: 86 minor 20major
4.4% 1134/: 0.7% user + 3.7% kernel /faults: 38 minor 19 major
4% 372/: 0.7%user + 3.3% kernel / faults: 6 minor
1.1% 272/:0.9% user + 0.1% kernel / faults: 33 minor
0.9%252/: 0.9% user + 0% kernel
0%409/: 0% user + 0% kernel /faults: 2 minor
0.1% 632/: 0.1% user + 0%kernel
100%TOTAL: 6.9% user + 8.2% kernel +84%iowait


-----pid 1134 at 2010-12-17 17:46:51 -----
Cmd line:

DALVIK THREADS:
(mutexes: tll=0 tsl=0tscl=0 ghl=0 hwl=0 hwll=0)
"main" prio=5 tid=1 WAIT
|group="main" sCount=1 dsCount=0 obj=0x2aaca180self=0xcf20
| sysTid=1134 nice=0 sched=0/0 cgrp=[fopen-error:2]handle=1876218976
at (Native Method)
-waiting on <0x2aaca218> (a )
(:1424)
(:48)
(:337)
(:157)
(:808)
(:841)
(:1171)
$(:200)
(:261)
(:378)
.<init>(:222)
(:53)
(:1356)
(:1235)
(:1189)
(:1271)
(:1098)
$(:187)
.(:268)
$(:648)
(:658)
(:700)
$2500(:98)
at$LoadBodyTask.onPostExecute(:1290)
$(:1255)
(:417)
$300(:127)
at.AsyncTask$(:429)
(:99)
(:123)
(:3652)
(Native Method)
(:507)

原因:IOWait很高,说明当前系统在忙于I/O,因此数据库操作被阻塞

原来:

final Message message=(mProviderContext,messageId);
if(message==null){
   return;
}

Account account=(mProviderContext,);

if(account==null){
   return;//isMessagingController returns false for null, but let's make itclear.
}

if(isMessagingController(account)){
   new Thread(){
       @Override
       public void run(){
          ();
       }
   }.start();
}

解决后:

newThread() {
    finalMessagemessage=(mProviderContext,messageId);

    if(message==null){
        return;
    }

    Accountaccount=(mProviderContext,);

    if(account==null){
       return;//isMessagingController returns false for null, but let's make itclear.
    }

    if(isMessagingController(account)) {
        ();
    }
}.start();

关于AsyncTask:/reference/android/os/

案例2:关键词:在UI线程进行网络数据的读写

ANRin process: :PhotoViewer (last :PhotoViewer)
Annotation:keyDispatchingTimedOut
CPU usage:
Load: 6.74 / 6.89 / 6.12
CPUusage from 8254ms to 3224ms ago:
: 4% = 4% user +0% kernel / faults: 68 minor
system_server: 2% = 1% user + 0%kernel / faults: 18 minor
re-initialized>: 0% = 0% user + 0%kernel / faults: 50 minor
events/0: 0% = 0% user + 0%kernel
TOTAL:7% = 6% user + 1% kernel

DALVIKTHREADS:
""main"" prio=5 tid=3 NATIVE
|group=""main"" sCount=1 dsCount=0 s=Yobj=0x4001b240 self=0xbda8
| sysTid=2579 nice=0 sched=0/0cgrp=unknown handle=-1343993184
(NativeMethod)
.(:478)
(:565)
(:87)
$(:303)
(:133)
(:157)
(:346)
(Native Method)
.(:459)
.getPreviewImage(:4465)
.dispPreview(:4406)
$6500(:125)
at$33$(:4558)
(:587)
(:92)
(:123)
(:4370)
(Native Method)
(:521)
$(:868)
(:626)
(Native Method)

关于网络连接,在设计的时候可以设置个timeout的时间或者放入独立的线程来处理。

关于Handler的问题,可以参考:/reference/android/os/

 

案例3:

关键词:Memoryleak/Thread leak

11-1621:41:42.560 I/ActivityManager( 1190): ANR in process: (last in )
11-1621:41:42.560 I/ActivityManager( 1190): Annotation:keyDispatchingTimedOut
11-16 21:41:42.560 I/ActivityManager(1190): CPU usage:
11-16 21:41:42.560 I/ActivityManager( 1190):Load: 11.5 / 11.1 / 11.09
11-16 21:41:42.560 I/ActivityManager(1190): CPU usage from 9046ms to 4018ms ago:
11-16 21:41:42.560I/ActivityManager( 1190): :98%= 97% user + 0% kernel / faults: 1134 minor
11-16 21:41:42.560I/ActivityManager( 1190): system_server: 0% = 0% user + 0% kernel /faults: 1 minor
11-16 21:41:42.560 I/ActivityManager( 1190): adbd:0% = 0% user + 0% kernel
11-16 21:41:42.560 I/ActivityManager(1190): logcat: 0% = 0% user + 0% kernel
11-16 21:41:42.560I/ActivityManager( 1190): TOTAL:100% = 98% user + 1% kernel

Cmdline:

DALVIK THREADS:
"main"prio=5 tid=3 VMWAIT
|group="main" sCount=1 dsCount=0 s=N obj=0x40026240self=0xbda8
| sysTid=1815 nice=0 sched=0/0 cgrp=unknownhandle=-1344001376
.(NativeMethod)
(Native Method)
.(:468)
(:6324)
(:6178)
(:1541)
……
$(:1830)
(:1349)
(:1114)
(:1633)
(:99)
(:123)
(:4370)
(Native Method)
(:521)
$(:868)
(:626)
(Native Method)

"Thread-408"prio=5 tid=329 WAIT
|group="main" sCount=1 dsCount=0 s=N obj=0x46910d40self=0xcd0548
| sysTid=10602 nice=0 sched=0/0 cgrp=unknownhandle=15470792
at (Native Method)
-waiting on <0x468cd420> (a )
(:288)
$UiUpdaterExecutor$(:289)
(:1096)

分析:

.(NativeMethod)内存不足导致block在创建bitmap上

**MEMINFO in pid 1360 [] **
native dalvik other total
size: 17036 23111 N/A 40147
allocated: 16484 20675 N/A 37159
free: 296 2436 N/A 2732

解决:如果机器的内存族,可以修改虚拟机的内存为36M或更大,不过最好是复查代码,查看哪些内存没有释放