每天学习一点点

时间:2023-01-01 14:15:03

Java线程在linux实现

  • Java线程在Linux中为进程
  • top -Hp [pid] 查看Java线程
  • ps -mp [pid] -o THREAD,tid,time 查看Java线程及运行时间
  • printf “%x\n” [tid] 将进程ID转换为16进制数,用于下一步分析
  • jstack [pid]|grep [tid] -A 30 查看对应线程中的堆栈,这里显示30条记录
  • pstack [pid] 查看线程的系统调用
  • pstree -a [pid] 或者 pstree -c [pid] 查看线程的父子关系

AWK

正则表达式语法

awk --re-interval '{match($0,/{"appver":"(.*)","time":(.*),"uid":(.*),"usercode":(.*),"eventId":"(.*)"}/,a) ;print a[1]"\x001"a[2]"\x001"a[3]"\x001"a[4]"\x001"a[5]}'

awk 统计文本记录行数

awk 'BEGIN{D="";n=0} {split($2,a,",") ;  if(D!=a[1]) {
if(n>900) print D" "n;
D=a[1]
n=0
} else {
  n=n+1
}
} END{
print D" "n}' ~/wankun/n1.log >  ~/wankun/n2

grep 'Accepted socket connection' zookeeper.log.15 |awk '{print $12}' |awk -F ':' '{a[$1]+=1;} END { for(i in a) {print a[i]" "i;}}'

常用函数

awk -F ',' '/":/{gsub("uid == null logger get rmd ","",$5);split($5,a,"&");sub("uid=","",a[1]);sub("usercode=","",a[2]);sub("title=","",a[3]);sub("subtitle=","",a[ #4]);sub("=","",a[5]); print $1"\x001"a[1]"\x001"a[2]"\x001"a[3]"\x001"a[4]"\x001"a[5]}' 

Shell 信号量

[ -e ./fd1 ] || mkfifo ./fd1
exec 3<> ./fd1
rm -rf ./fd1

for i in `seq 1 6`;
do
    echo >&3
done

read -u 3
{
  do something
  echo >&3
}&

wait
exec 3<&-
exec 3>&-

JVM 内存管理

使用 ManagementFactory.getMemoryMXBean().getHeapMemoryUsage() 获取到内存使用情况,getMax()getUsed() 来获取当前内存使用情况。

参考代码:org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler

查看Tez 运行堆栈

jstack -l 100597 |grep -A 30 -E "TezChild.*daemon"

每日增量数据统计

hadoop dfs -du -s /user/hive/warehouse/fact.db/*/dt=20180130 | awk '{sum+=$1} END{print sum/1024/1024/1024}'