基于pid寻找pod&&基于ID查找pod

基于pid寻找pod

两种方法

第一种

  1. 查看对应pid
1
2
# ps aux|grep prometheus 1000      
5631 0.4 4.2 576752 327584 ? Ssl Jul08 47:46 /bin/prometheus --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --config.file=/etc/prometheus/config_out/prometheus.env.yaml --storage.tsdb.path=/prometheus --storage.tsdb.retention.time=12h --web.enable-lifecycle --storage.tsdb.no-lockfile --web.route-prefix=/ --web.listen-address=127.0.0.1:9090
  1. 查看docker ID
1
# CID=$(cat /proc/5631/cgroup | awk -F '/' '{print $5}') # echo ${CID:7:8} 59acd32c
  1. 查看pod name
1
# docker inspect 59acd32c | jq '.[0].Config.Labels["io.kubernetes.pod.name"]' "prometheus-cluster-monitoring-0"

第二种

  1. 查看pid
1
2
# ps aux|grep prometheus 1000      
5631 0.4 4.2 576752 327584 ? Ssl Jul08 47:46 /bin/prometheus --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --config.file=/etc/prometheus/config_out/prometheus.env.yaml --storage.tsdb.path=/prometheus --storage.tsdb.retention.time=12h --web.enable-lifecycle --storage.tsdb.no-lockfile --web.route-prefix=/ --web.listen-address=127.0.0.1:9090
  1. 查看pod ID
1
# cat /proc/5631/mountinfo | grep etc-hosts | awk -F / '{print $6}' 44a2f0f5-8ddc-40f9-96a5-c37e6bea55df
  1. 查看pod name
1
# docker ps | grep 44a2f0f5-8ddc-40f9-96a5-c37e6bea55df | awk -F _ '{print $3}' | uniq prometheus-cluster-monitoring-0

基于ID查找pod

OOM日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Nov 16 10:36:48 rancher-2 kernel: stress invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=994
Nov 16 10:36:48 rancher-2 kernel: stress cpuset=336506de688b4cb104c2ec6b2be5134cc66bac44ad2f3e0ad0a12f57623c6d50 mems_allowed=0
Nov 16 10:36:48 rancher-2 kernel: CPU: 1 PID: 3262 Comm: stress Not tainted 4.4.238-1.el7.elrepo.x86_64 #1
Nov 16 10:36:48 rancher-2 kernel: Hardware name: Fedora Project OpenStack Nova, BIOS 1.9.1-5.el7_3.1 04/01/2014
Nov 16 10:36:48 rancher-2 kernel: 0000000000000286 c7146aedec18fc5b ffff8801f41afc60 ffffffff8134f22a
Nov 16 10:36:48 rancher-2 kernel: ffff8801f41afd38 ffff88016ec30000 ffff8801f41afcc8 ffffffff81211c8b
Nov 16 10:36:48 rancher-2 kernel: ffffffff8119996c ffff88016effc2c0 0000000000000000 0000000000000206
Nov 16 10:36:48 rancher-2 kernel: Call Trace:
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff8134f22a>] dump_stack+0x6d/0x93
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff81211c8b>] dump_header+0x57/0x1bb
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff8119996c>] ? find_lock_task_mm+0x3c/0x80
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff81211dfd>] oom_kill_process.cold+0xe/0x30e
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff81206c46>] ? mem_cgroup_iter+0x146/0x320
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff81208d78>] mem_cgroup_out_of_memory+0x2c8/0x310
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff812099b3>] mem_cgroup_oom_synchronize+0x2e3/0x310
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff81204c60>] ? get_mctgt_type+0x250/0x250
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff8119a30e>] pagefault_out_of_memory+0x3e/0xb0
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff81067882>] mm_fault_error+0x62/0x150
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff81068108>] __do_page_fault+0x3d8/0x3e0
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff810681c3>] trace_do_page_fault+0x43/0x140
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff81061737>] do_async_page_fault+0x37/0xb0
Nov 16 10:36:48 rancher-2 kernel: [<ffffffff81738778>] async_page_fault+0x28/0x30
Nov 16 10:36:48 rancher-2 kernel: Task in /kubepods/burstable/podfc92664d-3ca9-4a37-8244-7bf1a04da201/336506de688b4cb104c2ec6b2be5134cc66bac44ad2f3e0ad0a12f57623c6d50 killed as a result of limit of /kubepods/burstable/podfc92664d-3ca9-4a37-8244-7bf1a04da201
Nov 16 10:36:48 rancher-2 kernel: memory: usage 102400kB, limit 102400kB, failcnt 14
Nov 16 10:36:48 rancher-2 kernel: memory+swap: usage 102400kB, limit 9007199254740988kB, failcnt 0
Nov 16 10:36:48 rancher-2 kernel: kmem: usage 1940kB, limit 9007199254740988kB, failcnt 0
Nov 16 10:36:48 rancher-2 kernel: Memory cgroup stats for /kubepods/burstable/podfc92664d-3ca9-4a37-8244-7bf1a04da201: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 16 10:36:48 rancher-2 kernel: Memory cgroup stats for /kubepods/burstable/podfc92664d-3ca9-4a37-8244-7bf1a04da201/fa55a38258b1722eb5ba6bc5ea87f9f85107697ed3161d5e98526889b49778db: cache:0KB rss:44KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:44KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 16 10:36:48 rancher-2 kernel: Memory cgroup stats for /kubepods/burstable/podfc92664d-3ca9-4a37-8244-7bf1a04da201/336506de688b4cb104c2ec6b2be5134cc66bac44ad2f3e0ad0a12f57623c6d50: cache:0KB rss:100416KB rss_huge:75776KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:100392KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 16 10:36:48 rancher-2 kernel: [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Nov 16 10:36:48 rancher-2 kernel: [ 2608] 0 2608 255 1 4 2 0 -998 pause
Nov 16 10:36:48 rancher-2 kernel: [ 3149] 0 3149 28361 25755 56 5 0 994 stress
Nov 16 10:36:48 rancher-2 kernel: Memory cgroup out of memory: Kill process 3149 (stress) score 1979 or sacrifice child
Nov 16 10:36:48 rancher-2 kernel: Killed process 3149 (stress) total-vm:113444kB, anon-rss:99904kB, file-rss:3116kB
Nov 16 10:36:48 rancher-2 containerd: time="2020-11-16T10:36:48.794443229+08:00" level=info msg="shim reaped" id=336506de688b4cb104c2ec6b2be5134cc66bac44ad2f3e0ad0a12f57623c6d50
Nov 16 10:36:48 rancher-2 dockerd: time="2020-11-16T10:36:48.804531348+08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 16 10:37:05 rancher-2 containerd: time="2020-11-16T10:37:05.032807429+08:00" level=info msg="shim containerd-shim started" address=/containerd-shim/94327bd1be610aca06fccb2e695d8824d414327a214de3f259d5a8da5ded5188.sock debug=false pid=3801

可以看到/kubepods/burstable/podfc92664d-3ca9-4a37-8244-7bf1a04da201,其中fc92664d-3ca9-4a37-8244-7bf1a04da201就是pod ID,然后就可以根据pod ID查找pod

1
2
3
4
docker ps |grep fc92664d-3ca9-4a37-8244-7bf1a04da201 |awk '{print $10}' |awk -F '_' '{print "namespace: "$4 "\nname: "$3}'
namespace: default
name: memory-demo-2
接着查看CGroup显示的限制,例如:kernel: memory: usage 102400kB, limit 102400kB, failcnt 14 。其中内存使用达到102400kB,限制的值为102400kB,且内存超过限制的次数达到14次。