良玉的博客 点点滴滴,积水成河_良玉的博客_页游、手游linux运维工程师之路

kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!

早上机器突然出现负载持续飙升,都到300多了,ps或者top都无法出来。

syslog也没报错,以为是某个进程引起的,把所有在跑的业务进程都kill掉,结果还是负载上升。

后来一个个log查看发现kernel.log里有报错:

Jun 30 10:16:37 localhost kernel: [20789571.526981] ------------[ cut here ]------------

Jun 30 10:16:37 localhost kernel: [20789571.646113] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!

Jun 30 10:16:37 localhost kernel: [20789571.766023] invalid opcode: 0000 [#4] SMP

Jun 30 10:16:37 localhost kernel: [20789571.886232] Modules linked in: dccp_diag dccp udp_diag unix_diag tcp_diag inet_diag xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_fil

ter ip_tables x_tables dcdbas x86_pkg_temp_thermal coretemp kvm_intel kvm joydev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel mxm_wmi aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt

d shpchp lpc_ich mei_me mei ipmi_si binfmt_misc lp parport wmi acpi_power_meter mac_hid hid_generic tg3 usbhid ahci ptp hid libahci megaraid_sas pps_core

Jun 30 10:16:37 localhost kernel: [20789572.405783] CPU: 0 PID: 156980 Comm: java Tainted: G      D      3.13.0-24-generic #46-Ubuntu

Jun 30 10:16:37 localhost kernel: [20789572.539454] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.5 04/11/2016

Jun 30 10:16:37 localhost kernel: [20789572.673237] task: ffff8804363c17f0 ti: ffff880157214000 task.ti: ffff880157214000

Jun 30 10:16:37 localhost kernel: [20789572.807615] RIP: 0010:[<ffffffff81179051>]  [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10

Jun 30 10:16:37 localhost kernel: [20789572.943814] RSP: 0018:ffff880157215d98  EFLAGS: 00010246

Jun 30 10:16:37 localhost kernel: [20789573.080011] RAX: 0000000000000100 RBX: 0000000706005648 RCX: ffff880157215b18

Jun 30 10:16:37 localhost kernel: [20789573.217593] RDX: ffff8804363c17f0 RSI: 0000000000000000 RDI: 80000008a46009e6

Jun 30 10:16:37 localhost kernel: [20789573.355562] RBP: ffff880157215e20 R08: 0000000000000000 R09: 00000000000000a9

Jun 30 10:16:37 localhost kernel: [20789573.493361] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880c8bf3d180

Jun 30 10:16:37 localhost kernel: [20789573.631188] R13: ffff88084f6d5e00 R14: ffff88010126d400 R15: 0000000000000080

Jun 30 10:16:37 localhost kernel: [20789573.769177] FS:  00007f2421699700(0000) GS:ffff88085f200000(0000) knlGS:0000000000000000

Jun 30 10:16:37 localhost kernel: [20789573.908225] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Jun 30 10:16:37 localhost kernel: [20789574.045182] CR2: 00007f91da300018 CR3: 0000000100a64000 CR4: 00000000003407f0

Jun 30 10:16:37 localhost kernel: [20789574.180729] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

Jun 30 10:16:37 localhost kernel: [20789574.312936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Jun 30 10:16:37 localhost kernel: [20789574.441451] Stack:

Jun 30 10:16:37 localhost kernel: [20789574.566031]  ffff880157215e20 ffff880677705780 0000000000000039 00000007ca1800d0

Jun 30 10:16:37 localhost kernel: [20789574.690149]  00000007aaa80000 00007f2464081050 0000000000000005 0000000000000000

Jun 30 10:16:37 localhost kernel: [20789574.813562]  0000000000000000 0000000000000004 ffff8800000000a9 ffffffffffffff03

Jun 30 10:16:37 localhost kernel: [20789574.936459] Call Trace:

Jun 30 10:16:37 localhost kernel: [20789575.057978]  [<ffffffff817219a4>] __do_page_fault+0x184/0x560

Jun 30 10:16:37 localhost kernel: [20789575.180307]  [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20

Jun 30 10:16:37 localhost kernel: [20789575.300245]  [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0

Jun 30 10:16:37 localhost kernel: [20789575.416978]  [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60

Jun 30 10:16:37 localhost kernel: [20789575.530483]  [<ffffffff81721d9a>] do_page_fault+0x1a/0x70

Jun 30 10:16:37 localhost kernel: [20789575.640950]  [<ffffffff8171e208>] page_fault+0x28/0x30

Jun 30 10:16:37 localhost kernel: [20789575.749945] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00

00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7

Jun 30 10:16:37 localhost kernel: [20789575.975569] RIP  [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10

Jun 30 10:16:37 localhost kernel: [20789576.086848]  RSP <ffff880157215d98>

Jun 30 10:16:37 localhost kernel: [20789576.474980] ---[ end trace cb921fcdfc336f01 ]---


 kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!

Ubuntu 14.0内核有问题,但是业务需要继续,所以重启机器,恢复正常。


现在查原因,可能是由于大内存原因导致的,很多文章都尝试关闭透明大页使用,结果再未出现此情况:

# cat /sys/kernel/mm/transparent_hugepage/enabled 

[always] madvise never


[always]表示透明大页启用了。[never]表示透明大页禁用


现在禁用大内存

# echo "never" |  tee /sys/kernel/mm/transparent_hugepage/enabled    

never

# cat /sys/kernel/mm/transparent_hugepage/enabled 

always madvise [never]


留言列表
发表评论
来宾的头像