背景 
如果对linux load不熟悉的话,可以先看一下这里 
 
最近线上的系统(java应用)出现了一个很诡异的问题,就是cpu非常平稳的情况下load呈现出周期性的波动(如上图).如果把JVM停掉,又恢复正常了.花了不少时间排查,排除了是定时任务,IO.也排除了是JVM笨笨的问题.最后怀疑是内核的问题,于是准备写个测试程序验证一下.
测试程序 问题来了.怎么写一个消耗指定cpu百分比的测试?消耗cpu就是指的就是消耗cpu的时间.所以基本的原理就是按百分比吃掉cpu的时间.比如要消耗30%的cpu,假设单位时间是100ms,那就是跑满30ms,然后sleep 70ms.代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 public  class  CpuUtil_Single  {    public  static  final  int  TIME_UNIT_MS  =  100 ;     public  static  double     tmp;     public  static  void  main (String[] args)  throws  Exception {                  int  cpuUtil  =  30 ;         if  (args.length != 0 ) {             cpuUtil = Integer.valueOf(args[0 ]);         }         if  (cpuUtil < 5  || cpuUtil > 80 ) {             throw  new  IllegalArgumentException ("cpuUtil must be between 5 and 80" );         }         int  runTime  =  TIME_UNIT_MS * cpuUtil / 100 ;         int  sleepTime  =  TIME_UNIT_MS - runTime;         System.out.println("runTime(ms):"  + runTime + ",sleepTime(ms):"  + sleepTime);         while  (true ) {             eatCpuTime(runTime);             Thread.sleep(sleepTime);         }     }     public  static  void  eatCpuTime (long  useTimeMs)  {         long  start  =  System.nanoTime();         long  useTimeNano  =  useTimeMs * 1000000 ;         while  (true ) {                          for  (int  i  =  0 ; i < 10000 ; i++) {                 tmp = Math.sqrt(i) * Math.sqrt(i);             }             if  ((System.nanoTime() - start) >= useTimeNano) {                 break ;             }         }     } } 
 
运行一下试试:
1 2 3 4 5 javac CpuUtil_Single.java //后面的参数表示,把cpu跑到40% java CpuUtil_Single 40 
 
如果你的机器是单核的cpu,上面的代码没啥问题,还是比较准的.但如果是多核的就有问题了,需要改造一下.代码引起的变化可以用top命令看看,前面那个代码只能把其中一个核跑起来,后面这个则是每个核都跑起来了.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 import  java.util.concurrent.ExecutorService;import  java.util.concurrent.Executors;public  class  CpuUtil_Multi  {    public  static  final  int        TIME_UNIT_MS  =  100 ;     public  static  double           tmp;     public  static  final  int        CpuCoreNum    =  Runtime.getRuntime().availableProcessors();     public  static  ExecutorService  MyExecutor    =  Executors.newFixedThreadPool(CpuCoreNum);     public  static  void  main (String[] args)  throws  Exception {                  int  cpuUtil  =  30 ;         if  (args.length != 0 ) {             cpuUtil = Integer.valueOf(args[0 ]);         }         if  (cpuUtil < 5  || cpuUtil > 80 ) {             throw  new  IllegalArgumentException ("cpuUtil must be between 5 and 80" );         }         int  runTime  =  TIME_UNIT_MS * cpuUtil / 100 ;         int  sleepTime  =  TIME_UNIT_MS - runTime;         System.out.println("runTime(ms):"  + runTime + ",sleepTime(ms):"  + sleepTime + ",CpuCoreNum:"  + CpuCoreNum);         for  (int  i  =  0 ; i < CpuCoreNum; i++) {             MyExecutor.execute(() -> {                 while  (true ) {                     try  {                         eatCpuTime(runTime);                         Thread.sleep(sleepTime);                     } catch  (InterruptedException e) {                         e.printStackTrace();                     }                 }             });         }     }     public  static  void  eatCpuTime (long  useTimeMs)  {         long  start  =  System.nanoTime();         long  useTimeNano  =  useTimeMs * 1000000 ;         while  (true ) {                          for  (int  i  =  0 ; i < 10000 ; i++) {                 tmp = Math.sqrt(i) * Math.sqrt(i);             }             if  ((System.nanoTime() - start) >= useTimeNano) {                 break ;             }         }     } } 
 
重现问题 
linux 的load其实是大概每隔5s看一次运行队列(在linux中是R+D)长度.系统的负载是很规律的波动的话,这个计算方式就有问题了.为了重现这个问题,需要对上面的代码再做一点改造.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 package  name.chengchao.mylinuxtest.cpu;import  java.util.concurrent.ExecutorService;import  java.util.concurrent.Executors;import  java.util.concurrent.ScheduledExecutorService;import  java.util.concurrent.TimeUnit;public  class  CpuUtil_Multi_Interval  {    public  static  final  int                 TIME_UNIT_MS              =  100 ;          public  static  final  int                 CPU_WAVE_INTERVAL_SECOND  =  5 ;     public  static  double                    tmp;     public  static  final  int                 CpuCoreNum                =  Runtime.getRuntime().availableProcessors();     public  static  ScheduledExecutorService  MyScheduledExecutor       =  Executors.newScheduledThreadPool(1 );     public  static  ExecutorService           MyExecutor                =  Executors.newFixedThreadPool(CpuCoreNum);     public  static  void  main (String[] args)  throws  Exception {                  int  cpuUtil  =  30 ;         if  (args.length != 0 ) {             cpuUtil = Integer.valueOf(args[0 ]);         }         if  (cpuUtil < 5  || cpuUtil > 80 ) {             throw  new  IllegalArgumentException ("cpuUtil must be between 5 and 80" );         }         int  runTime  =  TIME_UNIT_MS * cpuUtil / 100 ;         int  sleepTime  =  TIME_UNIT_MS - runTime;         System.out.println("runTime(ms):"  + runTime + ",sleepTime(ms):"  + sleepTime + ",CpuCoreNum:"  + CpuCoreNum);         int  loopNum  =  1000  / TIME_UNIT_MS;         MyScheduledExecutor.scheduleAtFixedRate(() -> {             for  (int  i  =  0 ; i < CpuCoreNum; i++) {                 MyExecutor.execute(() -> {                     for  (int  j  =  0 ; j < loopNum; j++) {                         try  {                             eatCpuTime(runTime);                             Thread.sleep(sleepTime);                         } catch  (InterruptedException e) {                             e.printStackTrace();                         }                     }                 });             }         }, 0 , CPU_WAVE_INTERVAL_SECOND, TimeUnit.SECONDS);     }     public  static  void  eatCpuTime (long  useTimeMs)  {         long  start  =  System.nanoTime();         long  useTimeNano  =  useTimeMs * 1000000 ;         while  (true ) {                          for  (int  i  =  0 ; i < 10000 ; i++) {                 tmp = Math.sqrt(i) * Math.sqrt(i);             }             if  ((System.nanoTime() - start) >= useTimeNano) {                 break ;             }         }     } } 
 
把这个代码跑一个一天,大概就能看到load是上面这个样子!原因就在于系统的load采样周期和cpu波动的周期会慢慢的偏移,重合.偏移,重合.最终导致了这个结果.当然可以把定时任务的周期改成3s试试,还是会有一样的结果只是波动的周期变了而已.