从日志上理解JAVA调试机制

  JAVA调试我想大家都玩过,或许因为非常完美地集成到了我们日常的IDE里,比如eclipse,netbeans,可是大家有没有想过它是怎么实现的呢,接下来我会从vm中打印的调试日志来跟大家认识下这个调试体系。

  大家都是一点IDE里面的调试按钮,就轻松建立了调试环境,IDE其实是帮你做了些你可能不知道的事,请看如下图

  你将会看到如下内容

  其实这就是eclipse为你构建的命令行,从中筛选出如下命令行语句

-agentlib:jdwp=transport=dt_socket,suspend=y,address=localhost:51823 

  这就是关键了,熟悉agent的人应该都知道这是加载了一个叫做jdwp的agent,也就是说vm会去找一个叫做libjdwp.so/libjdwp.dylib的动态库,于是通过nm看看里面究竟有些什么方法?

nijiaben@nijiaben : ~/Developer/open_source/openjdk/jdk7u/build/macosx-x86_64-debug/lib
[0] % nm libjdwp.dylib
00000000000152b0 T _Agent_OnLoad
0000000000016520 T _Agent_OnUnload
00000000000531c0 D _ArrayReference_Cmds
00000000000531e0 D _ArrayType_Cmds
0000000000053230 D _ClassLoaderReference_Cmds
0000000000053240 D _ClassObjectReference_Cmds
0000000000053200 D _ClassType_Cmds
0000000000053260 D _EventRequest_Cmds
0000000000053f88 S _Field_Cmds
0000000000053280 D _Method_Cmds
0000000000053668 b _NullString
00000000000532c0 D _ObjectReference_Cmds
0000000000053320 D _ReferenceType_Cmds
00000000000533c0 D _StackFrame_Cmds
00000000000533f0 D _StringReference_Cmds
0000000000053400 D _ThreadGroupReference_Cmds
0000000000053420 D _ThreadReference_Cmds
00000000000534c0 D _VirtualMachine_Cmds

  赫然看到了Agent的几个方法,Agent_OnLoad和Agent_OnUnLoad,于是更加肯定这个动态库就是一个agent实例了。 有了它自然想看看这个Agent加载的时候干了些什么事。先不急于看里面的实现,一个好的模块或者应用,往往通过详细的日志就可以知道它的大致情况,其实jdwp也是有详细的日志可看的,在上面的-agentlib:jdwp后面再加上一个参数logflags=0xfff,即-agentlib:jdwp=transport=dt_socket,suspend=y,logflags=0xfff,然后再次执行java命令,你将会在你的当前目录下生成一个类似jdwp.log.297的文件,297其实是进程的ID,那当我们仅仅是启动,还没有做任何操作的时候jdwp都干了些什么?

[#|29.05.2013 21:20:18.161 CST|FINEST|J2SE1.5|jdwp|LOC=MISC:"debugInit.c":296;;PID=297;THR=t@60821504|:Onload: transport=dt_socket,address=8000,server=y,suspend=n,logflags=0xfff|#]
[#|29.05.2013 21:20:18.161 CST|FINEST|J2SE1.5|jdwp|LOC=JVMTI:"debugInit.c":300;;PID=297;THR=t@60821504|:GetPotentialCapabilities()|#]
[#|29.05.2013 21:20:18.161 CST|FINEST|J2SE1.5|jdwp|LOC=JVMTI:"debugInit.c":359;;PID=297;THR=t@60821504|:AddCapabilities()|#]
[#|29.05.2013 21:20:18.161 CST|FINEST|J2SE1.5|jdwp|LOC=JVMTI:"debugInit.c":128;;PID=297;THR=t@60821504|:SetEventNotificationMode()|#]
[#|29.05.2013 21:20:18.161 CST|FINEST|J2SE1.5|jdwp|LOC=JVMTI:"debugInit.c":128;;PID=297;THR=t@60821504|:SetEventNotificationMode()|#]
[#|29.05.2013 21:20:18.161 CST|FINEST|J2SE1.5|jdwp|LOC=JVMTI:"debugInit.c":390;;PID=297;THR=t@60821504|:SetEventCallbacks()|#]
[#|29.05.2013 21:20:18.161 CST|FINEST|J2SE1.5|jdwp|LOC=MISC:"debugInit.c":398;;PID=297;THR=t@60821504|:OnLoad: DONE|#]

  由于篇幅的问题,就只把部分日志列出来解释下,从上面我们可以获得的执行的时间,执行的环境类型,执行的哪个文件的哪一行,进程ID,线程地址,然后就是真正的日志信息等。 首先看看LOC,是location_type的简称,那么它包括多少种呢,我想从下面的宏定义上可以看出来

#define    JDWP_LOG_JVM         0x00000001
#define    JDWP_LOG_JNI         0x00000002
#define    JDWP_LOG_JVMTI       0x00000004
#define    JDWP_LOG_MISC        0x00000008
#define    JDWP_LOG_STEP        0x00000010
#define    JDWP_LOG_LOC         0x00000020
#define    JDWP_LOG_CB          0x00000040
#define    JDWP_LOG_ERROR       0x00000080
#define    JDWP_LOG_ALL         0xffffffff

  agent的实现离不开jvmti,jdwp也同样如此,那么它究竟想利用jvmti里的哪些能力呢,从下面的代码基本可以看出来了

   /* Fill in ones that we must have */
    (void)memset(&needed_capabilities,0,sizeof(needed_capabilities));
    needed_capabilities.can_access_local_variables              = 1;
    needed_capabilities.can_generate_single_step_events         = 1;
    needed_capabilities.can_generate_exception_events           = 1;
    needed_capabilities.can_generate_frame_pop_events           = 1;
    needed_capabilities.can_generate_breakpoint_events          = 1;
    needed_capabilities.can_suspend                             = 1;
    needed_capabilities.can_generate_method_entry_events        = 1;
    needed_capabilities.can_generate_method_exit_events         = 1;
    needed_capabilities.can_generate_garbage_collection_events  = 1;
    needed_capabilities.can_maintain_original_method_order      = 1;
    needed_capabilities.can_generate_monitor_events             = 1;
    needed_capabilities.can_tag_objects                         = 1;

    /* And what potential ones that would be nice to have */
    needed_capabilities.can_force_early_return
                = potential_capabilities.can_force_early_return;
    needed_capabilities.can_generate_field_modification_events
                = potential_capabilities.can_generate_field_modification_events;
    needed_capabilities.can_generate_field_access_events
                = potential_capabilities.can_generate_field_access_events;
    needed_capabilities.can_get_bytecodes
                = potential_capabilities.can_get_bytecodes;
    needed_capabilities.can_get_synthetic_attribute
                = potential_capabilities.can_get_synthetic_attribute;
    needed_capabilities.can_get_owned_monitor_info
                = potential_capabilities.can_get_owned_monitor_info;
    needed_capabilities.can_get_current_contended_monitor
                = potential_capabilities.can_get_current_contended_monitor;
    needed_capabilities.can_get_monitor_info
                = potential_capabilities.can_get_monitor_info;
    needed_capabilities.can_pop_frame
                = potential_capabilities.can_pop_frame;
    needed_capabilities.can_redefine_classes
                = potential_capabilities.can_redefine_classes;
    needed_capabilities.can_redefine_any_class
                = potential_capabilities.can_redefine_any_class;
    needed_capabilities.can_get_owned_monitor_stack_depth_info
        = potential_capabilities.can_get_owned_monitor_stack_depth_info;
    needed_capabilities.can_get_constant_pool
                = potential_capabilities.can_get_constant_pool;
    {
        needed_capabilities.can_get_source_debug_extension      = 1;
        needed_capabilities.can_get_source_file_name            = 1;
        needed_capabilities.can_get_line_numbers                = 1;
        needed_capabilities.can_signal_thread
                = potential_capabilities.can_signal_thread;
    }


  这就对为什么我们可以通过调试体系可以看到本地变量的值,可以一步一步push/pop frame等能力不足为奇了。 因为agent是在vm初始化过程中加载的,vm的环境其实很多还没准备好,所以jvmti提供了一种能力—事件通知,在某个特定事件发生的时候,agent可以通过注册一些回调事件来做一些特殊的工作,比如在vm初始化完成之后,在类文件加载之后,在类被加载之后等等都可以做一些相关的事情,jdwp主要注册了几个回调事件,重点是vm初始化的回调函数,此时才是真正的完成jdwp的初始化动作,这也是为什么我们的调试体系不能对所有的java代码进行跟踪的原因了,比如说AppClassLoader的创建过程,这个其实都是在vm初始化过程中创建的,但是这个时候jdwp还没有初始化呢,自然也无法完成我们的debug过程了。

    gdata->callbacks.VMInit             = &cbEarlyVMInit;
    gdata->callbacks.VMDeath            = &cbEarlyVMDeath;
    gdata->callbacks.Exception  = &cbEarlyException;

  在cbEarlyVMInit中干的事就不细说了,可以通过jdwp的日志大概知道干了些啥,但是有点还是要说下,假设要大家去设计实现这个功能,大家应该都会想到它的通信场景,类似命令模式,客户端不断发命令到服务端,服务端收到命令之后然后将请求结果反馈回来,比如说我们把鼠标放到某个变量上面,我们会看到弹出个小提示板,会把这个对象的值树状列出来,其实jdwp也就是这么实现的,那我们就可能想知道jdwp到底提供了哪些命令

l1Array[JDWP_COMMAND_SET(VirtualMachine)] = (void *)VirtualMachine_Cmds;
    l1Array[JDWP_COMMAND_SET(ReferenceType)] = (void *)ReferenceType_Cmds;
    l1Array[JDWP_COMMAND_SET(ClassType)] = (void *)ClassType_Cmds;
    l1Array[JDWP_COMMAND_SET(ArrayType)] = (void *)ArrayType_Cmds;

    l1Array[JDWP_COMMAND_SET(Field)] = (void *)Field_Cmds;
    l1Array[JDWP_COMMAND_SET(Method)] = (void *)Method_Cmds;
    l1Array[JDWP_COMMAND_SET(ObjectReference)] = (void *)ObjectReference_Cmds;
    l1Array[JDWP_COMMAND_SET(StringReference)] = (void *)StringReference_Cmds;
    l1Array[JDWP_COMMAND_SET(ThreadReference)] = (void *)ThreadReference_Cmds;
    l1Array[JDWP_COMMAND_SET(ThreadGroupReference)] = (void *)ThreadGroupReference_Cmds;
    l1Array[JDWP_COMMAND_SET(ClassLoaderReference)] = (void *)ClassLoaderReference_Cmds;
    l1Array[JDWP_COMMAND_SET(ArrayReference)] = (void *)ArrayReference_Cmds;
    l1Array[JDWP_COMMAND_SET(EventRequest)] = (void *)EventRequest_Cmds;
    l1Array[JDWP_COMMAND_SET(StackFrame)] = (void *)StackFrame_Cmds;
    l1Array[JDWP_COMMAND_SET(ClassObjectReference)] = (void *)ClassObjectReference_Cmds;

  从上面我们基本看到列出的都是分类,比如我们想看字段的,执行方法的,甚至vm的,那每个分类势必包含一些关键的命令,比如

void *VirtualMachine_Cmds[] = { (void *)21
    ,(void *)version
    ,(void *)classesForSignature
    ,(void *)allClasses
    ,(void *)getAllThreads
    ,(void *)topLevelThreadGroups
    ,(void *)dispose
    ,(void *)idSizes
    ,(void *)suspend
    ,(void *)resume
    ,(void *)doExit
    ,(void *)createString
    ,(void *)capabilities
    ,(void *)classPaths
    ,(void *)disposeObjects
    ,(void *)holdEvents
    ,(void *)releaseEvents
    ,(void *)capabilitiesNew
    ,(void *)redefineClasses
    ,(void *)setDefaultStratum
    ,(void *)allClassesWithGeneric
    ,(void *)instanceCounts
};

  这就列出了我们想看vm相关信息或者操作的一些命令,其实这些都是一些function

[#|29.05.2013 21:21:22.439 CST|FINEST|J2SE1.5|jdwp|LOC=MISC:"debugLoop.c":153;;PID=297;THR=t@290471936|:Command set 1, command 10|#]

  假如我们操作之后有上面这么一条日志,我们看到取的指令集合是1,想执行这个集合中的第10条命令,指令集合为1,说明取的是上面的llArray[1],也就是上面的VirtualMachine指令集合,10表示我们找的是 VirtualMachine中的doExit这条指令,也就是说客户端希望你服务端执行这条指令,

static jboolean
doExit(PacketInputStream *in, PacketOutputStream *out)
{
    jint exitCode;

    exitCode = inStream_readInt(in);
    if (gdata->vmDead) {
        /* quietly ignore */
        return JNI_FALSE;
    }

    /* We send the reply from here because we are about to exit. */
    if (inStream_error(in)) {
        outStream_setError(out, inStream_error(in));
    }
    outStream_sendReply(out);

    forceExit(exitCode);

    /* Shouldn't get here */
    JDI_ASSERT(JNI_FALSE);

    /* Shut up the compiler */
    return JNI_FALSE;

}
/* All process exit() calls come from here */
void
forceExit(int exit_code)
{
    /* make sure the transport is closed down before we exit() */
    transport_close();
    exit(exit_code);
}

  大家也猜到了,这其实就是退出vm,其实这也是我今天想写这篇文章的一个原因,我们这边线下环境一个系统,运行一段时间之后就突然间进程消失了,没有任何的日志,排除了人为的kill,System.exit,没有crash日志,更没有coredump,实在是诡异,跟了两天,突然在远程调试的时候发现了如下选项

  所以我不得不猜想是它导致的,经过自己测试,当我点击

  这个的时候,进程果然退出了,没有任何迹象,所以晚上花了点时间写了这篇文章记录下这诡异的问题

  这个暂时就介绍到这里,虽然还比较肤浅,有机会再从原理角度介绍jdwp的具体实现。

Comments