起因是用某第三方库Linux版,部分文件路径如下:
/bin/Lnx64/
|-- kmap_min
|-- libmaCliApi.so
|-- libprotobuf.so.8
|-- maClient (DIR)
|- libmaKernel.so
我用的llvm系列编译器 clang + llvm-lld 但仍使用 gnu stdlibc++
然后写一个demo名为kmap_min 运行下面两种link命令
clang++ -fuse-ld=lld -Wl,-rpath=\$ORIGIN -o ../../../bin/Lnx64/kmap_min kmap_min.o -L../../../bin/Lnx64 -lpthread -lrt -ldl -lxsdk -lObjects -lhare_socket -lmaCliApi -lencrypt -lmaTradeApi -ltinyxml -v
clang++ -Wl,-rpath=\$ORIGIN -o ../../../bin/Lnx64/kmap_min kmap_min.o -L../../../bin/Lnx64 -lpthread -lrt -ldl -lxsdk -lObjects -lhare_socket -lmaCliApi -lencrypt -lmaTradeApi -ltinyxml -v
唯一不同的是一个是 llvm-ld 另一个是 gnu ld
奇怪的是使用llvm-ld版本的居然初始化失败了。。
比对1:上述两命令的link展开
似乎一模一样
比对2:在初始化结束时,看他们加载了那些so
ps uax | grep kmap_min
lsof -p 12345
发现失败的那个少加载了很多dll
开始debug
dnf debuginfo-install glibc
dnf install glibc-devel
lldb ./kmap_min
(lldb) b dlopen
(lldb) b dlmopen
(lldb) run
......
(lldb) c
Process 130429 resuming
Process 130429 stopped
* thread #1, name = 'kmap_min', stop reason = breakpoint 1.1
frame #0: 0x00007ffff77a5240 libdl.so.2`__dlopen(file="/folder/bin/Lnx64/maClient/libmaKernel.so", mode=2) at dlopen.c:75:1
72
73 void *
74 __dlopen (const char *file, int mode DL_CALLER_DECL)
-> 75 {
76 # ifdef SHARED
..... after enter function _dlerror_run()
(lldb) bt
* thread #1, name = 'kmap_min', stop reason = step in
* frame #0: 0x00007ffff77a5977 libdl.so.2`_dlerror_run(operate=(libdl.so.2`dlopen_doit at dlopen.c:58:1), args=0x00007fffffffc920) at dlerror.c:176:28
frame #1: 0x00007ffff77a528a libdl.so.2`__dlopen(file=<unavailable>, mode=<unavailable>) at dlopen.c:87:10
(lldb) print *result
(dl_action_result) $8 = (errcode = 2, returned = 1, malloced = true, objname = "libprotobuf.so.8", errstring = "cannot open shared object file")
发现神奇的地方 libmaKernel.so 依赖的 libprotobuf.so.8 在当前这个可执行文件的"RUNPATH"路径下,但仍然会报错
网上搜到一个更简单的方案
LD_DEBUG=libs ./kmap_min
会输出整个loadlibrary过程 这也太贴心了!
尝试方案1
export LD_LIBRARY_PATH="."
发现lld产出的ELF也能用了,但为什么呢?
对比两个文件头
先暂时去掉环境变量: export -n LD_LIBRARY_PATH
两文件名为 kmap_min_ld
, kmap_min_lld
[vrqq@rhel Lnx64]$ objdump -x kmap_min_ld|egrep "RPATH|RUNPATH"
RPATH $ORIGIN:$ORIGIN/maClient
[vrqq@rhel Lnx64]$ objdump -x kmap_min_lld|egrep "RPATH|RUNPATH"
RUNPATH $ORIGIN:$ORIGIN/maClient
诶!发现前者是RPATH 后者是RUNPATH
再来看看涉案.so
[vrqq@rhel maClient]$ ldd ./maClient/libmaKernel.so
linux-vdso.so.1 (0x00007ffee1383000)
librt.so.1 => /lib64/librt.so.1 (0x00007f683b495000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f683b291000)
libprotobuf.so.8 => not found
......
结论?不知道
https://stackoverflow.com/a/56919091/12529885
The default for most moderately recent distributions is new-dtags and
RUNPATH. If you enable RPATH via --disable-new-dtags, then the problem
would go away, because RPATH applies globally (and is different from
RUNPATH in exactly that). – Employed Russian Jul 8 '19 at 21:46
https://wiki.debian.org/RpathIssue
这个官方文档从侧面描述了一下RUNPATH
一通搜索,也没找到具体的定义。。
网上这个问题很多,不过大致理解了含义:RUNPATH仅限当前文件 不向下传导,而不仅仅是一些网友说的“顺序问题”
Appendix. Windows下又该怎么做呢
先自吹一波 https://blog.vrqq.org/archives/779/
https://stackoverflow.com/questions/2100973/dll-redirection-using-manifests
https://social.msdn.microsoft.com/Forums/vstudio/en-US/b3eaa07f-7f92-4693-8aa1-b8fee0b92d2f/cannot-load-2-dlls-with-same-name-but-different-versions?forum=vcgeneral
(现在想想mac下的.bundle真是八仙过海各显神通)