RPATH and RUNPATH, dlopen() failure.

@vrqq  August 9, 2021

起因是用某第三方库Linux版,部分文件路径如下:

/bin/Lnx64/
   |-- kmap_min
   |-- libmaCliApi.so
   |-- libprotobuf.so.8
   |-- maClient (DIR)
       |- libmaKernel.so

我用的llvm系列编译器 clang + llvm-lld 但仍使用 gnu stdlibc++
然后写一个demo名为kmap_min 运行下面两种link命令

clang++ -fuse-ld=lld -Wl,-rpath=\$ORIGIN -o ../../../bin/Lnx64/kmap_min kmap_min.o -L../../../bin/Lnx64 -lpthread -lrt -ldl -lxsdk -lObjects -lhare_socket -lmaCliApi -lencrypt -lmaTradeApi -ltinyxml -v

clang++ -Wl,-rpath=\$ORIGIN -o ../../../bin/Lnx64/kmap_min kmap_min.o -L../../../bin/Lnx64 -lpthread -lrt -ldl -lxsdk -lObjects -lhare_socket -lmaCliApi -lencrypt -lmaTradeApi -ltinyxml -v

唯一不同的是一个是 llvm-ld 另一个是 gnu ld

奇怪的是使用llvm-ld版本的居然初始化失败了。。

比对1:上述两命令的link展开

似乎一模一样

比对2:在初始化结束时,看他们加载了那些so

ps uax | grep kmap_min
lsof -p 12345

发现失败的那个少加载了很多dll

开始debug

dnf debuginfo-install glibc
dnf install glibc-devel

lldb ./kmap_min
(lldb) b dlopen
(lldb) b dlmopen
(lldb) run

......
(lldb) c
Process 130429 resuming
Process 130429 stopped
* thread #1, name = 'kmap_min', stop reason = breakpoint 1.1
    frame #0: 0x00007ffff77a5240 libdl.so.2`__dlopen(file="/folder/bin/Lnx64/maClient/libmaKernel.so", mode=2) at dlopen.c:75:1
   72
   73   void *
   74   __dlopen (const char *file, int mode DL_CALLER_DECL)
-> 75   {
   76   # ifdef SHARED

..... after enter function _dlerror_run()
(lldb) bt
* thread #1, name = 'kmap_min', stop reason = step in
  * frame #0: 0x00007ffff77a5977 libdl.so.2`_dlerror_run(operate=(libdl.so.2`dlopen_doit at dlopen.c:58:1), args=0x00007fffffffc920) at dlerror.c:176:28
    frame #1: 0x00007ffff77a528a libdl.so.2`__dlopen(file=<unavailable>, mode=<unavailable>) at dlopen.c:87:10
(lldb) print *result
(dl_action_result) $8 = (errcode = 2, returned = 1, malloced = true, objname = "libprotobuf.so.8", errstring = "cannot open shared object file")

发现神奇的地方 libmaKernel.so 依赖的 libprotobuf.so.8 在当前这个可执行文件的"RUNPATH"路径下,但仍然会报错

网上搜到一个更简单的方案

LD_DEBUG=libs ./kmap_min

会输出整个loadlibrary过程 这也太贴心了!

尝试方案1

export LD_LIBRARY_PATH="."

发现lld产出的ELF也能用了,但为什么呢?

对比两个文件头

先暂时去掉环境变量: export -n LD_LIBRARY_PATH
两文件名为 kmap_min_ld, kmap_min_lld

[vrqq@rhel Lnx64]$ objdump -x kmap_min_ld|egrep "RPATH|RUNPATH"
  RPATH                $ORIGIN:$ORIGIN/maClient
[vrqq@rhel Lnx64]$ objdump -x kmap_min_lld|egrep "RPATH|RUNPATH"
  RUNPATH              $ORIGIN:$ORIGIN/maClient

诶!发现前者是RPATH 后者是RUNPATH

再来看看涉案.so

[vrqq@rhel maClient]$ ldd ./maClient/libmaKernel.so
        linux-vdso.so.1 (0x00007ffee1383000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f683b495000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f683b291000)
        libprotobuf.so.8 => not found
        ......

结论?不知道
https://stackoverflow.com/a/56919091/12529885

The default for most moderately recent distributions is new-dtags and
RUNPATH. If you enable RPATH via --disable-new-dtags, then the problem
would go away, because RPATH applies globally (and is different from
RUNPATH in exactly that). – Employed Russian Jul 8 '19 at 21:46

https://wiki.debian.org/RpathIssue
这个官方文档从侧面描述了一下RUNPATH

一通搜索,也没找到具体的定义。。

网上这个问题很多,不过大致理解了含义:RUNPATH仅限当前文件 不向下传导,而不仅仅是一些网友说的“顺序问题”

Appendix. Windows下又该怎么做呢

先自吹一波 https://blog.vrqq.org/archives/779/
https://stackoverflow.com/questions/2100973/dll-redirection-using-manifests
https://social.msdn.microsoft.com/Forums/vstudio/en-US/b3eaa07f-7f92-4693-8aa1-b8fee0b92d2f/cannot-load-2-dlls-with-same-name-but-different-versions?forum=vcgeneral

(现在想想mac下的.bundle真是八仙过海各显神通)


添加新评论