Replies: 15 comments 22 replies
-
Beta Was this translation helpful? Give feedback.
-
|
cc @chenxu14 |
Beta Was this translation helpful? Give feedback.
-
|
There are some Critical bugs with libhdfs that have not been resolved yet |
Beta Was this translation helpful? Give feedback.
-
@JkSelf Could you provide some detailed information about this case? |
Beta Was this translation helpful? Give feedback.
-
|
While, I have the three option, we should add Kerberos authentication and viewfs support in libhdfs3 and still get accessing HDFS using native code instead of JVM to get better performance. |
Beta Was this translation helpful? Give feedback.
-
|
Posting the relevant issue on libhdfs3 here: this . TLDR, it was mentioned that the Kerberos does not work for application <-> Hadoop KMS communication. |
Beta Was this translation helpful? Give feedback.
-
|
Is it possible to use libhdfs3 maintained by ClickHouse? It seems to be still maintained, and we use this internally. |
Beta Was this translation helpful? Give feedback.
-
|
@JkSelf can you have a list of the features missing in libhdfs3? And if they are added to clickhouse/libhdfs3? If they are, looks clickhouse/libhdfs3 is a better choice. |
Beta Was this translation helpful? Give feedback.
-
|
@zhanglistar @wypb In gluten we did tried to use CK/libhdfs3 before, we also added the delegation token support(oap-project/libhdfs3@9f234ed) The viewfs support is also a gap based on what I learned thanks |
Beta Was this translation helpful? Give feedback.
-
|
@majetideepak @assignUser @zhanglistar @FelixYBW We conducted performance tests on the Q6 query using a 2TB TPC-H dataset, comparing the results with HDFS Short Circuit enabled and disabled. According to the data we collected, with HDFS Short Circuit enabled, libhdfs3 performs approximately 1.08 x faster than libhdfs. However, with HDFS Short Circuit disabled, the performance of the two is quite similar. Note: The observed 8% performance degradation occurs under extreme conditions. In real-world production environments, remote HDFS is commonly used, so the performance is same with HDFS Short Circuit disabled, which do not impact the overall performance. We conducted performance tests on 103 queries of the 2 TB TPC-DS in the same environment and found that the performance when using libhdfs and libhdfs3 is comparable, with no performance degradation. Below are the machine models used for the tests: |
Beta Was this translation helpful? Give feedback.
-
|
Based on the experimental results and the vote, let's remove |
Beta Was this translation helpful? Give feedback.
-
|
Sorry to see this too late. |
Beta Was this translation helpful? Give feedback.
-
|
In our environment, LIBHDFS3 is better than LIBHDFS. |
Beta Was this translation helpful? Give feedback.
-
|
For us, libhdfs3 is practically the only one option. We use ClickHouse version, with some changes for Kerberos ( with user impersonation ). For encrypted data transfer (hadoop.rpc.protection=privacy) we restored a reverted patch |
Beta Was this translation helpful? Give feedback.
-
|
Another issue we met recently is the java configures are not fully honored by c++ version. Not only libhdfs3, but the aws, azure and GCS all. We are thinking to use the other remote storage's java API as well like libhdfs in Gluten. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
Currently, Velox utilizes the c++ version libhdfs3 to support the HDFS system. However, when a customer's Hadoop environment has Kerberos authentication and viewfs support enabled, Velox encounters errors with HDFS connection failures. Therefore, we plan to follow Arrow's approach and dynamically load the Hadoop and JVM libraries set up in the system during the runtime of Velox to invoke the JVM version implementation of HDFS. This would allow us to leverage more features from the Hadoop community. Given this requirement, whether use the JVM version of libhdfs to completely replace the C++ version of libhdfs3?
27 votes ·
Beta Was this translation helpful? Give feedback.
All reactions