flink消费kafka的数据写入到hdfs中，我采用了BucketingSink 这个sink将operator出来的数据写入到hdfs文件上，并通过在hive中建外部表来查询这个。但现在有个问题，处于in-progress的文件，hive是无法识别出来该文件中的数据，可我想能在hive中实时查询...

flink将kafka中的数据落地到hdfs，在小文件和落地效率方面有什么好的建议?

背景：现在使用的是spark streaming消费kafka的数据，然后落地到hdfs目录，产生了2个问题： 1、对于数据量较大的topic，且使用压缩存储之后，spark streaming程序会出现延迟。 2、落地的数据文件里有大量的小文件产生，namenode的压力增大对于问题1，暂时分...

共有9条

< 1 >

跳转至： GO

更新时间 2024-05-01 06:48:30

本页面内关键词为智能算法引擎基于机器学习所生成，如有任何问题，可在页面下方点击"联系我们"与我们沟通。

产品推荐

{"moduleinfo":{"card_count":[{"count_phone":1,"count":1}],"search_count":[{"count_phone":7,"count":7}]},"card":[{"des":"阿里云实时计算Flink版(Alibaba Cloud Realtime Compute)，基于开源的Apache Flink，优化后成熟稳定的企业版本，提供批流统一、完全托管、高性能的实时智能计算平台。","link1":"https://www.aliyun.com/product/bigdata/product/sc","link":"https://www.aliyun.com/product/bigdata/product/sc","icon":"https://img.alicdn.com/tfs/TB1yy8cD4D1gK0jSZFyXXciOVXa-200-200.png","btn2":"产品文档","tip":"阿里云实时计算推出独享模式，专属定制的计算大脑，2折试用，<a href=\" https://yq.aliyun.com/live/591\" target=\"_blank\">观看发布会</a>          最新产品和实时动态重磅发布 ，<a href=\"https://www.aliyun.com/product/new?source_type=out_sousuo_feature_0716\" target=\"_blank\">立即查看</a>","btn1":"立即开通","link2":"https://help.aliyun.com/product/45029.html","title":"实时计算Flink版"}],"search":[{"txt":"申请免费试用","link":"https://help.aliyun.com/document_detail/72329.html?spm"},{"txt":"管理控制台","link":"https://stream.console.aliyun.com/zh/dark/"},{"txt":"产品文档介绍","link":"https://help.aliyun.com/document_detail/62438.html?spm"},{"txt":"独享模式价格计算器","link":"https://stream.console.aliyun.com/zh/dark/#/profile/calculator"},{"txt":"独享模式2折试用","link":"https://promotion.aliyun.com/ntms/act/rc.html"},{"txt":"技术解读","link":"https://yq.aliyun.com/articles/669177"},{"txt":"产品动态","link":"https://www.aliyun.com/product/new?source_type=out_sousuo_feature_0716"}],"countinfo":{"search":{"length_pc":0,"length":0},"card":{"length_pc":0,"length":0}},"simplifiedDisplay":"newEdition","newCard":[{"ifIcon":"icon","icon":"sc","link":"https://img.alicdn.com/tfs/TB1XY8hGYr1gK0jSZFDXXb9yVXa-1740-328.png","title":"实时计算Flink版","des":"阿里云实时计算Flink版(Alibaba Cloud Realtime Compute)，完全兼容开源的Apache Flink，比开源性能提升2～3倍，提供了完整的告警、监控、日志的解决方案、提升开发效率且无运维成本，并有社区专家提供技术支持。","btn1":"立即开通","link1":"https://www.aliyun.com/product/bigdata/product/sc","btn3":"产品文档","link3":"https://help.aliyun.com/product/45029.html","btn2":"产品控制台","link2":"https://realtime-compute.console.aliyun.com/#/dashboard","infoGroup":[{"infoName":"实时同步","infoContent":{"firstContentName":"MySQL 到 Hologres","firstContentLink":"https://help.aliyun.com/document_detail/374270.html","lastContentName":"Kafka 到 Hologres","lastContentLink":"https://help.aliyun.com/document_detail/417217.html"}},{"infoName":"功能对比","infoContent":{"firstContentName":"开源Flink对比","firstContentLink":"https://developer.aliyun.com/article/784034?spm=5176.15088477.J_3921126020.9.1a951708XMCptN"}},{"infoName":"成本优化","infoContent":{"firstContentName":"作业自动调优","firstContentLink":"https://help.aliyun.com/document_detail/212422.html","lastContentName":"","lastContentLink":""}},{"infoName":"学习资料","infoContent":{"firstContentName":"实时数仓入门课程","firstContentLink":"https://developer.aliyun.com/learning/course/807?spm=5176.15088477.J_3921126020.7.1a951708XMCptN"}}],"contentLink":"https://www.aliyun.com/product/bigdata/sc"}]}

{"$env":{"JSON":{}},"$page":{"env":"production"},"$context":{"moduleinfo":{"card_count":[{"count_phone":1,"count":1}],"search_count":[{"count_phone":7,"count":7}]},"card":[{"des":"阿里云实时计算Flink版(Alibaba Cloud Realtime Compute)，基于开源的Apache Flink，优化后成熟稳定的企业版本，提供批流统一、完全托管、高性能的实时智能计算平台。","link1":"https://www.aliyun.com/product/bigdata/product/sc","link":"https://www.aliyun.com/product/bigdata/product/sc","icon":"https://img.alicdn.com/tfs/TB1yy8cD4D1gK0jSZFyXXciOVXa-200-200.png","btn2":"产品文档","tip":"阿里云实时计算推出独享模式，专属定制的计算大脑，2折试用，<a href=\" https://yq.aliyun.com/live/591\" target=\"_blank\">观看发布会</a>          最新产品和实时动态重磅发布 ，<a href=\"https://www.aliyun.com/product/new?source_type=out_sousuo_feature_0716\" target=\"_blank\">立即查看</a>","btn1":"立即开通","link2":"https://help.aliyun.com/product/45029.html","title":"实时计算Flink版"}],"search":[{"txt":"申请免费试用","link":"https://help.aliyun.com/document_detail/72329.html?spm"},{"txt":"管理控制台","link":"https://stream.console.aliyun.com/zh/dark/"},{"txt":"产品文档介绍","link":"https://help.aliyun.com/document_detail/62438.html?spm"},{"txt":"独享模式价格计算器","link":"https://stream.console.aliyun.com/zh/dark/#/profile/calculator"},{"txt":"独享模式2折试用","link":"https://promotion.aliyun.com/ntms/act/rc.html"},{"txt":"技术解读","link":"https://yq.aliyun.com/articles/669177"},{"txt":"产品动态","link":"https://www.aliyun.com/product/new?source_type=out_sousuo_feature_0716"}],"countinfo":{"search":{"length_pc":0,"length":0},"card":{"length_pc":0,"length":0}},"simplifiedDisplay":"newEdition","newCard":[{"ifIcon":"icon","icon":"sc","link":"https://img.alicdn.com/tfs/TB1XY8hGYr1gK0jSZFDXXb9yVXa-1740-328.png","title":"实时计算Flink版","des":"阿里云实时计算Flink版(Alibaba Cloud Realtime Compute)，完全兼容开源的Apache Flink，比开源性能提升2～3倍，提供了完整的告警、监控、日志的解决方案、提升开发效率且无运维成本，并有社区专家提供技术支持。","btn1":"立即开通","link1":"https://www.aliyun.com/product/bigdata/product/sc","btn3":"产品文档","link3":"https://help.aliyun.com/product/45029.html","btn2":"产品控制台","link2":"https://realtime-compute.console.aliyun.com/#/dashboard","infoGroup":[{"infoName":"实时同步","infoContent":{"firstContentName":"MySQL 到 Hologres","firstContentLink":"https://help.aliyun.com/document_detail/374270.html","lastContentName":"Kafka 到 Hologres","lastContentLink":"https://help.aliyun.com/document_detail/417217.html"}},{"infoName":"功能对比","infoContent":{"firstContentName":"开源Flink对比","firstContentLink":"https://developer.aliyun.com/article/784034?spm=5176.15088477.J_3921126020.9.1a951708XMCptN"}},{"infoName":"成本优化","infoContent":{"firstContentName":"作业自动调优","firstContentLink":"https://help.aliyun.com/document_detail/212422.html","lastContentName":"","lastContentLink":""}},{"infoName":"学习资料","infoContent":{"firstContentName":"实时数仓入门课程","firstContentLink":"https://developer.aliyun.com/learning/course/807?spm=5176.15088477.J_3921126020.7.1a951708XMCptN"}}],"contentLink":"https://www.aliyun.com/product/bigdata/sc"}]}}

实时计算Flink版

阿里云实时计算Flink版(Alibaba Cloud Realtime Compute)，完全兼容开源的Apache Flink，比开源性能提升2～3倍，提供了完整的告警、监控、日志的解决方案、提升开发效率且无运维成本，并有社区专家提供技术支持。

立即开通

产品控制台

产品文档

实时同步

MySQL 到 Hologres

Kafka 到 Hologres

功能对比

开源Flink对比

成本优化

作业自动调优