Hadoop Data Usage and Workload Analytics
Manage your Data Lake with intelligence and governance
Hadoop is fast becoming the new platform for storing the diverse and large volumes of data available for enterprises today, and a foundation for enabling new scales and styles of analytics. Whether on premises or in the Cloud, Hadoop clusters and data lakes are fast becoming a critical part of enterprise data management infrastructure.
But as Hadoop becomes a production system with increasing data volumes, applications and users, it becomes more challenging to optimize performance and control costs. Attunity can help by providing data usage and workload analytics for Hadoop, enabling intelligent data management.
Attunity Visibility for Hadoop provides comprehensive data usage and workload analytics for all Hadoop distributions including Cloudera, Hortonworks and MapR. It provides information and insight based on in-depth analysis of the Hadoop environment, including both storage and processing layers. With Attunity Visibility for Hadoop, enterprises can:
- Improve Workload Performance. Identify bottlenecks and monitor workloads for Apache Tez, Hive, MapReduce and Impala
- Plan and Optimize Capacity. Track and analyze size and growth of files and directories in HDFS to improve resource utilization and forecasting
- Report on Usage and Chargeback. Measure user group consumption of data and resources to assess ROI and justify investments
- Support Governance and Compliance. Capture and analyze information about user activity patterns to identify discrepancies and satisfy auditing requirements
Enable Intelligent data management by empowering your Hadoop team with a BI system designed specifically for them. With data usage and workload analytics, IT gains insight to improve the analytic services provided to the business and ways to optimize cost/performance. Attunity Visibility for Hadoop provides the following key capabilities:
- Track HDFS file and file group sizes to assess data growth trends
- Monitor activities by user, time and data set to ensure compliance with sensitive data requirements
- Identify bottlenecks by application, user, file, etc.
- Set thresholds to rapidly spot, diagnose and remediate issues
- Optimize performance and cost for Cloudera Impala, Tez and other Hadoop ecosystem components