Cloud Computing is the century’s most exuberant technological innovation, driving critical business success criteria. Amazon Web Services (AWS) is a comprehensive and proven cloud service with the biggest market share. Amazon Web Services (AWS) monitoring is handled by using various tools and services to collect, analyze, and provide dashboards & reports for data insights. The insights in turn are useful in identifying vulnerabilities and issues, predicting performance, and ensuring good system health and high performance. Monitoring resources and services also help keep usage costs low by ensuring optimum resource usage, like using the right service, reducing underutilization, and taking advantage of reserve pricing.
Tech Vedika with its vast experience in application development and support leveraging the AWS cloud provides a complete solution for AWS Monitoring. This white paper provides an understanding of our AWS Monitoring Services, the features we provide, our best practices, and how we reduce overhead for our customers so that they can focus on their business while we monitor their AWS workloads. Intended readers for this whitepaper include IT/Cloud leaders in the enterprise.
AWS Monitoring Services Overview
Tech Vedika provides a broad portfolio of AWS Monitoring Services, supported by our AWS Monitoring framework and process, that is built and enhanced over years of experience. This complete set of monitoring offerings enables us to continuously deliver incremental value for our customers. To address the varying needs of customers who have different levels of maturity in their AWS Cloud journey, Tech Vedika AWS Monitoring Services offering provides focus on the following areas:
- Site Performance Monitoring
- Database Monitoring
- Storage Monitoring
- Virtual Machine Monitoring
- Virtual Network Monitoring
It is essential for organizations using cloud services to have a plan for defining their monitoring strategy, identifying the monitoring tools and metrics, and implementing them. The following are the pre-requisites for building a successful monitoring plan.
- Define Monitoring Goals: List the goals of monitoring in terms of system health and performance.
- Capture the pain points: Define the pain points and areas that need improvement to further enhance the performance of AWS workloads
- Awareness of Tools: Identify in-house tools or employ a third-party tool with assistance from vendors who have a clear understanding of the monitoring tools
- Consensus: The stakeholders should understand and agree to the benefits AWS monitoring brings along for driving business value, customer satisfaction, and overall application performance
Before we start evaluating monitoring tools, it is important to define the key metrics for measuring system health and performance. The following are the core areas and metrics:
- Performance: These metrics include CPU Utilization, Memory Utilization, EngineCPUUtilization (for larger node types with 4vCPUs or more), Disk Utilization, Disk Capacity, Current Connections, Network Bandwidth, Status Checks, and Latency.
- Security: Security is a key area of monitoring. Organizations need insights to secure workloads. Possible threats include:
- Multiple instances that start/stop via programs
- Temporary security credentials that have long lives
- Activities that erase CloudTrail logs
- A new user account that deletes multiple users.
- Cost: While AWS provides automatic scalability and elasticity to the system, at the same time it needs continuous cost monitoring so that resource utilization is under control and within budget. The following are some hints to work in this direction:
- Resources to Requirements Mapping: Reduce the cost by stopping or resizing the low-utilization instances, databases, and other resources.
- Process to locate resource waste: Reduce the cost by snapshotting and deleting low-utilization EBS volumes and idle load-balancers. Leverage low-cost storage tiers for S3 infrequently accessed objects.
- Reliability: Logs and metrics should be used to monitor workloads and notify significant events and violations of thresholds.
Types of Monitoring
Monitoring an application deployed on the cloud requires monitoring at various levels. The following section provides an overview of the different monitoring types and metrics that need to be considered as part of a monitoring strategy.
- Site Performance Monitoring
- Network traffic, load-time, resource usage, and page availability.
- Impact of traffic and web elements on browsing performance.
- Inputs for SEO optimizations.
- Comparison with established KPIs.
- Database Monitoring
- Network throughput, client connections, I/O for read/write, burst credit balances for DB instances.
- High CPU/RAM/Disk space consumption, network traffic, DB connections, IOPS metrics.
- Storage Monitoring
- Enable remote storage monitoring
- Insights about storage volume layouts
- Detect inefficient capacities and processes.
- Detect security threats to proactive fixes.
- Virtual Machine Monitoring
- Capture the user activity and performance for organizations using cloud infra-services.
- Virtual Network Monitoring
- Monitor and protect firewalls, switches, routers, and software-based load-balancers.
- Assess network performance and point out the security issues.
AWS Monitoring Best Practices
The following are the proven best practices we utilize for our monitoring services:
- Specify Monitoring Goals: Organizations need to prioritize the mission-critical workloads and define the goals and alerts to meet these requirements first.
- Monitor Almost Everything: At times some ignored insignificant issues take a paramount shape and result in repercussions like business disruption.
- Automation is the key: Classic AWS implementations are huge and vigorous in terms of data and dynamics involved. Hence, it becomes difficult at the same time very vital to monitor each part. Here, automation comes in handy in accomplishing prolificacy at a lower cost.
- Start Simple: It becomes difficult to dig in everywhere. Evaluate the monitoring tools that best fit your needs.
- Adding Ownership using tags: To accomplish accountability with AWS monitoring, tag users who create instances in your company.
- Capture Logs: Logs help in keeping track of compliance and troubleshooting performance issues. Following are the important logs:
- Database Logs: To detect slow queries.
- Application Logs: Understand the reasons behind application failure.
- AWS CloudTrail Logs: To detect API calls made to the application.
- OS Logs: To identify host-failure reasons.
- Web Server Logs: To capture firewall logs and VPC logs for patterns of access and attacks.
- Budget Control: Command over the resources being employed assists in monitoring the cost and saving that extra penny.
Monitoring Tools Assessment
A well-monitored AWS workload does help in upgrading the functional conduct, scalability, agility, and security. It also brings along another gigantic engagement of monitoring a hefty system. Therefore, organizations should spend a good time accessing their requirements thoroughly. We use the following Checklist for assessment:
- How is the existing framework configured? This means is your present infrastructure on-premise, cloud, or hybrid. Further, how do you plan to monitor these? Do you want to employ a specific tool for each of these or a complete AWS monitoring system?
- Compliance Standards: What steps should be taken to comply with the governance and industry norms? Will just SaaS serve the purpose?
- Inventory Monitoring: Will the existing AWS monitoring system be enough to monitor your inventory or do you need a new tool for this?
- Replacing legacy with AWS: Price and complications involved in replacing any legacy solution with the new AWS monitoring system.
- Key Metrics: Does your organization have a clear understanding of the metrics involved in AWS monitoring?
Finding answers to the above questions will help in understanding the necessity of the AWS monitoring system and ranking the metrics.
Top AWS Monitoring Tools
TechVedika provides AWS Monitoring Services using AWS and proven 3rd party tools.
- Amazon CloudWatch: CloudWatch is a part of AWS, easily installable for on-premise, cloud, and hybrid models, and can help in real-time monitoring of the entire AWS application and infrastructure. It gathers operational and monitoring data like logs, metrics, and events. Automated Dashboards visualize this in a consolidated form to provide a united view of all the resources, services, and applications that run on your AWS application. Based on metric values or thresholds, abnormal application behavior can be monitored using alarms. This ensures a high-performing, healthy environment.
- One Platform: AWS CloudWatch provides data of all its resources, services, and applications on a single platform. This gives easy access and helps in correlating all the metrics, logs, and events.
- Metrics: AWS CloudWatch integrates more than 70 AWS services and provides a detailed 1-minute metric with up to 1-second granularity, to help in-depth analysis.
- Improve Performance and Utilization: Alarms help in identifying abnormal behavior hence corrective actions can be automated to check billing overages.
- Operational Insights: AWS CloudWatch provides automatic dashboards, real-time data insights, metric storage, and retention in a unified way that helps in optimizing resource utilization and system performance.
- Actionable Insights: Operational issues can be troubleshot with ease when users can visualize and explore logs. CloudWatch’s LogInsights scales with your log volume and complexity and provides answers in seconds. These logs can be correlated with metrics to have clear operational visibility.
AWS Cloud Watch Dashboard
- New Relic APM: New Relic is a SaaS offering that focuses on AWS monitoring. It makes use of a standard Apdex (Application Performance Index) score to monitor your application performance throughout the AWS structure. It sends alerts about application downtime even before the end users get to know. Also, Relic provides its users with a list of most-time-consuming web transactions, and details of each thread to boil down to the root cause. It provides broad visibility across the entire AWS technology stack. Salient features include the following:
- Metrics and Map: Each application transaction can be broken down to analyze its performance. Also, the service map makes it easy to understand if the distributed applications and services are performing well. Histograms and other statistical tools aid better visualization.
- Monitoring: A detailed overview of database and application performance using threshold, error rate, and response-time, helps collect diagnostic information and minimize the impact on end-users.
- Performance Analysis: Analyzing thread activity, connection pools, class load/unload, history, and application performance helps reduce the issues. Also, the most time-consuming transactions can be identified.
AWS New Relic Dashboard
- AWS X-Ray: This is a user-centric, visual framework for collecting data about the requests made to your production and distributed applications. Great for debugging and analyzing. Data collected from individual services in the form of segments are aggregated, based on the common requests, into a single unit called trace. This trace in turn leads the way to individual requests as they pass through each layer of the application, hence helping in locating the pain point. Also, X-Ray processes these traces to generate service graphs, hence, improving the overall experience of the end-users.
X-Ray simplifies the process by:
- Creating a service map – AWS X-Ray provides you with a map of the connection between the services used in your application. This helps in creating a dependency tree, detecting latency and errors across the entire AWS workload.
- Identifying errors and bugs – AWS X-Ray generates a response code for each request. This code can be analyzed to highlight bugs and errors without reproducing them.
- Analysis and visualization apps – A set of query APIs can be used to create user-specific analysis and visualization apps using the data from AWS X-Ray.
AWS X-Ray Dashboard
- Amazon EC2 Dashboard: EC2 monitoring tool is a service that helps to monitor and maintain Ec2 instances and infrastructure. EC2 Dashboard is useful for the following:
- Helps to view instance states and service health.
- Manage alarms and status reports.
- View scheduled events.
- Assess volume and instance metrics.
Amazon GuardDuty: Amazon GuardDuty is a threat detection service that continuously monitors your AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation.
Tech Vedika provides the following services using AWS GuardDuty:
- Stop unauthorized activity
- Guard against the use of compromised credentials, unusual data access in Amazon Simple Storage Service (S3), API calls from known malicious IP addresses, and more.
- Enable continuous monitoring and analysis
- Gain insight into security events with findings that provide context, metadata, and details on impacted resources.
- Simplify forensics
- Quickly determine the root cause of suspicious activities
Tech Vedika is well-equipped to provide consulting and implementation services for AWS Monitoring with experts and tools for improving system health and performance to bring the best RoI from your cloud investments and experience. We are backed by our existing satisfied customers, who trust our quality of work and delivery. Leveraging extensive industry experience and a track record of proven delivery excellence, Tech Vedika brings the expertise and confidence enterprises need to effectively execute their AWS Monitoring strategy.
This whitepaper describes in detail the components of AWS Managed Services offered and the benefits an organization can realize by embracing Cloud Managed Services. Manging services in the cloud require professionals with a good understanding of cloud services, cloud-native development, and tools for monitoring & alerting, and handling incidents proactively. We also put forward the unique value proposition Tech Vedika brings to help organizations in enhancing the ROI from their cloud investments and improving the overall application performance and security.
Cloud Managed services are either the partial or complete management and control of a client’s cloud platform. The services include migration, maintenance, and optimization of infra and applications hosted on the cloud. By using a managed cloud service provider, a business can ensure that its cloud resources are used efficiently to keep cost and performance at optimal levels.
The cloud management lifecycle includes the configuration and management of the cloud environment’s core processes, services, operations, and support components. Clients can adopt Cloud Managed Services throughout the cloud management lifecycle. These services can be used to help with an initial adoption or can be provided continually.
AWS Managed Services – Overview
Tech Vedika’s offering for AWS Managed Services comprises of the following core services:
- An assessment of the current AWS support services
- AWS Services support for the full DevOps cycle including development, testing, staging, and production deployment
- Design and implementation of cloud services and resources to set up new systems
- Operations management with 24/7 active monitoring & IT support
- Cloud account security and compliance
- Cloud resource optimization which includes cost and performance optimization
- Infra as a code, automation
By offloading these types of tasks to Tech Vedika, organizations can free up their internal IT teams to focus on more complex initiatives and efforts that drive new business outcomes.
AWS Managed Service Offerings
The following are the service offerings that are included in Tech Vedika’s AWS Managed Services model:
Application Deployment & Configuration
- Tailored for providing deployment support to Project Teams
- Design Deployment Architecture
- Setup Environments
- Develop Scripts & Automate Deployment
- Manage User Access to Infra Services
24 by 7 Infra and Application Performance Monitoring
- Tag Infra and App Services for collecting data
- Configure AWS and 3rd Party tools for Infra and App Monitoring
- Setup thresholds and alerts
- Proactively identify Performance Issues for Corrective Action
- Ensure Optimal Application performance
Incident Analysis & Resolution
- Help Desk (L1) Support
- Prepare User Guides & Technical Support docs
- Application Usage, Dashboards & Reports
- Involve AWS SREs (L2) for root cause analysis and incident resolution
- Route incidents to Application Development Team (L3) as needed
- SLA and incident management reporting
AWS Backup and Disaster Recovery
- Design Backup & DR Strategy
- Setup Backup & DR Environments
- Configure AWS tools and build scripts for Backup and archival
- Support for Restore Operations
Account Monitoring & Cost Optimization
- Manage AWS account and user access
- Monthly reports on resource utilization and billing
- Usage and Cost Monitoring for Reporting, Billing Analysis, and Cost Optimization
- Analysis of unused, over-provisioned resources
- Recommendations for Cost Optimization
- Implement proven strategies for Cost Reduction
- Implement CI/CD Pipelines using AWS & 3rd Party Tools
- Source Code control Audit & Management
- Provisioning, configuring and managing AWS infra resources using AWS tools & templates
- Deployment of microservices using AWS container services
- Log collection and analysis for app/infra health monitoring using AWS and 3rd Party Tool
Managed ServicesScope – AWS Infra Support
- 24*7 Monitoring of the Cloud infrastructure
- Basic Monitoring of Infrastructure using tools such as CloudWatch, Cloud Trail (Optional RAM, Disk File Metrics)
- Manage cloud services and components
- Elastic Cloud Compute (EC2) instances
- Network and Security (public subnets, private subnets, security rules, VLAN, etc.)
- Identity and Access Management
- Load Balancer
- Take backups and snapshots of application servers and database
- Creation of new AWS resources from pre-configured AMIs, creating read replicas, automating backups for new resources in AWS part of existing resources
- Provide dashboards for monitoring and SLA reports using the ITSM tool
TechVedika Managed Services Model
Tech Vedika managed services are based on SLA-driven technical support.
Alerts that are triggered from the monitoring tools are converted to incidents using the ITSM tool. The Cloud monitoring team acknowledges the incident based on the priority and assigns a team to the cloud ops team for resolution. Customers are provided with access to the ticket management system for raising support requests. The Service request raised is then routed to the CloudOps team for resolution.
The below diagram depicts the flow of the incident and the SR that is being raised.
The service level agreement is prepared along with inputs from customer business and IT teams and the actual time expected to resolve issues is determined based on the system impact and customer expectations.
|Priority||Priority Definition||Response Time||Resolution Time in hours|
|Priority 1 (P1)||
|Priority 2 (P2)||
|Priority 3 (P3)||
|Priority 4 (P4)||
Other than the above-defined priorities there will be an option to raise a service request for the ad-hoc activities through the ticket system. These ad-hoc requests are considered as low priority service requests and completion time is decided based on the type and scope of activity.
Managed Services Governance & Reporting Model
TechVedika will set up a governance model comprising of a steering committee and other stakeholders. TechVedika recommends the following communication and issues resolution process in the form of weekly, and monthly updates to ensure that the managed service is effective.
|Meeting||Schedule||Objectives||Members involved (both “Company Name” and Customer)|
|Project Meetings||Weekly||Review status against plan within project management, understand any issues that are impacting progress||Project Manager, Support team lead|
|Steering Committee Meetings||Monthly||To review progress, provide guidance and oversight to the project.
To monitor and communicate progress and status to company executives on a regular basis
To commit the required resources to the program
|Members of Steering Committee from “Company Name” and Customer (Sponsors, Business leaders, “Company Name” Delivery Executive)
“Company Name” and Customer Project Managers
Why TechVedika for Cloud Managed Services?
Tech Vedika has proven experience in cloud-native development and ongoing support of cloud-hosted applications. Tech Vedika brings technical architects and consultants with strong experience in handling public cloud platforms.
At TechVedika, we are committed to helping businesses leverage custom cloud solutions to control costs and automate critical processes. As a cloud-managed services provider, we set up, manage, and protect your cloud environment so you can focus on growing your business. With the experienced and certified AWS team, Process, Tools, and framework TechVedika can help manage customers’ workloads in a more optimized and secure way.
Organizations using AWS Cloud services often face challenges in ensuring that their workloads in the cloud are reliable, secure, efficient, and cost-effective. As cloud adoption increases and along with that the extent and scale of AWS infra & services usage, there is always room for improvement in terms of performance, security, cost, and architecture.
Cloud Consulting Service Offerings
TechVedika offers the following AWS consulting services to help organizations in increasing their ROI from cloud investments:
- Performance Assessment & Optimization
- Security & Compliance Assessment
- Cost Optimization Recommendations
- Cloud Architecture Services
1. Performance Assessment & Optimization
The objective of our performance optimization service is to help organizations in selecting the right resource types and sizes optimized for workload requirements, monitoring performance, and maintaining efficiency in accordance with business growth and challenges.
The activities performed as part of our performance assessment engagement are:
- Understand the customer application portfolio on AWS cloud
- Identify AWS services being used
- Understand the performance monitoring policies in place (rules and thresholds)
- Analysis of the existing log data using AWS CloudWatch and/or 3rd party tools to identify performance bottleneck areas
- Prioritization of improvement areas with inputs from customer Business and IT Teams
- Performance Baselining
- Recommendations for Performance Optimization
- Implementation Roadmap and guidance for Performance Tuning, defining/updating monitoring policies
- Optimize usage of AWS resources for cost optimization
- Ensure optimization workloads for maximum performance and SLA adherence
- Enhance operational efficiency and productivity
2. Security & Compliance Assessment
The focus of our security & compliance assessment engagement is for protecting information and systems including confidentiality and integrity of data, managing user permissions, and establishing controls to detect security events.
The activities performed as part of our security & compliance assessment engagement are:
- Understand the customer application portfolio and security practices
- AWS infrastructure Monitoring for identifying Security Risks
- Perform Security Vulnerability and Compliance Assessment based on CIS Benchmarks with a focus on
- Infra Security
- Configuration Management Practices
- Identity and Access Control Policies
- Monitoring and Logging Tools and Processes
- Patching and Hardening practices
- Compliance assessment (SOC, HIPAA, PCI, GDPR)
- Security Enhancement Recommendations such as
- Defense in Depth by strengthening Endpoint and Network Security using proven 3rd party tools
- Setting up AWS Config Rules and Alerts for detecting common misconfigurations
- Enable Security Configuration for Distributed Denial of Service (DDoS) Mitigation
- Enforce Firewall Management using AWS Web Application Firewall (WAF)
- Use AWS GuardDuty/Amazon Macie for Alert Investigation and Remediations
- Incident Management workflow for the analysis and resolution of High Priority Security Incidents
- Ensure adherence to Data Privacy, Security, and Compliance to meet Customer and Regulatory authority requirements
- Faster detection and resolution of Security Incidents
- 15 to 20% reduction in security incidents
3. Cost Optimization Recommendations
As part of our Cost Optimization Service Tech Vedika helps organizations analyze their existing AWS infrastructure in terms of costs incurred and reduce overall expenditure. While focusing on trimming unnecessary costs, we will take into account the business needs without compromising security and performance.
This service includes the following activities:
- Cost Assessment – Analysis of current usage and billing for AWS services
- Recommendations for Cost Optimization – for instance:
- Right-sizing instances
- Identification of underutilized instances
- Downgrade underutilized instances
- Opting for reserved, spot instances
- Deleting old snapshots
- Deleting unused/unattached Volumes
- Storage optimization
- Setting up mandatory cost Tagging to categorize resources by the owner, business unit, and environment towards cost accountability and optimization
- Choice of newer and better AWS services that can replace existing services
- Provide checklists and proven Strategies for Cost Reduction
- Recommend AWS tools and 3rd party tools for cost optimization
- Cost Monitoring techniques for ongoing Monitoring of Resource Usage and Cost for further optimization
- 30-40% reduction in annual subscription costs
- Proactive and ongoing Cost optimization for reduced spend and better budget control
4. Cloud Architecture Services
Tech Vedika’s AWS Architecture reviews help organizations in studying the state of current workloads and architecture vis-a-vis AWS architectural best practices. The outcome of these services will enable realizing a high-performing AWS infrastructure with well-architected cloud-native applications.
This service includes the following activities:
- AWS Application Portfolio Assessment for AWS Services Usage
- Architecture Reviews based on AWS Well-Architected Framework
- A detailed target architecture blueprint and recommendations
- Implementation Roadmap and guidance for realizing the target architecture
- Solution Architecture for New Development
- Cloud to Cloud and Cloud to On-premise Integration Architecture
- Microservices Design and Deployment including CQRS and Event sourcing
- API Management
Why Tech Vedika for AWS Cloud Consulting Services?
TechVedika has proven experience in providing AWS Cloud consulting services to organizations at various levels and maturity in cloud adoption. TechVedika brings experienced and certified technical architects and consultants with strong skills in handling public cloud platforms. We helped multiple organizations that already have AWS cloud programs in place but did not maximize its value in realizing its full potential. Our architects help adopt and improve the cloud-native approaches with a container and microservices-driven application practices.