TL Consulting Group

tools

How Exploratory Data Analysis (EDA) Can Improve Your Data Understanding Capability

How Exploratory Data Analysis (EDA) Can Improve Your Data Understanding Capability Can EDA help to make my phone upgrade decision more precise? You may have heard the term Exploratory Data Analysis (or EDA for short) and wondered what EDA is all about. Recently, one of the Sales team members at TL Consulting Group were thinking of buying a new phone but they were overwhelmed by the many options and they needed to make a decision suited best to their work needs, i.e. Wait for the new iPhone or make an upgrade on the current Android phone. There can be no disagreement on the fact that doing so left them perplexed and with a number of questions that needed to be addressed before making a choice. What was the specification of the new phone and how was that phone better than their current mobile phone? To help enable curiosity and decision-making, they visited YouTube to view the new iPhone trailer and also learned more about the new iPhone via user ratings and reviews from YouTube and other websites. Then they came and asked us how we would approach it from a Data Analytics perspective in theory. And our response was, whatever investigating measures they had already taken before making the decision, this is nothing more but what ML Engineers/data analysts in their lingo call ‘Exploratory Data Analysis’. What is Exploratory Data Analysis? In an automated data pipeline, exploratory data analysis (EDA) entails using data visualisation and statistical tools to acquire insights and knowledge from the data as it travels through the pipeline. At each level of the pipeline, the goal is to find patterns, trends, anomalies, and potential concerns in the data. Exploratory Data Analysis Lifecycle To interpret the diagram and the iPhone scenario in mind, you can think of all brand-new iPhones as a “population” and to make its review, the reviewers will take some iPhones from the market which you can say is a “sample”. The reviewers will then experiment with that phone and will apply different mathematical calculations to define the “probability” if that phone is worth buying or not. It will also help to define all the good and bad properties of the new iPhone which is called “Inference “. Finally, all these outcomes will help potential customers to make their decision with confidence. Benefits of Exploratory Data Analysis The main idea of exploratory data analysis is “Garbage in, Perform Exploratory Data Analysis, possibly Garbage out.” By conducting EDA, it is possible to turn an almost usable dataset into a completely usable dataset. It includes: Key Steps of EDA The key steps involved in conducting EDA on an automated data pipeline are: Types of Exploratory Data Analysis EDA builds a robust understanding of the data, and issues associated with either the info or process. It’s a scientific approach to getting the story of the data. There are four main types of exploratory data analysis which are listed below: 1. Univariate Non-Graphical Let’s say you decide to purchase a new iPhone solely based on its battery size, disregarding all other considerations. You can use univariate non-graphical analysis which is the most basic type of data analysis because we only utilize one variable to gather the data. Knowing the underlying sample distribution and data and drawing conclusions about the population are the usual objectives of univariate non-graphical EDA. Additionally included in the analysis is outlier detection. The traits of population dispersal include: Spread: Spread serves as a gauge for how far away from the Centre we should search for the information values. Two relevant measurements of spread are the variance and the quality deviation. Because the variance is the root of the variance, it is defined as the mean of the squares of the individual deviations. Central tendency: Typical or middle values are related to the central tendency or position of the distribution. Statistics with names like mean, median, and sometimes mode are valuable indicators of central tendency; the mean is the most prevalent. The median may be preferred in cases of skewed distribution or when there is worry about outliers. Skewness and kurtosis: The distribution’s skewness and kurtosis are two more useful univariate characteristics. When compared to a normal distribution, kurtosis and skewness are two different measures of peakedness. 2. Multivariate Non-Graphical Think about a situation where you want to purchase a new iPhone solely based on the battery capacity and phone size. In either cross-tabulation or statistics, multivariate non-graphical EDA techniques are frequently used to illustrate the relationship between two or more variables. An expansion of tabulation known as cross-tabulation is very helpful for categorical data. By creating a two-way table with column headings that correspond to the amount of one variable and row headings that correspond to the amount of the opposing two variables, a cross-tabulation is preferred for two variables. All subjects that share an analogous pair of levels are then included in the counts. For each categorical variable and one quantitative variable, we create statistics for quantitative variables separately for every level of the specific variable then compare the statistics across the amount of categorical variable. It is possible that comparing medians is a robust version of one-way ANOVA, whereas comparing means is a quick version of ANOVA. 3. Univariate Graphical Different Univariate Graphics Imagine that you only want to know the latest iPhone’s speed based on its CPU benchmark results before you decide to purchase it. Since graphical approaches demand some level of subjective interpretation in addition to being quantitative and objective, they are utilized more frequently than non-graphical methods because they can provide a comprehensive picture of the facts. Some common sorts of univariate graphics are: Boxplots: Boxplots are excellent for displaying data on central tendency, showing reliable measures of location and spread, as well as information on symmetry and outliers, but they can be deceptive when it comes to multimodality. The type of side-by-side boxplots is among the simplest applications for boxplots. Histogram: A histogram, which can be a barplot

How Exploratory Data Analysis (EDA) Can Improve Your Data Understanding Capability Read More »

Data & AI, , , , , , , , ,

The Importance of Feature Engineering in ML Modelling

When building Machine Learning (ML) models, we often encounter unorganised and chaotic data. In order to transform this data into explainable features, we rely on the process of feature engineering. Feature engineering plays a crucial role in the Cross Industry Standard Process for Data Mining (CRISP-DM). It is an integral part of the Data Preparation Step, responsible for organising the data effectively before it is ready for modeling. The diagram below illustrates the significance of feature engineering (FE) in the data mining process. CRISP-DM Process Model What is Feature Engineering? Feature Engineering (FE) is the process of extracting and organising important information from raw data in such a way that it fits the machine learning (ML) model. Feature Engineering Process(FE) Source: https://www.omnisci.com/technical-glossary/feature-engineering) Why is Feature Engineering Important? Feature Engineering (FE) has many benefits to offer in the CRISP-DM process. They include: Provides more flexibility and less complexity in models Faster data processing Understanding of models becomes easier Better understanding of the problem and questions to be answered Feature Engineering Techniques for Machine Learning (ML) Below is a list of feature engineering techniques and we will summarise each: Imputation: Handling Outliers Log Transformation One-Hot Encoding Scaling 1. Imputation Missing values is one of the most typical problems when it comes to data preparation. Human errors and dataflow interruptions are some of the major contributors to this problem. Moreover, missing values can detrimentally impact the performance of the ML models. An example of an imputation of NA values with Zero Imputation is frequently employed in healthcare research, such as when dealing with patient records that may have missing values for certain medical measurements. By imputing the missing data using methods like mean imputation or regression imputation, researchers can ensure that a complete dataset is available for analysis, allowing for more accurate assessments and predictions. 2. Handling Outliers Handling Outliers within datasets is an important technique with the purpose of creating an accurate representation of the data. This step must be completed prior to the model training step. There are various methods of handling outliers that include removal, replacing values, capping, and discretization. These methods will be discussed in detail in future blogs. An example of outliers Handling outliers is essential in financial analysis, for instance, when examining stock market data. By detecting and appropriately treating outliers using techniques like Winsorization or trimming, analysts can ensure that extreme values do not unduly influence statistical measures, leading to more robust and reliable insights and decision-making. 3. Log Transformation Log Transformation is one of the most prevalent methods used by data professionals. The technique transforms a skewed distribution of data into normally distributed or slightly skewed data. Therefore, making the data approximate for normal applications is required for different kinds of data analysis. Examples of Log Transformed Data Log transformation is commonly applied in skewed data distributions, such as when dealing with income or population data. By taking the logarithm of the values, the skewed distribution can be transformed into a more symmetric shape, facilitating more accurate modeling, analysis, and interpretation of the data. 4. One Hot Encoding One-Hot Encoding is a technique of preprocessing categorical variables into ML models. The encoding transforms a category variable into a binary feature for each category. It typically assigns a value of ‘1’ to the binary feature it corresponds to and all other binary features are set to ‘0’. An example of One Hot Encoding One-hot encoding is widely used in categorical data processing, such as in natural language processing tasks like sentiment analysis. By converting categorical variables into binary vectors, each representing a unique category, one-hot encoding enables machine learning algorithms to effectively interpret and utilize categorical data, facilitating accurate classification and prediction tasks. 5. Scaling Feature scaling is one of the hardest problems in data science to get right. However, it is not a mandatory step for all machine learning models. It is only applicable to distance-based machine learning models. The training model process requires data with a known set of features that need to be scaled up or down where it is deemed appropriate. The outcome of the scaling operation transforms continuous data to be similar in terms of range. The most popular techniques for scaling are Normalization and Standardisation, which will be discussed in detail in future blogs. Examples of Scaling Scaling is often used in image processing, such as when resizing images for a computer vision task. Scaling the images to a consistent size, regardless of their original dimensions ensures that the images can be properly processed and analysed, allowing for fair comparisons and accurate feature extraction in tasks like object recognition or image classification. Feature Engineering Tools There are a set of feature engineering tools that are popular in the market in terms of the capabilities it provides. We have listed a few of our recommendations: FeatureTools AutoFeat TsFresh OneBM ExploreKit Conclusion In summary, Feature Engineering is a crucial step in the CRISP-DM process before we even think about training our machine learning models. One of the core advantages include the training time of models is reduced significantly. As a result, it allows for a drastic reduction of cost in terms of utilisation of expensive computing resources. In this article, we learned a number of feature engineering techniques and tools that are used in the industry. Here at TL Consulting, our data consultants are experts at using feature engineering techniques to build highly accurate machine learning models, enabling us to deliver high-quality outcomes to support our customer’s data analytics needs. TL Consulting provides advisory and transformation services in the data analytics & engineering domain and has helped many organisations achieve their digital transformation goals. Visit TL Consulting’s data-engineering page to learn more about our service capabilities and send us an enquiry if you’d like to learn more about how our dedicated consultants can help you.

The Importance of Feature Engineering in ML Modelling Read More »

Data & AI, , , ,

Measuring Success Metrics that Matter

Measuring DevSecOps Success: Metrics that Matter In today’s fast-paced digital world, security threats are constantly evolving, and organisations are struggling to keep up with the pace of change. According to a recent Cost of a Data Breach Report by IBM, the average total cost of a data breach reached a record high of $4.35 million, with the average time to identify and contain a data breach taking 287 days. To mitigate these risks, enterprises are turning to DevSecOps, an approach that integrates security into the software development process. However, just adopting DevSecOps is not enough. Organisations must continually evaluate the effectiveness of their DevSecOps practices to ensure that they are adequately protecting their systems and data. As more businesses embrace DevSecOps, measuring DevSecOps success has become a critical component of security strategy. DevSecOps KPIs enable you to monitor and assess the advancement and effectiveness of DevSecOps practices within your software development pipeline, offering comprehensive insights into the determinants that impact success. These critical indicators facilitate the evaluation and measurement of collaborative workflows by development, security, and operations teams. By utilising these metrics, you can monitor the progress of your business objectives, such as expedited software-delivery lifecycles, enhanced security, and improved quality. Moreover, these key metrics furnish vital data for transparency and control throughout the development pipeline, facilitating the streamlining of development and enhancement of software security and infrastructure. Additionally, you can identify software defects and track the average time required to rectify those flaws. Number of Security Incidents One critical metric to track is the number of security incidents. Tracking the number of security incidents can help organisations identify the most common types of incidents and assess the frequency of incidents. By doing so, they can prioritise their efforts to address the most common issues and improve their overall security posture. Organisations can track the number of security incidents through various tools such as security incident and event management (SIEM) systems or logging and monitoring tools. By analysing the data from these tools, one can identify patterns and trends in the types of security incidents occurring and use this information to prioritise their security efforts. For instance, if an organisation finds that phishing attacks are the most common type of security incident, they can focus on training employees to be more vigilant against phishing attempts. Time to Remediate Security Issues Another essential metric to track is the time it takes to remediate security issues. This metric can help organisations identify bottlenecks in their security processes and improve their incident response time. By reducing the time, it takes to remediate security issues, organisations can minimise the impact of security incidents and ensure that their products remain secure. This metric can be tracked by setting up a process to monitor security vulnerabilities and track the time it takes to fix them. This process can include automated vulnerability scanning and testing tools, as well as manual code reviews and penetration testing. By tracking the time it takes to remediate security issues, organisations can identify areas where their security processes may be slowing down and work to improve those processes. Code Quality Metrics Code quality is another important aspect of DevSecOps, and tracking code quality metrics can provide valuable insights into the effectiveness of DevSecOps practices. Code quality metrics such as code complexity, maintainability, and test coverage can be tracked using code analysis tools such as SonarQube or CheckMarx. These tools can provide insights into the quality of the code being produced and identify areas where improvements can be made. For example, if a business finds that their code has high complexity, they can work to simplify the code to make it more maintainable and easier to secure. Compliance Metrics Compliance is another essential aspect of security, and measuring compliance metrics can help organisations ensure that they are meeting the necessary regulatory and industry standards. Tracking compliance metrics such as the number of compliance violations and the time to remediate them can help organisations identify compliance gaps and address them. Additionally, to ensure security, monitoring, vulnerability scanning, and vulnerability fixes are regularly conducted on all workstations and servers. Compliance metrics such as the number of compliance violations can be tracked through regular compliance audits and assessments. By monitoring compliance metrics, organisations can identify areas where they may be falling short of regulatory or industry standards and work to address those gaps. User Satisfaction Finally, tracking user satisfaction is an essential metric to ensure that security is not hindering user experience and that security is not compromising the overall quality of the product. Measuring user satisfaction can help organisations ensure that their security practices are not negatively impacting their users’ experience and that they are delivering a high-quality product. User satisfaction can be measured through surveys or feedback mechanisms built into software applications. By gathering feedback from users, businesses can identify areas where security may be impacting the user experience and work to improve those areas. For example, if users are finding security measures such as multi-factor authentication too cumbersome, organisations can look for ways to streamline the process while still maintaining security. In conclusion, measuring DevSecOps success is crucial for organisations that want to ensure that their software products remain secure. By tracking relevant metrics such as the number of security incidents, time to remediate security issues, code quality, compliance, and user satisfaction, organisations can evaluate the effectiveness of their DevSecOps practices continually. Measuring DevSecOps success can help organisations identify areas that need improvement, prioritise security-related tasks, and make informed decisions about resource allocation. To read more on DevSecOps security and compliance, please visit our DevSecOps services page.

Measuring Success Metrics that Matter Read More »

Cloud-Native, DevSecOps, , ,

The Hidden Costs of Outdated Technology

As the pace of technological advancement continues to soar, numerous enterprises find themselves struggling to keep up with the latest innovations. However, clinging to outdated technology can unleash a cascade of detrimental effects on productivity, employee morale, and the company’s bottom line. While postponing the upgrade of antiquated systems might appear financially prudent, the reality is that it often exacts a higher toll on businesses than the savings it promises. In this article, we will delve into the ways in which reliance on obsolete technology can inflate expenses, compelling businesses to confront the imperative of considering long-term costs. As systems grow older, they demand increasingly laborious and specialised maintenance, coupled with exorbitant fees for updates, patches, and licenses to ensure compatibility with modern counterparts. Astoundingly, studies estimate that a staggering 75% of the average IT budget is allocated solely to maintaining existing systems. Brace yourself as we uncover the hidden costs lurking behind the façade of outdated technology. Security vulnerabilities Security vulnerabilities: Outdated technology often falls behind in terms of the latest security features and patches, leaving it vulnerable to cyber threats. Hackers and malicious actors continuously adapt their tactics, while obsolete systems may lack the necessary safeguards to protect sensitive data and prevent breaches. The consequences of data breaches, compliance violations, and reputational damage can be significant. Unsupported systems are especially prone to security breaches and cyber-attacks, potentially exposing valuable data and intellectual property. In Australia, the average cost of a data breach in 2023 has skyrocketed to $5 Million, marking a substantial 13% increase from previous years. These astonishing statistics underscore the urgent necessity for businesses to prioritise the security of their technology infrastructure. Diminished Efficiency Diminished Efficiency: Outdated technology frequently lacks the latest cutting-edge features and capabilities that are essential for streamlining business processes and maximising productivity. Therefore, these obsolete systems tend to exhibit slower performance, decreased reliability, and an increased propensity for errors and downtime. This predicament forces employees to grapple with inefficient tools, resulting in the squandering of valuable time and resources. In fact, studies have revealed that maintaining outdated systems can lead to a staggering 30% decrease in productivity. This inefficiency incurs significant costs, both in terms of operational expenses and lost opportunities. The combination of sluggishness, unreliability, and a heightened vulnerability to errors or downtime culminates in a noticeable decline in overall efficiency. It is evident that clinging to obsolete systems not only hinders progress but also presents a substantial financial burden for enterprises seeking sustained success. Compatibility issues Compatibility issues: Outdated technology often faces compatibility issues when integrating with newer systems or software. For example, an older CRM system may struggle to sync data with a modern marketing automation platform, hindering information flow across departments. These issues impede data sharing, communication, and collaboration within the organisation. Workarounds and manual processes become necessary, consuming time, and increasing the risk of errors. Incompatibility with external systems or partners can result in missed opportunities and higher operational costs. Addressing these challenges is crucial to avoid inefficiencies, missed opportunities, and unnecessary expenses. Missed Innovation and Competitive Advantage Missed Innovation and Competitive Advantage: Enterprises that rely on outdated technology face challenges in keeping pace with competitors who embrace new and innovative solutions. Adopting modern technology can empower businesses to automate processes, optimise data gathering and analysis, elevate customer experiences, and stay ahead of industry trends. By neglecting to upgrade, businesses risk missing out on opportunities for growth, efficiency, and gaining a competitive edge. Embracing newer technology not only positions businesses for growth but also offers enhanced security features. Additionally, there can be tax benefits associated with operating costs. Unlike capital expenses, Software as a Service (SaaS) or Platform as a Service (PaaS) can be classified as operating costs, allowing for a 100% write-off instead of a smaller portion. Employee Dissatisfaction and Turnover Employee Dissatisfaction and Turnover: Outdated technology can have a detrimental impact on employee morale and job satisfaction. The frustration caused by slow and inefficient tools can significantly reduce productivity and breed discontent among employees. Over time, this dissatisfaction can contribute to higher turnover rates as employees actively seek technologically advanced workplaces that enable them to perform their duties more effectively. The challenges of dealing with sluggish programs and constant issues can generate frustration and stress for both leadership and general employees. It becomes challenging to excel in one’s role when the software fails to keep pace. Consequently, employee morale suffers, leading to an unfortunate increase in turnover. In conclusion, the hidden costs of outdated technology can have detrimental effects on businesses, including decreased productivity, security risks, missed opportunities, and employee dissatisfaction. To overcome these challenges, it is crucial for enterprises to prioritise investments in modern technology solutions. By embracing innovative systems and staying ahead of technological advancements, businesses can enhance productivity, improve security, capitalise on new opportunities, and foster a positive work environment. Investing in updated technology is an investment in the long-term success and sustainability of the business, ultimately leading to greater efficiency, profitability, and competitive advantage. Get in touch with our application modernisation experts at TL Consulting to fast forward your legacy.

The Hidden Costs of Outdated Technology Read More »

Cloud-Native, ,

The State of Observability 2023

The State of Observability 2023: Unlocking the Power of Observability The State of Observability 2023 study, recently released by Splunk, provides insights into the crucial role observability plays in minimising costs related to unforeseen disruptions in digital systems. In the fast-paced and intricate digital landscapes of today, observability has emerged as a beacon of light, illuminating the path towards efficient monitoring and oversight. Gone are the days of relying solely on traditional monitoring methods; observability offers a holistic perspective of complex systems by gathering and analysing data from diverse sources across the entire technology stack. With its comprehensive approach, observability has become an indispensable tool for comprehending the inner workings of digital ecosystems.  While DevOps and cloud-native architectures have become cornerstones of digital transformation, they also introduce a host of intricate observability challenges. The hurdles faced by organisations when implementing observability and security in Kubernetes were brought into focus in this year’s State of Observability survey conducted by Splunk. Respondents acknowledged the difficulties of effectively monitoring Kubernetes itself, which serves as a significant obstacle to achieving complete observability in their environments.  Now, let us explore some of the main findings uncovered in this report.  Main discoveries from this survey Observability leaders outshine beginners: Those who have embraced observability as a core practice outperform their counterparts in various aspects. These leaders experience a staggering 7.9 times higher return on investment (ROI) with observability tools, showing 3.9 times more confidence in meeting requirements, and resolving downtime or service issues four times faster.  The expanding observability ecosystem: The study reveals that the observability landscape has witnessed a recent surge in the adoption of observability tools and capabilities. An impressive 81% of respondents reported using an increasing number of observability tools, with 32% noting a significant rise. However, managing multiple vendors and tools presents a challenge when it comes to achieving a unified view for IT professionals.  Changing expectations around cloud-native apps: While the percentage of respondents expecting a larger portion of internally developed apps to be cloud-native has declined (from 67% to 58%), there has been an increase in those anticipating the same proportion (from 32% to 40%). A small percentage (2%) expects a decrease. This shift highlights the evolving landscape of application development and the growing importance of cloud-native technologies.  The convergence of observability and security monitoring: Organisations are recognising the benefits of merging observability and security monitoring disciplines. By combining these practices, enhanced visibility and faster incident resolution can be achieved, ensuring the overall robustness of digital systems.  Harnessing the power of AI and ML: AI and ML have become integral components of observability practices, with 66% of respondents already incorporating them into their workflows. An additional 26% are in the process of implementing these advanced technologies, leveraging their capabilities to gain deeper insights and drive proactive monitoring.  Centralised teams and talent challenges: Organisations are increasingly consolidating their observability experts into centralised teams equipped with standardised tools (58%), rather than embedding them within application development teams (42%). However, recruiting observability talent remains a significant challenge, with difficulties in hiring ITOps team members (85%), SRE (86%), and DevOps engineers (86%) being highlighted.  Conclusion In conclusion, observability has become an indispensable force in today’s hypercomplex digital environments. By providing complete visibility and context across the full stack, observability empowers organisations to ensure digital health, reliability, resilience, and high performance. Building a centralised observability capability enables proactive monitoring, issue detection and diagnosis, performance optimisation, and enhanced customer experiences. This goes beyond simply adopting tools into a more strategic approach that involves rolling out standardised practices across the full stack in which both platform teams and application teams participate to build and consume. As digital ecosystems continue to evolve, harnessing the power of observability will be key to unlocking the full potential of modern technologies and achieving digital transformation goals.

The State of Observability 2023 Read More »

Cloud-Native, DevSecOps, , , ,

Key Considerations When Selecting a Data Visualisation Tool

Data visualisation is the visual representation of datasets that allows individuals, teams and organisations to better understand and interpret complex information both quickly and more accurately. Besides considering the cost of the tool itself, there are other key considerations when selecting a data visualisation tool to implement within your business. These include: Identifying who are the end-users that will be consuming the data visualisation What level of interactivity, flexibility and availability of the data visualisation tool is required from these users?   What type of visualisations are needed to fit the business/problem statement and what type of analytics will drive this?  Who will be responsible for maintaining and updating the dashboards and reports within the visualisation tool? What is the size of the datasets and how complex are the workloads to be ingested into the tool? Is there an existing data pipeline setup or does this need to be engineered? Are there any requirements to perform pre-processing or transformation on the data before it is ingested into the data visualisation tool? The primary objective of data visualisation is to help individuals, teams and companies explore, monitor and explain large amounts of data by organizing and allowing for more efficient analysis and decision-making by enabling users to quickly identify patterns, correlations, and outliers in their data.  Data visualisation is an important process for data analysis and other interested parties as it can provide insights and uncover hidden patterns in data that may not be immediately apparent through either tabular or textual representations. With data visualisation, data analysts and other interested parties such as business SMEs can explore large datasets, identify trends from these datasets, and communicate findings with stakeholders more effectively.      There are many types of data visualisations that can be used depending on the type of data being analysed along with the purpose of the analysis. Common types of visualisations include graphs, bar charts, line scatter plots, heat maps, tree maps, and network diagrams.  For data visualisation to be effective, it requires careful consideration of the data being presented, the intended audience, and the purpose of the analysis. The visualisation that is being presented should be clear, concise, and visually appealing, with labels, titles, and colours used to highlight important points and make the information more accessible to the audience. The data visualisation needs to an effective storytelling mechanism for all end-users to understand easily. Another consideration is the choice of colours used, as the wrong colours can impact the consumers of the data visualisation and can impact visually impaired people (i.e., colour blindness, Darker vs Brighter contrasts as examples)  In recent years, data visualisation has become increasingly important as data within organisations continues to grow in complexity. With the advent of big data and machine learning technologies, data visualisation is playing a critical role in helping organisations make sense of their data, and become more data-driven with increased ‘time to insight’, as organisations facilitate better and faster decision-making.    Data Visualisation Tools & Programming Languages  At TL Consulting, our skilled and experienced data consultants use a broad range and variety of data visualisation tools to help create effective visualisations of our customer’s data. The most common are listed below:   Power BI is a business intelligence tool from Microsoft that allows users to create interactive reports and dashboards using data from a variety of sources. It includes features for data modelling, visualisation, and collaboration.  Excel: Excel is a Microsoft spreadsheet application and from a data visualisation perspective includes the capability to represent numerical data in a visual format.  Tableau: Tableau is a powerful data visualisation tool that allows users to create interactive dashboards, charts, and graphs using drag-and-drop functionality. It supports a wide range of data sources and has a user-friendly interface.  QlikView: QlikView is a first-generation business intelligence tool that allows users to create interactive visualisations and dashboards using data from a variety of sources. QlikView includes features for data modelling, exploration, and collaboration.  Looker:  Looker is a cloud-based Business Intelligence (BI) tool that helps you explore, share, and visualise data that drive better business decisions. Looker is now a part of the Google Cloud Platform. It allows anyone in your business to analyse and find insights into your datasets quickly.  Qlik Sense: Qlik Sense is the next-generation platform for modern, self-service-oriented analytics. Qlik Sense supports from self-service visualisation and exploration to guided analytics apps and dashboards, conversational analytics, custom and embedded analytics, mobile analytics, reporting, and data alerting.      In conjunction with the data visualisation tools listed above, there are a variety of programming languages using their various libraries that TL Consulting use in delivering outcomes to our customers that support not just Data Visualisation but also Data Analytics.  Python is a popular programming language that can be used for data analysis and visualisation. This can be done via tools such as Jupyter, Apache Zeppelin, Google Colab and Anaconda to name a few. Python includes libraries such as Matplotlib, Seaborn, Bokeh and Plotly for creating visualisations.  R is a programming language used for statistical analysis and data visualisation. It includes a variety of packages and libraries for creating charts, graphs, and other visualisations.  Scala is a strong statically typed high-level general-purpose programming language that supports both object-oriented programming and functional programming. Scala has several data visualisation libraries such as breeze-viz, Vegas, Doodle and Plotly Scala.  Go or Golang is a statically typed, compiled high-level programming language designed at Google. Golang has several data visualisation libraries that facilitate the creation of charts such as pie charts, heatmaps, scatterplots and boxplots.  JavaScript is a popular programming language that is a core client-side language of the w3.  It has rich data visualisation libraries such Chart JS, D3, FusionCharts suite, Pixi etc.      Conclusion In conclusion, there are several data visualisation tools and techniques available in the market. For organisations to extract meaningful insights from their data in a time-efficient manner, it’s important to consider these factors before selecting and implementing a new data visualisation tool for your business. TL

Key Considerations When Selecting a Data Visualisation Tool Read More »

Data & AI, , , , , , , ,

Kubernetes container design patterns

Kubernetes container design patterns Kubernetes is a robust container orchestration tool, but deploying and managing containerised applications can be complex. Fortunately, Kubernetes container design patterns can help simplify the process by segregating concerns, enhancing scalability and resilience, and streamlining management. In this blog post, we will delve into five popular Kubernetes container design patterns, showcasing real-world examples of how they can be employed to create powerful and effective containerised applications. Additionally, we’ll provide valuable insights and tool recommendations to help you implement these patterns with ease. Sidecar Pattern: The first design pattern we’ll discuss is the sidecar pattern. The sidecar pattern involves deploying a secondary container alongside the primary application container to provide additional functionality. For example, you can deploy a logging sidecar container to collect and store logs generated by the application container. This improves the scalability and resiliency of your application and simplifies its management. Similarly, you can deploy a monitoring sidecar container to collect metrics and monitor the health of the application container. The sidecar pattern is a popular design pattern for Kubernetes, with many open-source tools available to simplify implementation. For example, Istio is a popular service mesh that provides sidecar proxies to handle traffic routing, load balancing, and other networking concerns. Ambassador Pattern: The ambassador pattern is another popular Kubernetes container design pattern. This pattern involves using a proxy container to decouple the application container from its external dependencies. For example, you can use an API gateway as an ambassador container to handle authentication, rate limiting, and other API-related concerns. This simplifies the management of your application and improves its scalability and reliability. Similarly, you can use a caching sidecar container to cache responses from external APIs and reduce latency and improve performance. This ensures that the application is properly configured and ready to run when the primary container runs. The ambassador pattern is commonly used for API management in Kubernetes. Tools like Nginx,Kong and Traefik provide API gateways that can be deployed as ambassador containers to handle authentication, rate limiting, and other API-related concerns. Adapter Pattern: The adapter pattern is another powerful Kubernetes container design pattern. This pattern involves using a container to modify an existing application to make it compatible with Kubernetes. For example, you can use an adapter container to add health checks, liveness probes, or readiness checks to an application that was not originally designed to run in a containerised environment. This can help ensure the availability and reliability of your application when running in Kubernetes. Similarly, you can use an adapter container to modify an application to work with Kubernetes secrets, environment variables, or other Kubernetes-specific features. The adapter pattern is often used to migrate legacy applications to Kubernetes. Tools like Kubernetes inlets and kompose provide an easy way to convert Docker Compose files to Kubernetes YAML and make the migration process smoother Sidecar injector Pattern: The sidecar injector pattern is another useful Kubernetes container design pattern. This pattern involves dynamically injecting a sidecar container into a primary application container at runtime. For example, you can inject a container that performs security checks and monitoring functions into an existing application container. This can help improve the security and reliability of your application without having to modify the application container’s code or configuration. Similarly, you can inject a sidecar container that provides additional functionality such as authentication, rate limiting, or caching. The Sidecar Injector pattern is a dynamic method of injecting sidecar containers into Kubernetes applications during runtime. By utilizing the Kubernetes admission controller webhook, the injection process can be automated to guarantee that the sidecar container is always present when the primary container initiates. An excellent instance of the Sidecar Injector pattern is the HashiCorp Vault Injector, which enables the injection of secrets into pods. Init container pattern: Finally, the init container pattern is a valuable Kubernetes container design pattern. This pattern involves using a separate container to perform initialization tasks before the primary application container starts. For example, you can use an init container to perform database migrations, configuration file generation, or application setup. This ensures that the application is properly configured and ready to run when the primary container. In conclusion, Kubernetes container design patterns are essential for building robust and efficient containerised applications. By using these patterns, you can simplify the deployment, management, and scaling of your applications. The patterns we discussed in this blog are just a few examples of the many design patterns available for Kubernetes, and they can help you build powerful and reliable containerised applications that meet the demands of modern cloud computing. Whether you’re a seasoned Kubernetes user or just starting out, these container design patterns are sure to help you streamline your containerised applications and take your development to the next level.

Kubernetes container design patterns Read More »

Cloud-Native, DevSecOps, , , , ,

What can we expect for Kubernetes in 2023?

What can we expect for Kubernetes in 2023? As Kubernetes approaches the eighth anniversary of its first version launch, we look into the areas of significant change. So what does the Kubernetes ecosystem look like and What can we expect for Kubernetes in 2023? In short, is huge and continues to grow. As more businesses, teams, and people use it as a platform for innovation, more new applications will be created and old ones will be scaled more quickly than ever before, fuelling its continual development. The State of Kubernetes 2022 study from VMware Tanzu and the most recent Annual Cloud Native Computing Foundation (CNCF) Survey both indicate that Kubernetes is widely adopted and continues to grow in popularity as a platform for container orchestration. These studies suggest that Kubernetes has become a de facto standard in the industry and its adoption will likely continue to increase in the coming years. Anticipated Shift towards Kubernetes on multi cloud As we move forward into 2023, it’s becoming increasingly common for businesses to utilize multiple cloud providers for their Kubernetes deployments. This trend, known as multi-cloud/hybrid deployments, often involves the use of container orchestration and federated development and deployment strategies. While there are already tools available for deploying and managing containers across a variety of cloud providers and on-premises platforms, we can expect to see even more advancements in this area. Specifically, there will likely be an increase in technology that makes it easier to create and deploy multi-cloud systems using native cloud services that work seamlessly across different providers. Multi-cloud adoption allows businesses to take advantage of the strengths of different cloud providers, such as leveraging the best database solutions from one provider and the best serverless offerings from another. This approach can also increase flexibility, reduce vendor lock-in, and provide redundancy and disaster recovery options. Additionally, it can allow for cost optimization by taking advantage of different pricing models and promotions offered by different providers. Continual Evolution of DevOps and Platform Teams: To survive in this digital age, businesses need to have a diverse set of skills and knowledge areas within their workforce. Close collaboration between different departments and disciplines is essential for leveraging new technologies like Kubernetes and other cloud platforms. However, these technologies can be difficult to learn and maintain, and teams may struggle to gain in-depth understanding of them. Businesses should focus on automation and acceleration, but also invest in training and development programs to help their teams acquire the necessary skills to effectively use these technologies. Companies of all sizes should think about where they want to develop their Kubernetes knowledge base. Many businesses choose a platform team to develop and implement this knowledge. Multiple DevOps teams can be supported by a single platform team. This separation allows DevOps teams to continue concentrating on creating and running business applications while the platform team looks after a solid and dependable underpinning platform. Improved Stateful Application Management: Containers were originally intended to be a means of operating stateless applications. However, the value of running stateful workloads in containers has been recognised by the community over the last few years, and the newer versions of Kubernetes have added the required functionalities. Now there are better ways to deploy stateful applications, but the outcome is far from ideal and inconsistent. By including a controller in the cluster, K8s operators can resolve this difficulty. Reconciliation loops are controller loops that monitor differences between the current and intended states and adjust return the current state to the desired state. Maturity in Policy-as-Code for Kubernetes The goal has been to give teams more autonomy when delivering applications to Kubernetes for several years. In many businesses today, creating pipelines that can quickly send out apps is standard procedure. Although having autonomy is a great advantage, maintaining some manual control still requires finding the proper balance. The transition to everything as a code has opened a plethora of opportunities. Following accepted engineering principles will make it simple to validate and review policies defined as-code. As a result, the importance of policy frameworks will increase. Within the CNCF, Open Policy Agent (OPA) is the most common policy framework. Practices like this will advance concurrently with the adoption of Kubernetes and autonomous teams to enable continual growth while preserving or even gaining more control. Adoption enables you to control how Kubernetes is used by a wide range of teams. Enhanced Observability and Troubleshooting capabilities: Troubleshooting applications running on a Kubernetes cluster at scale can be challenging due to the complexity of Kubernetes and the relationships between different elements. Providing teams with effective troubleshooting solutions can give an organization a competitive advantage. The Four elements (Events, Logs, Traces, Metrics) are important in understanding the performance and behaviour of a system. They provide different perspectives and details on system activity, and when combined, give a more complete picture of the issue. Solutions that integrate these four elements can aid in faster troubleshooting and problem resolution and can also help in identifying and preventing future issues. Vendors and open-source frameworks will continue to drive this trend. Focus on supply chain security: Software supply chain security has been in laser sights for a while now, as most software rooted from other software. The necessity of ensuring Kubernetes’ strength has increased along with its importance as it becomes more widely adopted, it is important to ensure its security as it is a critical component of the software supply chain. This includes securing the infrastructure on which it runs, as well as securing the containerized applications that are deployed on it. The “4C’s of cloud native security” model is a good place to start thinking about the security of the different layers of a cloud native application: Cloud, Clusters, Containers, and Code. Each layer of the Cloud Native security model builds upon the next outermost layer, and they are equally important when considering security practices and tools. This can be done through a variety of methods, such as using secure configurations, implementing network

What can we expect for Kubernetes in 2023? Read More »

Cloud-Native, DevSecOps, , , ,

Progressive Delivery with Kubernetes:

Progressive Delivery (the GitOps way) with Kubernetes: One of the biggest challenges organisations faces, especially when running microservices, is managing application deployments. Having a proper deployment strategy is necessary. For instance, in a production environment, it is always a change management process requirement to ensure that the downtime impact on the end-user is minimised and maintenance windows need to be planned to cater for any changes that will cause an outage. It is also mandated that in case of any issues when deploying the change, a rollback plan must be ready for execution to recover from any failures. These challenges amplify with the increase of the number of microservices and makes it more difficult to assess the result of the deployment and execute the rollback if required. Enter progressive delivery. Thankfully, cloud native architectures using Kubernetes running microservices addresses this problem by offering increased flexibility, allowing teams to publish more useful updates more frequently and progressively. The use of release techniques like Canary, Blue-Green, and Feature flagging as part of progressive delivery enables teams to maximise an enterprise’s software delivery. It is predicated on the notion that consumers desire to test features prior to completion to enhance the user experience. In Kubernetes, there are different ways to release an application. It is necessary to choose the right strategy to make the infrastructure reliable and more resilient during an application deployment or update. The out of the box Kubernetes Deployment Object supports the Rolling Update strategy which comes as a standard and provides a basic set of safety guarantees (aka. readiness probes) during an update. When deploying into a development/staging environment, standard Kubernetes deployment strategies such as a recreate or rolling deployment might be a good option. However, the rolling update strategy faces may limitations such as controlling the speed and flow of the rollout. in large scale high-volume production environments, a rolling update is often considered too risky of an update procedure since it provides no control over the blast radius, may rollout too aggressively, and provides no automated rollback upon failures. In production environments, more advanced deployment strategies are much needed to satisfy the business requirements. These advanced strategies are called “Progressive Deployments”. An example of these deployment strategies is the Blue/Green deployment which allows for a quick transition between the old version and the new version by deploying them side by side and then switching to the new version when testing has been successful. This testing of the platform needs to be thorough to avoid having to rollback frequently. If unsure of the platform’s stability or the potential effects of releasing a new software version, a canary deployment offers a smaller scale next version of the release running side by side the current version in production. By doing so, the new release is rolled-out to a small subset of users to test the application and provide their feedback. Once the change is accepted, it is rolled out to the rest of the users. Benefits of Progressive Delivery: Progressive delivery lowers the risk of releasing new features, as well as identifying and resolving possible issues with those additions. It also offers early feedback on any version of your application. Before a feature is fully deployed, the developer can test out various changes on the product to see how the application behaves. The idea is that the developer can alter the release strategy if the modifications are unfavourable to prevent end users from experiencing any glitches. Secondly, improved release frequency results from sequential delivery. While the primary goal of progressive delivery is to provide end users with safer, more dependable releases, you as the DevOps team will benefit from being able to deploy new versions in smaller parts and hence release more frequently. You can work on each feature separately and release it in tiny sprints. The time to market is shortened, and any DevOps team can now deploy better software more quickly. Finally, and this is something that is frequently ignored, progressive delivery leads to improved segregation of duties between the development and operations teams. This segregation of duties works better with progressive delivery since developers concentrate on creating new features while operations concentrate on rolling out the new features gradually in a strategy that suits the operational needs of the platform. Progressive delivery is best achieved with GitOps: This demand for progressive delivery in a cloud native manner can be achieved with GitOps. The objective behind GitOps is to define and declare everything in Git including operational tasks. Git is already used by developers to generate and collaborate on code. GitOps simply expands this concept to include the creation and setup of infrastructure as well. Git becomes the control plane for operations and deployments because everything is declared as code in Git. GitOps is being enabled by open-source tooling such as ArgoCD, Flux and Flagger, which automatically checks Git repositories for any new changes, and if it detects a change, it automatically deploys it to production. With progressive delivery, these automated deployments need to be done in phases and to multiple target Kubernetes clusters. These tools offer full control of the software delivery pipeline, rollback strategies, test executions, feature releases, and scaling of infrastructure resources. In conclusion, there are various methods for deploying an application to cater for applications with varying complexities, teams with different demands, and environments with different operational requirements and compliance levels. Selecting the right strategy or strategies and having full control over these strategies in code when combined by the right tools is an extremely powerful feature of cloud native platforms that greatly simplifies change management, release management, and operations of the applications. It completely disrupts the way operations teams traditionally thought of these processes as rigid and extremely sensitive with a dramatic business impact in case anything went wrong, into simplified processes and tasks that can be executed every day in the background without having an impact on the end-user.

Progressive Delivery with Kubernetes: Read More »

Cloud-Native, DevSecOps, , , ,

Building a Robust Data Governance Framework in 2023

In today’s data-driven world with accelerating advancements in Artificial Intelligence (AI) and advanced analytics, organisations play an important role in ensuring that the data they collect, store, and analyse is underpinned by a strong data governance framework. Embedding the right data governance framework is the enablement of an organisation’s data strategy which requires dedicated planning and strategic direction from various business & technical stakeholders and should be driven from the “top-down” rather than “bottom-up”. To achieve this, organisations should focus on defining their information and data lifecycle management, data relationships and classification, data privacy, data quality, and data integrity to become more competitive and resilient.  The key fundamental challenge for organisations is to embed data standardisation, data security & compliance horizontally across the enterprise, thereby eliminating silos with their own disparate ways of working. In addition, it’s important for organisations to align their data governance framework with their data lifecycle, business strategy and goals, enabling a more agile approach to accommodate the organisation’s current & future needs.   Data Governance Framework Best Practices As organisations collect more and more data points, it’s important to define the right standards, policies, controls, accountability, and ownership (roles & responsibilities). A data governance framework will ensure the organisation abides by these standards while ensuring data that is collected and stored is secure, with a focus on maintaining data integrity and data quality. Ultimately, the data that is consumed by end-users should enable informative, data-driven decisions to be made. A constant re-evaluation is recommended to ensure the organisation’s data governance program is modernised and caters to the latest advancements in data and technology. Prior to defining a data governance framework, a comprehensive data discovery should be performed across the business landscape to create a unified view. This would aid in establishing data governance across the following areas: Data cataloging of data relationships, data quality, and data lineage Data classification and sourcing Metadata definition (Technical and Enterprise metadata) Data compliance, security, and privacy Data analytics & engineering Data storage & sharing The following diagram is a high-level example of a data governance framework. This model should be aligned with the organisation’s data and information management lifecycle. The framework definition should be evaluated from a People, Processes & Technology/Tooling perspective considering data stewardship, efficiencies, data security & access controls, alongside standardised processes governing the technology and tools that facilitate the production, consumption, and processing of the organisation’s data. The following sections highlight a few key areas which the data governance framework should address: Alignment to the Organisation’s Cloud Strategy When uplifting the data governance program, another important consideration for organisation’s that are building technology solutions on Cloud is to define an integrated data governance architecture across their environments, whether it be hybrid or multi-cloud. Alignment to their cloud strategy can help in the following areas: Improve data quality with better management & tooling available around data cleansing and enrichment Build a holistic, unified view of the organisation’s data through discovery and benchmarking Gain higher visibility into data lineage and track data end-to-end from source to target Build more effective data catalogs to ensure it benefits organisational needs to search and access the right data when needed Proactively review, monitor, and measure the data to ensure data consistency and data integrity is preserved For example, Microsoft offers an Azure Governance service as a management and governance cloud solution that features advanced capabilities to help manage data throughout its entire IT lifecycle and track data flows end-to-end, ensuring the right people have access to reliable, accurate data they need, whenever they need it. Data Privacy & Compliance As organisations continue building insights and implementing advanced analytics to learn more about their customers and create more tailored experiences, protecting sensitive data attributes including Personal Information (PI) should be at the heart of the organisation’s data security & data privacy practices, as part of their data governance framework. With the rise of cyber-attacks & data breaches, organisations should consider implementing data obfuscation techniques to “mask” or “encrypt” their PI source data, especially across non-production environments where the access controls are considered weaker than production environments, and the “internal” threat can be considered just as high as the external cyber threats. Applying data obfuscation techniques would ensure the PI data attributes are de-sensitized prior to their use in development, testing and data analytics. In addition, organisations should ensure data controls & access policies are reviewed more frequently than ever. Understanding who has access to the underlying data sources and platforms will help organisations maintain a good risk posture and should be assessed against their data governance framework, across their environments whether on-premise or on Cloud. Augmented Analytics & Machine Learning Without advanced analytics, data loses a lot of its usability and power. Advanced analytics combines the power of machine learning and artificial intelligence to help teams make data-driven decisions based on in-depth insights. Advanced analytics tools greatly streamline the data analysis process and help to provide a competitive edge, uncovering patterns and insights that manual data analysis may overlook. With the introduction of open-source machine learning models such as Open AI’s ChatGPT, how do organisations ensure the data that is collected, analysed, and presented is highly accurate and high quality? Depending on the data models & training algorithms used, these insights can be deeply flawed and it’s important for organisations to embed the right data governance policies around the use of open-source data models, including the collection, use, and analysis of the data points collected. A few roles that data governance plays in the world of augmented analytics, machine learning, and AI include: Providing guidance on what data is collected and how it’s used to train and validate data models for machine learning models to generate advanced analytics Providing standardization on the data science lifecycle and algorithms applied for generating insights, along with data cleansing & enrichment exercises Defining the best practices and policies when introducing new data models, along with measures to fine-tune and train models to increase data accuracy

Building a Robust Data Governance Framework in 2023 Read More »

Cloud-Native, Data & AI, , , , , ,