TL Consulting Group

visualisation

data-lakehouse

Harnessing the Power of the Data Lakehouse

As organisations continue to collect more diverse data, it is important to consider a strategic & viable approach to unify and streamline big data analytics workloads, ensuring it is optimised to drive data-driven decisions and enable teams to continue innovating and create a competitive edge. Traditionally, data warehousing has supported the need for ingesting and storing structured data, and the data lake as a separate platform for storing semi-structured/unstructured data. The data lakehouse combines the benefits and capabilities between both and bridges the gap by breaking silos created by the traditional/modern data warehouse, enabling a flexible and modern data platform to serve big data analytics, machine learning & AI workloads in a uniform manner. What is a Data Lakehouse? A data lakehouse is a modern architecture that merges the expansive storage of a data lake with the structured data management of a data warehouse. Data lakehouse platforms offer a comprehensive & flexible solution for big data analytics including Data Engineering and real-time streaming, Data Science, and Machine Learning along with Data Analytics and AI. Key Benefits of Implementing a Data Lakehouse: There are many benefits that can be derived from implementing a data lakehouse correctly: Azure Data Lakehouse Architecture: The following are some of the key services/components that constitute a typical Data Lakehouse platform hosted on Microsoft Azure: Key Considerations when transitioning to a Data Lakehouse: The following are key considerations that need to be factored in when transitioning or migrating from traditional data warehouses/data lakes to the Data Lakehouse: Implementing a Data Lakehouse: Quick Wins for Success The following are small, actionable steps that organisations can take when considering to implement a Data Lakehouse platform: Conclusion In summary, the data lakehouse is a pathway to unlocking the full potential of your data, fostering innovation, and driving business growth. With the right components and strategic approach, your organisation can leverage Data Lakehouses to stay ahead of the curve, while maintaining a unified, cost-effective data platform deployed on your Cloud environment. TL Consulting are a solutions partner with Microsoft in the Data & AI domain. We offer specialised and cost-effective data analytics & engineering services tailored to our customer’s needs to extract maximum business value. Our certified cloud platform & data engineering team are tool-agnostic and have high proficiency working with traditional and cloud-based data platforms and open-source tools. Refer to our service capabilities to find out more.

Harnessing the Power of the Data Lakehouse Read More »

Cloud-Native, Data & AI, , , , , , , , , , , ,

How Exploratory Data Analysis (EDA) Can Improve Your Data Understanding Capability

How Exploratory Data Analysis (EDA) Can Improve Your Data Understanding Capability Can EDA help to make my phone upgrade decision more precise? You may have heard the term Exploratory Data Analysis (or EDA for short) and wondered what EDA is all about. Recently, one of the Sales team members at TL Consulting Group were thinking of buying a new phone but they were overwhelmed by the many options and they needed to make a decision suited best to their work needs, i.e. Wait for the new iPhone or make an upgrade on the current Android phone. There can be no disagreement on the fact that doing so left them perplexed and with a number of questions that needed to be addressed before making a choice. What was the specification of the new phone and how was that phone better than their current mobile phone? To help enable curiosity and decision-making, they visited YouTube to view the new iPhone trailer and also learned more about the new iPhone via user ratings and reviews from YouTube and other websites. Then they came and asked us how we would approach it from a Data Analytics perspective in theory. And our response was, whatever investigating measures they had already taken before making the decision, this is nothing more but what ML Engineers/data analysts in their lingo call ‘Exploratory Data Analysis’. What is Exploratory Data Analysis? In an automated data pipeline, exploratory data analysis (EDA) entails using data visualisation and statistical tools to acquire insights and knowledge from the data as it travels through the pipeline. At each level of the pipeline, the goal is to find patterns, trends, anomalies, and potential concerns in the data. Exploratory Data Analysis Lifecycle To interpret the diagram and the iPhone scenario in mind, you can think of all brand-new iPhones as a “population” and to make its review, the reviewers will take some iPhones from the market which you can say is a “sample”. The reviewers will then experiment with that phone and will apply different mathematical calculations to define the “probability” if that phone is worth buying or not. It will also help to define all the good and bad properties of the new iPhone which is called “Inference “. Finally, all these outcomes will help potential customers to make their decision with confidence. Benefits of Exploratory Data Analysis The main idea of exploratory data analysis is “Garbage in, Perform Exploratory Data Analysis, possibly Garbage out.” By conducting EDA, it is possible to turn an almost usable dataset into a completely usable dataset. It includes: Key Steps of EDA The key steps involved in conducting EDA on an automated data pipeline are: Types of Exploratory Data Analysis EDA builds a robust understanding of the data, and issues associated with either the info or process. It’s a scientific approach to getting the story of the data. There are four main types of exploratory data analysis which are listed below: 1. Univariate Non-Graphical Let’s say you decide to purchase a new iPhone solely based on its battery size, disregarding all other considerations. You can use univariate non-graphical analysis which is the most basic type of data analysis because we only utilize one variable to gather the data. Knowing the underlying sample distribution and data and drawing conclusions about the population are the usual objectives of univariate non-graphical EDA. Additionally included in the analysis is outlier detection. The traits of population dispersal include: Spread: Spread serves as a gauge for how far away from the Centre we should search for the information values. Two relevant measurements of spread are the variance and the quality deviation. Because the variance is the root of the variance, it is defined as the mean of the squares of the individual deviations. Central tendency: Typical or middle values are related to the central tendency or position of the distribution. Statistics with names like mean, median, and sometimes mode are valuable indicators of central tendency; the mean is the most prevalent. The median may be preferred in cases of skewed distribution or when there is worry about outliers. Skewness and kurtosis: The distribution’s skewness and kurtosis are two more useful univariate characteristics. When compared to a normal distribution, kurtosis and skewness are two different measures of peakedness. 2. Multivariate Non-Graphical Think about a situation where you want to purchase a new iPhone solely based on the battery capacity and phone size. In either cross-tabulation or statistics, multivariate non-graphical EDA techniques are frequently used to illustrate the relationship between two or more variables. An expansion of tabulation known as cross-tabulation is very helpful for categorical data. By creating a two-way table with column headings that correspond to the amount of one variable and row headings that correspond to the amount of the opposing two variables, a cross-tabulation is preferred for two variables. All subjects that share an analogous pair of levels are then included in the counts. For each categorical variable and one quantitative variable, we create statistics for quantitative variables separately for every level of the specific variable then compare the statistics across the amount of categorical variable. It is possible that comparing medians is a robust version of one-way ANOVA, whereas comparing means is a quick version of ANOVA. 3. Univariate Graphical Different Univariate Graphics Imagine that you only want to know the latest iPhone’s speed based on its CPU benchmark results before you decide to purchase it. Since graphical approaches demand some level of subjective interpretation in addition to being quantitative and objective, they are utilized more frequently than non-graphical methods because they can provide a comprehensive picture of the facts. Some common sorts of univariate graphics are: Boxplots: Boxplots are excellent for displaying data on central tendency, showing reliable measures of location and spread, as well as information on symmetry and outliers, but they can be deceptive when it comes to multimodality. The type of side-by-side boxplots is among the simplest applications for boxplots. Histogram: A histogram, which can be a barplot

How Exploratory Data Analysis (EDA) Can Improve Your Data Understanding Capability Read More »

Data & AI, , , , , , , , ,

Key Considerations When Selecting a Data Visualisation Tool

Data visualisation is the visual representation of datasets that allows individuals, teams and organisations to better understand and interpret complex information both quickly and more accurately. Besides considering the cost of the tool itself, there are other key considerations when selecting a data visualisation tool to implement within your business. These include: Identifying who are the end-users that will be consuming the data visualisation What level of interactivity, flexibility and availability of the data visualisation tool is required from these users?   What type of visualisations are needed to fit the business/problem statement and what type of analytics will drive this?  Who will be responsible for maintaining and updating the dashboards and reports within the visualisation tool? What is the size of the datasets and how complex are the workloads to be ingested into the tool? Is there an existing data pipeline setup or does this need to be engineered? Are there any requirements to perform pre-processing or transformation on the data before it is ingested into the data visualisation tool? The primary objective of data visualisation is to help individuals, teams and companies explore, monitor and explain large amounts of data by organizing and allowing for more efficient analysis and decision-making by enabling users to quickly identify patterns, correlations, and outliers in their data.  Data visualisation is an important process for data analysis and other interested parties as it can provide insights and uncover hidden patterns in data that may not be immediately apparent through either tabular or textual representations. With data visualisation, data analysts and other interested parties such as business SMEs can explore large datasets, identify trends from these datasets, and communicate findings with stakeholders more effectively.      There are many types of data visualisations that can be used depending on the type of data being analysed along with the purpose of the analysis. Common types of visualisations include graphs, bar charts, line scatter plots, heat maps, tree maps, and network diagrams.  For data visualisation to be effective, it requires careful consideration of the data being presented, the intended audience, and the purpose of the analysis. The visualisation that is being presented should be clear, concise, and visually appealing, with labels, titles, and colours used to highlight important points and make the information more accessible to the audience. The data visualisation needs to an effective storytelling mechanism for all end-users to understand easily. Another consideration is the choice of colours used, as the wrong colours can impact the consumers of the data visualisation and can impact visually impaired people (i.e., colour blindness, Darker vs Brighter contrasts as examples)  In recent years, data visualisation has become increasingly important as data within organisations continues to grow in complexity. With the advent of big data and machine learning technologies, data visualisation is playing a critical role in helping organisations make sense of their data, and become more data-driven with increased ‘time to insight’, as organisations facilitate better and faster decision-making.    Data Visualisation Tools & Programming Languages  At TL Consulting, our skilled and experienced data consultants use a broad range and variety of data visualisation tools to help create effective visualisations of our customer’s data. The most common are listed below:   Power BI is a business intelligence tool from Microsoft that allows users to create interactive reports and dashboards using data from a variety of sources. It includes features for data modelling, visualisation, and collaboration.  Excel: Excel is a Microsoft spreadsheet application and from a data visualisation perspective includes the capability to represent numerical data in a visual format.  Tableau: Tableau is a powerful data visualisation tool that allows users to create interactive dashboards, charts, and graphs using drag-and-drop functionality. It supports a wide range of data sources and has a user-friendly interface.  QlikView: QlikView is a first-generation business intelligence tool that allows users to create interactive visualisations and dashboards using data from a variety of sources. QlikView includes features for data modelling, exploration, and collaboration.  Looker:  Looker is a cloud-based Business Intelligence (BI) tool that helps you explore, share, and visualise data that drive better business decisions. Looker is now a part of the Google Cloud Platform. It allows anyone in your business to analyse and find insights into your datasets quickly.  Qlik Sense: Qlik Sense is the next-generation platform for modern, self-service-oriented analytics. Qlik Sense supports from self-service visualisation and exploration to guided analytics apps and dashboards, conversational analytics, custom and embedded analytics, mobile analytics, reporting, and data alerting.      In conjunction with the data visualisation tools listed above, there are a variety of programming languages using their various libraries that TL Consulting use in delivering outcomes to our customers that support not just Data Visualisation but also Data Analytics.  Python is a popular programming language that can be used for data analysis and visualisation. This can be done via tools such as Jupyter, Apache Zeppelin, Google Colab and Anaconda to name a few. Python includes libraries such as Matplotlib, Seaborn, Bokeh and Plotly for creating visualisations.  R is a programming language used for statistical analysis and data visualisation. It includes a variety of packages and libraries for creating charts, graphs, and other visualisations.  Scala is a strong statically typed high-level general-purpose programming language that supports both object-oriented programming and functional programming. Scala has several data visualisation libraries such as breeze-viz, Vegas, Doodle and Plotly Scala.  Go or Golang is a statically typed, compiled high-level programming language designed at Google. Golang has several data visualisation libraries that facilitate the creation of charts such as pie charts, heatmaps, scatterplots and boxplots.  JavaScript is a popular programming language that is a core client-side language of the w3.  It has rich data visualisation libraries such Chart JS, D3, FusionCharts suite, Pixi etc.      Conclusion In conclusion, there are several data visualisation tools and techniques available in the market. For organisations to extract meaningful insights from their data in a time-efficient manner, it’s important to consider these factors before selecting and implementing a new data visualisation tool for your business. TL

Key Considerations When Selecting a Data Visualisation Tool Read More »

Data & AI, , , , , , , ,