Data Warehousing | IndianTechnoEra - IndianTechnoEra
Latest update Android YouTube

Data Warehousing | IndianTechnoEra

Unit-1:Data Warehousing Definition, usage and trends, DBMS Vs Data warehouse, data marts, metadata, Multidimensional data mode, data cubes, Schemas f


Data Warehousing | IndianTechnoEra


What is Data Warehousing?

Data warehousing is a type of database that is used to store and manage large amounts of data from multiple sources. 

The data is organized into a single, unified data warehouse that can be used for reporting and analysis. Data warehousing often includes the use of data mining and data analysis tools to uncover patterns and trends in the data. 

Data warehousing is used in many industries, including finance, healthcare, and retail.



Data Warehousing usage and trends

Data warehousing usage and trends have been steadily increasing over the past few years. Companies of all sizes are looking to leverage their data to gain insights and make better decisions. 

The increase in usage is largely due to the availability of data mining and analytics tools, which allow users to easily analyze large amounts of data. 


Additionally, the cost of data storage and processing has decreased significantly in recent years, making it more affordable for organizations to implement data warehousing solutions.

Data warehousing is also becoming a more popular solution for cloud computing, as it allows organizations to store and process data without having to maintain their own hardware and infrastructure. 

This can help reduce costs and improve scalability. Additionally, cloud-based data warehousing solutions are often more secure and reliable than traditional data warehouses.


As data warehousing continues to gain traction, organizations will need to consider how to best take advantage of its potential. 

Companies should focus on developing systems that are both secure and efficient, and explore ways to leverage their data for more informed decision-making. With the right approach, data warehousing can provide tremendous value for organizations of all sizes.


DBMS Vs Data warehouse

DBMS 

A database management system (DBMS) is a software program that interacts with the user, other applications, and the database itself to capture and analyze data. 

A DBMS generally manipulates the data in a database and provides users and other software access to retrieve and analyze that data.


Data Warehouse

A data warehouse is a central repository of integrated data from one or more disparate sources. 

It includes data from operational systems and external data sources, and is used for reporting and data analysis. 

The data warehouse is the core of the business intelligence system, as it contains the data that is used for reporting and analysis. 

It provides a unified view of the data and enables users to analyze data from multiple sources.


Database (DBMS) vs Data Warehouse


Data marts

Data marts are subsets of data warehouses that are created for specific departments or areas of a company. 

They are used to store data that is specific to that department or area that can be used for analysis and reporting. 

Data marts are usually populated with data from a data warehouse, but they can contain data from other sources as well. 

Data marts are typically used for departmental reporting and analysis, and can be a more cost-effective alternative to using the full data warehouse for such tasks.


Data Warehouse vs Data Mart


What is meta data?

Meta data is data that describes other data. It is often used to provide additional context and information about a particular dataset. 

It can include details such as author, date created, file format, and keywords. Meta data is also commonly used to index and organize digital resources, making them easier to find and use.


Data warehouse metadata

Data warehouse metadata is the information that describes a data warehouse’s structure, operations and content. It includes information about the data warehouse’s data sources, data models, data mappings, data transformations, and data storage. It also contains information about the data warehouse’s usage, such as user access rights, user roles, and data security. Data warehouse metadata is typically stored in a metadata repository, which is a centralized database that stores and manages metadata across an organization.


Multidimensional data mode

In multidimensional data mode, data is organized into a series of dimensions or categories. 

Each dimension is made up of related data points that can be used to create a comprehensive picture of the data set. 

For example, a multidimensional data set might include data points such as sales figures, customer demographics, product details, and other related information. By analyzing these data points in combination, a more detailed understanding of the data can be gained.


data warehousing data cubes

Data cubes are multidimensional data structures used in data warehousing applications. 

They are typically composed of facts, or measures, which are numerical values, and dimensions, which are the context of the facts. 

Data cubes are used to enable users to view summarized data from different perspectives and provide an environment to run queries and data analysis. 

Data cubes are commonly used in OLAP (online analytical processing) applications, where users can analyze and visualize data from different angles and gain insights from the data.



Data cubes are composed of a set of dimensions, which are the context of the facts, and measures, which are numerical values. For example, a cube might have a dimension of time, such as month, and a measure of total sales. 

This cube would enable users to view the total sales in each month. Data cubes can also have more than one measure, such as total sales and average sales, which allows users to view both metrics simultaneously.


Data cubes are used to enable users to quickly explore and analyze data from different perspectives. They provide users with the ability to drill down into the data to gain insights and make decisions. 

Data cubes also enable users to aggregate and summarize data, as well as run queries to find patterns and correlations. Data cubes are an essential tool for data warehousing and business intelligence applications.



fact constellations for Multidimensional database

Here some schema list

1. Star Schema

2. Snowflake Schema

3. Fact Constellation Schema

4. Galaxy Schema

5. Degenerate Dimension Schema

6. Data Vault Modeling

7. Data Marts

8. Hybrid Schema


Star schema

This schema consists of a single fact table surrounded by multiple dimension tables. The fact table contains the primary measures or facts associated with the business process, such as sales, costs, and profits. The dimension tables contain the attributes that describe the facts, such as product, date, and location.


Snowflake schema

This schema is an extension of the star schema, where the dimension tables are further normalized into multiple related tables. This normalization helps reduce redundancy of data and improve query performance. The snowflake schema is more complex than the star schema, but it can also provide more depth and flexibility in the analysis.



Data warehouse process:

Here are some process 

1. Data Extraction: Data is extracted from multiple sources like databases, web services, spreadsheets, and other sources.

2. Data Transformation: The extracted data is then transformed into a consistent format suitable for analysis. This includes cleaning, filtering, and aggregating data.

3. Data Loading: The transformed data is then loaded into the data warehouse.

4. Data Analysis: The loaded data is analyzed using various tools like SQL, OLAP, and reporting tools.

5. Data Reporting: The analyzed results are then presented to the user in the form of reports and dashboards.


Data warehouse architecture

A data warehouse is an organized collection of data from multiple sources, stored in a single logical repository. 

Data warehouse architecture  | IndianTechnoEra


It is used to support decision-making processes by providing historical, current, and predictive insights.

A data warehouse architecture is a multi-tiered approach to data warehousing that includes an integrated set of components for gathering data from multiple sources, storing and managing the data, and delivering the data to end users for analysis and reporting. 

The architecture typically includes a data source layer, a data integration layer, a data warehouse layer, and an analytics layer. The data source layer is responsible for extracting and transforming data from multiple sources into a common format that can be loaded into the data warehouse. 

                                                                        The data integration layer is responsible for loading the data into the data warehouse and ensuring that the data is organized in a consistent format. The data warehouse layer is responsible for storing the data in a format that is optimized for analysis and reporting. The analytics layer is responsible for providing users with the tools to analyze and report on the data.


Data warehouse architecture layers:

Data warehouse architecture consists of three layers:

1. Staging Layer: This is the first layer of the data warehouse architecture. It is used to store the raw data from the source systems.

2. Data Warehouse Layer: This layer stores the cleaned and transformed data from the staging layer. It is used for analysis and reporting.

3. Presentation Layer: This layer is used to present the data in the form of reports and dashboards. It is the user interface layer of the data warehouse.


what is oltp?

OLTP (Online Transaction Processing) is a type of information system that supports transaction-oriented applications on a computer system, typically using a relational database as a data store. It is used to manage operational data and facilitate the day-to-day operations of businesses. OLTP systems are designed to process large volumes of transactions quickly and efficiently. Examples of OLTP systems include online banking systems, online retail systems, and point-of-sale systems.


What is olap?

OLAP stands for Online Analytical Processing. It is a type of software used to analyze and report large amounts of data quickly. OLAP enables users to easily retrieve, manipulate, and analyze data from multiple perspectives, allowing them to quickly answer complex business questions and uncover new trends and insights. OLAP systems are designed to identify patterns, trends, and relationships in data, allowing users to make informed decisions. 


Types of OLAP

There are four main types of OLAP:

1. Relational OLAP (ROLAP): ROLAP stores data in a relational database and uses SQL queries to analyze it.

2. Multi-dimensional OLAP (MOLAP): MOLAP stores data in a multidimensional cube structure and uses MDX queries to analyze it.

3. Hybrid OLAP (HOLAP): HOLAP combines ROLAP and MOLAP, using both relational databases and multidimensional cubes to store and analyze data.

4. Cloud OLAP (C-OLAP): C-OLAP uses a cloud-based platform to store and analyze data.


servers of OLAP

* Microsoft SQL Server Analysis Services (SSAS) 

* Oracle Database OLAP Option, IBM DB2 OLAP Server, Teradata OLAP Server 

* SAP BusinessObjects Analysis 

* SAP HANA, SAP BW/4HANA 

* Exasol 

* MicroStrategy 

* Jedox 

* Oracle Exalytics 

* Apache Kylin , Apache Spark,  Apache Hive , Apache Ignite 

* MemSQL 

* Microsoft Power BI 

* QlikView 

* Tableau



OLTP Vs OLAP

OLTP (Online Transaction Processing) 

Its is a system designed to process a high volume of small transactions and queries. It is used to store, update, and retrieve data from an operational database. OLTP systems are designed for real-time operations, and are optimized for rapid response times and accuracy.


OLAP (Online Analytical Processing) 

It is a system designed to perform complex and analytical operations. It is used to analyze data from multiple sources and provides insights for decision making and forecasting. OLAP systems are optimized for high-volume data analysis and are designed to provide answers to complex queries quickly.


ROLAP Vs MOLAP

ROLAP (Relational Online Analytical Processing) 

It is a type of OLAP (Online Analytical Processing) database technology that is based on relational database technology. It uses SQL to query and manipulate the data stored in the database, and also allows for the construction of complex queries to analyze the data.


MOLAP (Multidimensional Online Analytical Processing) 

It is a type of OLAP database technology that is based on multidimensional data storage structures. It utilizes a multidimensional database to store and analyze data, and provides access to data through the use of MDX (Multi-Dimensional Expressions) queries. Unlike ROLAP, MOLAP does not use SQL to query the data.


3-Tier data warehouse architecture

The 3-Tier data warehouse architecture is a type of distributed architecture that allows for the integration of data from multiple sources into a single data warehouse. It consists of three tiers: the front-end tier, the middle-tier and the back-end tier.

3-Tier data warehouse architecture


Front-end tier: The front-end tier is responsible for managing and controlling the user interface. It includes the user interface, query tools, and data access tools.


Middle-tier: The middle-tier is responsible for connecting and transforming data from multiple sources into the data warehouse. It includes ETL (Extract, Transform, Load) tools, data cleansing tools, data extraction tools, and data integration tools.


Back-end tier: The back-end tier is responsible for storing and managing the data warehouse. It includes the data warehouse, data mart, and OLAP (Online Analytical Processing) tools.


The 3-Tier data warehouse architecture is a powerful and flexible system that allows organizations to quickly integrate data from multiple sources into a single data warehouse. It is also cost-effective and scalable, which makes it an ideal solution for organizations of all sizes.


Distributed data warehouses

Distributed data warehouses are databases that are spread across multiple physical locations, such as multiple servers in multiple data centers, or even multiple clouds. 

This allows organizations to store and process massive amounts of data from multiple sources in a single system. 

It also enables them to scale up their data storage and processing power quickly and cost-effectively.


Virtual data warehouses

Virtual data warehouses are databases that are built on top of existing databases. 

They are a layer of software that sit between the physical data warehouse and the application that needs to access it. This layer allows organizations to access data from multiple sources in a single system without having to replicate it. 

This makes it easier for organizations to access and analyze data from multiple sources in a single system. 

Virtual data warehouses also enable organizations to quickly and cost-effectively scale their data storage and processing power.




data warehouse manager

A Data Warehouse Manager is responsible for managing the data warehouse operations of an organization. They are responsible for designing and maintaining the data warehouse systems, and for ensuring that the data warehouse is up-to-date and secure. 

The Data Warehouse Manager also oversees the development of data models and other database objects, and works with IT and data architects to ensure the data warehouse meets the organization's needs. 

The Data Warehouse Manager also works with other departments to ensure that their data needs are met.



---------------- End ----------------

Key: Data Warehousing, Definition, usage and trends, DBMS Vs Data warehouse, data marts, metadata, Multidimensional data mode, data cubes, Schemas for Multidimensional database: stars, snowflakes and fact constellations, Data warehouse process & architecture, OLTP Vs OLAP, ROLAP Vs MOLAP, types of OLAP, servers, 3-Tier data warehouse architecture, Distributed and virtual data warehouses, data warehouse manager,  Data Warehousing unit 2

Post a Comment

Feel free to ask your query...
Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.