DATA WAREHOUSE---RALPH KIMBALL
Þ A data warehouse is a database, which is specifically designed for analyzing the business. But not for a business transactional processing.
Þ A data warehouse is designed to support decision making process.
Þ A Data warehouse is design to provide answers to the following business questions.
Þ Ex: Sales Analysis
Þ The Following Questions can be answered when performing sales analysis
Þ To whom you sold?
Þ What you sold?
Þ When you sold?
Þ Where you sold?
DATA WAREHOUSE—D.H INMAN
A data warehouse is a
1) Time-variant
2) Nonvolatile
3) Subject-Oriented
4) Integrated
1) Time-variant: A data warehouse is a Time-variant database, which business users in analyzing the business with different time periods, compare the data and identify the trends in the business.
2) Non-volatile: A data warehouse is non-volatile database, once the data enter into the data warehouse it does not effects to the changes which takes place at operational database. Hence the data is static.
3) Subject-Oriented: A data warehouse is subject-oriented database, which supports the business needs of individual departments in the enterprise (or) Middle management users.
Ex: Sales, H.R, Loans, Accounts ECT……………
4) Integrated: A Data warehouse is an integrated database, which collects the data from multiple operational sources and integrate the data in any format and delivers the data through out the enterprise at any speed.
DIFFERENCE BETWEEN OPERATIONAL DATABASE AND DATA WAREHOUSE
Operational database
1) It is a design to support business transactional processing.
2) Data is volatile
3) Current data
4) Detailed data
5) Design for running the business.
6) Supports normalization.
7) Designed clerical access.
DATA WAREHOUSE
1) It is designed to support decision making process.
2) Data is non-volatile.
3) Historical data.
4) Summary data.
5) Designed for analyzing the business.
6) Supports De-normalization.
7) Designed for managerial access.
Decision supporting system
Since a data warehouse is designed to support decision making process it is known as decision supporting system.
Historical database
Since a data warehouse contains historical business information it is known as historical database.
Read only database
Since the database is designed for only read the data for analysis and making decision but not for transactional processing hence it is known as read only database.
Data acquisition
It is a process of extracting the relevant business information, transforming the data into a required business format and loading into target database.
A data acquisition is defined with following process
- Data extraction
- Data transformation
- Data loading
ETL SYSTEM DESIGN
In a real world data moves to data warehouse in a two phases
- ETL stage1
- ETL stage2
DATA PROFILING
It is a process of studying and analyzing the source data. During data profiling we can identify the inconsistent data, duplicate records and records with null values.
ETL STAGE1
The following activities take place in ETL stage1
1) Defined the files as source.
2) Create a staging database.
3) Define a simple pass application
4) Perform data profiling a staging database.
ETL STAGE2
1) Define the staging database as a source.
2) Design the data warehouse with appropriate structures.
3) Define Extraction, Transformation and Loading.
4) Test ETL applications with the following types of tests.
a) Unit testing
b) Data validation testing
c) Performance testing
d) Peer review
e) UAT(user acceptance test)
Studying and analyzing the data is called as data profiling.
Stage1 is required only for understanding source data.
Data Acquisition –Data Extraction
It is a process of reading the data from various business systems like operational database, flat files. XML files, COBOL files, SAP, people soft, Seibel CRM etc……….
Data transformation
It is a process of clanging the data and converting the data into the required business format. The following are the different types of data transformation activities that take place in staging area.
1. Data merging
2. Data clanging
3. Data scrubbing
4. Data aggregation
Ø Staging area is a temporary memory where data transformation activities take place.
1. Data merging
It is a process of integrating the data from multiple source systems. There are two types of data merging activities.
1. Data join
2. Data union
- Data clanging
It is a process of changing the inconsistencies and inaccuracy in staging area.
|
- Data scrubbing:
It is a process of deriving new data definitions using Existing source definitions.
|
|
- Data aggregation:
It is a process of calculating the summaries in staging area. Few other aggregation function avg (), min (), max (), etc
Data loading:
It’s a process of inserting the data into a target system. There are two types of data loading.
1. Initial Load (or) full load
2. Incremental load or delta load
1. Initial load:
It’s a process of loading all required data into a target system at initial load.
2. Incremental load
It’s a process of considering only new records by omitting exiting records for data load is known as incremental load.
Data mart and types of data marts
àA data mart is a subject oriented database which supports the business needs of individual departments in the enterprise or middle management users.
A data mart is a subset of enterprise data warehouse.
Data marts are known as high performance query structures (HPQS).
There are two types of data marts
1) Dependent data mart
2) Independent data mart.
Top-down data warehousing approach:
According to w.h.inman first we need to build an enterprise data warehouse, and then from the enterprise data warehouse build a subject oriented database known as data marts
Bottom-up data warehousing approach:
According to Ralph Kimball first we need to develop a subject oriented database known as “data marts”, then conglomerate (integrate) the data marts into data warehouse.
Types of data marts:
1) Dependent data marts:
In a top-down data ware approach a data mart development is independent of enterprise data warehousing, such data marts are known as dependent data marts.
2) Independent data marts:
In bottom-up approach a data mart development is independent of edwh. Such data marts are called ‘independent data marts’.
A database is a collection of objects like tables, views, functions, triggers etc.
A view is a logical object which contains only structures but not data.
We can create a view based in a table.
A table is a collection of rows and columns which contains structure and data.
There is a dynamic reference relation between a table and a view i.e. we can access the data from the tables using views
Data warehouse-Database Design
A DWH and data mart is designed with following of schema.
1) Star schema database design
2) snow flake schema
3) Integrated schema (galaxy schema, constellation schema)
*à A Database architect is the person responsible for designing the structures for the data marts are DWH.
*à Business analyst is the person who gathers the requirements of the business.
*à ETL developers are responsible for loading the data in to the DWH and performing testing.
STAR SCHEMA:-
- A state schema is a database design which contains a centrally located FACT Table surrounded by multiple Dimension tables.
- Since the database design looks like a star, it is known as “star schema database design”.
- A state schema is a simple subject which supports the needs of a department in the enterprise.
- A FACT table contains a composite key where each candidate key is a foreign key to the dimension table.
- A FACT table contains facts. Facts are numeric.
- Not every numeric is a fact but numerics which are of type key performance indications are known as Facts.
- Facts are business measures used to evaluate the performance of an enterprise.
- A FACT table contains the facts at the lowest granularity.
- The level at which facts are stored in a FACT table is known as FACT granularity (or) Grain of FACT (or) FACT Event Level.
- A Dimension is a descriptive data about the major aspects of business analysis.
- A Dimension is used to describe the key performance indicators known as FACTS’.
- A Dimension provides answers to the following basic business questions.
Who? b) What? c) When? d) Where?
- 13) Dimension tables are denormalized.
*à A star schema contains only are fact table and are more dimension tables.
|
Snow Flaw Schema
Ø In this a denarmalized dimension table is spitted in to one or more tables results in normalization of dimension tables.
Ø If a dimension table contains more no of attributes with a huge amount of data then query performance may degrade and results disk space issues.
Ø Hence same times a denarmalized dimension table is spitted in to normalized tables to avoid the disk space issues.
*à This schema is not recommendable because the disk space issues may be solved but the query performance degrades as the number of flakes keeps as increases.
INTEGRATED SCHEMA
FACT CONSTELLATION (GALARY SCHEMA):-
àIt is a process of joining the FACT tables.
Confirmed Dimension:-
A Dimension table which can be shared by multiple FACT tables is known as “Confirmed Dimension”.
Fact less Fact Table
A FACT table without any facts in known as “fact less Fact Table”.
àAny two schemas can be integrated using joining FACT tables.
Dimensional Modeling
àIt is a Design methodology for designing data marts.
àA Dimensional modeling consists of three phases in designing data marts.
1) Conceptual Modeling.
2) Logical Modeling.
3) Physical Modeling.
Conceptual Modeling
à A data model should understand the business requirements clearly.
à Identify the entities.
à Identify the attributes.
à Identify the relationships between entities.
Logical Modeling
à Design the structures which are identified in a conceptual modeling.
à Establish the relationships between entities.
Physical Modeling:
Ø Execute the structures in the database to state the data physically.
à ERWIM is a data modeling tool used by Database Architects
OLAP (online analytical processing)
An OLAP is a set of speciation which allows the client applications (reports) in retrieving the data from data bases.
The following are the types of OLAPs.
1) DOLAP (Desktop OLAP)
2) ROLAP (Relational OLAP)
3) MOLAP (Multidensional OLAP)
4) HOLAP (Hybrid OLAP).
DOALP:
An OLAP which can Query the data from File oriented data bases (Success, Excel, FoxPro, and DBase) are known as Desktop OLAP.
ROLAP:
An OLAP which can Query the data from relational databases known as ROLAP.
MOLAP:
An olap which can query the data from multidimensional database (cube) is called as MOLAP
EX: - Hyperion Ess Base, Cognious 8.3, Boxi R3.
HOLAP:
An OLAP which uses the combine properties of ROLAP & MOLAP is known as Hybrid OLAP.
EX: - Cogniuors 8.3, Boxi R3.
Slowly Changing Dimensions
It captures the changes which takes place over the period of time. There are three types of slowly changing dimensions. They are
TYPE 1 SCD: - A type 1 dimension keeps the most recent values in the target. It does not maintain the history.
TYPE 2 SCD: - A type 2 Dimension keeps the full history in the target. For each update it inserts a new record in the target.
TYPE 3 SCD: - A type 3 Dimension keeps current and previous information in the target (partial history).
JUNK DIMENTIONS:-
A dimension which is defined with the type Boolean, Flag are known as JUNK Dimensions which are not participated in analyzing key performance indicates.
|
| ||||
Dirty Dimensions
In a dimension table if a record exist more than once with a non key attribute difference such dimensions are known as “Dirty Dimensions”.
Ex: - Type 2 Dimension (SCD).
ODS – (Operational Data Store)
An ODS is an integrated view of operational data sources designed to support operational monitoring.
ODS
1) Purpose: It is designed to support operational monitoring.
2) Similarities: Integrated data base.
3) Differences: Data is volatile Current data
Detailed data.
DWH
1. Purpose: It is designed to support decision making process
2. Similarities: Integrated data base
3. Differences: Data is Non volatile
Historical Data
Summary Data
DWH Life Cycle - Phases
In DWH life cycle implementation the following phase an participated in building a system for analysis.
1) Business requirements Collection (Business Analylist)
2) Data base designing (Data base architect).
3) ETL Development phase (ETL Team).
4) Report development phase (Business Intelligence team).
5) Test phase (ETL Team, BI Team).
6) Deployment.
7) Maintenance (Production team).
Calculate of fact table size:
A fact table size is calculated based a lowest level granularity.
Ex: 5 years data for 6000 productions in 3000 stores.
Fact table size = 5 * 365 * 6000 * 3000 bytes.
12 comments:
Nice article, users are attracted when they see your post thanks for posting keep updating Informatica Online Training
This is a very good content I read this blog, please share more content on
msbi online training hyderabad
I am very exiting after using this site.DATA WAREHOUSE
This is very nice post i m very like it and i appreciate you for good work keep it up it is very useful for me
JAVA Training in Chennai |
JAVA Course in Chennai |
Best JAVA Training in Chennai
just love the good work you are doing.
SAS Course in Chennai |
SAS Training Institutes in Chennai |
SAS Institute in Chennai
aşk kitapları
tiktok takipçi satın al
instagram beğeni satın al
youtube abone satın al
twitter takipçi satın al
tiktok beğeni satın al
tiktok izlenme satın al
twitter takipçi satın al
tiktok takipçi satın al
youtube abone satın al
tiktok beğeni satın al
instagram beğeni satın al
trend topic satın al
trend topic satın al
youtube abone satın al
takipçi satın al
beğeni satın al
tiktok izlenme satın al
sms onay
youtube izlenme satın al
tiktok beğeni satın al
sms onay
sms onay
perde modelleri
instagram takipçi satın al
takipçi satın al
tiktok jeton hilesi
instagram takipçi satın al pubg uc satın al
sultanbet
marsbahis
betboo
betboo
betboo
smm panel
smm panel
isilanlariblog.com
İnstagram Takipçi Satın Al
hirdavatciburada.com
https://www.beyazesyateknikservisi.com.tr/
Servis
tiktok jeton hilesi
en son çıkan perde modelleri
lisans satın al
yurtdışı kargo
nft nasıl alınır
özel ambulans
en son çıkan perde modelleri
minecraft premium
uc satın al
Good text Write good content success. Thank you
kibris bahis siteleri
bonus veren siteler
poker siteleri
betmatik
slot siteleri
betpark
kralbet
mobil ödeme bahis
kocaeli
konya
kuşadası
kütahya
malatya
RO3MH
manisa
maraş
mardin
marmaris
mersin
UHYYT
beşiktaş
eyüpsultan
gebze
kaş
tekirdağ
QPB5QU
Post a Comment