Welcome to Data Warehousing Flying Bee

Monday, 4 June 2012

Data Warehouse


DATA WAREHOUSE---RALPH KIMBALL

Þ     A data warehouse is a database, which is specifically designed for analyzing the business. But not for a business transactional processing.

Þ     A data warehouse is designed to support decision making process.

Þ     A Data warehouse is design to provide answers to the following business questions.

Þ     Ex: Sales Analysis

Þ     The Following Questions can be answered when performing sales analysis
Þ     To whom you sold?
Þ     What you sold?
Þ     When you sold?
Þ     Where you sold?

DATA WAREHOUSE—D.H INMAN
A data warehouse is a
1) Time-variant
2) Nonvolatile
3) Subject-Oriented
4) Integrated

1)      Time-variant:  A data warehouse is a Time-variant database, which business users in analyzing the business with different time periods, compare the data and identify the trends in the business.
2)      Non-volatile: A data warehouse is non-volatile database, once the data enter into the data warehouse it does not effects to the changes which takes place at operational database. Hence the data is static.
3)      Subject-Oriented: A data warehouse is subject-oriented database, which supports the business needs of individual departments in the enterprise (or) Middle management users.
            Ex: Sales, H.R, Loans, Accounts ECT……………

4)      Integrated: A Data warehouse is an integrated database, which collects the data from multiple operational sources and integrate the data in any format and delivers the data through out the enterprise at any speed.






 
                                                                            Can: DWH
                                                                                               

               



DIFFERENCE BETWEEN OPERATIONAL DATABASE AND DATA WAREHOUSE
                                                                                                                                                                             
Operational database
1)      It is a design to support business transactional processing.
2)      Data is volatile
3)      Current data
4)      Detailed data
5)      Design for running the business.
6)      Supports normalization.
7)      Designed clerical access.

DATA WAREHOUSE
1)      It is designed to support decision making process.
2)      Data is non-volatile.
3)      Historical data.
4)      Summary data.
5)      Designed for analyzing the business.
6)      Supports De-normalization.
7)      Designed for managerial access.

Decision supporting system
Since a data warehouse is designed to support decision making process it is known as decision supporting system.

Historical database
Since a data warehouse contains historical business information it is known as historical database.

Read only database
Since the database is designed for only read the data for analysis and making decision but not for transactional processing hence it is known as read only database.
                                                                                                                                                           
Data acquisition
It is a process of extracting the relevant business information, transforming the data into a required business format and loading into target database.

A data acquisition is defined with following process                                                                                        

  1. Data extraction
  2. Data transformation
  3. Data loading

ETL SYSTEM DESIGN
In a real world data moves to data warehouse in a two phases

  1. ETL stage1
  2. ETL stage2



DATA PROFILING
It is a process of studying and analyzing the source data. During data profiling we can identify the inconsistent data, duplicate records and records with null values.
                                                                      
ETL STAGE1
The following activities take place in ETL stage1

1)       Defined the files as source.
2)       Create a staging database.
3)       Define a simple pass application
4)       Perform data profiling a staging database.

ETL STAGE2
1)       Define the staging database as a source.
2)       Design the data warehouse with appropriate structures.
3)       Define Extraction, Transformation and Loading.
4)       Test ETL applications with the following types of tests.

a)       Unit testing
b)       Data validation testing
c)       Performance testing
d)       Peer review
e)       UAT(user acceptance test)

Studying and analyzing the data is called as data profiling.

Stage1 is required only for understanding source data.


Data Acquisition –Data Extraction
It is a process of reading the data from various business systems like operational database, flat files. XML files, COBOL files, SAP, people soft, Seibel CRM etc……….

Data transformation
It is a process of clanging the data and converting the data into the required business format. The following are the different types of data transformation activities that take place in staging area.

1.       Data merging
2.       Data clanging
3.       Data scrubbing
4.       Data aggregation

Ø  Staging area is a temporary memory where data transformation activities take place.

1.       Data merging
It is a process of integrating the data from multiple source systems. There are two types of data merging activities.

1.       Data join
2.       Data union



  1. Data clanging
It is a process of changing the inconsistencies and inaccuracy in staging area.





Target Sys
 

  1. Data scrubbing:
It is a process of deriving new data definitions using Existing source definitions.




Staging Area
 
 
DWH
 


  1. Data aggregation:

It is a process of calculating the summaries in staging area. Few other aggregation function avg (), min (), max (), etc
                                                        

Data loading:

It’s a process of inserting the data into a target system. There are two types of data loading.

1.       Initial Load (or) full load                  
2.       Incremental load or delta load
                                                                          
1. Initial load:
It’s a process of loading all required data into a target system at initial load.

2. Incremental load
It’s a process of considering only new records by omitting exiting records for data load is known as incremental load.
                                                                                                                                                                

               
Data mart and types of data marts

àA data mart is a subject oriented database which supports the business needs of individual departments in the enterprise or middle management users.

A data mart is a subset of enterprise data warehouse.

Data marts are known as high performance query structures (HPQS).

There are two types of data marts
1)       Dependent data mart
2)       Independent data mart.

Top-down data warehousing approach:                                                                                                               

According to w.h.inman first we need to build an enterprise data warehouse, and then from the enterprise data warehouse build a subject oriented database known as data marts






Can: DM


 


Can: DM                                                                                 
                                               
                                                                                                                                                                            
                                                                                                                                      
Can: DM                                                                                                                                      


Bottom-up data warehousing approach:

According to Ralph Kimball first we need to develop a subject oriented database known as “data marts”, then conglomerate (integrate) the data marts into data warehouse.





Can: DM



 


                                                                                                                                                                                               
                               
Can: DMCan: DM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               


Types of data marts:

1)       Dependent data marts:
In a top-down data ware approach a data mart development is independent of enterprise data warehousing, such data marts are known as dependent data marts.
                                                               
2)      Independent data marts:
In bottom-up approach a data mart development is independent of edwh. Such data marts are called ‘independent data marts’.
                                                                                                                                     

A database is a collection of objects like tables, views, functions, triggers etc.

A view is a logical object which contains only structures but not data.

We can create a view based in a table.

A table is a collection of rows and columns which contains structure and data.

There is a dynamic reference relation between a table and a view i.e. we can access the data from the tables using views


Data warehouse-Database Design

                A DWH and data mart is designed with following of schema.

                1) Star schema database design
                2) snow flake schema
                3) Integrated schema (galaxy schema, constellation schema)


*à A Database architect is the person responsible for designing the structures for the data marts are DWH.

*à Business analyst is the person who gathers the requirements of the business.

*à ETL developers are responsible for loading the data in to the DWH and performing testing.

STAR SCHEMA:-

  1. A state schema is a database design which contains a centrally located FACT Table surrounded by multiple Dimension tables.
  2. Since the database design looks like a star, it is known as “star schema database design”.
  3. A state schema is a simple subject which supports the needs of a department in the enterprise.
  4. A FACT table contains a composite key where each candidate key is a foreign key to the dimension table.
  5. A FACT table contains facts. Facts are numeric.
  6. Not every numeric is a fact but numerics which are of type key performance indications are known as Facts.
  7. Facts are business measures used to evaluate the performance of an enterprise.
  8. A FACT table contains the facts at the lowest granularity.
  9. The level at which facts are stored in a FACT table is known as FACT granularity (or) Grain of FACT (or) FACT Event Level.
  10. A Dimension is a descriptive data about the major aspects of business analysis.
  11. A Dimension is used to describe the key performance indicators known as FACTS’.
  12. A Dimension provides answers to the following basic business questions.
Who? b) What?  c) When?  d) Where?
  1. 13) Dimension tables are denormalized.


                                                                                                                  




*à A star schema contains only are fact table and are more dimension tables.


Sales Fact
 




Snow Flaw Schema

Ø  In this a denarmalized dimension table is spitted in to one or more tables results in normalization of dimension tables.
Ø  If a dimension table contains more no of attributes with a huge amount of data then query performance may degrade and results disk space issues.
Ø  Hence same times a denarmalized dimension table is spitted in to normalized tables to avoid the disk space issues.


                                                                                                            
                                                                                                                      
  
*à This schema is not recommendable because the disk space issues may be solved but the query performance degrades as the number of flakes keeps as increases.
                                           



                                                                 


INTEGRATED SCHEMA

FACT CONSTELLATION (GALARY SCHEMA):-
 
àIt is a process of joining the FACT tables.

Confirmed Dimension:-

A Dimension table which can be shared by multiple FACT tables is known as “Confirmed Dimension”.

Fact less Fact Table

A FACT table without any facts in known as “fact less Fact Table”.

àAny two schemas can be integrated using joining FACT tables.                                                     






                                                                                                                                           



                                                                                                                                         








Dimensional Modeling

àIt is a Design methodology for designing data marts.

àA Dimensional modeling consists of three phases in designing data marts.
                       
1)      Conceptual Modeling.
2)      Logical Modeling.
3)      Physical Modeling.

Conceptual Modeling

à  A data model should understand the business requirements clearly.
à  Identify the entities.
à  Identify the attributes.
à  Identify the relationships between entities.

Logical Modeling

à  Design the structures which are identified in a conceptual modeling.
à  Establish the relationships between entities.

Physical Modeling:
                                                                                                                                                                                                               
Ø  Execute the structures in the database to state the data physically.


à ERWIM is a data modeling tool used by Database Architects



OLAP (online analytical processing)

An OLAP is a set of speciation which allows the client applications (reports) in retrieving the data from data bases.

The following are the types of OLAPs.

1)      DOLAP (Desktop OLAP)
2)      ROLAP (Relational OLAP)
3)      MOLAP (Multidensional OLAP)
4)      HOLAP (Hybrid OLAP).
                                                                                                          
DOALP:                                                                                                        

An OLAP which can Query the data from File oriented data bases (Success, Excel, FoxPro, and DBase) are known as Desktop OLAP.

ROLAP:

An OLAP which can Query the data from relational databases known as ROLAP.

MOLAP:                            
An olap which can query the data from multidimensional database (cube) is called as MOLAP

               EX: - Hyperion Ess Base, Cognious 8.3, Boxi R3.

HOLAP:
An OLAP which uses the combine properties of ROLAP & MOLAP is known as Hybrid OLAP.

                EX: -   Cogniuors 8.3, Boxi R3.


                                                                                                               
Slowly Changing Dimensions

                It captures the changes which takes place over the period of time. There are three types of slowly changing dimensions. They are

*      TYPE 1 SCD: - A type 1 dimension keeps the most recent values in the target. It does not maintain the history.
*      TYPE 2 SCD: - A type 2 Dimension keeps the full history in the target. For each update it inserts a new record in the target.
*      TYPE 3 SCD: - A type 3 Dimension keeps current and previous information in the target (partial history).

JUNK DIMENTIONS:-

                A dimension which is defined with the type Boolean, Flag are known as JUNK Dimensions which are not participated in analyzing key performance indicates.


 
                                EX:-






Boolean           flag
Yes, true          1
No, false          0
 


 










Dirty Dimensions

In a dimension table if a record exist more than once with a non key attribute difference such dimensions are known as “Dirty Dimensions”.

Ex: - Type 2 Dimension (SCD).



ODS – (Operational Data Store)

                An ODS is an integrated view of operational data sources designed to support operational monitoring.


ODS                                                                                                                                            
1) Purpose:  It is designed to support operational monitoring.  
2) Similarities: Integrated data base.                                                                                                                      
3) Differences: Data is volatile         Current data
Detailed data.

DWH
1. Purpose: It is designed to support decision making process
2. Similarities: Integrated data base
3. Differences: Data is Non volatile
Historical Data
Summary Data



DWH Life Cycle - Phases

                In DWH life cycle implementation the following phase an participated in building a system for analysis.

1)      Business requirements Collection (Business Analylist)
2)      Data base designing (Data base architect).
3)      ETL Development phase (ETL Team).
4)      Report development phase (Business Intelligence team).
5)      Test phase (ETL Team, BI Team).
6)      Deployment.
7)      Maintenance (Production team).

Calculate of fact table size:

                A fact table size is calculated based a lowest level granularity.

Ex: 5 years data for 6000 productions in 3000 stores.

                Fact table size = 5 * 365 * 6000 * 3000 bytes.

12 comments:

Unknown said...

Nice article, users are attracted when they see your post thanks for posting keep updating Informatica Online Training

soumya said...

This is a very good content I read this blog, please share more content on
msbi online training hyderabad

Training for IT and Software Courses said...

I am very exiting after using this site.DATA WAREHOUSE

mathimathi said...

This is very nice post i m very like it and i appreciate you for good work keep it up it is very useful for me
JAVA Training in Chennai |
JAVA Course in Chennai |
Best JAVA Training in Chennai

mathimathi said...

just love the good work you are doing.
SAS Course in Chennai |
SAS Training Institutes in Chennai |
SAS Institute in Chennai


Unknown said...

aşk kitapları
tiktok takipçi satın al
instagram beğeni satın al
youtube abone satın al
twitter takipçi satın al
tiktok beğeni satın al
tiktok izlenme satın al
twitter takipçi satın al
tiktok takipçi satın al
youtube abone satın al
tiktok beğeni satın al
instagram beğeni satın al
trend topic satın al
trend topic satın al
youtube abone satın al
takipçi satın al
beğeni satın al
tiktok izlenme satın al
sms onay
youtube izlenme satın al
tiktok beğeni satın al
sms onay
sms onay
perde modelleri
instagram takipçi satın al
takipçi satın al
tiktok jeton hilesi
instagram takipçi satın al pubg uc satın al
sultanbet
marsbahis
betboo
betboo
betboo

Anonymous said...

smm panel
smm panel
isilanlariblog.com
İnstagram Takipçi Satın Al
hirdavatciburada.com
https://www.beyazesyateknikservisi.com.tr/
Servis
tiktok jeton hilesi

Anonymous said...

en son çıkan perde modelleri
lisans satın al
yurtdışı kargo
nft nasıl alınır
özel ambulans
en son çıkan perde modelleri
minecraft premium
uc satın al

tipobet said...

Good text Write good content success. Thank you
kibris bahis siteleri
bonus veren siteler
poker siteleri
betmatik
slot siteleri
betpark
kralbet
mobil ödeme bahis

sadık said...

kocaeli
konya
kuşadası
kütahya
malatya
RO3MH

tarık said...

manisa
maraş
mardin
marmaris
mersin
UHYYT

zeynep said...

beşiktaş
eyüpsultan
gebze
kaş
tekirdağ

QPB5QU

Post a Comment

 
Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes | Justin Bieber, Gold Price in India
Flying Bee Fish swimming from right to left Bat flying really fast