Sunday, January 06, 2008

MSBI and T-SQL puzzle

先建dw,并etl,然后建立analysis service db(add a cube and dimensions),并定期抽取数据

then define relationship between attributes, and then measures ane measure groups

Should use view to allow app to access DW, allow easy modification of structure, risk-free of future changes will break apps

SQL Profiler can monitor
all statements or a subset of the statements sent to SQL Server

Leave data that can not be validated, otherwise user won't trust DW later

Fuzzy Grouping transform: 发现很小的两行间区别并合并
Unpivot transfomr:旋转并加一列column?

Build dimension table first

Fact table usually do not need PK (as they are often summarized). It is biggest table thus column size must be well decided.

the fact table consists of a column containing the surrogate key for each dimension related to the fact table, as well as columns for the measures that we will be tracking. For the Shipments fact table, we need to add the ProductKey, SalesTerritoryKey, PlantKey, and ShipMethodKey columns

Referentia Integrity is not needed in DW if Surrogate key works
自动找出维表/事实表
UDM统一维度模型:扩展传统olap(否则要relational query):are not restricted to a predefined set of hierarchies for querying the cube. Instead, they can use any descriptive attribute on a dimension to analyze information

事实表中某些列(不能聚集的数据则不能做measure)叫做degenerate dimension/fact dimension

Surrogate key is called Identity columns in SQL2005. Dimension tables should have this

Business key like Shipmethodcode can be as short as possible

clustered index use business key to physically sort the index

For different length of text columns: user-defined types feature to create special data types for common categories of columns, such as ShortDesc and LongDesc

configure a Lookup transform to translate business keys into surrogate

KPI mentioned in MS UDM: Each KPI in the UDM defines up to four expressions for some performance metric (Sales level, for example):


The actual value.
The goal value.
The status. A normalized value between -1 and 1 that provides the status of actual vs. goal (-1 is 'very bad,' 1 is 'very good').
The trend. A normalized value between -1 and 1 that provides the trend over time (-1 is 'getting a lot worse,' 1 is 'getting a lot better').

If user need time inf: Rather than add time to TimeDimension The best way to support this is to leave the Time dimension at the day level (probably renamed Date for clarity) and create a separate dimension for Time Of Day.

Simple recovery is used in DW instead of Full in OLTP (only recovers to last full backup)

Versioning: change column width, load process could damage DW quality

A result of reloading the dimensions is the surrogate keys can change, invalidating the facts previously loaded. For this example, we will reload all the sales for each company so the new surrogate keys will be reflected in the fact table. This works if the volume of data is small enough that the processing fits within your time constraints. We show you how to properly update existing dimensions and incrementally load facts in Chapter 8, "Managing Changing Data."


********Name convention only***

Tables. Because we will be using views to provide access to the data, we have named the table with a prefix that describes the type (such as DimShipMethod or FactShipping) and the corresponding view with the real dimension or fact name (such as ShipMethod or Shipping).

Name style. We have capitalized each word in a column or table name, rather than using spaces or other separators such as underscores (so, ShipMethod rather than ship_method or ship method).

Surrogate key columns. We have used <>Key as the name, such as ShipMethodKey.

Business key columns. We have used <>BusinessKey as the name, such as ShipMethodBusinessKey. We apply this consistently, even when there is another obvious candidate such as ShipMethodCode or CustomerNumber.

Primary and unique key constraints. We have used <>_PK for primary keys, and <>_AK for unique keys such as the business key (the A in AK stands for alternate key).

One entity in dw must have one flow of data. when a Data Flow task is executed in a control flow
The business key is required to be unique?

you can create a new Analysis Services project with no data source, and then when adding dimensions, select the "Build without using a data source" option. After you have added attributes and hierarchies to the dimensions, you can select Generate Relational Schema from the Database menu to create the database.

Calculated measures命名可以包含空格。The formula to express the average days late is to divide the total number of days late by the number of shipments in question:
[Measures].[Days Late]/[Measures].[Shipments Count

无用的physical measure可以设visible属性

角色修改要在BIDS里,否则发布的时候被覆盖:You must either add the roles to the project in BI Development Studio so that they always exist, or deploy the project using the Deployment Wizard with either the "Deploy roles and retain members" or the "Retain roles and members" option selected. If you select "Deploy roles and members" in the wizard, any roles that you manually created using SQL Server Management Studio are removed.


>>>
Puzzle 1: Medication Tablets
Imagine that you've been diagnosed with a rare disease. Your physician prescribes two medications for youcall them A and B. Each bottle contains three tablets, and both medication tablets have exactly the same size, shape, color, and smell. Each bottle is marked with the medication type, one bottle with A and the other B, but the pills themselves aren't marked. You're instructed to take one A tablet and one B tablet every day for three days. Following the instructions correctly will cure you completely, but following the instructions incorrectly will result in sudden death.

The first day, you take one A tablet and one B tablet. The next day, you discover that someone has tampered with your bottles: bottle B is empty, bottle A contains one tablet, and three tablets lie on the counter. You realize that one A tablet and two B tablets are now mixed on the table, but you can't tell which is which. You call your pharmacy and learn that the tablets are out of stock until tomorrow. How can you continue following the instructions correctly and be cured?

(提示:你要吃的是,每天吃药的一半)

Puzzle 2: Chocolate Bar
Imagine you have a chocolate bar made up of 40 squares arranged in 5 rows and 8 columns. Your task is to divide it into the 40 individual chocolate squares using the minimum number of cuts. You're allowed to make only one cut at a time (and you're not allowed to pile multiple layers or lay them next to each other) and only in straight lines (horizontal or vertical). How many cuts do you need at minimum? Prove your logic; don't just guess.

Puzzle 3: To a T
Copy the shapes in Figure A-1 to a piece of paper that has square gridlines, and then cut the shapes out. Use the pieces to form a T shape with the proportions that Figure A-2 shows. You might have to think outside the box for this one!

0 Comments:

Post a Comment

<< Home