70-448 (070-448) Exam prep, Part 3/3 - SSAS

SSAS

 

Analysis Services is the most dominant technology on the 70-448 exam. About 40% of the questions will pertain to SSAS.

 

Installation/Configuration

 

Moving the data directory for SSAS:

http://blogs.msdn.com/b/karang/archive/2010/01/19/moving-data-directory-of-analysis-services-sql-2005-2008.aspx

  1. Change the DataDir location

  2. Stop SSAS Service

  3. Move the data to the new specified location

  4. Start SSAS Service

 

 

 

Algorithms

Text Mining: Text analysis, can be done in SSIS tasks:

Term Extraction Transformation – extract important terms from text

Term Lookup Transformation – apply a dictionary to text to look for specific words

 Term and frequency are derived

 

SSAS algorithms + parameters

Association Rules: Market Basket Analysis, defines an itemset as a combination of items

MINIMUM_PROBABILITY (higher probability = higher confidence)

 MINIMUM_SUPPORT – minimum cases that must contain the itemset before the rule is generated.

Clustering: Groups cases with similar characteristics together. Often used for clickstream analysis (web sites) but can be used other ways

CLUSTER_COUNT - # of clusters

 CLUSTERING_METHOD – scalable/non-scalable EM or K-Means; default is scalable EM

MODELLING_CARDINALITY - # of sample models constructed during modeling process (Default = 10)

Decision Trees: Predicts both kinds of variables

MINIMUM_SUPPORT – High support = fewer splits, shallower tree

COMPLEXITY_PENALTY – Higher = fewer splits, shallower tree

SPLIT_METHOD – Force binary splits or complete splits

Linear regression: continuous variables only – regression tree with no splits

Logistic regression: neural network with no hidden layers

Naïve Bayes: Discrete attributes only; calculates probabilities for each state of an input attribute

MAXIMUM_STATES – more states = discretize values

Neural Network: search for nonlinear functional dependencies, perform nonlinear transformations

 HIDDEN_NODE_RATIO - # of nodes in hidden layer. 0 = no layers = logistic regression model

Sequence Clustering: groups based on model rather than similarity using Markov Chains

CLUSTER_COUNT – approx # clusters

Time series: forecasting continuous variables

PERIODICITY_HINT – Tells algorithm to expect a specific time interval

FORECAST_METHOD – Training method (MIXED is default)

PREDICTION_SMOOTHING – How to mix ARTXP and ART

 

 

Dimension data security: database or cube level

Pessimistic: deny everything except allowed set

Optimistic: allow everything except denied set

MDX can be used to build allowed/denied sets

Cell security can be used for extremely granular control

 Read

 Read-Contingent: Determines if cells derived from restricted cells are permitted.

 Read/Write (in case of writeback); measure group partition must be write-enabled

 

SSAS Logging

SSAS uses query logs to log statistical information about queries (usage-based analysis)

Configuration is done via SSMS by connecting to the SSAS instance.

http://technet.microsoft.com/en-us/library/cc917676.aspx

 

Query log: captures query activity from users; Properties:

CreateQueryLogTable (creates it if it doesn’t already exist)

QueryLogConnectionString

QueryLogSampling – frequency (Default is every 10th)

QueryLogTableName (Default is OlapQueryLog)

 

Usage-Based optimization wizard: tunes aggregations based on actual user queries; requires query log to be active

 

SSAS Deployments

BIDS – local server

For testing/prod deployment, no needs to retain security settings

Bids: connected mode – connected to SSAS DB; project mode – changes saved locally as XML files, disconnected from server. Changes must be deployed to server.

Deployment options:

Processing Option

Default: minimum processing to bring cube to ready state

Full

Do not process: changes only

Transactional Deployment

False: each command in script is independent

True: One transaction

Deployment Mode

Deploy All: Overwrite target DB

Deploy Changes Only

Server

Database

Build files:

ProjectName.asdatabase – script definition of project objects

….configsettings – environmental settings, data source connections

….deploymentoptions – from above

….deploymenttargets – target DB and server

 

Deployment Wizard

Interactive mode – generates XMLA script

Command line mode – automation of wizard

Options for Partitions and roles – keep or overwrite security on target server

 

XMLA script – scheduling (script.XMLA)

Synchronize database wizard – synch two cubes (IE staging & Prod)

XMLA command “Synchronize”

Windows account target SSAS server is running under must have admin rights to the source database

Target server executes the synch under that account

 Useful for processing a cube on one server, then synching with multiple prod servers

 

 

Measures

 

Enterprise + Dev SSAS include functions for handling and aggregating semiadditive measures

http://msdn.microsoft.com/en-us/library/ms175356.aspx

 

Average Of Children

ByAccount

Count

Distinct Count

First Child – along time dimension

FirstNonEmpty – along time dimension

LastChild – along time dimension

LastNonEmpty – along time dimension

Max, Min, None, Sum

 

Nonadditive measures cannot be aggregated (rates, %s) – need to use MDX

 

 

 

 

 

Measure Groups

Relating dimensions to measure groups:

Regular – dimension measure group

Fact – Dimension is based on fact table

Referenced – dimension related to measure group through intermediate dimension

Many to many – same as above, multiple

Data mining – data mining model measure group

 

Aggregations: summarized value of all measures in a measure group

Aggregation wizard/BIDS:

AggregationUsage setting defined:

FULL: every aggregation in cube must use this attribute or related lower level attribute

NONE: don’t use this in aggregations – for data like phone numbers

Unrestricted: Design wizard decides

Default: ALL, dimension key, and user hierarchy attributed becomes unrestricted; Many—Many, Referenced, and data mining dimensions become NONE

 

Wizard does cost/benefit analysis and favors smaller aggregations using EstimatedCount property for members and EstimatedRows for fact tables – Use the Specify Object Counts page to keep this up to date.

 

Aggregation options

Estimated storage reach: limit number of aggregations by space consumed

Performance gain reaches: target percentage gain

I click Stop

Do not design aggregations

 

Dimension Attribute Hierarchies

“ALL” level is an optional, system-generated level – one member whose value is the aggregation of all values of members in the immediately subordinate level. It is not contained in the dimension table. Its value is considered the aggregation of all members of the hierarchy.

 

Whichever member at the highest point in the hierarchy that has IsAggregatable set to TRUE establishes the ALL level.

 

Good reading:

http://msdn.microsoft.com/en-us/library/ms166717(v=sql.90).aspx

 

 

 

 

 

 

 

 

SSAS Storage Modes, Processing

Multidimensional OLAP (MOLAP) – High query response time; data cached in the cube when processed in addition to the data sources; high latency because refreshes of this data are needed

Relational OLAP (ROLAP) – Queries slower because data is always retrieved from DB

Hybrid OLAP (HOLAP) – aggregation data stored locally in cube cache, fact table data in DB

If aggregations can satisfy requirements, HOLAP is as fast as MOLAP.

 

Proactive caching: Allows more granular control of cube refresh frequency; Solves data latency issues (time it takes to refresh cube from underlying data sources)

 most useful when data source changes randomly (OLTP)

 for cubes that have data sources as scheduled ETL, proactive caching may not be necessary

 

Without proactive caching, SSAS object must be explicitly processed to being the data in the cache up to date with the relational data sources.

Processing options:

Default: minimum tasks; depending on the situation, “Default” with become Full or something else

Full: Drop object stores and rebuild

Update: applies insert/del/upd without invalidating cube. Dimension only.

Add – Dimension, partition only

Process Data – Load object,but don’t process aggregations/indexes – Dimension, cube, measure group, partition

Process Index – only process aggregations/indexes - Dimension, cube, measure group, partition

Unprocess – delete object data

Process structure – Delete partition data, apply process default – Cube only

 

Only Dimensions, partitions, and mining structures store data. The rest of objects (measure groups, partitions) are containers only.

 

Processing objects in BIDS: solution explorer process

Interdependencies will affect which objects need to be processed

Impact Analysis screen in BIDS will show processing requirements

Process order allows granular control over both order and parallelism

Writeback table option: users can make data changes (UPDATE CUBE)

 

SSAS processing: NULLs in key columns are converted to 0 or empty string during processing

 Must have unknown member enabled (UnknownMember property cannot be NONE, must be VISIBLE or HIDDEN)

 Default Label is UNKNOWN but can be changed with UnknownMemberName property

 

Script processing: TYPE element specifies the processing method (ProcessFull, Etc.)

 

 

Analysis Services Processing in SSIS: Processing settings as a task inside a package

 

 

Database dimension properties:

http://msdn.microsoft.com/en-us/library/ms174546(v=sql.90).aspx

 

Steady State: no changes to relational data. Server listens for data change notifications:

  1. SQL Server trace

  2. Client initiation

  3. Scheduled polling

 

Unsteady State: Data has changed in source. Silence interval stopwatch counts the time between changes. Latency stopwatch defines maximum time allowed between cache refresh.

 

Pre-defined proactive caching modes:

Real-Time ROLAP: internal cache only

Real-Time HOLAP: partition data in RDB, aggregations in cube refreshed on data change notifications.

Low-latency MOLAP: cache expires in 30 mins

Medium-Latency MOLAP: 4 hours

Automatic MOLAP: silence interval of 10 seconds; if no silence, 10 minutes is the maximum wait time

Scheduled MOLAP: process partition daily

MOLAP: no proactive caching

 

Granular proactive caching/advanced settings

 Latency/Performance compromise

Silence Interval: Minimum wait period for quiet time in RDB before processing occurs; default = -1 (ignore notifications)

Silence Override Interval: In case of perpetual activity, processing will occur after this threshold is reached; default = -1 (not enabled)

Latency: life of cache; default = -1 (never revert to ROLAP)

Rebuild interval – period at which cache will be rebuilt (scheduled); default = -1 (none)

Bring online immediately: when ON, ROLAP is used when cache is being rebuilt; when ONCACHECOMPLETE, server never uses ROLAP but latency is infinite

Enable ROLAP aggregations – in ROLAP mode, tries to use SQL views for aggregations

Apply Settings to Dimensions – Applies storage mode & Proactive caching settings to associated dimensions

 

 

MDX – Multidimensional Expressions

 

http://msdn.microsoft.com/en-us/library/ms144785.aspx

http://msdn.microsoft.com/en-us/library/ms144884.aspx

 

Tuples: Multidimensional coordinate that identifies a single cell in the cube – an intersection of all axes, up to 128 (0 through 127)

 

COLUMNS, ROWS, PAGES, SECTIONS, CHAPTERS are names for first five axes

Can use 0, 1, 2, 3, 4, 5…

 

 

[Product].[Product].[Laptop], [Date].[Year].[2012], [Measure].[Internet Sales]

Unique Keys can be used as shorthand

[Product].[Product].&[7]

Where the product key for laptop is 7

 

MDX Set: collection of tuples with same dimensionality or attribute hierarchies

 

MDX functions for navigating the cube space

 

CurrentMember (Default property, usually omitted): [Order Date].[Calendar].CurrentMember, [Measures].[Reseller Amount]

Refer to the reseller amount for the member of the calendar hierarchy that references the current cell

PrevMember: Previous Member within same hierarchy (if current cell = 2003, returns 2002)

Members: returns all members of the hierarchy (Except ALL)

Children: returns all members underneath current

[Date].[Calendar].[Calendar Year].[2003].Children at quarterly hierarchy returns all quarters of year 2003

Parent: returns member’s parent

[…].[Road Bike] returns Bikes

 

 

 

Time-specific MDX functions

Lag: member at given offset from current member

[Date].[Calendar].Lag(1) returns previous member (Lag(1) = PrevMember)

ParallelPeriod: member from prior period in same relative position

PeriodsToDate – sibling members on same level as given member, starting with sibling #1 and ending with given member

Select [Measures].[Internet Sales Amount] on 0,

PeriodsToDate([Date].[Calendar].[Calendar Year],

[Date].[Calendar].[Month Name].&[10]&[2003]) on 1

Returns internet sales for first 10 months of 2003

 

 

 

http://www.mdxpert.com/Functions/MDXFunction.aspx?f=30

 




Category:

Time

Description:

The OpeningPeriod function returns the first sibling from the descendants of a supplied member at a supplied level.

Syntax:

OPENINGPERIOD( [«Level»[, «Member»] ] )

 



Category:

Time

Description:

The LastPeriods function returns the supplied number of members up to and including the supplied member.

Syntax:

LASTPERIODS( «Index»[, «Member»] )

 

 

Calculated members: Metadata-only definition of existing measures

CREATE MEMBER MYCUBE.[Measures].[Sales Figure]

AS [Measures…].[….] + [Measures….].[…],

FORMAT_STRING = “Currency”, VISIBLE = 1;

 

MDX Named sets: returns a set of dimension members

CREATE SET MYCUBE.[Top 50 Customers]

AS TopCount( EXISTING… );

EXISTING: reevaluate in current scope

Static, Dynamic Named Sets

 

 

KPIs and MDX (Multidimensional Expressions)

 

KPI values: Goal, Status, Trend, Value

Value: Current value, usually mapped to a measure

Goal: fixed number or calculated measure

Example from text:

[Date].[Calendar].PrevMember, [Measures].[Reseller Amount]

[Date].[Calendar]:

Would reference the Previous Member in the [Date].[Calendar] hierarchy (If month, then previous month, if year then year)

[Measures].[Reseller Amount] references that measure.

 

Setting this month (year, etc)’s goal to 40% more than last month (year)’s would be:

1.4 * ( [Date].[Calendar].PrevMember, [Measures].[Reseller Amount] )

 

Status: Value compared to Goal

-1 = underperform, 0 = ok, 1 = good

KPIValue function retrieves value

KPIGoal retrieves goal

KPIStatus retrieves status

KPITrend retrieves trend

 

IF KPIValue(“Reseller Sales”) / KPIGoal (“Reseller Sales”) >= 1 -- good performance

 

Trend: how KPI is performing with respect to time

 

Other KPI properties

Associated Measure group

Display folder

Parent KPI

Current time member

Weight (importance)

Status indicator and trend indicator