Exam Prep 070-451: Chapter 1: The struggle

Upon starting preparation for 070-451, I made an unfortunate discovery.  It seems that no one has ever taken it - or even shown any interest in it.  Ok, maybe that's a slight exaggeration.  But googling '070-451 study guide' only brings up a very halfhearted attempt by Microsoft to provide prep materials (step 5 is "Read books online"), Brent Ozar's unfortunately very accurate rant, and the usual deluge of sites charging 59.99 for a .PDF of outdated exam questions.  There are no results on Amazon for a 070-451 study guide.  It's the lost child of the DB certification track, focusing on new features in SQL Server 2008 that may not ever be useful in a production environment.  Still, it's part of the MCITP track, so I felt compelled to do a modest write-up based on some of the concepts known to be included on the exam.

Update for 2012-07-02:  I passed last week.  Bad news for braindump hunters: They're all woefully inaccurate and outdated.  Sorry !  :-) The exam itself is 70 multiple choice questions.

So, your best bet is this guide.  Focus on new features in SQL Server 2008 (especially new data types and compression options) as well as locking, CLR functions, isolation levels, and partitioning.  I didn't get many XML questions, but YMMV.

Here's a study guide which reflects what you might see on 070-451.  This is an ugly exam with confusing concepts, poorly worded questions, and ambiguous answers, so be as prepared as possible.

Transactions

Transactions can be marked with a specific name using the syntax

BEGIN TRANSACTION TRANSACTIONNAME with MARK ‘MYTRANSACTIONMARK’.

In recovery situations, keywords STOPATMARK will restore up to completion of the named transaction.  STOPBEFOREMARK will restore up to the named transaction without including it.

By using SAVE TRANSACTION statements, you can roll a transaction back to the mark specified in the SAVE statement:

The IS_READ_COMMITTED_SNAPSHOT_ON database-level option specifies that a DB will use row versioning instead of locking when in the READ COMMITTED Isolation level.

In Nested transactions, the outer transaction controls the final outcome as to whether the inner transactions are committed or rolled back.

 

BEGIN TRAN       Insert ‘A’…  Insert ‘B’…   SAVE TRANSACTION SAVEME    Insert ‘C’…

ROLLBACK TRANSACTION SAVEME

This will only roll back the Insert ‘C’ statement.

SAVE TRANSACTION Statements cannot be used outside of a Transaction.

 

SET_IMPLICIT_TRANSACTIONS

A connection-level configuration option that controls how transactions are handled.  With Implicit Transaction set to ON, most DML statements will automatically start a transaction.  This transaction must be explicitly committed.  If a transaction is already open, a new one will not be started.  When this option is OFF, it’s referred to as AUTOCOMMIT Mode.

http://msdn.microsoft.com/en-us/library/ms187807.aspx

 

Column-level deny of permissions

Is it possible to add a user to a database role, but hide certain columns from that user for security purposes.  You can DENY column-specific information while allowing the user to view the rest of the table.

http://www.mssqltips.com/sqlservertip/2124/filtering-sql-server-columns-using-column-level-permissions/

Indexes

Always create indexes on the columns that will be accessed as part of a WHERE clause, and include the columns in the select list.

Filtered indexes can be created with a WHERE clause, so that only rows with a specific range are included.  For example, a date that is often NULL can have an index placed on it with the WHERE XDATE IS NOT NULL.  Filtered indexes cannot contain the IGNORE_DUP_KEY parameter.

You can force the optimizer to select an index with the INDEX(#) option.  INDEX(0) is always the table’s clustered index, if it exists.

Executing Stored procedures

Keywords within a stored procedure control under which context a stored procedure will execute:

EXECUTE AS OWNER: The owner of the stored proc.

EXECUTE AS SELF: The current security context

EXECUTE AS CALLER: The context of the calling user.  The user must have permissions on the SP and any referenced objects.

Normally, if a user executes a stored procedure and the SP and underlying objects have the same owner, additional permissions checks are bypassed due to ownership chaining.  However, if the SP contains DDL statements for some reason, ownership chaining does not apply and the user executing the stored procedure must have appropriate permissions on all objects.

To reference assemblies from a stored procedure, use EXTERNAL NAME in the definition of the procedure.

The ENCRYPTION option means the Stored Procedure’s definition cannot be viewed.

 

Views

WITH SCHEMABINDING prevents any changes to the underlying tables while the view exists.

SCHEMABINDING View definitions must include columns lists and cannot include *.

Schemabound views should have unique clustered indexes for performance.

They must also include the Schema.Objectname when referencing underlying objects.

Common table expressions (CTEs) allow a block of code to be re-used like a temporary view for in a brief context.

CTEs cannot have ORDER BY or OUTPUT in their definition.  The select statement that references the CTE must be the next statement executed, except in the case of multiple consecutive CTEs being defined.

CTEs can be recursive and reference themselves in the definition.   An ANCHOR member must be specified first. 

MAXRECURSION specifies how many recursion levels are permitted in this scenario.  The default is 100, but MAXRECURSION 0 can specify infinite recursion. 

Once the maximum RECURSION level is reached, the result set to that point is returned, and an error message is generated (MAXIMUM RECURSION HAS BEEN EXHAUSTED).

WITH CHECK is used on updateable views; it prevents modifications to the view that would cause the underlying data to be removed from the view’s scope.

DISTRIBUTED PARTITIONED VIEWS can be used when data is in multiple locations, but needs to be presented as a single source.

 

 

 

Full-text searches

Stoplists exclude certain words from full-text indexing, often referred to as noise words.

CONTAINS locates matches in a column or columns based on a specified search condition.

FREETEXT searches based on meaning rather than exact matching.

FREETEXTTABLE searches based on meaning rather than exact matching and includes Key and Rank columns for relevance.

INFLECTIONAL includes forms of a word.

A thesaurus file can be used to identify words with the same meaning as one specified.

After modifications, the file should be loaded with the sys.sp_Fulltext_load_thesaurus_file procedure.

CONTAINS (Forms OF THESAURUS) is the predicate used for searching.

 

 

 

Performance and optimization

Table hints can be used to force the optimizer to select a certain index. 

Sparse columns optimize storage requirements for columns which have a high incidence of NULLS.

http://msdn.microsoft.com/en-us/library/cc280604.aspx

Parameter sniffing is when SQL Server detects passed parameters during compilation and optimizes the resultant plan.

The OPTIMIZE FOR clause will override any sniffing and force the optimizer to expect a certain value.

In the event you cannot modify the original object but still need to use OPTIMIZE FOR, an object plan guide can be created using SP_CREATE_PLAN_GUIDE.  SQL Server will use the plan guide when executing the specified query or object.

The PARAMETERIZATION database-level option can be used to force any literal values in every query to be considered a parameter during compilation.  This is the FORCED option; the default behavior is SIMPLE, which does not do this.

To override this in a query occurring in a database with FORCED parameterization, you can use SP_GET_QUERY_TEMPLATE and SP_CREATE_PLAN_GUIDE to use a specified parameterization.  If you ever find yourself in this situation, you may want to take a long hard look back on what mistakes you made to get there.

Using WITH RECOMPILE in the definition of a stored procedure will force it to never cache query plans.

In partitioning, a table-aligned index must be created using the same partition scheme and key as the partitioned table.

HASH joins/matches require an intermediary structure in Tempdb and are intensive operations.

SORT can also be intensive.

NESTED LOOPS are used in joins, and are only intensive in large datasets.

For uniqueidentifier value generation, NEWID and NEWSEQUENTIALID are both valid.

using NEWSEQUENTIALID can be faster than using the NEWID function. This is because the NEWID function causes random activity and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to completely fill the data and index pages.

http://msdn.microsoft.com/en-us/library/ms189786.aspx

 

SQL Server 2008 supports table-valued parameters as inputs to SPs and Functions.   You can use this to pass a large dataset in a single pass.  The table variable must be declared as READONLY in the SP definition, and can accept a user-defined data type based off TABLE.

SP_ESTIMATE_DATA_COMPRESSION_SAVINGS

sp_estimate_data_compression_savings

     [ @schema_name = ] 'schema_name' 

   , [ @object_name = ] 'object_name'

   , [@index_id = ] index_id

   , [@partition_number = ] partition_number

   , [@data_compression = ] 'data_compression'

 

http://msdn.microsoft.com/en-us/library/cc280574.aspx

Indexes can only be placed on computed persisted columns if they are deterministic.

Functions

Functions in general, especially non-deterministic functions, have certain restrictions in the way they can be used and can suffer from poor performance.

http://blog.sqlauthority.com/2007/05/29/sql-server-user-defined-functions-udf-limitations/

 

For complex string operations, a CLR function can be a better option – more functionality is available with .NET code than in SQL Server.

 

Using Sort Order in CLR Table-valued Functions

When using the ORDER clause in CLR table-valued functions, follow these guidelines:

·         You must ensure that results are always ordered in the specified order. If the results are not in the specified order, SQL Server will generate an error message when the query is executed.

·         If an ORDER clause is specified, the output of the table-valued function must be sorted according to the collation of the column (explicit or implicit). For example, if the column collation is Chinese (either specified in the DDL for the table-valued function or obtained from the database collation), the returned results must be sorted according to Chinese sorting rules.

·         The ORDER clause, if specified, is always verified by SQL Server while returning results, whether or not it is used by the query processor to perform further optimizations. Only use the ORDER clause if you know it is useful to the query processor.

·         The SQL Server query processor takes advantage of the ORDER clause automatically in following cases:

o    Insert queries where the ORDER clause is compatible with an index.

o    ORDER BY clauses that are compatible with the ORDER clause.

o    Aggregates, where GROUP BY is compatible with ORDER clause.

o    DISTINCT aggregates where the distinct columns are compatible with the ORDER clause.

The ORDER clause does not guarantee ordered results when a SELECT query is executed, unless ORDER BY is also specified in the query. See sys.function_order_columns (Transact-SQL) for information on how to query for columns included in the sort-order for table-valued functions.

 

UDF Generals:

 Must include a RETURNS clause to specify the data type being returned in the declaration.

The last statement must be a RETURN.  The RETURN statement cannot be enclosed in an IF/ELSE statement; the UDF interprets the last statement in this case as the IF/ELSE and will generate an error.

Reporting

Cube, Rollup, and Grouping Sets are aggregation methods.

Cube returns a summary row for each possible combination of columns in the GROUP By clause.

http://blogs.msdn.com/b/craigfr/archive/2007/10/11/grouping-sets-in-sql-server-2008.aspx

http://msdn.microsoft.com/en-us/library/bb510624.aspx

In Grouping Sets, the GROUPING_ID function identifies a grouping level.

Scalable Shared databases:

The Scalable Shared Database feature enables you to scale out a read-only database built exclusively for reporting.

http://msdn.microsoft.com/en-us/library/ms345584%28v=sql.100%29.aspx

 

XACT_ABORT

When SET XACT_ABORT is ON, if a Transact-SQL statement raises a run-time error, the entire transaction is terminated and rolled back.

When SET XACT_ABORT is OFF, in some cases only the Transact-SQL statement that raised the error is rolled back and the transaction continues processing. Depending upon the severity of the error, the entire transaction may be rolled back even when SET XACT_ABORT is OFF. OFF is the default setting.

 

http://msdn.microsoft.com/en-us/library/ms188792.aspx

 

Locking

With partitioned tables, another locking escalation is available: Partition-level.  To allow SQL Server to use this, use the table-level option LOCK_ESCALATION and set the value to AUTO.  AUTO defaults to PARTITION for partitioned tables, and TABLE for non-partitioned tables. 

The default is TABLE.  DISABLE is another option.

TABLOCKX is a table-level hint that prevents any other users from accessing a table until the respective query has finished.

NOWAIT causes a query to not wait for locks to release, timing out immediately if any are encountered.

NOLOCK causes a query to ignore locks entirely, but this can cause dirty reads.

READPAST cause a query to skip locked rows when reading data, but will not cause dirty reads.

HOLDLOCK is equivalent to Serializable.

Plan guide

To disable a plan guide, use EXEC sp_control_plan_guide N’DISABLE’, N’<Planguidename>’

Constraints

Column-level check constraints can be used for comparing a column to a value, such as SALESNUM > 0.

Table-level check constraints are needed when multiple columns are referenced, such as SALESNUM > TARGETSALES.

Constraints can only be created on computed columns if they are persisted.

To make an existing computed column persisted, you must drop and recreate it.

Development

The ADO.Net Entity Framework is an ORM framework developed to allow developers to interface with a conceptual mode, while not having to worry about the specific database objects.

http://en.wikipedia.org/wiki/ADO.NET_Entity_Framework

 

Storage

FILESTREAM allows large files to be stored on the file system with a pointer to the file in the database.

To store large files in-row in a table, the large_value_types_out_of_Row should be 0.  With this at 1 (ON), files will be stored in the database but outside of the row.

 

Query Mechanics

You can use the OUTPUT keyword to display information to the monitor (or another structure) about the rows being affected by a query.

INSTEAD OF TRIGGERS can be used to perform an alternative to an INSERT statement or include additional functionality in case the INSERT Fails.

Referential Integrity

Foreign Key options:

                ON DELETE NO ACTION: prevents a parent record from being deleted if it has children.

                ON DELETE CASCADE: deletes all children if a parent is deleted.

                ON DELETE SET NULL: sets all of the children in the FK’s values to NULL

                ON DELETE SET DEFAULT: values would revert to column DEFAULT.

 

Assemblies

Permission sets for assemblies:

SAFE: internal DB access only.

EXTERNAL_ACCESS: DB, file system, registry, environment variables

UNSAFE: no permission checks.  Only UNSAFE can call unmanaged code. [Thanks Dom for the catch]

Service Broker

http://www.developer.com/db/article.php/3640771/Getting-Started-with-SQL-Server-Service-Broker.htm

Mail

Database mail is the preferred method for sending email from SQL Server.  It uses SMTP.

SQL Mail is no longer recommended and needed an installed MAPI client.

 

Using Sort Order in CLR Table-valued Functions

When using the ORDER clause in CLR table-valued functions, follow these guidelines:

·         You must ensure that results are always ordered in the specified order. If the results are not in the specified order, SQL Server will generate an error message when the query is executed.

·         If an ORDER clause is specified, the output of the table-valued function must be sorted according to the collation of the column (explicit or implicit). For example, if the column collation is Chinese (either specified in the DDL for the table-valued function or obtained from the database collation), the returned results must be sorted according to Chinese sorting rules.

·         The ORDER clause, if specified, is always verified by SQL Server while returning results, whether or not it is used by the query processor to perform further optimizations. Only use the ORDER clause if you know it is useful to the query processor.

·         The SQL Server query processor takes advantage of the ORDER clause automatically in following cases:

o    Insert queries where the ORDER clause is compatible with an index.

o    ORDER BY clauses that are compatible with the ORDER clause.

o    Aggregates, where GROUP BY is compatible with ORDER clause.

o    DISTINCT aggregates where the distinct columns are compatible with the ORDER clause.

The ORDER clause does not guarantee ordered results when a SELECT query is executed, unless ORDER BY is also specified in the query. See sys.function_order_columns (Transact-SQL) for information on how to query for columns included in the sort-order for table-valued functions.

Partitioning

When using the SWITCH statement to swap partitions, the receiving partition must not contain any data.  The tables may or may not use the same partition function.

Different partitions can have different compression levels.  You can use the REBUILD PARTITION Clause to specify the levels.

The SPLIT function separates a single partition into two.

The MERGE function combines two existing partitions by eliminating the specified boundary.

Security

Application roles only permit database access via a custom application. 

Parameterized stored procedures and SP_EXECUTESQL are the safest way to parse user-entered variables.

User-defined data types based on ASSEMBLIES are restricted to the database in which they are created.

 

 

Data Types

date

The date data type has a range of January 1, 01 through December 31, 9999 with an accuracy of 1 day. The default value is January 1, 1900. The storage size is 3 bytes.

time

The time data type stores time values only, based on a 24-hour clock. The time data type has a range of 00:00:00.0000000 through 23:59:59.9999999 with an accuracy of 100 nanoseconds. The default value is 00:00:00.0000000 (midnight). The time data type supports user-defined fractional second precision, and the storage size varies from 3 to 6 bytes, based on the precision specified.

datetime2

The datetime2 data type combines the range and precision of the date and time data types into a single data type. Datetime2 has greater precision than datetime.

The default values and string literal formats are the same as those defined in the date and time data types.

datetimeoffset

The datetimeoffset data type has all the features of datetime2 with an additional time zone offset. The time zone offset is represented as [+|-] HH:MM. HH is 2 digits ranging from 00 to 14 that represent the number of hours in the time zone offset. MM is 2 digits ranging from 00 to 59 that represent the number of additional minutes in the time zone offset. Time formats are supported to 100 nanoseconds. The mandatory + or - sign indicates whether the time zone offset is added or subtracted from UTC (Universal Time Coordinate or Greenwich Mean Time) to obtain the local time.

 

With the HierarchyID Data type, you can build relationships amongst existing data points.

http://www.codeproject.com/Articles/37171/HierarchyID-Data-Type-in-SQL-Server-2008

Spatial data types:

Geometry: Planar data

Geography: elliptical (round-earth)

Spatial Indexes allow you to specify a density.  More information here:

http://msdn.microsoft.com/en-us/library/bb964712%28v=sql.105%29.aspx

Large datatypes, blobs, Varbinary and Varbinary(MAX) - when to use and what the benefits and limitations of each are.

 

XML

XML Schema collections can be used to validate XML documents.

FOR XML outputs XML data in a relational format.

Untyped XML is for loading XML that does not conform to a schema, lacks a defined schema, or has already been validated.

Type XML is for loading data that requires validation.

 

XML Data type methods

http://msdn.microsoft.com/en-us/library/ms190798.aspx

Query() returns untyped XML from a specified string or expression.

Value() returns scalar values in relational format from a string literal or expression

Must return a single value – will usually see [1] after the attribute, e.g. @ID[1] to specify a single value.

Exist() returns a bit (1 = TRUE, 0 = FALSE)

Modify() modifies the contents of an XML document.

Syntax: REPLACE VALUE OF (Expression1) WITH (Expression2)

Nodes() shreds an XML document into relational rows of XML fragments.

 

Selecting Relational Data as XML:

FOR XML Clause:

RAW: A single <row> element is generated for each row in the result set.

AUTO: Nested elements are returned using the columns specified in the select statement.

EXPLICIT: Each row is defined in detail in the FOR XML statement.

PATH: Each row defined in detail, but column names and aliases are specified as XPATH expressions.

An @ in the column aliases specify that this element will have an ID and name attribute.

Lacking an @ will cause the result to be a subelement.

Typed XML is needed for situations where specific elements, attributes, and values must be validated.  A Schema Collection is needed.

OPENXML is more memory intensive than XML methods.

 

Service Broker

A Contract specifies the message types permitted and the participants that are allowed to send those messages.

	CREATE CONTRACT contract_name
	   [ AUTHORIZATION owner_name ]
	      (  {   { message_type_name | [ DEFAULT ] }
	          SENT BY { INITIATOR | TARGET | ANY } 
	       } [ ,...n] ) 

Authorization specifies the owner, which should be a role or database user.  This is often omitted and defaults to the current user.

 

A Queue stores service broker messages. 

	CREATE QUEUE <object>
	   [ WITH
	     [ STATUS = { ON | OFF }  [ , ] ]
	     [ RETENTION = { ON | OFF } [ , ] ] 
	     [ ACTIVATION (
	         [ STATUS = { ON | OFF } , ] 
	           PROCEDURE_NAME = <procedure> ,
	           MAX_QUEUE_READERS = max_readers , 
	           EXECUTE AS { SELF | 'user_name' | OWNER } 
	            ) [ , ] ]
	     [ POISON_MESSAGE_HANDLING (
	       [ STATUS = { ON | OFF } )
	    ]

 

Status specifies available or unavailable.  Unavailable queues cannot process messages.

Activation specifies the stored procedure which manages the queue.

Status (Activation) specifies whether the broker starts the stored procedure.  Default is ON.

SENT BY specifies which participant can send messages for this contract.

 

CREATE BROKER PRIORITY assigns numeric priorities to conversations.

The Entity Framework (http://en.wikipedia.org/wiki/ADO.NET_Entity_Framework)

Is an abstraction layer for developers writing .NET code to interact with database objects without concerning themselves with the physical structure of the database.  Basic properties can be exposed to the developers without worrying about database objects such as foreign keys, triggers, constraints, etc.

Schema permissions

CREATE SCHEMA can specify an owner to that schema.

By default, the schema owner can drop objects and the schema itself (when empty), but cannot create objects.  The schema owner has GRANT permissions to objects in the schema.

Scalable Shared Databases

http://technet.microsoft.com/en-us/library/ms345392%28SQL.105%29.aspx