ToyboxCreations SELECT * FROM [My Life]

25Aug/100

Modifying Data in Partitioned Views

Yesterday I covered the basics of partitioned views.  I also mentioned there were some gotchas when it comes to modifying the data in those partitioned views.  I'd like to go into more detail about that today.  The first big gotcha on updating data in a partitioned view is making sure you can update the data in that view.  In order for your partitioned view to be update-able you must ensure:

  • The view must be composed of SELECT statements combined with the UNION ALL statement.
  • Each SELECT references only one base table.

If you've done that, then there are some gotchas when it comes to INSERT, UPDATE, and Delete.

INSERT Gotchas

In order to INSERT data into a partitioned view you must follow the following rules:

  • You have to define all columns in your INSERT statement, even if you want to pass a NULL, or a default value.
  • You can't use DEFAULT in the VALUES clause.
  • All values inserted must pass the CHECK constraints defined on all constrained columns.
  • None of your tables can have an IDENTITY property ( auto-increment BAD!)
  • No TIMESTAMP columns.
  • You can't reference the view itself or any member table in the INSERT statement.  That means if you're going to make sure that row doesn't exist before inserting it... check it in an IF THEN before the INSERT statement.

If you follow these rules, you can INSERT data into your partitioned view.

UPDATE Gotchas

When Updating you have a few more issues to deal with.

  • You can't use the DEFAULT keyword as a value in your SET clause, even if the column has a DEFAULT value.  Reference the DEFAULT value explicitly.
  • You cannot update a PRIMARY KEY value if the column is TEXT, IMAGE, or NTEXT.  Seriously who would use those as a PRIMARY KEY...seriously?
  • Just like inserts, you can't UPDATE if you have a TIMESTAMP column in any of your base tables.
  • And just like inserts, you can's reference the view or any of the base tables in your UPDATE statement...Store referenced values in an intermediate variable, table or table variable and you're good to go.

DELETE Gotchas

Finally, the only real gotcha for DELETE statements against a partitioned view is you cannot reference the partition or a base table as a part of the delete statement.  I'm not saying you can't write  a DELETE FROM partitionedViewName... you just can't do a DELETE FROM pvn FROM partitionedViewName INNER JOIN baseTableName on...  you'll get an error.

That's it... All the gotchas when manipulating data in a partitioned view.  If you have any questions, send them in.  I'm here to help!

Filed under: Microsoft SQL No Comments
24Aug/100

Partitioned Views

Last week I covered partitioning.  I explained how you could set up filegroups in your database, and then split data from a single table or index across those filegroups.  That way you can reduce blocking in your objects by physically separating the data into parts.  I even covered how you could use partitioning to speed archiving older data from production tables using SWITCH.  This week I want to dive into partitioned views.

Like partitioning tables and indexes across multiple filegroups, you can set up multiple tables and then reference them through a single view, making it appear you have one large table.  These tables can exist all on the same server, or on several servers.  When you start using tables on other servers in your view it's referred to as a distributed partitioned view.  To set up distributed views, you're going to need to learn about Federated Database Servers.

I would like to point out using partitioned views is not the preferred method, but there can be times where it will be your only choice.  That's why I'm teaching you this topic.

Creating Partitioned Views

The data in the tables referenced by your partitioned view should be split into ranges of data values based on on of the columns common to all the member tables.  Each of these tables must be defined in a CHECK constraint specified on that partitioning column.  That way, when you execute a query against the partitioned view, the query optimizer will use that constraint to determine which member tables to hit in order to find the data you've requested.

Let's set up some demo tables in our KOTOR database.  This time we're going to track sales of our items.  In our example, we have sales tables that will be split by year, then combine them into one partitioned view.

CREATE TABLE sales_2010 (
    saleID INT
  , customerID INT
  , saleDate DATETIME
      CHECK(DATEPART(yy, 2010)
  , saleYear INT
      CHECK( saleyear = 2010)
  , saleTotal DECIMAL(9,2)
    CONSTRAINT PK_Sales2010__saleID_saleYear PRIMARY KEY(saleID, saleYear)
)

This is our template, we will also create tables for sales_2009, sales_2008, and sales_2007, the only thing we have to change will be the name, and the year in the CHECK constraints.

Once we have those defined, creating the VIEW is simple.

CREATE VIEW total_sales (
  SELECT saleID, customerID, saleDate, saleTotal FROM sales_2010
  UNION ALL
  SELECT saleID, customerID, saleDate, saleTotal FROM sales_2009
  UNION ALL
  SELECT saleID, customerID, saleDate, saleTotal FROM sales_2008
  UNION ALL
  SELECT saleID, customerID, saleDate, saleTotal FROM sales_2007
)

Using Partitioned Views

Now that we have defined the total_sales view, you can use it like any other view. You can SELECT from it, and you can update it.

There are two ways you can update a partitioned view. If you have partitioned your tables using the primary key, then you can update the view directly. (UPDATE viewname, SET column = value). But if you didn't use the partitioning column as part of your primary key, then you'll have to create an INSTEAD OF INSERT and UPDATE triggers for your view. If you go this route, make sure you build in some error handling to prevent duplicate rows from being inserted.

And to be honest, that can be a pain. Make it easier on yourself, and develop your partitioned views to segment the data based at least in part on the tables' primary key!

Implementing partitioned views can make it easy to archive data. You can create a new table to cover the new segment of data, then update the view to reference the new table, and remove the reference to the old data.  For those of you not as familiar with partitioning tables, that may be a way of learning to become more familiar with partitioning... just sayin'.

Next time, I'd like to cover some of the gotchas that can exist when you go to running INSERT, UPDATE, and DELETE against your partitioned view.  For now, let this sink in, and play around with the basics.  If you have any questions, send them in!  That's what I'm here for.

Filed under: Microsoft SQL No Comments
20Aug/100

Partitioning: MERGE, SPLIT, and SWITCH

OK, you know how to set up partitioning for a table and an index, and now you think you've solved your deadlocking, and your I/O issues forever, right?

Hardly.

Once you've been in SQL long enough you learn that no answer is forever.  Inevitably you'll have to revisit your old queries and see if their performance has degraded.  Partitioning is no different.  There are several issues you may need to solve with partitioning in the long run.  The first scenario is really easy to understand.

Let's say you have the partitions set up like we did back in my Partitioned Tables and Indexes article.  Your items table has grown and grown over the past year, and you notice that your data is starting to get heavy in one particular filegroup.

SPLIT

DECLARE @totalCount DECIMAL(9,2)
SELECT @totalCount = COUNT(*) FROM items

SELECT
 CASE WHEN manufacturerID <= 10 THEN 'fg1' else
   CASE WHEN manufacturerID between 11 and 20 THEN 'fg2' else
     CASE WHEN manufacturerID between 21 and 30 THEN 'fg3' else
       CASE WHEN manufacturerID between 31 and 40 THEN 'fg4' else
         CASE WHEN manufacturerID > 50 THEN 'fg5' END
       END
     END
   END
 END as filegroup
 , CONVERT(DECIMAL(9,2), count(*))/ @totalCount
FROM items
GROUP BY
  CASE WHEN manufacturerID <= 10 THEN 'fg1' else
    CASE WHEN manufacturerID between 11 and 20 THEN 'fg2' else
      CASE WHEN manufacturerID between 21 and 30 THEN 'fg3' else
        CASE WHEN manufacturerID between 31 and 40 THEN 'fg4' else
          CASE WHEN manufacturerID > 50 THEN 'fg5' END
        END
      END
    END
  END

When you run this, you notice even with 5 partitions you have much more than half of your records are in the last group.  This tells you you need to dig into that last group and figure out where you need to add a partition to your table.  You could then run another query to look at the distribution in that last group by groups of 10

SELECT
 (manufacturerID/10) + 1 AS filegroup
 , CONVERT(DECIMAL(9,2), count(*))/ @totalCount
FROM items
GROUP BY
 (manufacturerID/10) + 1

After looking into this you notice that over time you've added more manufaturers in this range, and you can effectively split the load for this file group if you set a boundary at 75.  That would put half of fg5 into a new partition, and the load would be split evenly again.  But how could you add a partition?

Well, in order to add a partition, you're going to have to assign it's data to a filegroup.  I'm going to assume all the filegroups you have defined for your current database are already in use.  We're going to need a new filegroup.

ALTER DATABASE KOTOR
ADD FILEGROUP fg6
   (NAME = fg6_data, FILENAME = 'd:\mssql\data\kotor_fg6.ndf', SIZE = 10MB)

Before you can use that filegroup in your partition, you've got to alter the PARTITION SCHEME.  You need to identify it as the next filegroup to use when you alter your PARTITION FUNCTION.  To do that, you need to ALTER the PARTITION SCHEME and identify the next filegroup to use.

ALTER PARTITION SCHEME myPartitionScheme
NEXT USED fg6

Now that you have identified the new filegroup as the NEXT USED filegroup, you can define the new boundary to your PARTITION FUNCTION.  The SPLIT operator allows you to define a new boundary point to your PARTITION FUNCTION.

ALTER PARTITION FUNCTION myPartitionFunction(int)
SPLIT RANGE (75)

This now adds 75 as a new boundary to the PARTITION FUNCTION we originally created in this article.

MERGE

IF you wanted to remove a boundary instead.  Let's say you have a boundary in use that is completely empty, and you want to stop using that partition, to free it up for some reason, then you want to use the MERGE operator. The syntax is just like SPLIT, except the boundary point you pass will be removed from your PARTITION FUNCTION.

SWITCH

Now before we go, I'd like to really blow you away.  How do you archive data?  SELECT it from one table into an ARCHIVE table, then delete it from the original table?  What if your database is highly available and you can't get a lock on the table to do that?

Well, if you partitioned your table, you've got a very powerful little tool available to you.  You can actually move a partition to a new table then you've effectively detached that part of the table from the original table.

You read that right.  You move a partition from one table to another, and the rest of your data remains untouched.  If you deal with sales data, you've probably already partitioned your table on a DATE or DATETIME field, right?  Once you move beyond a certain period and you're ready to archive that data, you can use this trick.

Script out a create statement for your table, give the table a new name, tableNameArchive.  Then, SWITCH the partition to that new table.

ALTER TABLE tableName
SWITCH PARTITION x TO tableNameArchive

Where x is the number of the partition containing the data you want to archive (this would usually be the first or oldest partition).  Then, MERGE that partition's boundary in the first table, since you no longer need it.

ALTER PARTITION FUNCTION partFunction()
MERGE RANGE (value)

Check out the data in your tableNameArchive table... it's your archived data!  Pretty slick, right?  This is a pretty complex use of partitions.  I think I should put together a demo of it in use, so you can work through it.  What do you think?  Do you want a practice problem?  If so, let me know!

If you have questions on this, or any other SQL topic, send them in.  I'm here to help!