What is Heap Table?
SQL Server stores a table as a heap or B-tree. If you create a table without index then it is stored as a heap. Similarly, If you create a table with a
clustered index, then it is stored as a B-tree. A table can never be a heap and a clustered table at the same time. Simply, a heap table is one without a
clustered index.
A tables use a heap until a clustered index created on the table. Within the heap tables, the data is unsorted. The server does not guarantee nor maintain any sorting order of the data in the heap tables. When we insert data into
heap tables, SQL Server tries to fill pages as much as possible.
Logically, the heap is comprised of an Index Allocation Map (IAM) that points to all pages within the heap. Within the heap, there is no linking or organization between the pages. All reads and writes must consult the IAM
first to read all pages within a heap. Heap tables are best for staging tables.
In the Heap table, all the rows are identified internally using the heap row
identifier. The row identifier is an actual physical location composed of
three values, such as
- FileID
- PageNum
- SlotNum
If the base table is not clustered, then any non-clustered index can store the
heap row identifier in every level of the index, which is used to point back
to the full row of the base table.
Advantages: In Heap tables, inserting data is very fast compared to
clustered index tables.
Disadvantages:
- Forwarding pointers
- Random I/O reads
- Consumes more space when not rebuild
Query to list all tables using the heap
SELECT
stbl.name, sidx.type_desc
FROM
sys.tables stbl
INNER JOIN
sys.indexes sidx ON stbl.object_id = sidx.object_id
AND sidx.type = 0
AND stbl.type = 'U'
Storage
When we are inserting data into heap tables, SQL Server scans for page free
space (PFS) page before inserting it. If the data page contains enough space
then it stores a new row into the existing page. If it does not have
sufficient free space, SQL Server allocates a new page.
SQL Server using the low value from the PFS free space percentage during the estimation such as SQL Server would not put a new row to the page if its size
exceeds 20 percent of the page size. For example, if the data page store 4500
bytes of data, and has 3560 bytes of free space available, PFS would indicate
that the page is 51-80 percentage full (0x02). SQL Server doesn't allow a new
row to the page as it exceeds 20 percent free space ( 8,060 * 0.2 = 1,612
).
CREATE TABLE dbo.heap_demo(val varchar(8000))
GO
INSERT INTO dbo.heap_demo(val) VALUES (replicate('1',4489))
GO 10
select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.heap_demo'),0,null,'DETAILED');
Inserted 10 rows in the table and the data stored for each row is 4500 bytes.
SQL Server allocates 10 data pages such as one page per row. As PFS indicated
that the pages are 51-80 percentage full.
Now insert 1000 bytes data row in the table, which is about 12.5 percent of the page size. As a result SQL Server nows that record will fit in one of the existing pages ( they all have at least 20% free space).
INSERT INTO dbo.heap_demo(val) VALUES (replicate('1',1000))
Now insert 2000 bytes data row in the table, which is about 25 percent of the
page size. SQL Server doesn't fit the row in any page as the row exceeds 20
percentage, as a record, it allocates a new page.
INSERT INTO dbo.heap_demo(val) VALUES (replicate('1',2000))
Here SQL Server unnecessarily allocates new data pages, leaving large amounts of free space unused. SQL Server eventually fills empty spaces with smaller
rows.
Forwarding Pointers
When you update the row in the heap table, SQL Server tries to accommodate it
on the same page. If there is no free space available, SQL Server moves the
new version of the row to another page and replaces the old row with a special
16-byte row called a forwarding pointer. The new version of the row is called
a forwarded row.
Forwarding pointers improves I/O operations such as duplicate reads and
prevent updates of non-clustered index keys.
Let's create a table and insert three rows
CREATE TABLE dbo.heap_demo_fp(id int,val varchar(8000))
GO
INSERT INTO dbo.heap_demo_fp(ID,val) VALUES (1,replicate('1',1500))
INSERT INTO dbo.heap_demo_fp(ID,val) VALUES (2,replicate('1',1500))
INSERT INTO dbo.heap_demo_fp(ID,val) VALUES (3,replicate('1',3000))
GO
select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.heap_demo_fp'),0,null,'DETAILED');
Here three rows size (6000 bytes) not exceed the PFS indication percentage
such as 20% free space available, so all the three rows fill on the same page.
Next, update one of the table rows and increase their size ( 5000
bytes). The new version of the rows would not fit into the page anymore,
which introduces the allocation of the one new page and one forwarding
pointers.
update dbo.heap_demo_fp set val=replicate('1',5000) where ID=1;
select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent,forwarded_record_count
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.heap_demo_fp'),0,null,'DETAILED');
Rebuild Heap Tables
In the case of large heap tables, we can see the considerable value of
these logical reads that can be causing performance issues for the data
retrieval. If the forwarding pointer is more then more the logical reads. To
resolve we need to monitor the heap tables and rebuild it.
For example, Let's insert 65,536 rows in the table
truncate table heap_demo_fp;
;with N1(C) as (select 0 UNION ALL select 0) -- 2 rows
,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows
,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows
,N4(C) as (select 0 from N3 as T1 cross join N3 as T2) -- 256 rows
,N5(C) as (select 0 from N4 as T1 cross join N4 as T2) -- 65,536 rows
,IDs(ID) as (select row_number() over (order by (select NULL)) from N5)
insert into dbo.heap_demo_fp(ID)
select ID from IDs;
select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent,forwarded_record_count from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.heap_demo_fp'),0,null,'DETAILED');
set statistics io on
select * from dbo.heap_demo_fp
set statistics io off
(65536 rows affected)
Table 'heap_demo_fp'. Scan count 1, logical reads 106.
As you see, there are 106 pages in the table and as a result, SQL Server
performs 106 reads during IAM scan. Let's update the table and introduce
forwarding pointers.
update dbo.heap_demo_fp set val=replicate('1',500);
select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent,forwarded_record_count
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.heap_demo_fp'),0,null,'DETAILED');
set statistics io on
select count(*) from dbo.heap_demo_fp
set statistics io off
Now the table has 4461 pages and it requires almost 70k reads to perform a
scan. As you see a large number of forwarding pointer leads to extra I/O
operations and reduces the performance of the query accessing the data.
To get rid of the forwarding pointers is by rebuilding the heap table.
ALTER TABLE dbo.heap_demo_fp REBUILD;
select page_count, avg_record_size_in_bytes, avg_page_space_used_in_percent,forwarded_record_count from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.heap_demo_fp'),0,null,'DETAILED');
set statistics io on
select count(*) from dbo.heap_demo_fp
set statistics io off
After the table is rebuilt, the table now uses 4377 pages and no forwardings
records.
Conclusion
Heap tables are useful in staging tables where inserting data will be very
fast compare to clustered tables. But heap tables will perform slow compare
to clustered tables. It is recommended to use clustered tables. According to
our business needs, you can choose either clustered or heap tables.
Thanks for your time.
References:
0 comments:
Post a Comment