Hello,
When I loaded data from OpenEdge to MS SQL using Pro2SQL tools I found that the data in MS SQL take up more space (аpproximately > 3 - 4 times more) than in the Progress OpenEdge.
Simple example:
OpenEdge table:
ADD TABLE "test1" AREA "Account Table" DUMP-NAME "test1" ADD FIELD "int1" OF "test1" AS integer FORMAT "->,>>>,>>9" INITIAL "0" POSITION 2 MAX-WIDTH 4 ORDER 10 ADD FIELD "dec1" OF "test1" AS decimal FORMAT "->>,>>9.99" INITIAL "0" POSITION 3 MAX-WIDTH 25 DECIMALS 10 ORDER 20 ADD FIELD "char1" OF "test1" AS character FORMAT "x(8)" INITIAL "" POSITION 4 MAX-WIDTH 150 ORDER 30
Create a four record with different sizes (lengths) for field char1:
1 22222 "12345678" 2 33333 "1234567891234567" 3 44444 "12345678901234567890123456789012345678901234567890123456789012345678901234567890" 5 21324 "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
MS SQL Table:
CREATE TABLE test2 ( int1 integer null, dec1 decimal(25,10) null, char1 char(150) null, ) GO
Create a similar four records:
INSERT INTO [dbo].[test1] ([int1] ,[dec1] ,[char1]) VALUES (1 ,22222 ,12345678) GO INSERT INTO [dbo].[test1] ([int1] ,[dec1] ,[char1]) VALUES (2 ,33333 ,1234567891234567) GO INSERT INTO [dbo].[test1] ([int1] ,[dec1] ,[char1]) VALUES (2 ,44444 ,'12345678901234567890123456789012345678901234567890123456789012345678901234567890') GO INSERT INTO [dbo].[test1] ([int1] ,[dec1] ,[char1]) VALUES (4 ,21324
,'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890') GO
in Progress OpenEdge size of these four records is 324.0B
Table Records Size Min Max Mean Count Factor Factor PUB.test1 4 324.0B 32 144 81 4 1.0 1.0
but in the MS SQL is 8К:
EXEC sp_spaceused N'test1' GO name rows reserved data index_size unused test1 4 16 KB 8 KB 8 KB 0 KB
How to explain this?
Regards,
Valeriy
That all looks pretty reasonable to me... especially since you have 3 more columns in SQL than you do in OE.
If you wanted to try and reduce the physical storage you "might" be able to get some savings by setting up filegroups (storage areas) for the table and the indexes instead of just letting them exist in the same filegroup with other objects.
Unless this table is going to grow quite a bit or if you have much larger tables in mind it likely isn't worth it.
Well you have two issues... one is that you are using the CHAR data type (fixed length) instead of VARCHAR(variable length).
The other is that I am pretty sure that sp_spaceused is reporting the block usage and not the contents of the rows.
Loading up terabytes of data into both SQL Server and Oracle I have never seen the supposed extra space usage required. This is of course using the proper data types and without any of the shadow columns that dataserver requires.
[quote user="Keith Sudbury"]
Well you have two issues... one is that you are using the CHAR data type (fixed length) instead of VARCHAR(variable length).
The other is that I am pretty sure that sp_spaceused is reporting the block usage and not the contents of the rows.
[/quote]
I found another script to determine the size of the row:
declare @table nvarchar(128) declare @idcol nvarchar(128) declare @sql nvarchar(max) --initialize those two values set @table = 'test4' set @idcol = '*' set @sql = 'select ' + @idcol +' , (0' select @sql = @sql + ' + isnull(datalength(' + name + '), 1)' from sys.columns where object_id = object_id(@table) set @sql = @sql + ') as rowsize from ' + @table + ' order by rowsize desc' PRINT @sql exec (@sql)
and it's closer to the truth
[quote user="Keith Sudbury"]
Loading up terabytes of data into both SQL Server and Oracle I have never seen the supposed extra space usage required. This is of course using the proper data types and without any of the shadow columns that dataserver requires.
[/quote]
You've worked with Pro2?
Since I use Pro2SQL, his documentation said:
Please note, if this box is to contain the SQL database, the size of
the SQL database will be approximately two times the size of the
Progress database.
So I'm trying to understand why the size may be twice as much, plus understand why I have sql db size even more )
That script is just adding up the lengths of the returned data and not what is actually stored in the DB. So still not exactly comparing apples to apples yet.
The first step is to get the data types set up correctly... that means VARCHAR or NVARCHAR depending on your codepage needs. Using CHAR is just asking for space to be wasted. There are some rare edge cases on very large tables where CHAR can make sense... but unless you have a table with a few hundred million records it isn't worth talking about.
Why the Pro2 team made this statement I have no idea. My utilities existed before Pro2 was commercially available but they are basically the same concept... use replication triggers to track changes and the dataserver to insert/update/delete the target database as needed.
> On Mar 9, 2016, at 10:54 AM, Keith Sudbury wrote:
>
> Using CHAR is just asking for space to be wasted.
Not a requirement.
It is possible for CHAR to be stored variable-length in a database and then fixed up when the value is handed to the application. There are databases that compress character data too.
As far as I know in SQL Server CHAR is still the old style fixed length storage.. CHAR(150) always uses 150 bytes for storage.
VARCHAR(150) uses the actual data length + 2 bytes.
NVARCHAR(150) is twice the actual length + 2 bytes
In the first example I used CHAR type to match with Progress - now I know that I was wrong. Thank you for the clarification, Keith!
But initially Pro2SQL tool generates fields with varchar. Exemple from real db:
CREATE TABLE [DBO].[qc_mstr] ( [prrowid] varchar(36) not null PRIMARY KEY NONCLUSTERED, [qc_cmtindx] integer null, [qc_due_date] datetime null, [qc_insp_loc] varchar(80) null, [qc_inv_stat] varchar(80) null, [qc_lead_time] decimal(28,10) null, [qc_loc] varchar(80) null, [qc_lot] varchar(80) null, [qc_nbr] varchar(30) null, [qc_ord_date] datetime null, [qc_part] varchar(30) null, [qc_project] varchar(80) null, [qc_qty_comp] decimal(28,10) null, [qc_qty_ord] decimal(28,10) null, [qc_qty_rjct] decimal(28,10) null, [qc_ref] varchar(80) null, [qc_rel_date] datetime null, [qc_rmks] varchar(80) null, [qc_serial] varchar(50) null, [qc_site] varchar(80) null, [qc_status] varchar(30) null, [qc_teststep] varchar(30) null, [qc_type] varchar(30) null, [qc_user1] varchar(80) null, [qc_user2] varchar(80) null, [qc__chr01] varchar(80) null, [qc__chr02] varchar(80) null, [qc__chr03] varchar(80) null, [qc__chr04] varchar(80) null, [qc__chr05] varchar(80) null, [qc__dec01] decimal(28,10) null, [qc__dec02] decimal(28,10) null, [qc__dte01] datetime null, [qc__dte02] datetime null, [qc__log01] bit null, [qc_autoissue] bit null, [qc_job] varchar(80) null, [qc_domain] varchar(8) null, [oid_qc_mstr] decimal(28,10) null, [Pro2SrcPDB] varchar(12) null, [pro2created] datetime null, [pro2modified] datetime null, ) GO CREATE INDEX [qc_mstr##qc_due_part] ON [DBO].[qc_mstr] ( [qc_domain] ASC, [qc_due_date] ASC, [qc_part] ASC, [qc_site] ASC ) GO CREATE INDEX [qc_mstr##qc_lot] ON [DBO].[qc_mstr] ( [qc_domain] ASC, [qc_lot] ASC ) GO CREATE INDEX [qc_mstr##qc_nbr] ON [DBO].[qc_mstr] ( [qc_domain] ASC, [qc_nbr] ASC, [qc_lot] ASC ) GO CREATE INDEX [qc_mstr##qc_part] ON [DBO].[qc_mstr] ( [qc_domain] ASC, [qc_part] ASC, [qc_site] ASC, [qc_due_date] ASC ) GO CREATE INDEX [qc_mstr##qc_type_nbr] ON [DBO].[qc_mstr] ( [qc_domain] ASC, [qc_type] ASC, [qc_nbr] ASC, [qc_lot] ASC ) GO
In the Progress dbanalys the size of this table is
PUB.qc_mstr 408863 57.6M 98 195 147 408863 1.0 1.0
I tried to calculate the size of the table in SQL based on this article: https://technet.microsoft.com/en-us/library/aa933068(v=sql.80).aspx
I get the same size using the formula provided in the article: ~57.4 M
As far as I understand now, all previously used scripts showed me a real physical space, or the wrong size (row lenght).
This table in MS SQL takes physically (sp_spaceused) 130568 KB. it is equal to 130568 KB / 8KB (page size) = 16321 pages. But according to the calculations it should take only 7301 pages. Other words physically used twice as much space. And it's even more confusing me [:S]
Keith, what method for calculating the size of the data in the SQL used you?
Regards,
Valeriy Bashkatov
P.S.
I just need to formulate a comprehensive explanation to the customer about why the physical size of the SQL-database (created by Pro2) more than Progress-database.
Keep in mind that the 57.6M reported by proutil is the total size of the rows and not how many blocks are required to store the data. The odds are pretty decent (99.9%) that more than 57.6M of space is being used to store that table in OpenEdge.
sp_spaceused will show the entire footprint for a table... including reserved pages for future use (think Type II Area cluster size).
If you posted the entire output from sp_spaceused I would have a better idea of where the space is being used.
[quote user="Keith Sudbury"]
If you posted the entire output from sp_spaceused I would have a better idea of where the space is being used.
EXEC sp_spaceused N'qc_mstr' GO
name rows reserved data index_size unused qc_mstr 408863 144464 KB 130568 KB 13856 KB 40 KB
Keep in mind that the 57.6M reported by proutil is the total size of the rows and not how many blocks are required to store the data. The odds are pretty decent (99.9%) that more than 57.6M of space is being used to store that table in OpenEdge.
sp_spaceused will show the entire footprint for a table... including reserved pages for future use (think Type II Area cluster size).
[/quote]
I think you're right.
The physical size of the area (Type II) with a single table is almost the same:
ls -lah atm_28.d1 -rw-r--r-- 1 root root 112M Мар 10 17:33 atm_28.d1
That all looks pretty reasonable to me... especially since you have 3 more columns in SQL than you do in OE.
If you wanted to try and reduce the physical storage you "might" be able to get some savings by setting up filegroups (storage areas) for the table and the indexes instead of just letting them exist in the same filegroup with other objects.
Unless this table is going to grow quite a bit or if you have much larger tables in mind it likely isn't worth it.
Thank you, Keith!
I think I have to go visit the bookstore, section on the MS SQL tuning books [^o)]