Thursday, 12 February 2015

Get to know your database

- or how to make your database applications shine.

Once in a while I stumble across fellow developers or applications, that could shine by doing things a bit differently and adding extra skills to their tool set - a change of focus might be the only thing that is needed.

One example is the art of refactoring, another is some basic database knowledge.

I have several times meet people and companies who develop business application that depend heavily on database access - that uses their chosen RDBMS just as a bucket.

Let me give some examples of "performance smells" - you might easily expand the list, and please do in the comments.

Selecting data from the database, and then at client/application level traverse through the data, to filter out duplicates - knowing your SQL - a SELECT DISTINCT would have been helpful. The same goes for sorting, grouping, aggregating and a number of other stuff.

Fetching all data to the client, and then doing validation that should be at the database level, like primary and foreign key constraints.

Inserting and updating on a record by record basis or bulk upserts/merges - can make a big difference.

Not moving heavy data manipulation into stored procedures and functions on the database level, could also mean a lack of performance - and one of the bad excuses I have heard is "portability" - well let me tell you that "performance" is key - "portability" is not a real issue - the various data access controls - like FireDAC helps you on the application level - and if you are not utilizing the power of your RDBMS, then you could just use plain .csv or .xml files in a folder structure.

No validation of execution and query plans - incorrect or missing indexing - insufficient knowledge or awareness of strong and weak points in your RDBMS of choice - is a big issue.

A couple of examples to break the flow of text might be in place. 2 tables - A and B with approx 3000 and 4000 records, both with index on FIELD1 - 2 queries that will return the same result of 1530 records.

1. example
select distinct A.FIELD1 from A
where A.FIELD1 not in (select B.FIELD1 from B)

select distinct A.FIELD1 from A
where not exists (select 1 from B where A.FIELD1=B.FIELD1);

On a FireBird 2.1 RDBMS the first query does almost 9.000.000 non-indexed reads on table B - and takes more than 4 seconds. The second query does approx 1800 indexed reads on table B - and took 31 msec. On a different vendor you might have different mileage.

2. example
If you have been avoiding database CURSORS on MS SQL Server because they do not perform very well, you might be in for a pleasant surprise because they fly on InterBase/FireBird.

The new Change View™ feature in InterBase XE7 also looks promising - like its event alerts and multi-dimensional arrays were 20 years ago.

Look at the database metadata of your ERP, CRM or any other "shrink wrap" business application your business is depending on - are there only tables and views? Then you can hope that their next version is going to perform better - since there is room for improvement :-)

Rule of thumb: Move the logic as close to the data as possible - and leave the client as thin as possible.

All things said - it is all down to looking at the setup - and ensure that sufficient flexibility in the application is in place - to change when the infrastructure changes.

Within applications with heavy user interaction, some amount of upfront validation is needed, but that doesn't mean that it then involves heavy database interaction - going back and forth - to let the database validate. Range and type validation can easily be done at the control/UI level.

Another thing is that you should always populate your tables with massive amounts of data - more than you expect is ever going to be in there - and then test and profile on that. Response times under 5 msec. seems fine with 10 records in a test database - but if putting a million records in, increases the response time accordingly - then you have an issue. Some Database managers have test data generators.

I have also done a fair amount of additional performance tuning, when I develop and debug on data that mimics real life metrics. Slow response times and unneeded data request makes me grumpy - look the data up only once, unless you have a really good reason to do otherwise.

Monitor your SQL and network traffic, and analyse your code - you might not want your form to re-fetch data from the database, just because it regained lost focus. That of cause also goes if you did decided to put an application server/web service in between.

Getting database views created that fits the need might also be a good idea, but be specific on the need if you're not allowed to do them yourself - and is dependent on a DBA - you do not want to end up with doing a distinct select to get data to populate a look up - that takes 2 minutes to execute - be specific, even if you are being the DBA yourself - and I think every developer should have a little DBA in their head, and vice versa.

I remember a Delphi application running against an InterBase RDBMS, that was very responsive - instant reply on a fairly large archive of songs and album data - that was done mostly with stored procedures in the database. So creating SUID (Select, Update, Insert and Delete) procedures might give an performance boost. Some Database managers lets you create them easily by a click of a button.

Also do not fetch more than you need, and prepare your views. Do minimize the number of connections since they do cost, also minimize your queries - it is faster to get data from a stored procedure that selects from an temporary table in memory - than calling 200 single selects - learn and use your JOINS and UNIONS.

If your bandwidth is not an issue and you need/want to manipulate data in memory - which also makes a lot of sense since you get memory at low cost or you might have data from heterogeneous data sources (very easy using FireDAC LocalSQL feature) - you can even gain more performance - but let the RDBMS do the hard part.

It is still #DelphiWeek 2015 - and I should stop ranting and raving for today - put something lighter up tomorrow and maybe prepare something with more code for the weekend - to end my self-inflicted blog marathon :-D

Enjoy :-)

1 comment:

  1. (Yeah, this is only a million years old or so... :-) I think I'd do example 1 as:

    select distinct A.FIELD1
    from A left join B
    on A.FIELD1 = B.FIELD1
    where B.FIELD1 is NULL ;