Friday, April 16, 2010

SQL Server: Varchar v/s NVarchar

VARCHAR and NVARCHAR are two datatypes that are used in SQL Server very frequently and interchangeably. We would see what is the difference if any between these two:



VARCHAR is an abbreviation for variable-length character string. It's a string of text characters that can be as large as the page size for the database table holding the column in question. The size for a table page is 8,196 bytes, and no one row in a table can be more than 8,060 characters. This limits the maximum size of a VARCHAR to 8,000 bytes.

The NVARCHAR stands for  uNicode VARCHAR. Basically, NVARCHAR is a VARCHAR that supports two-byte characters. The most common use for this sort of thing is to store character data that is a mixture of English and non-English symbols.

The key difference between the two is how they're stored. VARCHAR is stored as regular 8-bit data. But NVARCHAR strings are stored in the database as UTF-16 — 16 bits or two bytes per character — and converted to whatever codepage is being used by the database connection on output (typically UTF-8). NVARCHAR strings as well have the same length restrictions as VARCHAR — 8,000 bytes. However, since NVARCHARs use two bytes for each character, that means a given NVARCHAR can only hold 4,000 characters (not bytes) maximum. So, NVARCHAR entities require twice the storage of whatever you'd allocate for a plain old VARCHAR.

VARCHAR and NVARCHAR in SQL Server 2005

One fairly major change to both VARCHAR and NVARCHAR in SQL Server 2005 is the creation of the VARCHAR(MAX) and NVARCHAR(MAX) data types. If you create a VARCHAR(MAX) column, it can hold up to 2^31 bytes of data, or 2,147,483,648 characters; NVARCHAR(MAX) can hold 2^30 bytes, or 1,073,741,823 characters.

These new data types are actually replacements for the Large Object or LOB data types such as TEXT and NTEXT, which have a lot of restrictions. They can't be passed as variables in a stored procedure, for instance. The (MAX) types don't have those restrictions; they just work like very large string types. Consequently, if you're in the process of re-engineering an existing data design for SQL Server 2005, it might make sense to migrate some (although not all!)TEXT / NTEXT fields to VARCHAR(MAX) / NVARCHAR(MAX) types when appropriate.

The big difference between VARCHAR and NVARCHAR is a matter of need. If you need Unicode support for a given data type, either now or soon enough, go with NVARCHAR. If you're sticking with 8-bit data for design or storage reasons, go with VARCHAR. Note that you can always migrate from VARCHAR to NVARCHAR at the cost of some room -- but you can't go the other way 'round. Also, because NVARCHAR involves fetching that much more data, it may prove to be slower depending on how many table pages must be retrieved for any given operation.

No comments: