SQL User Defined Functions (UDFs)

Author:Murphy  |  View: 29395  |  Time: 2025-03-23 11:46:22
Photo by Rubaitul Azad on Unsplash

A SQL User Defined Function (UDF) is an important but often overlooked feature. Although there are many online resources that explain the syntax of SQL UDFs, most of them fall short in guiding users on how to effectively apply this tool in real-world scenarios. Therefore, I write this article to narrow the gaps by discussing when and how to use SQL UDFs, exploring the underlying logic behind their syntax, and providing practical use cases. Besides, this article will clarify about the various types of SQL UDFs and address the confusion between UDFs and Stored Procedure, which is another important technique in SQL. Given that the syntax for UDFs can vary across different database systems, I will focus on SQL Server UDFs for demonstration purposes though I typically prefer using MySQL in my work. The reasons for this choice will be revealed in later sections of this article.


What Is a UDF? And When Should You Use One?

A User Defined Function (UDF) in SQL is an object that developers create to perform an action and return the result. Besides UDFs, SQL provides Built-in Functions (or Native Functions). Built-in Functions, pre-defined by SQL, includes string functions, numeric function, data and time functions, aggregate functions, logical functions and so on. These built-in functions can perform common tasks efficiently. However, when developers have to conduct some complex calculation or perform custom data transformations that are not supported by built-in SQL functions, they have to create their own functions and store them in the database for future use. In addition to narrowing the gap left by built-in functions, UDFs bring the benefits as follows.

· Simplifying queries: UDFs are modular Programming via abstracting complex calculations, business logic and repetitive operations. This reduces the complexity of the main query and improves readability and maintainability of the code.

· Realizing Reusability: UDFs reduce code duplication by packaging the logic, calculations and expression into a single function, which can be reused across different queries, views or stored procedures.

· Enhancing Security: UDFs are treated as separate parts of the queries so that developers can enforce the limitation of direct access to certain data or logic by packaging them into UDFs. Sensitive data or operations can be protected by this way.

· Optimizing Performance: UDFs enable Sql queries to execute faster by being compiled and stored in the database. Besides, UDFs can prevent round-trips between the database and the application, thus optimizing the performance of programming.

· Improving Flexibility: UDFs allow developers to create dynamic and adaptable SQL syntax with parameters. And they can make the code more portable across various databases.


A Universal Structure of SQL UDFs

In SQL Server, there are three types of UDFs: Scaler UDFs, Aggregate UDFs and Table UDFs. Because MySQL doesn't natively support Table UDFs, I use SQL Server instead of MySQL, my favoured tool, for this article. Some online tutorials provide examples of Table UDFs in MySQL, which can be misleading. Although MySQL doesn't support Table UDFs, similar functionality can be achieved through Stored Procedures or Views.

· Scaler UDFs: These functions work on a single row and return a single value.

· Aggregate UDFs: These functions process multiple rows and return a single aggregated value. However, creating an aggregate UDF cannot be done in a pure SQL environment; it usually requires the use of .NET languages such as C# or VB.NET, which are then registered with SQL Server as assemblies. Because creating aggregate UDFs requires knowledge of another programming languages, I'll not use examples to deep dive this UDF type in this article.

· Table UDFs: These functions return a table which can be used like a regular table in queries.

How to write a SQL UDF correctly sometimes confuses beginners, so I've summarized a universal structure and developers can use this structure to create 90% of SQL UDFs without difficulty.

CREATE FUNCTION [schema_name.]function_name
(
    @parameter_name1 datatype [= default_value],
    @parameter_name2 datatype [= default_value],
    ...
)
RETURNS return_type
[WITH ]
AS
BEGIN
    -- Function body: the logic of the function
    RETURN [value or table_expression];
END;
GO

Explanations of Key Elements in a SQL UDF structure:

· [schema_name.]function_name: Defines the **** function name which will be called later on. The function name is compulsory but the schema is optional if the default schema is used.

· @parameter_name1 datatype [= default_value]: Specifies the name(s) and datatype(s) of the parameter(s) passed to the function. This part is optional.

· RETURNS return_type: Indicates the result that the function will return. It's compulsory. For Scalar UDFs, the return_type is the datatype of the returned value while for Table UDFs, the return_type is TABLE .

· [WITH ]: Optional attributes that specifies characteristics of the function.

· BEGIN…END: Encloses the function body where the logic or query is defined. It's compulsory for Scalar UDFs and Multistatement Table UDFs. For Inline Table Functions, which directly return the query result, BEGIN...END is not required.

· RETURN [value or table_expression]: Returns a single value from the function for Scalar UDFs, the query result for Inline Table UDFs, or the final table after executing the complex SQL statement for Multistatement Table UDFs. This is compulsory.

In the following sections, I'll use examples to explain how to create and call these various types of UDFs effectively.


An Example of Scalar UDFs

A scalar UDF is employed to generate a single value from the data in a single row. For example, if we'd like to create a function that multiplies two numbers, we can directly apply the universal structure mentioned earlier by replacing the following components and get the syntax.

[schema_name.]function_name -> dbo.func_multiply Parameters -> @num_1,@num_2(both FLOAT) Return Type ->FLOAT Function Body-> Multiplies two numbers and returns the result

-- Creating the Scalar UDF
CREATE FUNCTION dbo.func_multiply
(
    @num_1 FLOAT,
    @num_2 FLOAT
)
RETURNS FLOAT
AS
BEGIN
    RETURN @num_1 * @num_2;
END;
GO

-- Calling the UDF
SELECT dbo.func_multiply(10.5, 2);
Image by the author

Examples of Table UDFs

In my other article, "The Most Useful Advanced SQL Techniques to Succeed in the Tech Industry," I used the mock sales data from a promotion at Star Department Store to introduce some advanced SQL techniques.

The Most Useful Advanced SQL Techniques to Succeed in the Tech Industry

Today I'll elaborate on how to create and call a Table UDF with the same data set.

CREATE TABLE promo_sales(
  Sale_Person_ID VARCHAR(40) PRIMARY KEY,
  Department VARCHAR(40),
  Sales_Amount INT
);

INSERT INTO promo_sales VALUES ('001', 'Cosmetics', 500);
INSERT INTO promo_sales VALUES ('002', 'Cosmetics', 700);
INSERT INTO promo_sales VALUES ('003', 'Fashion', 1000);
INSERT INTO promo_sales VALUES ('004', 'Jewellery', 800);
INSERT INTO promo_sales VALUES ('005', 'Fashion', 850);
INSERT INTO promo_sales VALUES ('006', 'Kid', 500);
INSERT INTO promo_sales VALUES ('007', 'Cosmetics', 900);
INSERT INTO promo_sales VALUES ('008', 'Fashion', 600);
INSERT INTO promo_sales VALUES ('009', 'Fashion', 1200);
INSERT INTO promo_sales VALUES ('010', 'Jewellery', 900);
INSERT INTO promo_sales VALUES ('011', 'Kid', 700);
INSERT INTO promo_sales VALUES ('012', 'Fashion', 1500);
INSERT INTO promo_sales VALUES ('013', 'Cosmetics', 850);
INSERT INTO promo_sales VALUES ('014', 'Kid', 750);
INSERT INTO promo_sales VALUES ('015', 'Jewellery', 950);
promo_sales (Image by the author)

The first task is to obtain the summary of all departments, including the number of sales persons and the total sales in each department. Similar to Scalar UDFs, we can use the universal structure by replacing the following parts and get the syntax.

[schema_name.]function_name -> dbo.GetDepartmentSummary Parameters -> None Return Type ->TABLE Function Body-> Returns a table showing the department summary

-- Creating the Table UDF
CREATE FUNCTION dbo.GetDepartmentSummary()
RETURNS TABLE
AS
RETURN
(
    SELECT 
        Department,
        SUM(Sales_Amount) AS Total_Sales,
        COUNT(DISTINCT Sale_Person_ID) AS Number_of_Sales_Persons
    FROM promo_sales
    GROUP BY Department
);
GO

-- Calling the UDF
-- Getting the department summary
SELECT * FROM dbo.GetDepartmentSummary();
-- Getting the summary for cosmetics department
SELECT * FROM dbo.GetDepartmentSummary() WHERE Department = 'Cosmetics';
-- Getting the summary for departments whose total sales over 2000K USD
SELECT * FROM dbo.GetDepartmentSummary() WHERE Total_Sales > 2000;
-- Getting the summary for department with highest total sales amount
SELECT TOP 1 * FROM dbo.GetDepartmentSummary()
ORDER BY Total_Sales DESC;
-- Adding the Avg_Sales_Per_Person to the summary
SELECT 
    Department, 
    Total_Sales, 
    Number_of_Sales_Persons,
    Total_Sales / CAST(Number_of_Sales_Persons AS FLOAT) AS Avg_Sales_Per_Person
FROM dbo.GetDepartmentSummary();

The table UDF can be called to retrieve the department summary, filter by specific criteria, or perform some other actions.

Image by the author

The second task is also to obtain the department summary, but this time, the summary involves the average sales per person and a bonus for each department. Later, the company management decided to change the bonus rule – departments with sales over 2000K USD would have their bonus rate increased by 10%. Unlike the first task which was completed with an Inline Table-Valued UDF, this task will be conducted with a Multistatement Table-Valued UDF.

The key differences between Inline Table-Valued UDFs and Multistatement Table-Valued UDFs can be observed from the two main areas:

· Complexity of logic: Inline Table-Valued UDFs contain a single query and don't require BEGIN...ENDblock, which is ideal for simple logic while Multistatement Table-Valued UDFs employ BEGIN...END block to accommodate multiple SQL statements and more complex logic.

· Performance: SQL server's query optimizer treats Inline Table-Valued UDFs as part of the calling query, which generally leads to faster performance. On the other hand, Multistatement Table-Valued UDFs must build the entire table in memory before returning it, which can impact performance.

For the second task, the following elements can be filled into the universal structure to create the Multistatement Table-Valued UDF:

[schema_name.]function_name -> dbo.MultiStmt_GetDepartmentSummary Parameters -> None Return Type ->TABLE Function Body-> Returns the final table showing department summary

CREATE FUNCTION dbo.MultiStmt_GetDepartmentSummary()
RETURNS @DeptSummary TABLE 
(
    Department VARCHAR(40),
    Total_Sales INT,
    Number_of_Sales_Persons INT,
    Avg_Sales_Per_Person DECIMAL(10, 2),
    Bonus DECIMAL(10, 2) 
)
AS
BEGIN
    -- First Statement: Initialize the table variable with department sales summary
    INSERT INTO @DeptSummary (Department, Total_Sales, Number_of_Sales_Persons, Avg_Sales_Per_Person, Bonus)
    SELECT 
        Department,
        SUM(Sales_Amount) AS Total_Sales,
        COUNT(DISTINCT Sale_Person_ID) AS Number_of_Sales_Persons,
        AVG(Sales_Amount) AS Avg_Sales_Per_Person,
        SUM(Sales_Amount) * 0.2 AS Bonus
    FROM promo_sales
    GROUP BY Department;  

    -- Second Statement: Update rows in the table variable
    UPDATE @DeptSummary
    SET Bonus = Bonus * 1.1  
    WHERE Total_Sales > 2000;

    -- Return the final table
    RETURN;
END;
GO

-- Usage:
SELECT * FROM dbo.MultiStmt_GetDepartmentSummary();
Image by the author

Differences Between Table UDFs and Stored Procedures

It's common that people who are new to SQL UDFs often confuse Table UDFs with Stored Procedures because they often appear similar and can sometimes conduct the same functionalities. A Stored Procedure is a precompiled collection of SQL statement(s) which can be executed as a single unit and generate result sets.

The key differences between Table Functions and Stored Procedures are:

· Return Type: Table UDFs always return a table but Stored Procedures return result sets.

· Flexibility: Stored Procedures are more flexible than Table UDFs because they allow a wider range of SQL statements and operations.

· Parameters: Table UDFs only allow input parameters but Stored Procedures accept both input and output parameters.

· Usage: Table UDFs are often used by SQL queries while there are more scenarios for Stored Procedures – they can be called from within other procedures, scripts, or applications.

· Performance: Stored Procedures often have better performance than Table UDFs because they are precompiled and optimized for execution.

Despite the differences, Stored Procedures can perform similar tasks to Table UDFs in SQL Server. Below is the syntax for using a Stored Procedure to complete the second task as a Multistatement Table-Valued UDF in obtaining the department summary.

-- Creating the stored procedure to achieve the same functionality
CREATE PROCEDURE dbo.GetDepartmentSummary_Proc
AS
BEGIN
    -- Create a temporary table to store the department summary
    CREATE TABLE #DeptSummary 
    (
        Department VARCHAR(40),
        Total_Sales INT,
        Number_of_Sales_Persons INT,
        Avg_Sales_Per_Person DECIMAL(10, 2),
        Bonus DECIMAL(10, 2)
    );

    -- First Statement: Insert department summary into the temporary table
    INSERT INTO #DeptSummary (Department, Total_Sales, Number_of_Sales_Persons, Avg_Sales_Per_Person, Bonus)
    SELECT 
        Department,
        SUM(Sales_Amount) AS Total_Sales,
        COUNT(DISTINCT Sale_Person_ID) AS Number_of_Sales_Persons,
        AVG(Sales_Amount) AS Avg_Sales_Per_Person,
        SUM(Sales_Amount) * 0.2 AS Bonus
    FROM promo_sales
    GROUP BY Department;

    -- Second Statement: Update rows in the temporary table 
    SET Bonus = Bonus * 1.1  
    WHERE Total_Sales > 2000;

    -- Return the final table
    SELECT * FROM #DeptSummary;

    -- Clean up: Drop the temporary table
    DROP TABLE #DeptSummary;
END;
GO

-- Usage:
EXEC dbo.GetDepartmentSummary_Proc;

It produced the exactly same result as the Table UDF MultiStmt_GetDepartmentSummary did.


Conclusion

This article provides a detailed explanation of SQL UDFs. By focusing on the SQL Server environment, this tutorial not only introduces the universal structure for creating a UDF from scratch, but also discusses the similarities and differences between Scale UDFs, Inline Table-Valued Functions, and Multistatement Table-Valued Functions with use case and syntax. Additionally, this article addresses several common misconceptions and elaborate on the differences between Table UDFs and Stored Procedures. I hope this article helps you effectively apply UDFs in your SQL development. Comments and suggestions are welcome!

Tags: Data Science Programming Sql Tips And Tricks User Defined Functions

Comment