NeuroAgent

Correct SQL Query for Active Loans

Verification and optimization of SQL query for retrieving clients with active loans. Analysis of correctness, performance, and alternative approaches.

Question

Is the SQL query correctly written to display clients with active (unpaid) loans?

Task

It is necessary to output a list of clients with active (unpaid) loans using three tables:

  • t1 (credit contracts)
  • t2 (deposit contracts)
  • t3 (clients)

Table Structure (MySQL 5.7)

Table t1 (credit contracts)

sql
CREATE TABLE t1 (
    CREDIT_CONTRACT_ID INT,
    START_DATE DATE,
    END_DATE DATE,
    CUSTOMER_ID INT,
    CREDIT_AMOUNT INT
);

Table t3 (clients)

sql
CREATE TABLE t3 (
    CUSTOMER_ID INT,
    NAME VARCHAR(100),
    BIRTH_DATE DATE,
    CITY VARCHAR(100)
);

Sample Data

Data for table t1

sql
INSERT INTO t1 VALUES 
(1, '2015-10-15', '2015-12-30', 1, 1000),
(2, '2015-12-14', NULL, 2, 2000),
(3, '2015-10-20', NULL, 3, 15000),
(7, '2015-12-28', NULL, 4, 80000),
(8, '2015-10-07', NULL, 5, 3000),
(9, '2015-06-30', NULL, 6, 29000),
(10, '2015-04-09', NULL, 7, 8000),
(11, '2015-01-20', NULL, 8, 68000),
(12, '2015-10-27', NULL, 9, 7000);

Data for table t3

sql
INSERT INTO t3 VALUES 
(1, 'Серик', '1975-09-20', 'Алматы'),
(2, 'Динара', '1977-07-22', 'Алматы'),
(3, 'Марат', '1969-01-02', 'Астана'),
(4, 'Олег', '1967-09-27', 'Астана'),
(5, 'Марина', '1969-10-12', 'Алматы'),
(6, 'Дильшат', '1979-10-08', 'Караганда'),
(7, 'Ануар', '1982-04-17', 'Кызылорда'),
(8, 'Михаил', '1980-12-11', 'Атырау'),
(9, 'Екатерина', '1987-02-12', 'Актобе');

Proposed SQL Query

sql
SELECT 
    t3.NAME,
    t3.BIRTH_DATE,
    t3.CITY,
    t1.CREDIT_CONTRACT_ID,
    t1.START_DATE,
    t1.END_DATE,
    t1.CREDIT_AMOUNT
FROM t1
JOIN t3 ON t1.CUSTOMER_ID = t3.CUSTOMER_ID
WHERE t1.END_DATE IS NULL;

Question

Is the SQL query correctly written to display clients with active (unpaid) loans?

In theory, I’m looking for loans where END_DATE is not specified (meaning the loan hasn’t been paid off yet). The query result seems logical, but I want to confirm if this is a correct and optimal approach. Are there any nuances or alternative methods I should consider?

NeuroAgent

The SQL query is written correctly and effectively for retrieving clients with active loans. It properly uses the END_DATE IS NULL condition to identify active loans and correctly joins the loan contract tables with clients through JOIN.

Contents

Analysis of the current SQL query

Your query is functionally correct and solves the stated task:

sql
SELECT 
    t3.NAME,
    t3.BIRTH_DATE,
    t3.CITY,
    t1.CREDIT_CONTRACT_ID,
    t1.START_DATE,
    t1.END_DATE,
    t1.CREDIT_AMOUNT
FROM t1
JOIN t3 ON t1.CUSTOMER_ID = t3.CUSTOMER_ID
WHERE t1.END_DATE IS NULL;

Advantages of the current approach:

  • Uses INNER JOIN to correctly link loan contracts with clients
  • Applies the WHERE t1.END_DATE IS NULL condition to filter active loans
  • Returns all necessary fields for analyzing active loans

Important: The END_DATE IS NULL condition is indeed the standard way to identify active loans in most databases, as a NULL value in the end date field indicates an incomplete contract.

Logic correctness check

To fully verify the correctness of the query, let’s analyze it step by step:

  1. JOIN operation: JOIN t3 ON t1.CUSTOMER_ID = t3.CUSTOMER_ID correctly links each active loan with its corresponding client

  2. Filtering active loans: WHERE t1.END_DATE IS NULL selects only those loans that have not yet been repaid

  3. Data selection: The query returns all necessary information for analyzing active loans

Query results on the provided data:
The query will return clients with IDs 2, 3, 4, 5, 6, 7, 8, 9 - all those who have loans with END_DATE = NULL.

Potential issues:

  • The query does not consider the t2 table (deposit contracts), but in this context, this is not an issue since we are only interested in loans
  • There is no check for possible duplicates if a client has multiple active loans

Performance optimization

Although the current query is functionally correct, it can be optimized for better performance:

1. JOIN type selection

sql
-- INNER JOIN (current option)
-- Returns only clients with active loans, which makes sense for this task

-- LEFT JOIN (if all clients are needed, even those without loans)
-- But this is not required in this case

2. Indexes for improved performance

sql
-- It is recommended to create indexes on fields used in JOIN and WHERE
CREATE INDEX idx_t1_customer_id ON t1(CUSTOMER_ID);
CREATE INDEX idx_t1_end_date ON t1(END_DATE);

3. Field selection optimization

sql
-- If only main fields are needed, you can limit the selection
SELECT 
    t3.NAME,
    t1.CREDIT_CONTRACT_ID,
    t1.CREDIT_AMOUNT,
    t1.START_DATE
FROM t1
JOIN t3 ON t1.CUSTOMER_ID = t3.CUSTOMER_ID
WHERE t1.END_DATE IS NULL;

Alternative approaches

1. Using COALESCE to explicitly indicate active loans

sql
SELECT 
    t3.NAME,
    t1.CREDIT_CONTRACT_ID,
    t1.CREDIT_AMOUNT
FROM t1
JOIN t3 ON t1.CUSTOMER_ID = t3.CUSTOMER_ID
WHERE COALESCE(t1.END_DATE, CURRENT_DATE) > CURRENT_DATE;

This approach can be useful if the logic for determining active loans changes in the future.

2. Using NOT EXISTS for complex conditions

sql
SELECT 
    t3.NAME,
    t1.CREDIT_CONTRACT_ID,
    t1.CREDIT_AMOUNT
FROM t3
JOIN t1 ON t3.CUSTOMER_ID = t1.CUSTOMER_ID
WHERE NOT EXISTS (
    SELECT 1 FROM t1 t2 
    WHERE t2.CUSTOMER_ID = t3.CUSTOMER_ID 
    AND t2.END_DATE IS NOT NULL
);

3. Aggregation for clients with multiple loans

sql
SELECT 
    t3.NAME,
    COUNT(t1.CREDIT_CONTRACT_ID) AS active_credits_count,
    SUM(t1.CREDIT_AMOUNT) AS total_credit_amount
FROM t1
JOIN t3 ON t1.CUSTOMER_ID = t3.CUSTOMER_ID
WHERE t1.END_DATE IS NULL
GROUP BY t3.NAME, t3.CUSTOMER_ID;

Additional recommendations

1. Data integrity check

Before using the query, it is recommended to check data integrity:

sql
-- Check for NULL values in CUSTOMER_ID
SELECT COUNT(*) FROM t1 WHERE CUSTOMER_ID IS NULL;

-- Check for uniqueness of CUSTOMER_ID in the client table
SELECT COUNT(*) FROM t3;
SELECT COUNT(DISTINCT CUSTOMER_ID) FROM t3;

2. Adding sorting for easier reading

sql
SELECT 
    t3.NAME,
    t1.CREDIT_CONTRACT_ID,
    t1.CREDIT_AMOUNT,
    t1.START_DATE
FROM t1
JOIN t3 ON t1.CUSTOMER_ID = t3.CUSTOMER_ID
WHERE t1.END_DATE IS NULL
ORDER BY t3.NAME, t1.START_DATE DESC;

3. Using aliases for better readability

sql
SELECT 
    c.NAME AS customer_name,
    cr.CREDIT_CONTRACT_ID,
    cr.CREDIT_AMOUNT,
    cr.START_DATE
FROM t1 cr
JOIN t3 c ON cr.CUSTOMER_ID = c.CUSTOMER_ID
WHERE cr.END_DATE IS NULL
ORDER BY c.NAME;

4. Security and edge case handling

sql
-- Adding validation for correct dates
SELECT 
    c.NAME,
    cr.CREDIT_CONTRACT_ID,
    cr.CREDIT_AMOUNT
FROM t1 cr
JOIN t3 c ON cr.CUSTOMER_ID = c.CUSTOMER_ID
WHERE cr.END_DATE IS NULL
  AND cr.START_DATE <= CURRENT_DATE
  AND (cr.CREDIT_AMOUNT > 0 OR cr.CREDIT_AMOUNT IS NULL);

Recommended final query:

sql
SELECT 
    c.NAME AS customer_name,
    c.CITY,
    cr.CREDIT_CONTRACT_ID,
    cr.START_DATE,
    cr.CREDIT_AMOUNT
FROM t1 cr
JOIN t3 c ON cr.CUSTOMER_ID = c.CUSTOMER_ID
WHERE cr.END_DATE IS NULL
  AND cr.START_DATE <= CURRENT_DATE
  AND cr.CREDIT_AMOUNT > 0
ORDER BY c.NAME, cr.START_DATE DESC;

Conclusion

  1. The main query is completely correct for identifying clients with active loans by checking END_DATE IS NULL

  2. It is recommended to add indexes on the CUSTOMER_ID and END_DATE fields to improve performance with large data volumes

  3. For better code readability, meaningful table aliases should be used instead of t1, t2, t3

  4. Data integrity checks and edge case handling should be added in production environments

  5. For analytical tasks, aggregate functions can be used to get summary information for clients with multiple loans

Your initial approach is an excellent starting point, and the suggested optimizations will help make the query more reliable and performant.

Sources

  1. Official MySQL Documentation - JOIN Syntax
  2. W3Schools SQL NULL Values
  3. Stack Overflow - Checking for NULL values in SQL
  4. SQLShack - SQL Query Performance Optimization