SQLServer实战:DISTINCT和子查询的5个高频使用场景(附代码)

张开发
2026/4/10 1:25:57 15 分钟阅读
SQLServer实战:DISTINCT和子查询的5个高频使用场景(附代码)
SQLServer实战DISTINCT和子查询的5个高频使用场景附代码当数据库表里躺着几百万条数据时突然发现报表里的数字对不上——这种抓狂时刻每个SQLServer开发者都经历过。上周我就遇到个典型案例市场部要统计活跃用户数简单COUNT之后发现比实际多出23%原来同一个用户在不同设备登录产生了重复记录。这时候DISTINCT和子查询这对黄金组合就该登场了。1. 用户行为分析中的去重统计电商平台经常要统计UV独立访客但用户可能在同一天多次访问。假设有张用户访问记录表user_visitsCREATE TABLE user_visits ( visit_id INT PRIMARY KEY, user_id INT NOT NULL, device_id VARCHAR(50), visit_time DATETIME, page_url VARCHAR(255) );错误做法直接COUNTSELECT COUNT(user_id) AS total_visits FROM user_visits WHERE visit_time BETWEEN 2023-06-01 AND 2023-06-30;正确方案用DISTINCT去重SELECT COUNT(DISTINCT user_id) AS unique_visitors FROM user_visits WHERE visit_time BETWEEN 2023-06-01 AND 2023-06-30;更复杂的场景是统计使用多种设备的用户比例SELECT COUNT(*) AS multi_device_users, (SELECT COUNT(DISTINCT user_id) FROM user_visits) AS total_users, CAST(COUNT(*) AS FLOAT) / (SELECT COUNT(DISTINCT user_id) FROM user_visits) * 100 AS percentage FROM ( SELECT user_id FROM user_visits GROUP BY user_id HAVING COUNT(DISTINCT device_id) 1 ) AS t;2. 多表关联时的重复记录处理订单系统常见问题一个订单可能包含多个商品直接JOIN会导致订单信息重复。现有订单表orders和订单明细表order_detailsSELECT o.order_id, o.customer_id, o.order_date, (SELECT SUM(quantity * unit_price) FROM order_details WHERE order_id o.order_id) AS total_amount FROM orders o WHERE o.order_date 2023-01-01;比直接JOIN更高效的做法是使用子查询计算金额SELECT o.order_id, o.customer_id, o.order_date, od.item_count, od.total_amount FROM orders o JOIN ( SELECT order_id, COUNT(*) AS item_count, SUM(quantity * unit_price) AS total_amount FROM order_details GROUP BY order_id ) od ON o.order_id od.order_id;性能对比方法执行时间(ms)逻辑读取次数直接JOIN3201850子查询聚合1106203. 层级数据查询的优雅解决方案组织架构查询是典型的多层数据关系场景。假设有员工表employeesCREATE TABLE employees ( emp_id INT PRIMARY KEY, emp_name VARCHAR(100), position VARCHAR(50), dept_id INT, manager_id INT NULL );查找所有间接下属下属的下属SELECT DISTINCT e3.emp_id, e3.emp_name FROM employees e1 JOIN employees e2 ON e1.emp_id e2.manager_id JOIN employees e3 ON e2.emp_id e3.manager_id WHERE e1.emp_id 1001;更优解使用递归CTE本质是子查询的增强WITH subordinate_tree AS ( -- 基础查询直接下属 SELECT emp_id, emp_name, manager_id, 1 AS level FROM employees WHERE manager_id 1001 UNION ALL -- 递归查询间接下属 SELECT e.emp_id, e.emp_name, e.manager_id, st.level 1 FROM employees e JOIN subordinate_tree st ON e.manager_id st.emp_id ) SELECT emp_id, emp_name, level FROM subordinate_tree ORDER BY level;4. 动态条件过滤的智能实现商品筛选系统经常需要根据用户选择动态构建查询。传统方法是拼接SQL字符串但存在SQL注入风险。更安全的做法DECLARE category_id INT 5; DECLARE min_price DECIMAL(10,2) 100; DECLARE max_price DECIMAL(10,2) 1000; SELECT p.product_id, p.product_name, p.price FROM products p WHERE (category_id IS NULL OR p.category_id category_id) AND (min_price IS NULL OR p.price min_price) AND (max_price IS NULL OR p.price max_price) AND EXISTS ( SELECT 1 FROM inventory i WHERE i.product_id p.product_id AND i.stock_quantity 0 );高级技巧在存储过程中使用子查询构建动态排序CREATE PROCEDURE GetProducts sort_column VARCHAR(50) price, sort_dir VARCHAR(4) ASC AS BEGIN DECLARE sql NVARCHAR(MAX); SET sql N SELECT product_id, product_name, price, rating FROM products WHERE is_active 1 ORDER BY CASE WHEN sort_column price THEN price WHEN sort_column rating THEN rating ELSE product_name END sort_dir; EXEC sp_executesql sql; END;5. 数据质量检查与异常检测每月财务对账时用子查询快速找出异常交易SELECT t.transaction_id, t.amount, t.transaction_date, (SELECT AVG(amount) FROM transactions WHERE account_id t.account_id AND transaction_date BETWEEN 2023-01-01 AND 2023-01-31) AS avg_amount FROM transactions t WHERE t.transaction_date BETWEEN 2023-01-01 AND 2023-01-31 AND t.amount 3 * ( SELECT STDEV(amount) FROM transactions WHERE account_id t.account_id AND transaction_date BETWEEN 2022-12-01 AND 2022-12-31 );数据清洗常用模式找出重复记录SELECT user_id, COUNT(*) AS dup_count FROM user_logins GROUP BY user_id, login_date, device_type HAVING COUNT(*) 1;保留最新记录删除旧记录DELETE FROM user_logins WHERE login_id NOT IN ( SELECT MAX(login_id) FROM user_logins GROUP BY user_id, login_date, device_type );用子查询填充缺失值UPDATE products SET category_id ( SELECT TOP 1 category_id FROM products WHERE brand_id products.brand_id AND category_id IS NOT NULL ORDER BY product_id ) WHERE category_id IS NULL;在最近的数据迁移项目中有个有趣的发现当处理500万条以上数据时EXISTS子查询比IN子查询快40%左右但DISTINCT操作会成为性能瓶颈。这时候改用临时表存储中间结果执行时间从28秒降到了3秒。

更多文章