sql针对某一字段去重，并且保留其他字段

今天客户提了一个小需求，希望我能提供一条sql语句，帮助他对数据中 _field 这个字段的值去重，并且保留其他字段的数据。第一反应是select distinct，但这种语句在对某个字段去重时，无法保留其他字段，所以select distinct不成立。因为用户对去重没有要求，字段值重复时保留任意一行就行，所以我想到当字段值重复时，选出对应主键最大的那条数据作为保留数据，这样可以实现用户的去重需求。但是用户的表中又没有主键，没办法，我们只好先使用窗口函数创建主键了。

因为平时喜欢用hive on spark写sql，所以sql语句使用中间表的形式来写，_field为去重字段，other_fields为原表table中_field外的其他字段

1.创建主键（存在主键则无需创建，窗口函数需要遍历所有行数据，数据量大时会很慢）

TEMP table1 = select row_number() over (order by _field) as id, _field, other_fields from table

2.选出每个_field对应的最大主键

TEMP table2 = select max(id) as max_id from table1 group by _field

3.找出选中的主键对应的原表数据

TEMP table3 = select _field, other_fields from table2 left join table on table2.max_id = table1.id

OUTPUT table3

中间表写法看起来可能有些乱，对于mysql这种支持嵌套查询的数据库来说，写起来更好理解

id为主键，_field为去重字段，other_fields为原表table中_field外的其他字段

select * from table where id in (select max(id) from table group by _field);

秒客网

sql针对某一字段去重，并且保留其他字段

相关文章