自定义Interceptor(Flume):在Flume采集日志时,通过自定义Interceptor过滤敏感字段。例如,使用以下Java代码实现手机号脱敏: public class MaskPhoneInterceptor implements Interceptor { private static final Pattern PHONE_PATTERN = Pattern.compile("1[3-9]\\d{9}"); @Override public Event intercept(Event event) { String body = new String(event.getBody(), StandardCharsets.UTF_8); Matcher matcher = PHONE_PATTERN.matcher(body); String maskedBody = matcher.replaceAll(match -> match.group().substring(0, 3) + "****" + match.group().substring(7)); event.setBody(maskedBody.getBytes(StandardCharsets.UTF_8)); return event; } }
UDF(Hive):在Hive中处理日志时,使用自定义UDF(用户定义函数)替换敏感字段。例如,创建MaskIDCardUDF类,实现身份证号的掩码处理: public class MaskIDCardUDF extends UDF { public Text evaluate(Text idCardText) { if (idCardText == null || idCardText.toString().isEmpty()) return Text.valueOf(""); String idCard = idCardText.toString(); if (idCard.length() == 18) { return Text.valueOf(idCard.substring(0, 6) + "********" + idCard.substring(14)); } else { return idCardText; } } }
实时处理(Spark Streaming):在Spark Streaming处理日志流时,使用map函数过滤敏感字段。例如: val cleanedLogs = logStream.map(log => { val fields = log.split(",") fields.map { case field if field.startsWith("password=") => "password=***" case field if field.startsWith("credit_card=") => "credit_card=****" case other => other }.mkString(",") })