This Question is from QuickTechie Cloudera CDP Certification Preparation Kit.
Question-10: Which of the following data or resources which can be configured for redacting sensitive data?
- Log & Query Redaction
- YARN MapReduce Job Properties
- Spark Event Logs
- Spark Executor logs
Answer: 1,2,3,4
Exp: Removing Sensitive information from Metrics
- It is possible that applications you are running using Hive, Impala, MapReduce, Spark or Oozie have some sensitive information in Diagnostic Data.
- So, it is necessary to configure the redaction.
- It is recommended that even if you are not sending metrics or diagnostic data to Telemetry Publisher.
- Job configurations of logs can have sensitive information and that needs to redacted.
- Following are the list of data and resources which can be configured to redacting sensitive data before sending it to Telemetry publisher.
- Log & query redaction: You have to create regular expression for filtering out the data. This needs to be done on the query and logs which are collected by Telemetry Publisher.
- YARN MapReduce Job properties: As you know, Telemetry publisher pull job configuration data from the HDFS. Hence, before storing job configuration information in HDFS, you have to redact sensitive information.
- Spark Event logs & Spark executor logs: Again, this can be filtered using regular expression for Spark2 jobs only. This can filter both event and executor logs.
- Be default this is enabled. However, you can override by using safety valves in Cloudera Manager or in the Spark application itself.
This Question is from QuickTechie Cloudera CDP Certification Preparation Kit.