- Get All Questions & Answer for CDP Generalist Exam (CDP-0011) and trainings.
- Get All Questions & Answer for CDP Administrator - Private Cloud Base Exam CDP-2001 and trainings.
- Get All Questions & Answer for CDP Data Developer Exam CDP-3001 and trainings.
This Question is from QuickTechie Cloudera CDP Certification Preparation Kit.
We have setup a falcon process that reads data from a HDFS location and saves the o/p thru pig process into another HDFS location. The Feeds and Processes are running in the cluster but I cannot see any output generated.
My XML for process is as below :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="demo1Process" xmlns="uri:falcon:process:0.1"> <tags>processName=demo1Process</tags> <clusters> <cluster name="Atlas-Demo1"> <validity start="2016-01-28T20:51Z" end="2017-02-02T20:51Z"/> </cluster> </clusters> <parallel>2</parallel> <order>FIFO</order> <frequency>minutes(5)</frequency> <timezone>GMT+05:50</timezone> <inputs> <input name="inputfeed" feed="demo1Feed" start="yesterday(0,0)" end="today(-1,0)"/> </inputs> <outputs> <output name="outoutfeed" feed="demo1OutputFeed" instance="yesterday(0,0)"/> </outputs> <workflow name="select_airlines_data" version="pig-0.12.0" engine="pig" path="/falcon/demo1/code/demo1.pig"/> <retry policy="exp-backoff" delay="minutes(3)" attempts="2"/> <ACL owner="falcon" group="falcon" permission="0755"/> </process> |
My XML for Input Feed is as below
<feed xmlns='uri:falcon:feed:0.1' name='demo1InputFeed' description='demo1 input feed'> <tags>feed_name=demo1InputFeed</tags> <groups>input</groups> <frequency>minutes(1)</frequency> <timezone>GMT+05:50</timezone> <late-arrival cut-off='minutes(3)'/> <clusters> <cluster name='demo1cluster' type='source'> <validity start='2016-01-28T07:49Z' end='2017-02-01T07:49Z'/> <retention limit='days(2)' action='delete'/> <locations> <location type='data'> </location> <location type='stats'> </location> <location type='meta'> </location> </locations> </cluster> </clusters> <locations> <location type='data' path='/falcon/demo1/data/${YEAR}-${MONTH}'> </location> <location type='stats' path='/falcon/demo1/status'> </location> <location type='meta' path='/falcon/demo1/meta'> </location> </locations> <ACL owner='falcon' group='falcon' permission='0755'/> <schema location='none' provider='none'/> <properties> <property name='jobPriority' value='HIGH'> </property> </properties> </feed> |
My Input folder is (in HDFS)
/falcon/demo1/data/2016-01
- Get All Questions & Answer for CDP Generalist Exam (CDP-0011) and trainings.
- Get All Questions & Answer for CDP Administrator - Private Cloud Base Exam CDP-2001 and trainings.
- Get All Questions & Answer for CDP Data Developer Exam CDP-3001 and trainings.
This Question is from QuickTechie Cloudera CDP Certification Preparation Kit.
There are couple of issues in your entity xml's.
1> The granularity of date pattern in the location path should be at least that of a frequency of a feed.
2> yesterday(hours,minutes): As the name suggest EL yesterday picks up feed instances with respect to start of day yesterday. Hours and minutes are added to the 00 hours starting yesterday, Example: yesterday(24,30) will actually correspond to 00:30 am of today, for 2010-01-02T01:30Z this would mean 2010-01-02:00:30 feed.
Input location path in the feed xml is /falcon/demo1/data/${YEAR}-${MONTH} but frequency is in minutes. Also if you want to process data of the month please use lastMonth or currentMonth EL expression.
Please refer EL expression doc for more details. Refer this doc for entity specification details. Thanks!
- Get All Questions & Answer for CDP Generalist Exam (CDP-0011) and trainings.
- Get All Questions & Answer for CDP Administrator - Private Cloud Base Exam CDP-2001 and trainings.
- Get All Questions & Answer for CDP Data Developer Exam CDP-3001 and trainings.
This Question is from QuickTechie Cloudera CDP Certification Preparation Kit.
When a process is executed you will have one job that is the launcher it will contain the parameters for the pig script and any error that is returned by the pig command.
You will have a second job that is the actual pig execution.
You should find the problems in one or the other.
If these jobs don't exist you can also go to the oozie ui and see why these actions are not spawned off.
- Get All Questions & Answer for CDP Generalist Exam (CDP-0011) and trainings.
- Get All Questions & Answer for CDP Administrator - Private Cloud Base Exam CDP-2001 and trainings.
- Get All Questions & Answer for CDP Data Developer Exam CDP-3001 and trainings.
This Question is from QuickTechie Cloudera CDP Certification Preparation Kit.
Thanks for the quick reply.
I just tested the pig replacing $input and $output with actual HDFS path and pig job is running fine.
Also my feed has input as path as /falcon/demo1/data/${YEAR}-${MONTH} where as my actual HDFS path is /falcon/demo1/data/2016-01. Can this be a probable mismatch.
? The path looks good. ${YEAR} is replaced with the current year and so on.. However what do you see when you look into ResourceManager as described above.
I would cross-check the following:
- process validity start/end dates
- input start/end dates
- feed validity start/end dates
- input path pattern
- timezone
If you want data to be picked up for a particular process instance, they feed must be valid (read this as the feed is expected to be populated) during that time, and the data must be in a directory that matches the expected pattern. Look at your Oozie coordinator actions for details on what HDFS paths are being waited for.