Suo Lu

Thinking in Data

| | email

Running Quartz Clustering Job on AWS

We have a application running with GAE and plan to move to AWS Beanstalk. GAE has Cron feature for Scheduled Tasks which Beanstalk don't.
Quartz is the "Cron" in the java world, and support clustering which is required for us. Looks it is a good replacement, so we decide to have a try.

Architecture

Quartz Clustering relies on Database:


(Pic from Official Doc)

From the architecture you can see that firstly you should have a high availability DB solution. Since AWS has RDS, this is not an issue for us.

Configuration

Put a quartz.properties into your app classpath, make sure to copy it and distribute to other instances.
Note that every node should have the same org.quartz.scheduler.instanceName. More detail about file content please refer to Official Doc.

Quartz provides script for most kinds of DB, you can find them under downloaded_distribution_file/docs/dbTables

Java Code

Let's create a Logging Job:

import org.quartz.Job;
import org.quartz.JobExecutionContext;
import org.quartz.JobExecutionException;
import org.quartz.SchedulerException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class LoggingJob implements Job {

	private static Logger _log = LoggerFactory.getLogger(LoggingJob.class);

	@Override
	public void execute(JobExecutionContext paramJobExecutionContext) throws JobExecutionException {
		_log.info(paramJobExecutionContext.toString());
		try {
			_log.info(paramJobExecutionContext.getScheduler().getSchedulerName() + " execute!");
		} catch (SchedulerException e) {
			e.printStackTrace();
		}
	}
}

The scheduler runner looks like:

import org.quartz.JobBuilder;
import org.quartz.JobDetail;
import org.quartz.Scheduler;
import org.quartz.SchedulerException;
import org.quartz.SimpleScheduleBuilder;
import org.quartz.Trigger;
import org.quartz.TriggerBuilder;
import org.quartz.impl.StdSchedulerFactory;

public class QuartzTest {

	public static void main(String[] args) {

		Scheduler scheduler = null;
		try {
			scheduler = StdSchedulerFactory.getDefaultScheduler();
			JobDetail job = JobBuilder.newJob(LoggingJob.class).requestRecovery().withIdentity("job1", "group1").build();
			Trigger trigger = TriggerBuilder.newTrigger().withIdentity("trigger1", "group1").startNow()
					.withSchedule(SimpleScheduleBuilder.simpleSchedule().withIntervalInSeconds(2).repeatForever()).build();

			if (scheduler.getJobDetail(job.getKey()) == null) {
				scheduler.scheduleJob(job, trigger);
			}
			scheduler.start();

			try {
				Thread.sleep(100000);
			} catch (InterruptedException e) {
				e.printStackTrace();
			}

			scheduler.clear();
			scheduler.shutdown();
		} catch (SchedulerException se) {
			se.printStackTrace();
			if (scheduler != null) {
				try {
					scheduler.clear();
					scheduler.shutdown();
				} catch (SchedulerException e) {
					e.printStackTrace();
				}
			}
		}
	}
}

Running QuartzTest to see LoggingJob print following at console:

[INFO] 20 Jan 11:16:30.926 AM MyClusteredScheduler_Worker-1 [test.LoggingJob]
JobExecutionContext: trigger: 'group1.trigger1 job: group1.job1 fireTime: 'Tue Jan 20 11:16:30 CST 2015 scheduledFireTime: Tue Jan 20 11:16:05 CST 2015 previousFireTime: 'Tue Jan 20 11:16:03 CST 2015 nextFireTime: Tue Jan 20 11:16:07 CST 2015 isRecovering: false refireCount: 0

[INFO] 20 Jan 11:16:30.926 AM MyClusteredScheduler_Worker-1 [test.LoggingJob]
MyClusteredScheduler execute!

Then start more QuartzTest, you will find only 1 LoggingJob is scheduled at the same time. If you terminate the first one, LoggingJob from another node is awoken.

In Conclusion

The Pros and Cons of Quartz Clustering:

Pros Cons
Simply architecture Relies on database high availability
Suit for small/mid level clustering DB could be the bottleneck and latent risk
Less invasiveness. Can be in/out application. Clock sync is required between nodes.

Next I'm going to try ZooKeeper, there now exists a Lock implementation under recipes directory.

20 Jan 2015