asked on

Job Scheduling in Java

Hi,

We've a web-based application running on Apache Tomcat v6.0.10. Of course, reports are also available as part of our application. Now, we've planned to introduce a new feature in the application called report scheduling. Using report scheduling, application User's can schedule reports of their choice and get it delivered at their mail box. We want to give User's as much flexibility as they can in scheduling the reports.

I also heard & read about job scheduling in Java at:

Open Source Job Schedulers in Java
http://java-source.net/open-source/job-schedulers

What is Quartz
http://onjava.com/lpt/a/6207

JobServer 1.4 Open Source Java Job Scheduler
http://www.javalobby.org/java/forums/t68751.html

Considering my use case explained above, my questions are:
1) Is it possible to use Sun's own java.util.TimerTask for my complex report scheduling?
2) What are the valid/strong reasons/limitations of java.util.TimerTask compared to other job scheduler frameworks? So that I myself have a strong belief/reason before choosing a third-party job scheduler framework.
3) There are a maximum of 100-200 Users in my application. In case, Users have scheduled reports in such a way at one time there are 100 report requests in the queue. How does the job scheduler framework OR java.util.TimerTask handle such scenarios? Do we have control over this?
4) At any time, Users are allowed to change their report schedules. Does the job scheduler framework support this?
5) Obviously, to run a report there are report inputs, that has to be passed to each report schedule. Do we have the flexibility/option in passing parameters to the job scheduler framework?
5) Which is the best way? Integrating job scheduler framework with web application or running it as a standalone?

NOTE
Because of memory leak in our application, we've a restart of Tomcat service daily-basis at low-usage time. Reason I'm explaining this is that report scheduled by Users should be persisted across server/Tomcat restarts. Take this into consideration.

Experts opinion in right direction are appreciated.

CEHJ

This is really a no-brainer: you shouldn't attempt to implement this yourself with java timers, otherwise you'll end up reimplementing something like Quartz, but probably not as well.

I don't know JobServer, but Quartz is the standard implementation and supports your requirements

nfaria

I have some scheduled jobs implemented with the OS utilities.

I record in the DB what are the reports configs and delivery schedule.

Then I use CRON in Linux to start Java programas that read from the DB and issue all the required reports. The report system runs as a stand-alone exec. In windows you can also schedule jobs to launch your execs.

For asynchronous tasks that don´t influence the outputs and data of your application this is enough. I think reporting can be treated this way.

Scheduling jobs in the application is more for those jobs that interact with ongoing processes and require to know the current state of the application.

Zoniac

ASKER

Hi CEHJ,

I'll also go thro' the documentation to understand the functionality that meets my requirements.

I'm looking for a very specific point on the following and this keeps coming to my mind always to further proceed/go with Quartz framework:
4) At any time, Users are allowed to change their report schedules. Does the job scheduler framework support this?

Do you've any sample/example code for manipulating or re-scheduling a already scheduled job using Quartz framework? Any pointers to relevant documentation are also appreciated.

CEHJ

Check out http://www.quartz-scheduler.org/docs/api/1.8.0/org/quartz/Scheduler.html#rescheduleJob(java.lang.String,%20java.lang.String,%20org.quartz.Trigger)

Zoniac

ASKER

Hi nfaria,

It looks that with the in-house job scheduling implementation at OS level, the DB schema defined for report configs and delivery schedule itself becomes very important in, for example, when the next scheduling has to be run, etc.

Is it possible for you to expose your scheduling portion of your schema?

nfaria

I only need to set up daily, weekly or monthly reports so it is enough to have somenthing like

scheduling_interval TINYINY (1, 7, 30 defines daily, weekly or monthly, special value 0 means no sending)
scheduling_last_sent DATE (I don´t need the time)
scheduling_user_id INT (owner of the config and target of the e-mail)
scheduling_report_id INT (report to run)

And my Java program issue a SELECT similar to (in MySQL syntax)

SELECT report_id, user_id, user_name, user_email
FROM scheduling_table st
INNER JOIN users u ON st.user_id = u.user_id
WHERE DATEDIFF(NOW(), scheduling_last_sent) >= scheduling_interval AND scheduling_interval > 0;

And for each record I process the report identified by report_id, dispatch it to user_email and update the field scheduling_last_sent.

Zoniac

ASKER

Hi nfaria,

In your case:

1) Daily: At what time does it trigger? Based on your schema, I think it is not user-configurable. Am I correct?
2) Weekly: At what day and time does it trigger? Again is it user-configurable?
3) Monthly: At what date and time does it trigger?

Overall, I see there is no recurrence pattern in your scheduling mechanism. Is my understanding correct?

Is your cron entry scheduled to query database every 1 minute?

nfaria

In my requisites it is enough to launch the exec every night and I don´t have a specific day for sendings like every Monday, every 1st day or something like that.

So each user gets its report from the day they enabled it plus the time interval.

If you need to set up an explicit day of month, day of week and/or hour to each user and report you just have to add those fields and adjust the query. For example

scheduling_day_of_month TINYINT (1 to 31, 0 no use)
scheduling_day_of_week TINYINT (1 to 7, 0 no use)

and issue an execution every day with adjusted query

SELECT report_id, user_id, user_name, user_email
FROM scheduling_table
INNER JOIN users u ON scheduling_user_id = u.user_id
WHERE
DATEDIFF(NOW(), scheduling_last_sent) >= scheduling_interval AND
(
DAYOFMONTH(NOW()) = scheduling_day_of_month OR
DAYOFWEEK(NOW()) = scheduling_day_of_month
) AND
scheduling_interval > 0;

If you need hourly reports or to send them in a given hour of the day just have to extend this keeping the same logic and executing your exec each hour or so. You could do it by the minute but you have to make sure that it only launches after the previous launch has ended.

ChristoferDutz

If your application is based on spring, I can certainly recommend having a look at the Spring Scheduling features. They do rely on Quartz, but wrap all the "dirty stuff" of scheduling. http://static.springsource.org/spring/docs/2.0.x/reference/scheduling.html

Zoniac

ASKER

3) There are a maximum of 100-200 Users in my application. In case, Users have scheduled reports in such a way that at one particular time there are 100 report requests in the queue. How does the job scheduler framework handle in such scenarios? Do we have control over this?

ChristoferDutz

Well a scheduling framework usually has a predefined number of working-threads. Per default I think Spring configures cron with 5 worker threads in a Thread-Pool. Stuff like report-generation seems to be quite calculation-intense, so I think this behaviour is rather desirable in your case. Otherwise the scheduler would take down your system every night for a few minutes.

Do I understand it correctly? Your system doesn't need to run user-defined jobs at a user-defined time, but work user defined reports at a predifined and system wide time? If this is case id certainly recommend adding report-jobs to a list and have a report-workter (or several) work off the list at a given time triggered by quartz (either directly or using the spring wrapper)

Zoniac

ASKER

Hi ChristoferDutz,

To answer your question, so far I had the idea of running jobs for user-defined & user-defined time. Now, based on your points, suddenly this comes into my mind. Can you clear this?

In case if my Quartz scheduler worker threads are also running in parallel with my web application, will that slow down my web application system? I can also say, my each report job may take approximately 5 minutes to complete.

nfaria

5 minutes x 100 reports in queue?

If they run in parallel they will strain your app, but if they share the same DB even if they are executed apart they can strain your DB and your app as a consequence.

Each user request is an independent report with nothing in common with each other?
Couldn´t you generate a report and send it to the users that 'subscribed' it?

If not hope you have a mirror DB exclusively for reading and report generating.

ChristoferDutz

Hi,

On a normal Quad-Core-Duo you will be able to execute 4 Threads in parrallel (Intel may say 8 but the "duo" cores are no full cores). So if you have 100 Jobs each taking 5 minutes, this means that you will roughly need 2 Hours to work off the load. If you increase the number of parallel executions you will certainly get less performance as the Operating-System has to do a lot of Thread switching and Thread scheduling, Lock maintenance etc.

Of course your Reports will strain the Webserver and if for example a lot of people want fresh reports at noon, your website may even go offline of be verry sluggish for a given time. A solution would be to give the report-threads a really low priority. This would at least give the main threads enough power to keep the webserver available.

I'd recommend a dedicated Report-Server, that does the generating of reports and that generates them at night if the report-generation is a DB intense operation.

Zoniac

ASKER

Hi nfaria & ChristoferDutz,

Let me give some more detail on my production server setup:

SERVER SETUP DETAILS
RAM: 7.5 GB
Both my web server (Apache Tomcat) and database (PostgreSQL) are running in the same server. Also, I've planned to run Quartz framework in the same machine. Choice of integrating Quartz server framework is in my hand now, that is either integrating it with Apache Tomcat or running as standalone & adding jobs using RMI call.

YES. Each user request is an independent report with nothing in common with each other.

How do I make report-threads a really low priority from within Apache Tomcat in case Quartz server is integrated with Apache Tomcat?

SOLUTION

ChristoferDutz

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Zoniac

ASKER

Hi ChristoferDutz,

By setting thread priority, you mean setting this at the quartz.properties level?

# "THREAD_PRIO" can be any int between Thread.MIN_PRIORITY (1) and
# Thread.MAX_PRIORITY (10). The default is Thread.NORM_PRIORITY (5).
org.quartz.threadPool.threadPriority = 1

OR, do you mean handling this at each job-level at the time of adding & scheduling:
scheduler.addJob(job, true);scheduler.scheduleJob(cronTrigger);

Also, can you explain the significance of "getting the priority first, saving it, setting it to minimum in a try block and resetting it to the old value in the finally block"? I assume there may be reason behind for you saying this, let me understand your view point further.

ChristoferDutz

Oh well if the Quartz allows you to set this, I would prefer the "org.quartz.threadPool.threadPriority = 1" option ... your "OR"-Option is not going to help you.

My approach allows you to set the level of the currently executed Thread to a predefined priority and makes sure it is reset to its default after the job is finished (Actually shortly before it is finished). If you don't save and reset the priority you would run into problems if you have Jobs in "mixed-priority" (Some Jobs have normal priority and some have lower priority), because as soon as one thread executes the low-prio job, its priority is reduced and after it has finished it's job it is returned to the Thread-Store (still with low priority). As soon as a normal-prio Job gets a thread from the Store it might get one with normal priority, but it also might get one with low priority. I doubt this is what you want.

I think this also may be the big difference between setting "org.quartz.threadPool.threadPriority" and my approach as my approach allows Jobs executing in different priority whereas the "org.quartz.threadPool.threadPriority" approach makes all run in the same.

Zoniac

ASKER

Hi ChristoferDutz,

I got your approach. In my case, all reports generated are of equal priority. So I cannot change job thread priority programmatically and also I would not be able to distinguish the priority of the job threads executing at runtime, since as I said, all report threads queued are have equal priority.

So as you said and also my opinion, I would better set org.quartz.threadPool.threadPriority = 1 and make all threads run with the same priority.

ChristoferDutz

I'd suggest to give it a try :-)

Additionally I'd recommend to limit the amount of threads in the pool as I mentioned 1000 Treads working on 4-8 Cores does not make real sense and will certainly be tha cause of some performance issues you would be getting under heavy load.

If it turns out that your system still has some performance/scalability issues, feel free to come back and let us help you with them. I think there are quite some performance specialists here.

Zoniac

ASKER

Hi ChristoferDutz,

Thanks for your valuable suggestion. Sure, I'll get back.

I need another idea/suggestion w.r.t Quartz framework:

Based on my use case explained in my original post, I need to have many job instances doing same thing (generating report) but with different parameters (report input) and different time intervals (based on each User's schedule).

1. Is it right to create a separate JobDetail and a separate CronTrigger for each User?
2. In case if the User wants to reschedule his report (that is different report input and different schedule), which approach would you suggest:
1. Deleting both JobDetail and CronTrigger and creating a new JobDetail & CronTrigger
whenever User want's to change his/her report input and reschedule.
2. Identify/locate CronTrigger from JobDetail and rescheudling job using
Scheduler.rescheduleJob(String triggerName, String groupName, Trigger newTrigger)

ChristoferDutz

No, I wouldn't suggest that.

I would recommend a generic CronTrigger and JobDetail, that works off a List (preferably a concurrent List) of ReportJob objects. These ReportJob objects are simple containers, that contain everything that a users report needs.

Zoniac

ASKER

Hi ChristoferDutz,

In my case, may be CronTrigger can be worked off to a predefined list, but it cannot be in case of report input (which has to be passed a parameter to JobDetail), because the report input to be passed has too many parameters in my case.

Again my question is little different here: Incase if the User (more than one user are allowed) wants to change report input and it's schedule, anyway I need to reframe job of this particular user with different report input parameters and with new schedule.

1. Delete JobDetail and it's associated CronTrigger and create one with new JobDetail & CronTrigger (from predefined list that is).
2. Keeping JobDetail as it is, but just overwrite JobDataMap (report parameters) and CronTrigger (schedule).

ASKER CERTIFIED SOLUTION

ChristoferDutz

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Zoniac

ASKER

Hi ChristoferDutz,

First, thanks for sharing the scheduling portion code of your application.

As we discussed, in your case also, to reschedule an already assigned job (to the Quartz scheduler), first the existing job is located and its associated trigger were deleted, and a new job with new trigger is created and scheduled.

Am I right? As you pointed correctly, your case closely matches mine.

This is just out of my curiosity after reading your code. In scheduleJob() method's catch block, you have updateJob(aJob). Can you share with me what this does/handles on having exceptions in scheduling a job?

ChristoferDutz

Oh this is nothing to wory about. My Job definitions are saved using JPA. A user can specify a Job. Within this job is a boolean field that allows him to activate the job. So only the active jobs are automatically scheduled. We had the case that some job definitions caused errors (if the job contains a year setting of 2009 for example) this is why jobs causing errors are deactivated automatically and in order to persist the changes my updateJob method simply persists the JPA Object.

Zoniac

ASKER

Hi ChristoferDutz,

Thank you for your update on updateJob(aJob).

Zoniac

ASKER

Solution arrived for my use case with Quartz scheduler framework.