Webcron
webcron is the term for a time-based job scheduler hosted on a web server. The name derives its roots from the phrase web server and the Unix daemon cron. A webcron solution enables users to schedule jobs to run within the web server environment on a web host that does not offer a shell account or other means of scheduling jobs.
Overview
Many web hosts offer shell accounts or some sort of built-in job scheduler such as cron that makes it easy for users to schedule jobs. Such hosts run jobs as command-line applications that may optionally communicate with the web server. A webcron solution, however, runs entirely within the purview of the web server environment of a web host. This allows a webcron solution to operate on hosts that do not offer a job scheduler such as cron or a shell account. A webcron solution will also run equally well on hosts that do offer users such capabilities but is designed as a substitution or replacement.A webcron solution is made up of two pieces. The first piece is a script that will execute the tasks that resides somewhere accessible via a URL. The second piece is to use a scheduling provider that contacts the URL of the script at regular intervals.
Before setting up a schedule with a scheduling provider, a user must set up a script that runs on the web server. Most web hosts have restrictions on the length of time a single instance of a script may execute. Many web hosts also have limitations on CPU and RAM resource usage. Users of webcron solutions on shared hosting providers must be careful to not repeatedly exceed their web host's limitations so as to not get kicked off. A script that runs a long time must take into account that it may be terminated at any point by a web server process. Users may implement a state machine, which allows the script to operate across multiple invocations and run within the limitations imposed by a web host.
Scheduling Providers
Third-Party
There are many third-party webcron scheduling providers on the web. These services accept a URL and a frequency schedule to retrieve, or ping, the specified URL. Most providers have restrictions built into their system to avoid overloading their servers and to encourage users to sign up for premium accounts.Users who set up premium accounts on third-party webcron scheduling providers typically gain additional benefits such as SMS and email notifications, uptime reports and logging, increased timeout limits, schedules won't expire, being able to use HTTP POST method, HTTP cookie support, or fewer restrictions on scheduling frequency.
Some webcron service providers accept CRON expression in web interface to schedule the job executing time.
Visitor Based
A webcron solution can be contained entirely on a web host by letting visitors trigger a webcron scheduler script on the server. For instance, this can be accomplished by using an 'img' HTML element in the header or footer of the website, an Ajax call in a script or an iFrame. When a visitor views the website, the image loads, which triggers the webcron scheduler. The webcron scheduler runs any tasks that need to run and then outputs an image so the visitor's web browser does not display a broken image on the page. It may alternatively start the task asynchronously such that the HTTP response is not delayed.If there are insufficient visitors to a website using visitor based webcron scheduling, then scheduled tasks will not run on time.
Since visitor based webcron scheduling enables the possibility of self-contained webcron solutions, it increases the portability of a website or web-based software product. Some web-based open-source software that have tasks that need to run regularly use a visitor based webcron solution to execute those tasks.
Remote Access
A remote access capable webcron solution is typically bundled with a pair of client and server components. The client runs on a separate computer, such as the user's personal computer. A job schedule is set up on the computer where the client component resides. Then, when the job runs, the client component communicates with the server component.Remote access usually offers capabilities that are impossible with other scheduling providers. The data between the client and server components is typically encrypted even across HTTP. This allows a plugin or module for the client component to talk to the server component to securely request information that is normally restricted. Compression of the data sent and received helps reduce overall bandwidth used.
A typical implementation of a remote access plugin or module is to incrementally back up files and databases from the web server to the client. Some incremental backup implementations may even offer basic host-based intrusion detection system functionality.
Local Access
A webcron solution can be used on hosts that already have cron available. This is useful when required functionality is only available via the web server. The cron daemon is the scheduling provider and periodically contacts the script using another tool such as Wget.In the case of a remote access capable webcron solution, cron can run the client component to execute the script.
Security Concerns
Since webcron solutions involve availability via a URL, there are several different security concerns that users should address. A webcron solution introduces issues of trust, opportunities for denial-of-service attacks, network or packet sniffing, executing a replay attack, and possible exposure of information. A webcron solution is an ideal entry point for criminal computer hackers.When using a third-party scheduling provider, users trust the third-party to not misuse the URL in any way. Users also have to assume that the connection between the third-party server and the web server is secure from hackers.
When using a visitor based scheduling provider, users may inadvertently provide a possible venue for denial-of-service attacks. Also, if a script is written improperly, the script may unintentionally expose information about the server.
When using a remote access scheduling provider, users usually have refined control over how communication takes place with the web server. If HTTP is used, the URL is sent in the clear over the wire but the data in the request is typically encrypted. This opens up the possibility of denial-of-service attacks and replay attacks.