Oracle University Course Data XML feed cloud processing solution using AWS and Rackspace Linux Instances
Project to design, test and implement a secure cloud-based solution for Vertex IT Solutions, one of the top Recruitment Companies in the UK, using cloud-based AWS and Rackspace Linux servers, to dynamically display as an Oracle Education provider of up to 10000 Oracle University Training Courses for their candidate and client website with a total average Oracle Company sales value of £16.8m per day.
• The Oracle University master course data file in XML format provided by Oracle every week does not contain all the training course data. Therefore, the missing course data has to be downloaded, merged and correctly formatted into the master file for 100% XML validation.
• The Oracle University master course data file contained control characters such as vertical and horizontal tabs, which caused the data not to validate and to display correctly.
• The website XML import application was found to have import restrictions to only use and maintain one import data file.
• The XML data import to the Vertex IT Solutions must be done each day, without any reduction in website performance to site visitors.
• The Oracle University course data solution had to be delivered in full to the Vertex IT Solutions web designers Weblake Interactive within 30 days.
• The whole project required delivery within budget.
A Linux based automated solution proposed to the client, to download, merge and format the Oracle XML data, for transfer and import, from an AWS Linux instance into the into the Vertex IT Solutions Rackspace website WordPress MySQL database environment.
The XML batch Linux application was designed, tested and compiled for performance reasons to download up to 10,000 Oracle courses on a weekly basis, and merged them into one data file, with control data cleansing for 100% XML data validation. This Oracle University Course Master data file created using this XML batch program has around 1.3 million lines of XML element and course data on a daily basis, for mapping to the relevant website categories, for dynamic daily post creation, updates and automatic post deletions.
The XML batch application was very network, disk I/O and CPU intensive taking 10 hours to complete due to the Cloud instance used. Before we performed application profiling and optimisation activities to scale the AWS Instance vertically and to auto-scale the instance horizontally during the batch runs. An AWS network load balancer was also implemented, with faster disk storage, resulting in reducing the batch runtime from 10+ hours to around 50 minutes.
The whole Cloud application project was designed, tested, implemented, optimised, documented and completed within the 30 days requirement by our Digital Innovation Global Engineers, with help from the excellent Web Designers at Weblake Interactive.
Details on how the Oracle University XML Cloud application works:
The compiled Linux application:
1. On startup performs a cleanup for existing batch runs
2. Downloads the data from the Oracle University Course data website
3. Performs data error checking to ensure integrity
4. Loads the 100,000 lines of empty downloaded Oracle course data XML into server memory
5. Processes each
6. Adds XML header strings to the XML file
7. Removes all unwanted HTML/control characters from XML file
8. Transfers the completed XML file securely using SCP to the Vertex IT Solution Rackspace website x86-64 Linux server, specially designed and provided by Weblake Interactive.
The Rackspace Linux server then imports the XML data into the website MySQL database fields as newly formatted web posts, updates any previous Oracle University web posts that have changed and removes any web posts that have expired course dates.