NOTE: This job is no longer available!

Service Reliability Engineer

Vancouver, BC
The Service Reliability Engineering Group manages more than 1,300 servers used to host DemonWare's on-line services, supporting some of the most popular computer games in the world today. 
 
Responsibilities of the Service Reliability Engineer (SRE):
·        Managing infrastructure
·        Tracking and scheduling the following:
-OS/OS vendor updates
-Network equipment software/firmware updates
-DemonWare software updates (based on packaging and documentation from DevOps team)
-External tools/service updates (e.g. Hadoop)
-Adding/removing hardware to/from service clusters
·        Monitoring and metrics
-Ensure monitoring and metrics systems are operational
-Ensure configuration for monitoring and metric systems are up to date
-Problems and outages
-Second-line support to the front-line operations team
 
Essential:
·        Administration experience of Debian and Red Hat based Linux servers
·        Can work under high pressure
·        Quick learner
·        Very comfortable working with Linux based systems
·        Good understanding of database technologies
·        Able to automate every task he or she does
·        Strong scripting skills: Unix shell programming and ideally at least one of Perl, Python or Ruby
·        Should know how to script tasks involving SQL, XML and network operations in a Linux environment
·        Strong analytical and troubleshooting skills
·        Excellent written and verbal communications skills
·        Ability to spend up to one week per month on call
 
Desired:
·        Experience of Windows Servers
·        In-depth knowledge of IP based networking
·        Administration of services on Linux servers (Apache, Postfix, MySQL etc.)
·        SQL databases - MySQL preferred
·        Web services development (e.g. XML-RPC, REST)