NOTE: This job is no longer available!
Service Reliability Engineer
at Demonware
Vancouver, BC
The Service Reliability Engineering Group manages more than 1,300 servers used to host DemonWare's on-line services, supporting some of the most popular computer games in the world today.
Responsibilities of the Service Reliability Engineer (SRE):
· Managing infrastructure
· Tracking and scheduling the following:
-OS/OS vendor updates
-Network equipment software/firmware updates
-DemonWare software updates (based on packaging and documentation from DevOps team)
-External tools/service updates (e.g. Hadoop)
-Adding/removing hardware to/from service clusters
· Monitoring and metrics
-Ensure monitoring and metrics systems are operational
-Ensure configuration for monitoring and metric systems are up to date
-Problems and outages
-Second-line support to the front-line operations team
Essential:
· Administration experience of Debian and Red Hat based Linux servers
· Can work under high pressure
· Quick learner
· Very comfortable working with Linux based systems
· Good understanding of database technologies
· Able to automate every task he or she does
· Strong scripting skills: Unix shell programming and ideally at least one of Perl, Python or Ruby
· Should know how to script tasks involving SQL, XML and network operations in a Linux environment
· Strong analytical and troubleshooting skills
· Excellent written and verbal communications skills
· Ability to spend up to one week per month on call
Desired:
· Experience of Windows Servers
· In-depth knowledge of IP based networking
· Administration of services on Linux servers (Apache, Postfix, MySQL etc.)
· SQL databases - MySQL preferred
· Web services development (e.g. XML-RPC, REST)
Responsibilities of the Service Reliability Engineer (SRE):
· Managing infrastructure
· Tracking and scheduling the following:
-OS/OS vendor updates
-Network equipment software/firmware updates
-DemonWare software updates (based on packaging and documentation from DevOps team)
-External tools/service updates (e.g. Hadoop)
-Adding/removing hardware to/from service clusters
· Monitoring and metrics
-Ensure monitoring and metrics systems are operational
-Ensure configuration for monitoring and metric systems are up to date
-Problems and outages
-Second-line support to the front-line operations team
Essential:
· Administration experience of Debian and Red Hat based Linux servers
· Can work under high pressure
· Quick learner
· Very comfortable working with Linux based systems
· Good understanding of database technologies
· Able to automate every task he or she does
· Strong scripting skills: Unix shell programming and ideally at least one of Perl, Python or Ruby
· Should know how to script tasks involving SQL, XML and network operations in a Linux environment
· Strong analytical and troubleshooting skills
· Excellent written and verbal communications skills
· Ability to spend up to one week per month on call
Desired:
· Experience of Windows Servers
· In-depth knowledge of IP based networking
· Administration of services on Linux servers (Apache, Postfix, MySQL etc.)
· SQL databases - MySQL preferred
· Web services development (e.g. XML-RPC, REST)
