ScyllaDBは、CassandraをC++で再実装したハイパフォーマンスを謳うNoSQLデータベースで、ScyllaDB Inc.はその開発・サポートを行っている企業。
As a Scylla Cloud Operations & SRE Engineer, you'll play a crucial role in maintaining the operational excellence of our cutting-edge NoSQL database platform, Scylla Cloud.
Responsibilities:
- Collaborate with the Cloud Operations & SRE team to ensure the smooth day-to-day operation of Scylla Cloud.
- Monitor system health, troubleshoot issues, and proactively address any operational challenges.
- Assist and perform upgrades for Scylla Cloud, including Scylla database versions, OS upgrades, and security patches.
- Collaborate with DevOps/Cloud Engineering to ensure seamless upgrade processes.
- Participate in scaling up and down Scylla Monitor & Scylla Managers servers based on demand.
- Employ proactive monitoring strategies to identify and address potential performance bottlenecks and resource constraints.
- Act as a liaison with the Support Organization to address cloud platform-related issues.
- Respond to tasks and tickets escalated by Support Staff, and collaborate to ensure timely resolutions.
- Develop and maintain a comprehensive runbook that can be leveraged by Support Staff to troubleshoot and resolve common issues, improving efficiency in issue resolution.
- Create scripts and automation solutions to streamline operational tasks and enhance efficiency.
- Contribute to the development of automation strategies for cloud infrastructure management.
- Collaborate with the Cloud Engineering team to define and create feature requests that enhance the functionality and performance of Scylla Cloud.
- Conduct regular cluster health and performance audits, identifying areas for optimization.
- Implement strategies to enhance the efficiency and reliability of Scylla Cloud clusters.
- Work closely with the Customer Success team to ensure that provisioned resources align with customer needs and purchased packages. Provide insights into potential scaling opportunities and usage optimization.
- Demonstrate a deep understanding of public cloud environments (AWS, GCP, Azure), Kubernetes, Linux system operations, and NoSQL database deployment/management. Apply this knowledge to resolve complex technical challenges.
- Utilize scripting languages like Python, Terraform, Ansible and Bash to create automation tools that enhance operational efficiency.
- Collaborate closely with Support and Engineering teams to address issues, drive improvements, and implement customer-focused solutions.
Requirements
- 3+ years of experience in public cloud platforms (AWS, GCP, Azure).
- 3+ years of Linux system operations and metrics analysis.
- Strong scripting skills in Python and Bash.
- Experience with reporting and visualization tools such as Splunk, Grafana, Prometheus, and Kibana.
- Excellent written and verbal English communication skills.
- Exceptional organizational skills and ability to manage multiple projects concurrently.
- Ability to work both independently and collaboratively within cross-functional teams.
- Strong problem-solving skills, especially under pressure.
- Eagerness to continuously learn and adapt to emerging technologies.
- Familiarity with container technologies like Docker and Kubernetes.
- FamiliarityProficiency with automation tools such as Ansible and Terraform.
- 3+ years of Kubernetes experience - advantage.
確かグローバルのリモートワーク可だった気がする。
英語が堪能なら、こういうミドルウェア系のSREのキャリアパスが開かれるんやな...(遠い目)。 こういう意味で、SREのスキルセットはグローバルで活用可能、ということが言えそう。
まぁでも、自分はそこら辺を深掘りできるほどの情熱はあんまりない。