ScyllaDB

job description

ScyllaDBは、CassandraをC++で再実装したハイパフォーマンスを謳うNoSQLデータベースで、ScyllaDB Inc.はその開発・サポートを行っている企業。

2024-02-11: Scylla Cloud Operations & SRE Engineer (closed)

As a Scylla Cloud Operations & SRE Engineer, you’ll play a crucial role in maintaining the operational excellence of our cutting-edge NoSQL database platform, Scylla Cloud.

Responsibilities:

  • Collaborate with the Cloud Operations & SRE team to ensure the smooth day-to-day operation of Scylla Cloud.
  • Monitor system health, troubleshoot issues, and proactively address any operational challenges.
  • Assist and perform upgrades for Scylla Cloud, including Scylla database versions, OS upgrades, and security patches.
  • Collaborate with DevOps/Cloud Engineering to ensure seamless upgrade processes.
  • Participate in scaling up and down Scylla Monitor & Scylla Managers servers based on demand.
  • Employ proactive monitoring strategies to identify and address potential performance bottlenecks and resource constraints.
  • Act as a liaison with the Support Organization to address cloud platform-related issues.
  • Respond to tasks and tickets escalated by Support Staff, and collaborate to ensure timely resolutions.
  • Develop and maintain a comprehensive runbook that can be leveraged by Support Staff to troubleshoot and resolve common issues, improving efficiency in issue resolution.
  • Create scripts and automation solutions to streamline operational tasks and enhance efficiency.
  • Contribute to the development of automation strategies for cloud infrastructure management.
  • Collaborate with the Cloud Engineering team to define and create feature requests that enhance the functionality and performance of Scylla Cloud.
  • Conduct regular cluster health and performance audits, identifying areas for optimization.
  • Implement strategies to enhance the efficiency and reliability of Scylla Cloud clusters.
  • Work closely with the Customer Success team to ensure that provisioned resources align with customer needs and purchased packages. Provide insights into potential scaling opportunities and usage optimization.
  • Demonstrate a deep understanding of public cloud environments (AWS, GCP, Azure), Kubernetes, Linux system operations, and NoSQL database deployment/management. Apply this knowledge to resolve complex technical challenges.
  • Utilize scripting languages like Python, Terraform, Ansible and Bash to create automation tools that enhance operational efficiency.
  • Collaborate closely with Support and Engineering teams to address issues, drive improvements, and implement customer-focused solutions.

Requirements

  • 3+ years of experience in public cloud platforms (AWS, GCP, Azure).
  • 3+ years of Linux system operations and metrics analysis.
  • Strong scripting skills in Python and Bash.
  • Experience with reporting and visualization tools such as Splunk, Grafana, Prometheus, and Kibana.
  • Excellent written and verbal English communication skills.
  • Exceptional organizational skills and ability to manage multiple projects concurrently.
  • Ability to work both independently and collaboratively within cross-functional teams.
  • Strong problem-solving skills, especially under pressure.
  • Eagerness to continuously learn and adapt to emerging technologies.
  • Familiarity with container technologies like Docker and Kubernetes.
  • FamiliarityProficiency with automation tools such as Ansible and Terraform.
  • 3+ years of Kubernetes experience - advantage.

確かグローバルのリモートワーク可だった気がする。

英語が堪能なら、こういうミドルウェア系のSREのキャリアパスが開かれるんやな…(遠い目)。 こういう意味で、SREのスキルセットはグローバルで活用可能、ということが言えそう。

まぁでも、自分はそこら辺を深掘りできるほどの情熱はあんまりない。