Senior Infrastructure Focused Site Reliability Engineer (Westwood)

Employment Type

: Full-Time


: Information Technology

Loading some great jobs for you...


Role : Infrastructure Focused Site Reliability Engineer

: Union, NJ

Duration : 12 Months (Contract)


Looking for a Senior Infrastructure-focused Site Reliability Engineer (SRE) who can build, instrument, troubleshoot, automate and triage highly scalable legacy and modern systems . The candidate will be part of a team with a mission to blend a variety of skill sets and work collaboratively to ensure not only that we deliver quality, but also take an active role in determining what architectures and technologies perform, scale and deliver services reliably.


  • Troubleshoot issues across the entire stack - hardware, software, applications and network
  • Design, build, test, and automate discovery, instrumentation, alerting, and escalation of monitoring
  • Document and articulate clearly all efforts and communicate and demonstrate to the team with ease.
  • Support Incident Services teams on escalated incident and root cause analysis and resolution
  • Work off-hour on-call schedules
  • Interpersonal and Communication Skills

  • Effective working relationships with all diverse, cross functional units of the organization and external vendors
  • Excellent interpersonal skills in areas such as teamwork, facilitation, and negotiation
  • Able to work independently or as part of a team under pressure and manage priorities
  • Meets or exceeds expectations on process turnaround for assigned tasks and follow ups, and issue resolution to closure
  • Effective communication skills; verbal, non-verbal and written; and ability to communication on all levels of the organization.
  • Qualifications Required (Hands-on)

  • 4 years experience with Network Engineering, Virtualization Technologies, Infrastructure requirements and standards
  • Experience with real time monitoring (CA Spectrum, CA UIM, and other monitoring applications)
  • Advanced knowledge of system monitoring
  • Ability to validate and troubleshoot within monitoring applications
  • Ability to assess monitoring requirements and provide solutions
  • Ability to identify key processes and services to monitor.
  • Understanding of system and application logs
  • Full understanding of system and network KPIs
  • Experience troubleshooting routers, switches, servers and firewalls
  • Capable of making independent and accurate decisions under pressure
  • Capable of responding to majorcritical events and be an active participant in determining solutions and instrumentation
  • Good-to-have (Functional)

  • Experience with fault tolerant infrastructure and monitoring instrumentation with such technologies as Kubernetes, Kafka, Cassandra, AWS, GCP, etc.
  • Experience instrumenting and researching issues with CA Monitoring Suite, Nagios, InfluxDB, Grafana, Prometheus, Stack Driver, Sumo Logic, New Relic, Quantum Metric, Tealeaf etc...
  • Familiarity with tools such as Puppet, Ansible, Salt, Chef, or CFEngine would be a plus
  • Additional familiarity with log analysis tools such as Sumo Logic, ELK, and Splunk would also be helpful.
  • Practical knowledge of shell scripting and at least one scripting language (Python, Ruby)
  • Sound ethics and confidentiality with good customer service skills
  • Knowledge of project management methodologies and techniques
  • Thorough analysis, judgment and problem-solving skills
  • Self-motivated and team-oriented
  • Ability to multi task and prioritize

  • Associated topics: cloud architect, equipment, http, infrastructure, infrastructure architect, maintenance, senior engineer, senior system engineer, system integrator, systems integration architect

    Launch your career - Create your profile now!

    Create your Profile

    Loading some great jobs for you...