draccus

Writes messages from SQS to S3

Usage no npm install needed!

<script type="module">
  import draccus from 'https://cdn.skypack.dev/draccus';
</script>

README

Draccus

Build Status

A tool for stashing messages queued up in Amazon's SQS.

Use it as a disaster recovery mechanism to drain a queue, or use it to guarantee delivery of transaction logs to persistent storage.

SQS Messages will only be deleted once they have been permanently stored, thus certain classed of errors may cause messages to be stored twice. This is by design to avoid the possibility of dropping messages. If you care about uniqueness make sure there is an identifier in the message body that can be used by any future batch processing jobs that might run over the messages.

Install

$ npm install draccus
$ npm test draccus

Usage

dracus.js --options_file path/to/config --queue_name some_queue_name

CLI Flags

  • --options_file : Path to a JSON file where options can be loaded from.
  • --access_key_id : AWS access key ID.
  • --secret_access_key : AWS secret access key.
  • --metadata_service_host : Hostname to use when looking up credentials via IAM (see below).
  • --queue_name : The name of the queue to read messages from.
  • --flush_frequency : How often to flush messages to the store. Default: 60s.
  • --s3_bucket : Name of an S3 bucket where messages should be written.
  • --out_dir : A local directory to write messages to (e.g. for dev).
  • --stdout : Simply logs messages to the console (warning: will still erase the queue contents).
  • --daemon : Whether the process should stay running once the queue is empty, and wait for further messages.
  • --filename_pattern : How to generate filenames, uses momentjs. for string formatting, in addition replacing PID with the process id. Default: X for unix timestamp.
  • --log_file : The path of a file to write logs too.
  • --log_raw_message : Specifies that the raw SQS message should be stored as JSON, instead of the message body.

--options_file

The above flags can be specified via a JSON file, along with additional AWS configuration options as described in the SDK Documentation.

{
  "accessKeyId": "YOURACCESSKEY",
  "secretAccessKey": "yoursecretkey",
  "sslEnabled": true,
  "region": "us-west-2",
  "queueName": "my_awesome_queue",
  "s3Bucket": "drained_queue",
  "filenamePattern": "YYYY/MM/DD/HH-X-PID"
}

The JSON flags are camel case equivalents of the CLI flags.

AWS Credentials

When running outside of EC2 you will need to pass in a valid access key and secret key. However, if you are running within EC2 and you omit the access key draccus will fall back to trying to use IAM based credentials, using the instance's role. See here for more details.

Throughput

Throughput should mostly be limited by network latency and SQS response time. From SQS documentation the theoretical limit per instance should be 500 messages per second.

In a test on an m1.small it took 40.7s to handle 10,000 messages. That is: receive, write to S3, and delete the message. The test flushed to S3 every 10s, resulting in 3x 100k files.

Test set up:

draccus-fill-queue --options_file aws.config --batches 1000
time draccus --options_file aws.config --flush_frequency 10

(If you use multiple workers remember to specify a different filename pattern or include PID in the filename pattern.)

Contributing

Questions, comments, bug reports, and pull requests are all welcome. Submit them at the project on GitHub. If you haven't contributed to a Medium project before please head over to the Open Source Project and fill out an OCLA (it should be pretty painless).

Bug reports that include steps-to-reproduce (including code) are the best. Even better, make them in the form of pull requests.

Author

Dan Pupius (personal website), supported by A Medium Corporation.

License

Copyright 2013 The Obvious Corporation.

Licensed under the Apache License, Version 2.0. See the top-level file LICENSE.txt and (http://www.apache.org/licenses/LICENSE-2.0).