Web App plus Analytics Engine: 2 databases needed?

Justin_Gordon · April 8, 2014, 6:28pm

Situation: Web App Plus Analytics

Let’s suppose a system requires 2 database because there 2 components written by different teams:

“Rails”, a web app, which displays output data from part 2.
“Analytics”, a processing engine written in say Python, which has huge daily import data volumes and lots of number crunching, and will probably run on EC2 instances.

However, there are some tables/data shared between the systems. These are relatively modest in size.

What’s the best way to manage this situation of some data is shared.

My guesses are:

2 databases is definitely required, as opposed to one giant database, even though a number of tables are in common.
Communication between the two systems should be via messaging, such as RabbitMQ.
There be no sharing of databases across systems. See API or shared database? - Ruby on Rails - thoughtbot
Synchronization of the data of the common tables will be via messaging.

I’m just wondering if hand built messaging for synchronization of the shared data is overkill? It seems like there should be something simpler as this problem feels rather generic.

Questions:

Use Heroku to manage the non-rails database?
Assuming 2 separate databases
Any way to manage schema migrations used by both systems?
Any automated way to handle synchronization of some tables?

zamith · April 9, 2014, 9:46am

Why do you need two databases? Why no use something non relational that is built to handle that amount of data?

Justin_Gordon · April 9, 2014, 9:53am

Any recommendations alternative DBs for handling large amounts of financial markets data?

zamith · April 9, 2014, 10:01am

Do you need strict consistency or is eventual good enough? It also depends if you’re mostly doing reads or writes.

Good general solutions are HBase, Cassandra or Riak. You have a pretty comprehensive list of solutions here.

Justin_Gordon · April 16, 2014, 10:52pm

It seems like a relatively common solution is to use synchronization or replication of a Postgres database, with the options for Postgres listed in this wiki article.

I’m leaning toward this solution:

Rails has one Postgres DB, call it Rails DB.
OLAP team can replicate/synchronize either certain tables or the whole Rails DB.
OLAP team has own DB for whatever they need.
Results of OLAP are sent back to Rails App via Rabbit MQ.
Rails team only needs to communicate any changes to the tables used by the OLAP team.
Rails team can publish messages for the OLAP team when certain data changes.

derekprior · April 17, 2014, 3:42am

If the data is truly bidirectional then prepare for a world of pain. If each system needs only read-only access to data unique to the other, consider scheduled imports, API access, or database links with appropriate read only permissions (the specifics of which depend on your DB). In the case of your rails app, in can actually be configured to connect to several different databases.

I’d try very hard to avoid a situation where the data can be updated in multiple databases and needs to be synced between them. There are domains where this is necessary, but that complexity is best avoided if you can help it.

Justin_Gordon · April 20, 2014, 4:03pm

I’m going to keep the Rails DB pristine. I won’t know if the other team directly accesses the DB, synchronizes it, etc. The other team will be required to send Rails messages via RabbitMQ.

I’ll publish messages for the other team, as they require it.

There definitely are some common tables of information, and making one app is a far worst proposition, especially given the other team wants to use Python.

ACPK · May 13, 2016, 1:14am

The Thoughtbot analytics video recommends having a separate analytics database. See: 4: Common Techniques, A/B Testing, Funnels, Cohort Analysis | Online Video Tutorial by thoughtbot

Topic		Replies	Views
API or shared database? Ruby on Rails	4	1078	January 10, 2014
Same model connecting to different databases Ruby on Rails	5	2141	November 4, 2015
Share models between two apps? Ruby on Rails	4	3633	October 14, 2014
Best practices for managing databases in production Ruby on Rails	3	2363	February 9, 2015
What database should I use? Ruby on Rails	0	858	May 31, 2018

Web App plus Analytics Engine: 2 databases needed?

Situation: Web App Plus Analytics

My guesses are:

Questions:

Related topics