Rails

Everybody knows that this line :

Model.where(column: 'value').update_all(column: 'Another Value')

is better than the following lines :

Model.where(column: 'value').each do |item|
    item.column = 'Another Value'
    item.save
end

Not because it’s shorter, but because it will get executed as one UPDATE statement, Not as multiple UPDATE statements.

https://media.giphy.com/media/N2rLxtwaU9rBC/giphy.gif

But it seems that I did this mistake while working on a worker that should bulk insert thousands of records.

My task was to clone N amount of records and update one value for the new clones (records), So I started to work and here was my code :

ActiveRecord::Base.transaction  do
	Model.where(column: 'value').each do |item|
		new_item = item.dup # Duplicating the Object
	    new_item.column = 'Another Value'
	    new_item.save
	end
end

So even though I am using DB Transactions thinking that It would be the most optimized solution since I’m committing my records at once but I forgot the fact that I’m executing multiple INSERT statements.

So the solution is to execute one SQL INSERT statement but unfortunately Rails doesn’t support that or ActiveRecords doesn’t. So I had to use a 3rd party gem to do so, and the gem is bulk_insert.

bulk_insert allows you to insert multiple records in one single INSERT statement along with some other features where you can check in this README file. And to test it, I created a small Rails App with a local PostgreSQL DB to benchmark the alternative solutions. The benchmark was testing the time to bulk insert 100k records.

Solution Time to insert 100K records
Multiple INSERT statements with DB Transaction ~ 88 seconds
Using bulk_insert gem ~ 8 seconds

https://thumbs.gfycat.com/TangibleCavernousBeardeddragon-size_restricted.gif

I admit that the ORMs spoiled me and I need to pay more attention to what’s happening behind the scenes.