Bad ORM is infinitely worse than bad SQL

This is a bit of a rant, and I do apologize. It seems that lately I'm confronted more and more with ORM implementations that may look right on the surface, but that are a killer for your application. In the end, it's always a case of bad ORM usage, lack of knowledge from a developer, ... but I believe that just strengthens my point: it's very easy to write bad ORM, whether it's Doctrine, Propel, ...

Edit: this is getting a proper discussion at the Reddit-thread as well, be sure to check it out.

Hey, this looks like it might work

In an ORM world, it's very easy to write something like this (I say easy, I do not say right).

<?php
    // First, get all the companies in your database
    $companies = $this->getAllCompanies();
    $totalValue = 0;
    foreach ($companies as $company) {
        // For each company there is, retrieve the total value of all their orders
        $orders = $company->getOrders();
        foreach ($orders as $order) {
            $totalValue += (float) $order->getValue();
        }
 
        echo "This company made us ". $totalValue ." euro already.";
    }
 ?>

If you've done PHP for more than 3 months, the above will give you shivers down your spine. It looks like perfectly valid code, it even makes sense to read it. But in reality, it probably does something like this.

<?php
    // First, get all the companies in your database
    SELECT * FROM "company";
    $totalValue = 0;
    foreach (...) {
        // For each company there is, retrieve the total value of all their orders
        SELECT * FROM "order" WHERE companyid = $companyid;
        foreach (...) {
            $totalValue += $row->value;
        }
 
        echo "This company made us ". $totalValue ." euro already.";
    }
 ?>

The above snippet of code will get progressively worse as your application grows. Why? Because as you have more entries in the "company" table, your first foreach-loop will grow since you iterate over each entry in the table. As a result of that, even more queries in the "order" table are being executed. So every time your customer base grows (more entries in that "company" table), the code above will get slower and you'll hurt your database with more and more (simple) queries as you iterate each table for each row.

Bad ORM is too easy to write

My problem with ORM is that it's too easy to write bad code. It's very easy to use all the default mappings and just do "SELECT *" in the background. Every ORM-system gives you the ability to write custom SQL queries, but that sort of defies the point of ORM in general, hence hardly anyone does it. As more and more developers use only ORM to create applications, they lose their touch with the database interaction, the queries behind it, the reasoning of why to use a certain kind of query, the performance impact of an INNER or OUTER joins, ...

The example above, causing many small queries, could also be replaced with one efficient query to give you the same result.

SELECT SUM(o.VALUE) AS TotalValue, c.name AS CompanyName
FROM company AS c
LEFT JOIN "order" AS o ON o.companyid = c.companyid
GROUP BY o.companyid;

And it's not even a difficult query.

The lack of visibility

If you take a look back at the 2 code examples above, I think it's sort of obvious that the 2nd example -- where the SQL queries are shown -- shows you very quickly that this is bad code. It shows you the SQL queries that will be performed, and if you think logically for 1.5s you'll be aware that it is a bad thing to write. While the ORM example looks perfectly valid and the performance bottleneck may not immediately be clear.

And that just may be my biggest frustration with ORM: as a developer, you lose focus on the underlying SQL queries. Some may claim that it's not important, that it is exactly the reason why ORM is gaining so much attention. But if you're serious about tuning your application, you need knowledge of the SQL that is being performed. You need to know how to write efficient queries with complex JOIN's, GROUP BY's and aggregate functions such as SUM(), AVG(), ...

Oh you, just master ORM!

Yes. You don't write stupid code if you master your tools, and that includes ORM frameworks as well. I'm sure there are people out there that write perfectly performant ORM-related code, it seems I have just yet to find them.

If you disagree with me, prove me wrong.

Hi! This is a blog where I write down solutions to some of the problems I've faced when working as a sysadmin/dev. I hope you find the information shown here useful to you. Please use the comments on this blog to give feedback about the content!. I'm @mattiasgeniar on Twitter.

Tagged with: , ,
Posted in php
35 comments on “Bad ORM is infinitely worse than bad SQL
  1. Can’t agree more with fact that it’s easy to write performance killers when using ORM’s. I would even increase the rant and include frameworks and libraries in general.

    But be so kind not to blame the ORM’s or frameworks, but the ignorent programmers. I elaborately use the word programmer instead of developer. It seems that the new generation of programmers don’t even wan’t to take the time to learn how to craft a piece of software, they just want to write code that “looks” good and be ooooh sooooo productive.

    They all prefer the nice abstraction and “magic” that surround libraries and frameworks. But actually, they are clueless of what’s really happening underneath. So to be honest, saying “as a developer, you lose focus on the underlying SQL queries. ” is just so wrong in many ways. But that’s maybe another debate when you discuss the lack of finding qualified developers :)

    I do understand the point you are trying to make. The above example is perhaps a bit too obvious, but there are indeed some more complex examples where you might lose focus.

    But now I’m ranting in this comment. So let me just end with a small tip that helps me when it comes to understanding ORM’s and even knowing what’s happening underneath. Every ORM comes with great logging tools. So just take the time to tail your logs. This way, you can see what queries get generated when you start testing your pieces of code. And yes, check the queries that pass by. This way, you won’t lose focus on writing descent software and you might learn a thing or two ;)

  2. Frank Marien says:

    Agree, I’ve seen this happen more than a few times..

    I cannot help but wonder, though: Around the turn of the century, I had moved to a proper Object Store,
    at the time, I thought that would have been the end of the relational database.. I’m amazed to
    find that instead of Object Stores, we’re now 13+ years later and we’ve chosen to do this wasteful “twisting layer”
    called ORM instead, and we’re still having this ass-backwards (from an OO pov) relational concept in the backend, only NASA and Boeing are using Object Stores, and everyone else (that is not still relational) has gone Huge Persistent HashMap In The Cloud, and/or something fast with no integrity, which is all very far away from a proper object store..

    What On Earth Happened?

  3. Nicolas says:

    @Frank Marien
    Oop is good to design behavioral interactions inside the limited scope of a process and suited to some problem domain like user interfaces… It is not the best paradigm for all possible domain…
    Data doesn’t benefit much of oop concepts. data has no behavior. data doesn’t benefit much of encapsulation. On the contrary it use indexes and queries that exhibit everything.

    There is no ultimate answer and oop programing aka single dispatch on type is not the solution to every possible problem in the field of computer science.

    I see object databases as this view that everything must be an object trend we had and still have in many circles. Like if no other abstration is possible or even more suitable in some cases.

  4. Frisian says:

    I don’t think, that you can blame the ORM or even the programmers here. Probably no one told them about the cost of database access and how to do it properly (“tell what data to fetch” instead of “tell, how to fetch the data”) during their education. These things come with training on the job, which is arguably the reason, why many organisations make a difference between “developer” and “senior developer”.
    Besides, lazy-loading is a much better example for lack of visibility in my opinion.

    • Matti says:

      Probably no one told them about the cost of database access and how to do it properly (“tell what data to fetch” instead of “tell, how to fetch the data”) during their education.

      I believe this just strengthens my point: when I got my education, we learned the INs and OUTs of SQL. The right way to do things. ORM makes for lazy developers that just don’t seem to care anymore.

      I know I’m generalizing, there are still a lot of good developers out there, but mark my words: in 10 years time, the new generation of developers might nog even know what SQL is anymore.

      • Dean De Block says:

        “in 10 years time, the new generation of developers might nog even know what SQL is anymore.”

        Maybe SQL will be gone by then :)

      • Chris says:

        SQL will be gone, as well as IE6, COBOL & VB.

        I wouldn’t get my hopes up.

      • Peter says:

        Even if SQL is gone, they still have to understand the theoretical base of relational structures. An ORM is never going to replace this. ORMs map badly to say a relation calculus.

        Actually I see that their is a push to lift the query language into the general language as EDSL. Linq is a good example of this.

  5. a says:

    So, if the ORM is bad, you’re proposing to move control-break code into your template, and, the fact that your join will now contain a lot more data to be transferred. A join can be expensive. While your example will definitely be quicker with the JOIN, an ORM that allows lazy loading will allow your DBA to focus on database design while allowing you to focus on front-end development.

    Why not write the code that you would use to display identical output using both the ORM and the JOIN. While you are gaining SQL performance with your JOIN, you will lose template performance with the fact you’ve got to write your control break structure.

    While either can have problems, ORMs are there to do abstractions and allow easier manipulation of your data on the development side. Just because you’ve run across bad ORMs doesn’t mean they are all bad.

    • Matti says:

      Just because you’ve run across bad ORMs doesn’t mean they are all bad.

      Probably not. In fact I’m sure there are good ORM implementations as well. But I believe my point remains valid: ORM makes it too easy to write bad code. And badly written ORM code is a much bigger pain the *!@# than badly written SQL.

  6. Bernard says:

    Don’t blame the tools, blame the one holding the tools.

    An unexperienced developer using an ORM, a framework, or SQL-queries without knowing why you should try to query all your data at once, will make the same mistake with either of them.

    He need to be educated, not tar-and-feathered.

  7. CurtainDog says:

    Is it really that big a deal if you loose sight of the underlying SQL queries? SQL is in itself a layer of abstraction, the database still has to figure out how to retrieve the requested data in an efficient manner. No reason why an ORM would be any worse in this regard.

    I agree that it is too easy to write bad code under with ORM, however I believe the same applies to SQL.

    • Chris says:

      It is a *HUGE* deal if you “loose” sight… especially if you are using a tool improperly (SELECT * FROM Company) vs properly (SELECT Company, Sum(Sales) from Company Group By Company). The cost of sending “*” rows, vs using the Database as it was meant to be (Sending two rows: Company, Sales) can’t be understated.

      ORM stands to abstract and hide the details, such that without due diligence, you might not know that your actually using “SELECT *” instead of the more proper form, which is the thrust of the rant (which I agree with).

      It is the constant fight to abstract the details to get increasingly complicated stuff done… while understanding enough of whats going on under the hood to not screw things up. Its one of the reasons I can’t get comfortable with ORM (in my limited attempts to use it… I’m more DB guy than Programmer Guy).

      • CurtainDog says:

        But the same argument applies to SQL. You can’t look at a query out of context and determine its efficiency (and the same applies to ORMs). If you really wanted to know what a query was doing you’d look at the execution plan. What due diligence is required of ORM that isn’t already required of SQL?

    • thomas kyte says:

      “no reason why an orm would be any worse”

      Lots of reasons.

      Let’s say you have a book. I’ll ask you for the index first. Then I’ll start asking you for the A topics (get them for me randomly – one by one) then the B’s and so on… Till finally you’ve given me every page – one at a time – multiple times (each page stores more than one thing on them – some pages hava A, M and Z topics for example). I am an ORM.

      Let’s say you have a book. I ask you for the book and read it from start to finish. I am a nice bulk SQL statement designed to do set based processing – not slow by slow, row at a time processing.

      If you look at his example:

      foreach ($companies as $company) {
      // For each company there is, retrieve the total value of all their orders
      $orders = $company->getOrders();
      foreach ($orders as $order) {
      $totalValue += (float) $order->getValue();
      }

      the database gets a single silly select to read out the companies one row at a time, followed by a sillier smaller sql statment to read out the orders for each company one by one – the database ends up getting thousands, maybe millions of teeny tiny little statements – all that want to use an index – when the database should have gotten: “select * from company, orders where join condition” so the database can actually have a beefy query that asks for everything needed to optimize.

      If I micromanage you and give you teeny tiny tasks – as a programmer – like “write a function that does this”, “write a procedure that does that” – and each task I gave you took you about 5 to 10 minutes, versus giving you a specification and asking you to implement a certain feature or module – which would be more efficient? which would be a better use of my time, your time (you are the database server, I am the client). Assuming you are a competent programmer – giving you a specification (the entire problem statement) would be the only efficient way to do it. Giving you teeny tiny tasks (thousands of them) makes it impossible for you to “optimize” your programming.

      You have to give the database the entire problem statement. The ORM tries in many cases to “out smart” the database by giving it millions of tiny sql statements – none of which is particular hard (nothing to optimize at the database level) but due to the sheer volume KILLS the database.

      • CurtainDog says:

        In most cases an ORM would try to avoid the n+1 selects problem. An ORM would likely employ caching and lazy loading strategies to try and build the best query possible under the circumstances (the query might only be executed on the first getValue() call to use your above example, at which stage the ORM has all the information it needs to build a decent query).

        Now the point could be raised that databases are designed to execute queries efficiently already, and anything else is just extra overhead. I think this is a fair point, but we’d need to see code and performance comparisons to draw any further conclusions.

  8. rob says:

    A lot of these problems are mitigated if you force developers to be explicit with each of their calls. Lazy loading is a horrible beast.

  9. Weng Fu says:

    I don’t think there is any problem at all. How else could you write a query?

  10. Anonymous says:

    For anyone who does not immediately catch on to what the problem is with the ORM example, I’ll break it down:

    1. For every “company”, another query is then run to retrieve the orders for that company.

    2. The number of queries go up by one for every company you add. Thousands of queries to the database is obviously horrible for performance.

  11. Anonymous says:

    “The above will get progressively worse as your application grows, as you’ll have more entries in the “company” table and even more in the “order” table. Every time your customer base grows, the code above will get slower and you’ll hurt your database with more and more (simple) queries as you iterate each table for each row.”

    This explanation could be worded a bit clearer.

  12. Noah Yetter says:

    Today’s generation of programmers do not understand, do not wish to understand, and indeed hold disdain for the relational model. ORM is that generational character taken to its logical conclusion.

    The truth is that most structured data that belongs in an RDBMS in the first place should be modeled there, first, and the application written to match it, not the other way around. Don’t believe it? Build a few real applications the ORM way, treating the database as a black box. See how well that works out for you.

  13. Blaise Kal says:

    ORM is often less performant than raw SQL. But not every website is a twitter.com or google.com. For lots of low-traffic websites, it doesn’t matter if a query takes twice as long. It’s even less of a problem with a good caching framework. We even use it for a few high-traffic sites (but we do optimize critical queries). And of course ORM’s have big advantages. It has a low learning curve and it’s faster to write. ORM’s also prevents SQL injection, so I don’t agree with the title of this post ;-)

  14. David Rudder says:

    I disagree with the point of the article. Yes, if your customers table is huge this can be a performance nightmare. But, it’s an extreme example, and it’s tied to performance of SQL databases. And, in reality, since every order will have a company, you could replace this with $orders = $this->getAllOrders(); and then sort them in code.

    How often are you going to run a report which shows total order value for all customers? Often enough that it’s worth sacrificing readability for a few seconds of processing time? Yes, you can claim that this doesn’t scale, but then you could also make the claim that the report becomes unusable after the first few hundred customers.

    Generally, I find that, except for extreme circumstances, more understandable code is better. If that means making a fast operation slightly less fast, then so be it. I’d rather have a function that doesn’t require an in-depth knowledge of SQL, doesn’t tie us to a SQL database, and doesn’t require a knowledge of the underlying database’s quirks.

    • Nicolas says:

      This is not an extreme example… in my previous company another worked on modernization of an old mainframe app. they used an ORM and naive queries like here… Turn out that loading the 100000 items that can be sold by the hypermarket on each page, one by one, was a bit slow, even with caching… it took several minutes… they corrected it… but they had so many problem like that that the customer decided to no longer tuse the product… the company is at loss each year now.

      Manyl applications are not just for 3 concurrent users working on a few hundred records at most. Banks manage milions transactions, typical hypermarket sell hundred thousand articles a day, there are many people in a city or even employees in a big company. Where I work now, a flight shopper and booking company, we made 948 millions billed reservation in 2011… with more than 17000 requests by second at peak hours… this not because your project has no performance requirement that only google or Facebook need performance.

      Typically your client will also want some report of the daily, monthly activity per customer, city, per age range of the client, whatever… I was asked to do that for my previous company… Turn out the solution was just on thousand line of SQL and a BI report tool. Performance was fantastic and took like 10% of the time it would have required to write with an ORM in java… Simply because SQL was very suited to the job… Again there is no ultimate unique solution that you can apply to everything.

      • David Rudder says:

        It sounds like you’re blaming ORM for poor application design. What customer would want to have to leaf through 100,000 items on a single page? Most ORMs support paging, and you definitely don’t want to have to use postgres’ LIMIT or Oracle’s ROWNUM or MySQL’s LIMIT…you get my point.

        I’m not saying there are never reasons why you wouldn’t need to join tables. But you should do it the clean, portable way first. If it’s too slow, look at your requirements. Maybe you’re trying to display 100k items to the customer on one page and could benefit from paging. If the requirements are sound, and you really, really need to join, then most ORMs support this. But, code it clean first, and then optimize if necessary. Going immediately to non-portable, complex SQL statements is almost always the wrong answer.

        This doesn’t apply to reporting servers. Their whole point is to crunch data as fast as possible. For those, complex SQL makes sense. I’m talking about customer-facing websites.

  15. Justin says:

    Author has a good point, but the commenters should take care to note the title specifies “Bad ORM”. There is such a thing as good ORM (very rare in the professional world) such that you can specify that you wish to retrieve all companies coupled with their related orders. ORM of this sort usually requires a good model of foreign keys, something that also seems to be lost to modern anti-sql weenies. It also usually requires that the programmer writing code specifically writes helpers for the ORM to define how such relations should be handled.

    An astute person will realize that I’ve used polymorphs of the word ‘relate’ twice: so long as ORM is built upon a lower stack that is relational in design, users of the ORM absolutely need to understand and be able to implement a true relational model. There is no free lunch, nor a silver bullet.

    Besides the previously discussed, it’s worth mentioning that all ORM frameworks incur cost. There is learning cost, in so much that you have to learn the intricacies of the framework. This in itself incurs the cost of learning proper relational database design / SQL, just so that you DON’T have to write SQL (wait, you have to learn SQL not to use SQL? zounds!)! There is performance cost related to exactly what is being mentioned in this article, and there is also performance cost related to the inclusion of a library and methods that run the ORM. There is maintenance cost related to the framework upgrades and perhaps upgrades of libraries the framework is dependent on. This is the nature of software; to have the system do more for you, it has to do more (duh)! The more you do in coding, the less the system has to do (this is called optimization, which seems to be turning into a lost art). There is no free lunch, there is no silver bullet.

    But, there are middle grounds. You don’t have to use an ORM framework to benefit from ORM ideas (you can also code object oriented code in a procedural language without using object oriented syntax too! You can use whatever mental models in whatever system you want, for the most part, even if it’s a horrible idea.). For example, one could seperate the SQL into a group/module/class of functions/methods that construct said methods and call them from your controllers. This would get your SQL out of your way while maintaining control; yes, you would have to write the SQL queries and/or their constructors, but it gives you seperation of concerns, puts your database layer into it’s own place, makes maintenance easier, allows you to tie error control into your database queries, let’s you reuse db queries, etc.

    I find that ORM gives big wins for things that don’t require exceptions from the most basic models, and that maintenance of performance and code readability push it to overall loss quickly after exceptions to the ORM’s way of doing things are needed. This happens in every project I’ve ever worked on above 50 hours.

    Personally, I would have used the ORM like this though:

        // First, get all the orders in your database
        $orders = $this->getAllOrders();
        // Then, get all the companies in your database
        $companies = $this->getAllCompanies();
        // Initialize array (I hate implicit variable creation, it is sloppy)
        $totals = Array(); 
        foreach ($orders as $order) {
          // For each company there is, retrieve the total value of all their orders
          $companyID = $order->getCompanyId;
          // Add the order's value to the array index for the company
          $totals[$companyID] += (float) $order->getValue();
        }
        //We only loop through this to get the name of the company, otherwise, you could skip ever pulling the company
        foreach ($companies as $company) {
          $companyID = $company->getId;
          $companyName = $company->getName;
          echo $companyName," made you $",$totals[$companyID], " this year!";
        }

    I’m not about to do testing, but having skipped an inner join (slow, but mercifully faster than our dear friend ORM) it may operate faster than the inner join. It still uses the ORM (if you must work in an ORM system with a team that would prefer you not use SQL), but not in such a horrid way. It’s also easier to grok, I think.

    • Justin says:

      1st sentence last paragraph; s/operate faster/operate slower;
      inner joins are slow ways to pull data, but the overhead the ORM produces in two calls usually is slower.
      Coffeeeee….

  16. wesen says:

    I like ORMs, and I like SQL. I usually prototype stuff I want to build with the ORM, and on calls that need it, write some custom SQL. I love the ORM for creating and writing data, which in most apps I build is way less than the reading. For reading, I usually return the queried data in a format like JSON, which makes it easy to pinpoint the queries that need to be fast, and rewrite them in SQL. Basically, I have SQL for reading, ORM for writing. I really hate writing and creating data in SQL. Having the ORM also allows me to easily add conversion / validation, which I would have to model in a very similar way if I was writing pure SQL. So in the end, the code duplication and the complexity of the ORM is not one.

    Depending on the ORM, you can also easily create virtual attributes that are GROUPs or query models using JOINs.

    for example: $companies = $this->getAllCompanies(array(‘attributes’ => array(‘id’, “value” => ‘SUM(value)’), ‘group’ => ‘id’);

    not far from the actual SQL, but pretty comfortable, and adapts easily to schema migrations.

    ORM plus:
    - model abstractions are coupled to the database, ease of migration
    - validations, no sql injections, easy conversion / import from other formats
    - easy to write/create data
    - actually easy to optimize performance wise
    - easy to change database vendor
    - easy to experiment

    ORM bad:
    - complexity that sometimes gets in the way
    - doesn’t alleviate the need to know SQL well
    - performance
    - sometimes a pain to debug (not really often though)

    SQL plus:
    - reading queries are a bliss
    - performance

    SQL bad:
    - injections
    - SQL as a programming language is hell (stored procedures!?)
    - evolving a schema can be quite a pain
    - often bound to a vendor

    etc…

  17. Carsten says:

    Your argumentation is invalid.

    In the same sense you could argue that we should all switch back to bikes because driving car is infinitely more dangerous.

    If you choose a tool such as a car or an ORM you do it for a special purpose and benefit while accepting the risk of its usage. You should also be aware of the boundaries when using a tool: you can do a lot of things much better and faster with a car but in some areas you might be better off using the bike.

    • David Rudder says:

      Another example I would use is when people used to say the same thing about Java. You can make horrible design decisions with Java that result in awful performance. And thus, you should use C. Because someone who can’t take the time to learn to write good Java is going to be *awesome* at writing C, huh? And, someone who can’t take the time to learn to write performant ORM is going to rock at SQL :)

  18. Johannes says:

    If you use strategies like Eager Loading (Doctrine2) for related tables, it wouldn’t be such a big problem.
    Furthermore:
    I don’t think it’s a problem when you don’t touch the underlying sql.
    Someone who is able to design a relational model is also aware of problems like this

    • > Someone who is able to design a relational model is also aware of problems like this

      But that person will have experience in SQL. If all you’ve ever been thought includes ORM and abstraction, these problems will not be made clear and _will_ cause performance bottlenecks.

  19. Rodger says:

    Even in SQL, with no ORM, you can do the same loop de loops.
    See the tuning presentation I made.
    http://rodgersnotes.wordpress.com/2010/09/14/oracle-performance-tuning/

1 Pings/Trackbacks for "Bad ORM is infinitely worse than bad SQL"
  1. [...] a response to Mattis Geniar’s post, titled Bad ORM is infinitely worse than bad SQL, where he talks about his bad encounters with ORMs, and how easy it is to mess up performance, and [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>