Data Taxonomies in Asset Management & Banking

David Doherty
4 min readDec 23, 2019

Large companies have been spending time building data governance groups, and expanding that effort by hiring “Chief Data Officers” and giving them the remit to ‘fix the data issues’. This is a really difficult problem and can be contextualized by a seemingly simple problem: let’s build a high-level data taxonomy. That should give the Chief Data Officer some scope around the remit of their obligations and duties, because after all, everything is data: personal information, financial information, etc.

However, it’s not that simple. There are no universal taxonomies for use. Finos is a open source foundation targeting the financial industry and the ‘objects’ wiki is empty (as of now): https://finosfoundation.atlassian.net/wiki/spaces/FO/pages/807763985/Proposed+Objects.

When you go to a consultancy like EY, Accenture, etc; they will try to sell you classification consultancy services like saying: “We have a taxonomy of hundreds of classifications that we have got from industry wide analysis”. Though when you ask to see it they claim it is Intellectual Property, and that they will have to come to your organization to ‘apply’ it correctly.

These are all just symptoms that the problem is difficult. Yet trying to agree on a singular taxonomy across the industry would be really helpful even if it’s not ‘correct’ in everyones’ eyes. Transactions/Positions/Trades sound the same to many peoples ears unfortunately which creates a little confusion.

  • Reference Data
    * Issued Instrument Reference Data (e.g. bonds)
    * Issuer Reference Data
    * Contracts Reference Data (e.g. futures contract conventions)
    * Venue Reference Data (e.g. trading venues, clearing venues, etc)
    * Broker Reference Data
    * Currency Reference Data
    * Holiday Calendars
  • Org hierarchy
    * Credit Hierarchy
    * Legal Hierarchy
    * People Hierarchy
  • Accounts
    * Internal Trading Accounts
    * Client Trading Accounts
    * Trading Vehicles
  • Market Data
    * Public market data (e.g. SDR/TRACE/etc)
    * Licensed market data (e.g. Bloomberg, Reuters, etc)
    * Private market data (e.g. internally generated market data)
    * Derived market data
  • Trades
    * Hypothetical trades (e.g. trades you are thinking about doing)
    * Market Orders (e.g. RFQs, resting orders on CLOB, etc)
    * Transactions (e.g. executed orders)
    * TCA (i.e. transaction costs)
  • Positions
    * Traded Positions
    * Accounting Position (i.e. tax basis)
  • Profit/Loss
    * Unrealized P&L
    * Realized P&L
    * Taxable P&L

I knocked that out without much effort. If I get time I will maintain a longer more evolved list at the bottom of this article. Even with this, there are already a bunch of questions:

  • The taxable accounting position is calculated by looking at each transaction’s tax lot information so you can compare the purchase price versus the current or sale price.
  • Positions, is a bit vague as to whether I’m referring to open positions, or all historical positions.
  • Should I need a sales hierarchy also, or should that be inferred from the org hierarchy?
  • If my broker is also in legal hierarchy am I adding it in two places? If my broker is providing a source of liquidity, are they also a venue?

Hopefully that makes it easier to understand the nuance. We can drive that home by thinking about the Chief Data Officer’s first 100 days. What do you do? Probably create a taxonomy and then build a data quality dashboard, right? That way you’ll know where to focus effort.

The CDO comes to brokers because apparently that is a challenge. They see a myriad of challenges in this one symptom. The broker may not be modeled exclusively from other data elements. In one organization it may be an attribute on the legal hierarchy (e.g. a legal entity as isBroker=True). At another organization it may be that each broker is a ‘venue’ and people can separate it out from centralized trading venues (TradeWeb, MarketAxess, etc) via tribal knowledge. Another organization may have it separated out cleanly. As every organization is coming from a different angle, it’s not easy for the industry to agree on terminology.

Another example is where we have derived data. A swap rate can be as simple as an bond yield and a swap spread, or could be derived from a custom proprietary quant library. A fairly simple org may look at market data as simply being public or licensed and that may fit all of their needs and is a good taxonomy choice. A sophisticated derivatives shop would probably need the concept of derived market data. Derived market data may be a valid classification for them as the data is so far removed from the source that it can be considered proprietary and maybe could be [re]sold. From one perspective a swap rate is licensed market data, and from another its derived market data.

Before we close, let’s consider TCA — how much did it cost to execute a transaction. Well, there is the broker fees which you probably want to map to your broker hierarchy. (if its clean), you have venue fees potentially, there are ongoing costs like cost of financing margin, there is the bid/offer spread, and there is potentially the impact of your trade moving the market (if you’re lucky enough to be that large). Potentially this ‘TCA data’ is simply calculated on demand if you have all the correct market data, transaction data, broker data, venue data, etc. So if you’re smart enough to be able to build that picture on-demand, should it fit into your taxonomy? Maybe, maybe not.

So what’s the point of this article? Well its simply that don’t take anything for granted even ‘simple’ classification. The way you define your data types is going to have a bias for how you store the data in your systems, rather than how you use them. This is what makes a Chief Data Officers job difficult, and why you should assume that you are probably making choices based on your org’s view, rather than an industry view.

--

--

David Doherty

I write about Fintech, it's past & future, leveraging 20+ years of experience in leadership roles at large Fintechs