The Beatles once said that love is all you need, but they never worked with Ed-Fi’s API to drive meaningful outcomes for learners everywhere.
Synonymous with the Ed-Fi data standard, the API is central to adoption and interoperability, though we tend to gravitate to Ed-Fi’s ODS database when it’s time to consider analytical questions. This was a sensible solution for a long time because most agencies are comfortable with databases and their tools and know how to access these and query them to answer their most pressing student questions.
There are three big downsides to an ODS-sourced analytical solution:
- It requires database access in addition to API access
- The ODS is highly normalized, making it difficult to query
- The ODS structure is not a defined data standard like the API
An ODS driven workflow looks something like this:
* It’s tempting to directly build analytics against the ODS to save a step but that is not recommended due to the potential performance impact.
The Ed-Fi API Strategy
What if we changed the workflow to fix the downsides mentioned above? What if we used the Ed-Fi API for all the data out needs in addition to the data collection needs? This new strategy could look something like:
In this second workflow, we can see that the ODS is only an appliance of the API and no one needs external access to it. The API is the only thing needed to enable interoperability with source systems and all the downstream use cases.
The API centric workflow is also helpful for any district using a managed service provider for their Ed-Fi implementation. An API driven workflow improves scalability and data security. Direct database access is hard to predict and optimize which makes performance and scalability difficult.
How does Ed-Fi API for Analytics work?
Analytics should always be built on top of a purpose-built data store. Each data store can include the appropriate data organized in a way to answer the questions being asked. This is why Ed-Fi supports an Analytics Middle Tier (AMT) to describe what the data might look like to satisfy the core use cases. If we look at the diagrams above, you can see that the change in workflow is actually very minor. Instructional leaders, Administrators, and researchers can still have their data organized for the tools they are familiar with. The primary difference is how to configure the data store and manage the transformations needed from the API instead of the ODS. Modern data architectures make this much easier with more robust tools that are well equipped to handle this new way of thinking.
There is much more detail in Stephen Fuqua’s data lakes article on how to approach this. Modern data lake architecture provides tools to orchestrate loading data lakes from APIs as well as refining the data to meet make the information readily available through common business intelligence tools. Modern data architectures also make new manipulation possible like supporting multiple copies of the data tailored to each use case. For example, the API could drive an AMT data set ready for Instructional leaders as well as an anonymized dataset ready for researchers.
How do skillsets compare?
One of the main concerns about an API focused approach to Analytics is about the skillset we’ve already invested in. Most analysts can continue to operate on top of this API sourced framework as well as or better than older approaches. The difference between the strategies largely resides with the data engineer needs and IT requirements.
Once you land the data into your data lake/data warehouse, your BI tools will function similarly. This means that a majority of the analytics tools and resources you have in place can operate off an API driven solution just as easily as a database sourced solution.
The main difference is how does your data lake/warehouse get set up and who maintains it. The decision to configure a data lake and source it from the API will require a different set of tools. Your vendor or IT staff may be familiar with the tools but they do differ a little. Depending on your cloud/on-premise environment you will need some data orchestration to read the API and save the results into your data lake/warehouse environment.
The API-sourced approach is best when leveraging cloud frameworks, although on-premise solutions do exist. Purpose built solutions are also growing in the community. Many of the managed service providers can also assist with analytical platforms along with their API environment hosting capabilities.
What does the API mean for Ed-Fi’s Analytics Middle Tier?
Analytics Middle Tier (AMT) is a set of collections that describe the use cases driving the Ed-Fi data standard. AMT is sourced from Ed-Fi’s Operational Data Store (ODS) and the Alliance will continue to maintain it as the database-sourced method for populating analytical data. This method of sourcing data is still popular within the community.
Ed-Fi is also working on a project to describe those same AMT use cases using Ed-Fi’s API. This solution is to reaffirm that the API and data standard can drive all the use cases already outlined in AMT.
For anyone thinking about implementing AMT loaded via the API, you will quickly realize that it is dependent on your environment. You may be using next-generation data warehouse tools from Google, Microsoft, Amazon, Snowflake, or others. Each provider has specific tools to make data storage from an API easy. Ed-Fi’s reference architecture for API-to-AMT uses data lake style storage so that it can work on-premises or on any cloud provider. The importance is how you map the API’s output to the data standard’s use cases.
Final Thoughts
The cloud providers have unlocked data storage in a new way. We can now more directly access APIs to collect data and run complex analytics on it in extremely performant ways. These tools unleashed a new wave of education focused analytics providers who leveraged these tools to revolutionize what is possible with Ed-Fi data. Whether a district takes this in-house or leverages an existing analytics provider, the pipeline of education data is available. We’re living in an exciting time for us data nerds out there.