Session specifics
Tips and tricks specific to Koha reports but not SQL in general.
There will be some SQL on slides but hopefully nothing too arcane.
I'm no expert! Please chime in with your own ideas in chat.
I want to talk about SQL reports, but specifically the nuances of writing them in Koha. So I will mostly not cover how to write SQL in general, nor the Guided Reports that let you construct a query without writing SQL but are somewhat limited.
There will be SQL on some of my slides, but hopefully it's mostly straightforward.
Finally, I'm not a SQL expert by any means, so feel free to add your own tips and tricks in the chat. You can also add questions but I will answer them at the end.
What's in a table?
See the Reports Library
And Schema schema.koha-community.org/
"Search engine" links to any given table: https://schema.koha-community.org/23_05/tables/%s .html
Let's start with how to find the data we're looking for in Koha's many database tables. The most obvious places to look at are the ones shown right in the reports module; the Reports library and the database schema.
The Reports library is a great starting place, because no matter how specific our needs are, there's a good chance someone has written something similar. Most of my reports start by coping something from the library or from our existing reports.
Secondly, the schema website is a great resource for understanding the structure of Koha's database. It shows what fields are on what tables, what type of dataa field is—like integer, text, timestamp—and shows relationships between tables.
One trick that I use frequently is a Chrome "search engine" that quickly jumps to a particular table in the schema. Browser search engines replace the "%s" token in your search engine URL with whatever you type, so I can type my "sq" keyword and then "items" in my address bar to go to the items table in the schema.
Experimental queries
Peek at what's in a particular table:
SELECT * FROM deletedbiblio
Enumerate the values in a field:
SELECT DISTINCT itype FROM items ORDER BY itype
SELECT DISTINCT itype, COUNT(*) FROM items
GROUP BY itype ORDER BY itype
The schema shows an abstract outline of the database without the actual values from our Koha instance. I commonly use some simple queries to peek at our database contents. The first one is a SELECT * FROM
query on a table I'm interested in. This shows what fields are in a table and what kind of data they contain.
A useful variant is to SELECT DISTINCT
to enumerate all the values a field takes on in our database. We can further refine this by adding a COUNT(*)
to see how many times each value appears. This shows the distribution of values in a field.
To run these quick, simple queries I have one reusable report I call "Eric's Neverending Query" that I edit and run over and over again. This way our reports library isn't cluttered with one-off queries.
deleted*
tables
Deleted biblios, metadata, items, and borrowers (patrons) are not removed from the database permanently but transferred to a table with a "deleted" prefix.
Similarly, past issues (checkouts) and holds (reserves) are moved to tables prefixed with "old".
To be accurate, many reports must factor in these "deleted" or "old" tables.
One peculiarity of Koha is how deleted data is handled. One might assume, when a type of record is deleted, it's removed from the database, but that would be irreverisible and make some reports misleading. Instead, Koha moves deleted database rows from one table to another table with a "deleted" prefix. So when we delete a biblio, it moves from "biblio" to the "deletedbiblio" table. Similarly, once a checkout is returned or a hold fulfilled, rows move to tables prefixed with "old" to indicate they're no longer active.
Why is this important? Without factoring in these deleted or old tables, some reports are inaccurate. Any historical report which isn't looking at the current state of Koha but at how it changed over time would be slightly off.
Use a UNION
in a subquery
All biblios:
SELECT * FROM (
SELECT * FROM biblio
UNION
SELECT * FROM deletedbiblio) b
ORDER BY b.biblionumber
So how do we factor in these deleted tables? We use the SQL UNION
operator. UNION
combines the results of two SELECT
statements with the same shape—the two statements must return the same number of columns and the columns should be of the same type. Of course, the biblio and deletedbiblio tables have the exact same schema, so they work. We put the UNION
inside a subquery by wrapping it in parentheses and then we treat the subquery as the table of all our biblios, deleted or not, which we can SELECT
fields from, JOIN
to other tables, and filter with WHERE
criteria.
Annotate combined tables
This combined items table has a "status" column.
SELECT i.status, i.itemnumber, i.itype FROM
(SELECT "live" as status, i.* FROM items i
UNION
SELECT "deleted" as status, di.* FROM deleteditems di) i
status
itemnumber
itype
live
1
BOOK
deleted
2
BOOK
Sometimes, we might want to join a live and deleted table but keep track of which table our records came from. It's possible to sort of "annotate" the combined table by adding a column that indicates the source of the record. Here's an example where we combine the items and deleteditems tables and add a "status" column that tells us whether the record is from the live or deleted table. Because we added the status column to both the items and deleteditems tables, they still have the same shape, and the UNION
works.
Annual bibs added
SELECT YEAR(datecreated) as Year, COUNT(*) as "Records Added"
FROM (
SELECT biblionumber, datecreated FROM biblio
UNION
SELECT biblionumber, datecreated FROM deletedbiblio
) b
WHERE YEAR(datecreated) BETWEEN 2016 AND YEAR(CURDATE())
GROUP BY YEAR(datecreated)
Here's a concrete example of why factoring in the deleted tables matters. This report shows biblios created in each month of each year since 2016. Without the UNION to the deletedbiblio table, we would by undercounting, because any title which was added since 2016 but subsequently deleted is no longer in the biblios table. This is precisely the sort of reports that must factor in the deleted or old tables to be accurate.
statistics
versus issues
table
Rather than combine issues
& oldissues
for circulation figures, use statistics
.
The issues
tables have no item information & JOIN
ing them to the items
tables does not account for changes in item type or location since the checkout occurred.
Intuitively, when we build a circulation report, we might turn to the "issues" tables that hold old and current checkouts. But the issues don't record item details, so to look at properties like item type, location, branch, etc. we have to join to the items table. But the items table is for an item's current status, not its historical status at the time the checkout occurred .
Consider this example: a faculty member puts an item on reserve which we implement by changing the item type to "2DAYRESERVE" and that changes the circulation rules for the item. Several students check out the item. After the term ends, the title is returned to the main shelves and it's item type reset to "BOOK". A report which uses the "issues" table would show this as a "BOOK" checkout, not a "2DAYRESERVE" checkout.
That's where the "statistics" table comes in. It records the item type, shelving location, branch, collection code, and patron category at the time of the checkout.
Checkouts by branch
SQL by Nicole C. Engard, ByWater Solutions
SELECT branch,
MONTH(datetime) AS month,
YEAR(datetime) AS year,
COUNT(datetime) AS checkouts
FROM statistics
WHERE type LIKE 'issue'
GROUP BY branch, year, month
ORDER BY year, month, branch
Here's an example circulation report using the statistics table which was written by Nicole C. Engard. It shows checkouts per month per branch. We closed one of our branches, but this report still shows that branches historical checkouts, because it uses the statistics table instead of issues.
Translate authorized value codes
Fields that use an authorized value as a controlled vocabularly store the Authorized Value code and not it's human readable label, but we can connect to the authorised_values
table.
Authorized values are a sort of controlled vocabulary in Koha. They're used for item types, locations, various item statuses, and more. For instance, the items "itype" report from earlier does not return descriptive labels but terse codes like "LIBUSEBK" (for "Library Use Book") and "NEWPER" (for "New Periodical"). Often, we would prefer to show human-readable labels in reports and we can do that by joining to the "authorised_values" table.
notforloan
example
SELECT i.notforloan as "NFL Code", av.lib as "Label",
COUNT(*) as "Items"
FROM items i
JOIN authorised_values av ON i.notforloan = av.authorised_value
AND av.category = "NOT_LOAN"
GROUP BY i.notforloan
NFL Code
Label
Items
-3
Repair
10
-2
In Processing
5
0
Available
65,735
Here we link items' "not for loan" values to their descriptions. There are at least two things to be aware of when connecting to the authorised_values table. First, there are two description fields, "lib" and "lib_opac". These are rendered in Koha administration as "Description" and "Description (OPAC)"—they are staff versus public descriptions. Also, either of these descriptions can be empty, so be careful to always GROUP BY
the code and not the description. We could provide a default description in the query using an IFNULL
or CASE
statement.
Secondly, multiple authorized value categories use the same values. A "-1" might mean "on order" for notforloan and mean "long overdue" for itemlost. So when we JOIN
to the authorised_values table, we must JOIN
on both the value and the category. JOIN
ing on multiple conditions isn't something I often think of when writing SQL, so it's worth mentioning.
Public JSON data
Make a report's data easily accessible to external programs by setting it to "Public" and copying its JSON URL. The URLs look like:
https://koha.library.org/cgi-bin/koha/svc/report?id=42
Public means public! Don't include confidential information.
Another nifty feature of Koha reports is the ability to make their data easily available to other applications. When you set a report to "Public", other programs can use the report's JSON URL to obtain its data.
"Public" truly means public, so reports that show confidential information about patrons should not be made public. But much of your Koha data is already public, like bibliographic records, and aggregated statistics about records or patrons are also safe. One could envision building a web dashboard using several public Koha reports to show currently checked out titles, number of registered patrons, recent acquisitions, etc.
Add 948 to OPAC
Report with runtime parameter:
JavaScript in OPACUserJS:
Here's a contrived example of how we might use a public report to add a field that's in our MARC records but not displayed to the page. First, we have a simple report which returns the MARC field's value from the database using the ExtractValue
function. I haven't mentioned runtime parameters yet, but they're super useful—they let us change the criteria of a report each time it's executed. Here, the report looks up metadata for a different bib based on its biblionumber using the parameter.
Next, we use Koha's OPACUserJS setting to add some JavaScript that will run on our public catalog records. This code looks up the biblionumber for the current record from the URL, passes that ID to the report, and if we get any data back from the report we label it and append it to the record display. Note that the report's JSON is always an array of arrays; there's one child array for each row the report returns. This code adds the OCLC holdings information contained in the 948 local use tag to the record without needing to modify Koha's XSLT templates.
Reports keep getting better!
ID columns connect to bulk operations
"Update and run" button
Autocompletion
Graphs
I'm done listing my Koha SQL tips, but I want to take some time to appreciate all the excellent work that's going into making reports better. There have been so many quality of life improvements to reports over the past few years.
If your SELECT
statement includes a table's identifier column, like biblionumber, there are now buttons in the report's output that connect to bulk operations. So we can go directly from a report on items to item bulk modifications, for instance.
In addition to the "save" and "run" buttons, there's now an "update and run" button. This may seem like a small deal, but it saves so much time when refining a report, or using my "neverending query" which I edit and rerun frequently.
Field autocompletion was added recently which can really help you type out reports quickly or remember some obscure field names.
Finally, reports that return numeric values can now generate graphs like a pie chart or time series.
#reports
channel in BWS-Partners Slack
For instance, Koha reports won't allow the words UPDATE
, DELETE
, DROP
, INSERT
, SHOW
, or CREATE
anywhere , even in comments or strings. But folks suggested handy tricks like:
SELECT * FROM biblio
WHERE title LIKE CONCAT('%Rocky Horror Picture Sh','ow%')
I mentioned that I am no SQL expert, but the Koha commmunity has some true SQL wizards. The reports channel of the ByWater Partners Slack organization is great for questions or sharing fancy reports. One comical example recently was different ways to work around how Koha reports censor certain SQL keywords for security purposes. If you want to use a word like "show" or "create" in a WHERE
condition, you can use CONCAT
so that the letters don't appear in sequence.
If you're not a ByWater customer, the main Koha Community Slack also has a reports channel for such discussions.