Koha SQL Reports

GiftED Community Session

Eric Phetteplace ephetteplace@cca.edu, Systems Librarian, California College of the Arts

These slides are available at phette23.github.io/koha-gifted-sql/.

Session specifics

Tips and tricks specific to Koha reports but not SQL in general.

There will be some SQL on slides but hopefully nothing too arcane.

I'm no expert! Please chime in with your own ideas in chat.

What's in a table?

See the Reports Library

And Schema schema.koha-community.org/

"Search engine" links to any given table: https://schema.koha-community.org/23_05/tables/%s.html

Let's start with how to find the data we're looking for in Koha's many database tables. The most obvious places to look at are the ones shown right in the reports module; the Reports library and the database schema. The Reports library is a great starting place, because no matter how specific our needs are, there's a good chance someone has written something similar. Most of my reports start by coping something from the library or from our existing reports. Secondly, the schema website is a great resource for understanding the structure of Koha's database. It shows what fields are on what tables, what type of dataa field is—like integer, text, timestamp—and shows relationships between tables. One trick that I use frequently is a Chrome "search engine" that quickly jumps to a particular table in the schema. Browser search engines replace the "%s" token in your search engine URL with whatever you type, so I can type my "sq" keyword and then "items" in my address bar to go to the items table in the schema.

Experimental queries

Peek at what's in a particular table:

SELECT * FROM deletedbiblio

Enumerate the values in a field:

SELECT DISTINCT itype FROM items ORDER BY itype

SELECT DISTINCT itype, COUNT(*) FROM items
GROUP BY itype ORDER BY itype

The schema shows an abstract outline of the database without the actual values from our Koha instance. I commonly use some simple queries to peek at our database contents. The first one is a SELECT * FROM query on a table I'm interested in. This shows what fields are in a table and what kind of data they contain. A useful variant is to SELECT DISTINCT to enumerate all the values a field takes on in our database. We can further refine this by adding a COUNT(*) to see how many times each value appears. This shows the distribution of values in a field. To run these quick, simple queries I have one reusable report I call "Eric's Neverending Query" that I edit and run over and over again. This way our reports library isn't cluttered with one-off queries.

`deleted*` tables

Deleted biblios, metadata, items, and borrowers (patrons) are not removed from the database permanently but transferred to a table with a "deleted" prefix.

Similarly, past issues (checkouts) and holds (reserves) are moved to tables prefixed with "old".

To be accurate, many reports must factor in these "deleted" or "old" tables.

Use a `UNION` in a subquery

All biblios:

SELECT * FROM (
	SELECT * FROM biblio
	UNION
	SELECT * FROM deletedbiblio) b
ORDER BY b.biblionumber

So how do we factor in these deleted tables? We use the SQL UNION operator. UNION combines the results of two SELECT statements with the same shape—the two statements must return the same number of columns and the columns should be of the same type. Of course, the biblio and deletedbiblio tables have the exact same schema, so they work. We put the UNION inside a subquery by wrapping it in parentheses and then we treat the subquery as the table of all our biblios, deleted or not, which we can SELECT fields from, JOIN to other tables, and filter with WHERE criteria.

Annotate combined tables

This combined items table has a "status" column.

SELECT i.status, i.itemnumber, i.itype FROM
	(SELECT "live" as status, i.* FROM items i
	UNION
	SELECT "deleted" as status, di.* FROM deleteditems di) i

status	itemnumber	itype
live	1	BOOK
deleted	2	BOOK

Sometimes, we might want to join a live and deleted table but keep track of which table our records came from. It's possible to sort of "annotate" the combined table by adding a column that indicates the source of the record. Here's an example where we combine the items and deleteditems tables and add a "status" column that tells us whether the record is from the live or deleted table. Because we added the status column to both the items and deleteditems tables, they still have the same shape, and the UNION works.

Annual bibs added

SELECT YEAR(datecreated) as Year, COUNT(*) as "Records Added"
FROM (
  SELECT biblionumber, datecreated FROM biblio
  UNION
  SELECT biblionumber, datecreated FROM deletedbiblio
) b
WHERE YEAR(datecreated) BETWEEN 2016 AND YEAR(CURDATE())
GROUP BY YEAR(datecreated)

`statistics` versus `issues` table

Rather than combine issues & oldissues for circulation figures, use statistics.

The issues tables have no item information & JOINing them to the items tables does not account for changes in item type or location since the checkout occurred.

Intuitively, when we build a circulation report, we might turn to the "issues" tables that hold old and current checkouts. But the issues don't record item details, so to look at properties like item type, location, branch, etc. we have to join to the items table. But the items table is for an item's current status, not its historical status at the time the checkout occurred. Consider this example: a faculty member puts an item on reserve which we implement by changing the item type to "2DAYRESERVE" and that changes the circulation rules for the item. Several students check out the item. After the term ends, the title is returned to the main shelves and it's item type reset to "BOOK". A report which uses the "issues" table would show this as a "BOOK" checkout, not a "2DAYRESERVE" checkout. That's where the "statistics" table comes in. It records the item type, shelving location, branch, collection code, and patron category at the time of the checkout.

Checkouts by branch

SQL by Nicole C. Engard, ByWater Solutions

SELECT branch,
MONTH(datetime) AS month,
YEAR(datetime) AS year,
COUNT(datetime) AS checkouts
FROM statistics
WHERE type LIKE 'issue'
GROUP BY branch, year, month
ORDER BY year, month, branch

Translate authorized value codes

Fields that use an authorized value as a controlled vocabularly store the Authorized Value code and not it's human readable label, but we can connect to the authorised_values table.

`notforloan` example

SELECT i.notforloan as "NFL Code", av.lib as "Label",
COUNT(*) as "Items"
FROM items i
JOIN authorised_values av ON i.notforloan = av.authorised_value
  AND av.category = "NOT_LOAN"
GROUP BY i.notforloan

NFL Code	Label	Items
-3	Repair	10
-2	In Processing	5
0	Available	65,735

Here we link items' "not for loan" values to their descriptions. There are at least two things to be aware of when connecting to the authorised_values table. First, there are two description fields, "lib" and "lib_opac". These are rendered in Koha administration as "Description" and "Description (OPAC)"—they are staff versus public descriptions. Also, either of these descriptions can be empty, so be careful to always GROUP BY the code and not the description. We could provide a default description in the query using an IFNULL or CASE statement. Secondly, multiple authorized value categories use the same values. A "-1" might mean "on order" for notforloan and mean "long overdue" for itemlost. So when we JOIN to the authorised_values table, we must JOIN on both the value and the category. JOINing on multiple conditions isn't something I often think of when writing SQL, so it's worth mentioning.

Public JSON data

Make a report's data easily accessible to external programs by setting it to "Public" and copying its JSON URL. The URLs look like:

https://koha.library.org/cgi-bin/koha/svc/report?id=42

Public means public! Don't include confidential information.

Add 948 to OPAC

Report with runtime parameter:

JavaScript in OPACUserJS:

Here's a contrived example of how we might use a public report to add a field that's in our MARC records but not displayed to the page. First, we have a simple report which returns the MARC field's value from the database using the ExtractValue function. I haven't mentioned runtime parameters yet, but they're super useful—they let us change the criteria of a report each time it's executed. Here, the report looks up metadata for a different bib based on its biblionumber using the parameter. Next, we use Koha's OPACUserJS setting to add some JavaScript that will run on our public catalog records. This code looks up the biblionumber for the current record from the URL, passes that ID to the report, and if we get any data back from the report we label it and append it to the record display. Note that the report's JSON is always an array of arrays; there's one child array for each row the report returns. This code adds the OCLC holdings information contained in the 948 local use tag to the record without needing to modify Koha's XSLT templates.

Reports keep getting better!

ID columns connect to bulk operations

"Update and run" button

Autocompletion

Graphs

a pie chart of items per library branch showing 81.3% of items are in SF, 15.8% are in Oak, and 2.9% in MATLIB.

I'm done listing my Koha SQL tips, but I want to take some time to appreciate all the excellent work that's going into making reports better. There have been so many quality of life improvements to reports over the past few years. If your SELECT statement includes a table's identifier column, like biblionumber, there are now buttons in the report's output that connect to bulk operations. So we can go directly from a report on items to item bulk modifications, for instance. In addition to the "save" and "run" buttons, there's now an "update and run" button. This may seem like a small deal, but it saves so much time when refining a report, or using my "neverending query" which I edit and rerun frequently. Field autocompletion was added recently which can really help you type out reports quickly or remember some obscure field names. Finally, reports that return numeric values can now generate graphs like a pie chart or time series.

`#reports` channel in BWS-Partners Slack

For instance, Koha reports won't allow the words UPDATE, DELETE, DROP, INSERT, SHOW, or CREATE anywhere, even in comments or strings. But folks suggested handy tricks like:

SELECT * FROM biblio
WHERE title LIKE CONCAT('%Rocky Horror Picture Sh','ow%')

I mentioned that I am no SQL expert, but the Koha commmunity has some true SQL wizards. The reports channel of the ByWater Partners Slack organization is great for questions or sharing fancy reports. One comical example recently was different ways to work around how Koha reports censor certain SQL keywords for security purposes. If you want to use a word like "show" or "create" in a WHERE condition, you can use CONCAT so that the letters don't appear in sequence. If you're not a ByWater customer, the main Koha Community Slack also has a reports channel for such discussions.

Questions?

Thanks for coming! These slides are available at phette23.github.io/koha-gifted-sql/.

Eric Phetteplace

Systems Librarian

California College of the Arts

libraries.cca.edu

@phette23@digipres.club

phette.net

@phette23 on GitHub