Data schema, Phase 1

a brown corkboard littered with random photos, notes, and clippings held in place with colorful plastic pins, with red yarn strung between the pins connecting different articles. In the middle, a clipping says "seek for the truth."
Image by Freepik

When I was at ASECS 2024, I got a question after my presentation about how I was organizing my data. I was thinking he meant how I was setting everything up overall, for example what schema I was going to use, and I sort of only half answered him because I don’t entirely know what I’m doing for the overall project. I have realized, however, that I complicated the question far more than it required. I also realized that wasn’t something I’d talked about here, and it likely should have been. I feel very strongly that transparency in what and how and why is important — a belief that was strengthened after a really strong presentation from the great scholars behind the Women’s Print History Project about the need for DH documentation that discusses the “why” as well as the “how.”

As I’ve mentioned before, this is a project with multiple phases. Phase One involved using OCLC Worldcat and the ESTC to locate individual copies and possibly previously missed or spurious publication information. I started with the excellent list of Lennox publications in Charlotte Lennox: An Independent Mind by Susan Carlile, which gave me a huge headstart. That’s where the maps on this site came from — an early effort to identify locations and see where clusters of copies might have been located. The limitations of Google Maps, though, specifically in the number of points per map, made this of limited usefulness. I may yet go back and try to dump all of it into something like Tableau and get a master map, but I haven’t yet. Probably not until I’ve finished cleaning up the data set.

There was then an interim stage I consider Phase 1.5, in which I took the data out of Google Maps and put it into an Excel spreadsheet. The data as it stood was in the following categories:

  • ID #: this helped differentiate records so I could move them into a relational database eventually
  • Longitude: location info courtesy Google Maps
  • Latitude: location info courtesy Google Maps
  • Library Name: the name of the library with the holdings (preferably not just the special collections dept)
  • Description: intended to describe something about the book, such as edition # or suspected piracy, not the library
  • Affiliation: what institution is the library affiliated with, if any
  • Designation:(what sort of library/institution is it — this is the muddiest category because sometimes it refers to the library (unaffiliated) and sometimes to the institution (affiliated)
  • Pub Year: publication year
  • Location: where was the book published
  • Bookseller: who was the bookseller/publisher
  • Title: Title of the book/periodical

Now, as to why I chose to keep track of all this stuff, particularly about the institutions, part of it was because I thought it would be interesting to track this and see how the spread went. Part of it was because I thought it would help me track down funding to visit collections. And the last part of it, I think, is because I didn’t want to go back and add it in later in case I regretted not having it to begin with. It’s easier to cut info than to go back and add it in across all the records.

I’ve held to that philosophy as I’ve gone — I’d rather get all the information plus some and end up with something unnecessary than I would want to record less than I’d end up wanting. There is absolutely a point of diminishing returns, here, naturally, but it’s balanced by the realization that, for most of these copies, I will get only one bite at the apple. If I miss something or realize later what I needed, I may be able to get it by asking the librarians, but I likely won’t get a second visit to see it for myself. This is one reason that my local copies are my test cases, to ensure what information I’m recording and why before I start trying to build my photo album of all the special collections rooms I get to visit.

I hope this proves helpful to someone — I’m happy to provide more information or answer questions as needed. I’ll move toward describing Phase Two in an upcoming post.

State of the Project, September 2023

Greetings, gentle readers! September finds me pushing forward still, albeit a bit more slowly due to general life issues and a lot of time dedicated to sorting through the works housed at the Beinecke Rare Book and Manuscript Library at Yale. Needless to say, we’re going to have to spend some time there in the future, to no one’s surprise.

I currently have 30 libraries verified, having removed two or three so far that ostensibly had only a couple of items at most, and they turned out to either simply not be there, to belong to an affiliated library on the same system, or to be online or microfom versions of the work. I am looking at applying to the Lewis Walpole Library Fellowship this year, along with perhaps a local fellowship that might cover some gas money for libraries near to hand.

Number of libraries confirmed: 34
Number of libraries entered into the database: 2
Number of extant copies confirmed: 148

State of the Project, August 2023

Photo by Luis Zheji on Pexels.com

Greetings, gentle readers! The end of Hot Data Summer is upon us, and I have nearly finished all my class prep for the next semester’s teaching. Around and among and before that, I’ve been busily embarking on Phase 2.2 of the project, which as stated in July’s update, involves breaking out the data by library/institution (it depends on the nature of the organization and its libraries — there’s a system, I promise) and verifying holdings via catalog searches and/or contacting the library directly in some cases.

Thus far I’ve completed a mere twenty libraries, but that’s still served to provide some interesting insights. I have eliminated some prospective holdings (either they don’t exist or were online access only), but I’ve uncovered at least as many that simply weren’t in ESTC or Worldcat when I used it, for whatever reason. I knew there would be missed volumes, so that isn’t that surprising, but the number and type of them is still intriguing. As an example, an early data point (we’ll see if it holds) is that out of those 20, six libraries have multiple pre-1850 editions of the Memoirs of the Duke of Sully. What does that mean? I’m not sure, but it’s something to ponder and look into further if it holds up.

As far as the database design goes, I’ve put it aside for the moment. I could, in theory, enter holdings in as I confirm them (almost certainly a good idea, now that I think about it) but I would like to get a bit more done in confirmation first, and then perhaps have phases of entering data as opposed to a more constant back and forth.

Number of libraries confirmed: 20
Number of libraries entered into the database: 2
Number of extant copies confirmed: 77

State of the Project, July 2023

Marbling from the end papers of a fantastic book at the Library Company of Philadelphia

Greetings, gentle readers. I am thrilled to share the news that I’ve finished Phase 2.1! I’ve finally completed* aggregating all the map location data into one huge Excel workbook. Now I can pull it all together into a single worksheet and create some pivot tables to help me cross reference locations. I have also added all the rest of the maps into the Maps page here on the site, for anyone who’s interested in seeing them. Just as a note, the maps are cleared of duplicates, but do not yet represent verified holdings.

The next step is Phase 2.2, wherein I put all the data in a single sheet, create a massive pivot table, and break out the results by library/institution so I can see which institutions have what books and start planning the in-person gathering of bibliographic data, as well as the applications to fund the travel required to visit those collections. I’m not sure how long that portion of the project will take, but I think it should be considerably less than the previous phase if only due to the relative lack of data entry.

I also need, while this process continues, to start finalizing some decisions about the Heurist database I’m using. I’ve started working on importing some data and creating the structure and relations based on the data I have. I’m still very much in the mess-around stage of making the database — there are no permanent decisions in place yet. I’m happy with how things are shaping up, though.

Finally, as a few data points:

  • The last title I put into the workbook was The History of Eliza.
  • Since June’s update, i’ve entered 472 entries across four titles, including the most famous of Lennox’s works, The Female Quixote (which had 309 entries).
  • 2612 records total at the end of Phase 2.1.

*I fully expect to find material that I’ve accidentally left out or overlooked. No process is perfect, after all.

State of the Project, June 2023

A green landscape, looking out from a shady wooded area into a sunlit yard, with a large statement tree off to the right side and a wooden fence in the distance.

The view from the back of my house

Greetings, gentle readers! The summer has almost returned, and I’m managing to make this post mid-month as opposed to nearly-done-month. Overall I’m quite pleased with my industry.

Insofar as the project goes, finishing the semester has done wonders for my ability to keep working on my data. I completed working on the Marquis de Sully finally and was able to likewise finish Old City Manners, Philander, Poems Upon Several Occasions*, and Shakespear Illustrated, the latter just this evening. I’m very happy with the rate of progress I’m making.

In addition to working through the data and cleaning it up, I’m currently trying to work through two different problems. The first thing I’m trying to sort out is regarding periodical reprints of Lennox’s work. I want to catalog not simply the stand-alone volumes of her works, but also the various reprints, both partial and complete, of her work in periodicals of the time. The problem is, how do I track them? Using the Lennox bibliography in Susan Carlile’s book, Charlotte Lennox: An Independent Mind, I have a list of excerpts in various publications.

The question, though is this: 1) is that actually all of them? and 2) (and this is a big one) how do I treat these periodicals within the same project as more traditional codex books? Lennox even had her own periodical, The Lady’s Museum. The same periodical may have (and in some cases does have) multiple samples from her various works across time. How do I record that data so that nothing is lost and yet I’m also not doubling my own effort? It’s not that this is a particularly complicated problem; it’s just that the solution I pick will necessarily inform the shape of the project as it goes, so I’d rather do my best to choose something that won’t cause problems later if I can.

The next issue, mostly unrelated to the above procedural quandary, is how to set up my database so that different records can have the same title without it being a gigantic mess. This is actually the easier one to answer, most likely, as I’m sure I’m not the first person building a relational data structure to have multiple entries with the same name, for example, but different data attached to each one. I’m working on doing some reading and I’ve got some feelers out with some data-oriented DBA people I know, and I’ll likely have an answer to this later this month. Once I do, I can keep working on the structure on my Heurist database and importing the material I’ve currently got in spreadsheets. In the meantime, I’ll keep working on the organization and getting the location data sorted, with the goal of being finished with it and organizing the next phase of the project by the fall, aka library fellowship application season.

Current Data Category: Shakespear Illustrated
# of entries in this category to date: 126
# of entries in the worksheet so far: 2140 and counting

State of the Project, May 2023

red tulips next to a stone against a field of brown mulch

Tulips from my garden before the deer ate them.

Here it is, the middle of May already. Where is the April post, you may ask? Well, the April post sadly went the way of the rest of my month of April, swallowed whole by the end of the semester and grading. I got nothing done on the project to speak of in April, though I did find my way into some interesting discoveries.

At the beginning of April/end of March, I attended the ASECS (American Society for Eighteenth Century Studies) annual conference, this year held in St. Louis. While I was there I went to a fantastic panel (okay, many fantastic panels) but this one in particular discussed a very interesting potential path forward for the Lennox project. This panel included a paper by Norbert Schürer (CSULB), who was discussing a digital humanities project being created using the Heurist platform — a customizable relational database system that was designed for Humanities research. The platform is free to use, hosted by the University of Sydney. It is based in MySQL, which means that it’s easy to export to somewhere else for hosting or other purposes, and it’s going to be simple to transfer to new homes and interfaces down the line. It can also generate a website interface and has mapping and network visualization capabilities.

No one else has, up to date, used the platform for a descriptive bibliography, so a lot of the relationships and information types I need for my project do not yet exist. Before I start putting in extensive book data, however, I want to take the information I do have and create a locational database that takes the map data sets I’ve created and pulls it together for more effective research planning. To that end, I’ve created a test database and been futzing around with it in my spare time, which has not been terribly plentiful over the past month but should ease up considerably over the summer.

I was torn for a time on how to proceed, as it might be less time consuming simply to switch over to inputting data into the database directly. I think I’ve decided, though, to continue putting entries into the spreadsheets for now while I try to figure out the structures I need in Heurist and build something useful. To that end, I’ve started inputting data again and am nearly done with the Marquis de Sully, which is a relief. I’ll keep you posted on how it all goes.

Current Data Category: Memoirs of the Duke de Sully translation
# of entries in this category to date: 953
# of entries in the worksheet so far: 1505 and counting

State of the Project, March 2023

My dog Noodle, basking in a sunbeam with his blanket.

Welcome back! It’s a bit past the middle of March, Spring has officially sprung for what it’s worth, and Noodle is back to enjoying his sunbeams and blanket in the mornings in our household library.

March saw me attending the annual American Society of Eighteenth Century Studies (ASCES) meeting, held this year in St. Louis, Missouri. The conference went really well, all things considered, and I was glad to have the chance to present a bit of my own work and hear about the awesome things others are doing. I attended a fantastic play reading by the Theater and Performance Caucus, went to a lot of great panels, and in particularly the Bibliographical Society of America panel left me with a LOT to think about.

Specifically, I’m reaching the point in my spreadsheet work that I’m thinking about where the work goes from here and what format is next. At that panel, at a talk by Norbert Schürer, I learned about Heurist, an open access relational database setup designed for humanities research, hosted and overseen at the University of Sydney. I’ve started poking at it since ASECS, and I’m both overwhelmed by and excited about the possibilities. I decided to keep bringing info into my spreadsheets for now as I can manipulate it and export it to the database, which will be faster than putting it all in by hand later (I’ll likely have to do a lot of editing of records and making links, but that’s still less work than inputting everything by hand again).

Progress proceeds apace on record data entry. I’m still working on the Marquis de Sully records, but I’m now onto Map 2 (!) and halfway done with it. Onward!

Current Data Category: Memoirs of the Duke de Sully translation
# of entries in this category to date: 720
# of entries in the worksheet so far: 1272 and counting

State of the Project, August 2022

First of all, I am so very pleased to announce that I’ve received the 2022-2023 Helen F. Faust Women’s Writers Research Award from the Penn State Special Collections library. I’ll be traveling to the Eberly Family Special Collections Library next week to dig into their Charlotte Lennox holdings, starting the archival research in earnest. All my deepest thanks to the Penn State Libraries for helping fund this travel and research, and I can’t wait to see what we find!

The amusing part of all this, of course, is that — proceeding in an orderly planned fashion — I assumed next summer would be the beginning of my archival research, and I had tons of time to plan out the scope and types of data and do some trial runs on local holdings. And thus, to paraphrase the poets, does fate make fools of us all. I’m therefore doing all my data planning this week, then, and figuring out how I’m going to record it, what I want to take note of, where I want to store it, and how I’m going to eventually put it all together, as least to the extent that I can without having done it (which means it’ll absolutely change between now and later).

I’m recording the trials and tribulations of the project, by the way, not because I particularly feel they’re worthy of recording for the sake of the project, but because I want people who take on similar projects to be able to look back at this and know they aren’t alone. I believe very strongly in breaking out of the strictures of “professionalism” and the gatekeeping they enact. To be “professional” too often means to speak only of your successes, downplay your failures or challenges, and deny weakness or missteps. The parts we edit out, though, in order to achieve that seamless appearance, are where the opportunities for growth and the useful case studies for others happen to be. I’m under no illusion that this blog will be a source of fascinating reading material for a huge audience, but my hope is that for those who need it or like it, it will serve to light their own path a little, if only dimly.

Back to business. I’ve finished importing The Life of Harriot Stuart and I’ve been working on Henrietta for a while now — see previous posts about the better selling books taking far longer. I’ve also added the Henrietta maps to the website. As a note, this process of importing might take longer than one would ostensibly wish, but it’s already helped me locate some discrepancies and repeated data points, which I am correcting in the maps as well when I find them. I’ve also discovered, for anyone playing along at home, that my KMZ-to-CSV converter does not know what to do with a layer in Google Maps that has symbols in it, like ” or , or so forth. It therefore simply does not extract that layer, which means I have to go in and enter it by hand (which fortunately I can do, having the maps to hand). Things to know for the future, I suppose.

Next up: Finish Henrietta, move on to the next item on the list, travel, do archival stuff, take lots of notes and images, start creating the archive!

ASECS 2022 follow-up

As previously mentioned, I had the opportunity to talk about the project in a session at the American Society for Eighteenth Century Studies (ASECS) annual conference this past week. The panel was hosted by the Digital Humanities caucus, one of two back-to-back sessions titled “Centering Marginalized Voices in Digital Humanities Projects.” There were so many amazing projects discussed, and I was honored to be a part of it. The image to the right is the poster I presented, which I’m sharing here as well. You can also download this poster at the link below if you’re interested in getting a better look at it.

Among the many great projects discussed in that session, I’d like to give a shout out in particular to The Lady’s Museum Project, which is a digital version of Lennox’s periodical The Lady’s Museum. The editors behind this collaborative project to create the first ever critical edition of TLM are Karenza Sutton-Bennet and Kelly Plante, both of whom are brilliant, passionate scholars. I urge you to go take a look at their fantastic work.

Poster presented at ASECS 2022
giving a project overview