Hi all,
While I am certainly not new to databases, I am new to OpenEdge. I have a 10.2B database I'm querying via the SQL Client access driver (JDBC) and I'm having some troubles getting the data the way I'd like it. Hopefully one of you can give me some direction.
I'm actually querying a table that contains customer names and various contact methods that have been employed to send them notifications. If a given customer has been contacted via email and via text message, their name will appear twice, as two different rows. The problem is, I don't want duplicates. In this example, I only want one mention of this customer in my result set. I don't care of it is the email response record or the text message response record -- I just need one or the other, but not both.
A "SELECT DISTINCT" query on only one field does the trick with a traditional SQL database. In OpenEdge, it appears that a SELECT DISTINCT only applies if ALL of the fields are exactly the same in one or more records (meaning duplicate rows).
Consider this example:
+----------+----------------+----------+-------------------------+ | CUST_NBR | CUST_NAME | CATEGORY | TIMESTAMP | +----------+----------------+----------+-------------------------+ | 3456 | John Doe | Static | 2016-04-25 08:17:30.134 | +----------+----------------+----------+-------------------------+ | 2789 | George Orwell | Dynamic | 2016-04-25 08:17:32.945 | +----------+----------------+----------+-------------------------+ | 3456 | John Doe | Static | 2016-04-25 08:17:37.256 | +----------+----------------+----------+-------------------------+
Suppose I want to grab only one record per customer (so only one John Doe). The problem is, because the timestamp (which I need) is different from the next row, I still get both John Doe records in my result set. How would I get just one John Doe record (I don't care which)?
Hopefully all that makese sense. Could any of you give me any insight into this problem
I don't understand the remark about needing a timestamp and not caring about what John Doe record you get ? I mean, sound like you don't care about timestamp either, as they are different ?
SQLExplorer>select distinct name from pub.customer;
name
----------------
John Doe
Someone
<end>
On its face, it may not make a ton of sense, but there really is a reason.
This particular table is used to keep track of customer notifications. Based on the contact preferences of the user (email, sms/text, automated phone call, etc), every notice sent to them could result in 1 or more entries in this table. My query doesn't care if one customer was only emailed and another customer was sent an email, sms/text and call. All my query cares about is who was notified -- at all.
Right now I'm eliminating duplicate entries at the script/application level after the data has been pulled from the database (I only want to process one notification per customer). I'd like to have the database do that for me, if possible. It would greatly simplify things for me. Basically I only want to eliminate duplicates on one field/column, not on the entire record. Maybe OpenEdge won't do that.
Once I've processed the table, I'm recording the notification's timestamp, which helps me identify later which notifications I have already analyzed and processed. If the table were my creation, I could adjust the format of the table to suit my needs, but in this case, it's not (it's actually part of a product being provided by a vendor of ours).
Does that make a little more sense? My database experience is largely in MySQL and PostgreSQL (I'm a server/network guy, not a DBA), and I can easily accomplish what I want with that -- I'm just not very experienced with OpenEdge, so I don't know how to formulate the question in OpenEdge lingo.
This may make better sense if I put it this way:
+----------+-----------+----------+-------------------------+ | CUST_NBR | CUST_NAME | TYPE | TIMESTAMP | +----------+-----------+----------+-------------------------+ | 4424 | Mary | email | 2016-04-25 08:16:23.238 | +----------+-----------+----------+-------------------------+ | 3456 | John | email | 2016-04-25 08:17:30.134 | +----------+-----------+----------+-------------------------+ | 2789 | George | email | 2016-04-25 08:17:32.945 | +----------+-----------+----------+-------------------------+ | 4424 | Mary | text/sms | 2016-04-25 08:24:54.113 | +----------+-----------+----------+-------------------------+ | 3456 | John | text/sms | 2016-04-25 08:17:37.256 | +----------+-----------+----------+-------------------------+ | 4424 | Mary | call | 2016-04-25 08:27:33.468 | +----------+-----------+----------+-------------------------+
A result set like this from the above table would suit my purposes:
+----------+-----------+-------------------------+ | CUST_NBR | CUST_NAME | TIMESTAMP | +----------+-----------+-------------------------+ | 2789 | George | 2016-04-25 08:17:32.945 | +----------+-----------+-------------------------+ | 3456 | John | 2016-04-25 08:17:37.256 | +----------+-----------+-------------------------+ | 4424 | Mary | 2016-04-25 08:27:33.468 | +----------+-----------+-------------------------+
Thank you!
Seems like, at the very least you would want the latest notification ...
Are you telling me there's no way to do this in OpenEdge?
Steve,
That does group things together, as one would expect with a group statement, but using my example above, I want no rows with duplicate CUST_NUM values. In the above example, there are three rows with a CUST_NUM value of 4424 (Mary) and two rows with a CUST_NUM value of 3456 (John).
I'd like to consolidate those results so that I'm only getting one row for 4424 (Mary) and only one row for 3456 (John). In a PostgreSQL or MySQL world, a 'SELECT DISTINCT' would do the trick. The problem is that (as I understand it), because OpenEdge thinks in terms of rows and not data sets, it just doesn't process this way.
Does that make sense?
Did you try Steve's code? ABL tends to be row oriented, although there are more set-oriented features now, but ABL is *entirely* separate from SQL and SQL is SQL. It may be missing some feature of SQL that you are used to in another implementation, but it acts like SQL.
OK - looked more closely at your example:
Select cust_nbr, category, min(timestamp)
From my_table
Group by cust_nbr, name
That should give, I believe, your preferred result in your example.
This assumes that each cust# has exactly one name value. Their are not multiple names for a given cust#.
note - select distinct and grouping are logically equivalent if grouping by all the output columns with no aggregation.
....steve pittman
> On Apr 26, 2016, at 5:02 PM, Thomas Mercer-Hursh wrote:
>
> It may be missing some feature of SQL that you are used to in another implementation, but it acts like SQL.
The beauty of stadards. Among many other things, when you use SELECT DISTINCT, the SQL standard leaves to the implementation what happens to the other column values. Some do it this way and others that way.