How do I check for the existence of a particular set of column names amongst all tables?
It's straight-forward how to find one column name:
SELECT utc.table_name
FROM user_tab_columns utc
WHERE utc.column_name = 'FUND_ID'
ORDER BY utc.table_name;
TABLE_NAME
------------------------------
ANNUITY_ALLOCATION
ANNUITY_FEE
ASSIGNMENT
BENCHMARK_PROCESSING
BENCHMARK_WEIGHT
CATEGORY_LINK
COLLECTION_LINK
COMPOSITE_LINK
DAILY_ASSET
DAILY_PROFILE
DAILY_PROFILE_EXTRA
...
If I need to find two such column names, I suppose I could do a self-join:
SELECT t1.table_name, t1.column_name, t2.column_name
FROM user_tab_columns t1
INNER JOIN
user_tab_columns t2
ON t1.table_name = t2.table_name
WHERE t1.column_name = 'FUND_ID'
AND t2.column_name = 'SECURITY_ID'
ORDER BY t1.table_name;
TABLE_NAME COLUMN_NAME COLUMN_NAME
------------------------------ -------------------- --------------------
EXCEPTION_STORE FUND_ID SECURITY_ID
SECURITY FUND_ID SECURITY_ID
V_DATA_EXCEPTION FUND_ID SECURITY_ID
V_HOLDING_RATE FUND_ID SECURITY_ID
V_HOLDING_REPORT FUND_ID SECURITY_ID
V_TRANS_REPORT FUND_ID SECURITY_ID
But how I do extend this to 3, 4 or 5 columns? Simply adding self-joins doesn't seem to scale particularly well.
What I really need is to obtain the set of all column names as a single column per table name, then somehow cross-reference those against a set of desired column names.
Fortunately, Oracle supports the concept of a multiset. This operator allows me to quickly perform set operations between two sets.
To do this, I first need a schema-level collection type:
CREATE OR REPLACE TYPE varchar2_nt IS TABLE OF VARCHAR2(4000);
/
4,000 bytes of course being the maximum allowable string size in SQL.
Now, how do I get the set of all column names for each table? Like this:
SELECT ut.table_name
,(SELECT CAST(
MULTISET(SELECT utc.column_name
FROM user_tab_columns utc
WHERE utc.table_name = ut.table_name) AS varchar2_nt
)
FROM DUAL)
column_list
FROM user_tables ut
ORDER BY ut.table_name;
TABLE_NAME COLUMN_LIST
------------------------------ ----------------------------------------
ACCOUNT VARCHAR2_NT('IS_NONAGS', 'ACCOUNT_ID', '
EXTERNAL_ID', 'NAME', 'DESCRIPTION', 'AC
COUNT_CATEGORY', 'RECONCILIATION_RULE',
'PERIOD_BASED', 'FORCE_SHARE_CLASS')
ACCOUNT_GROUP VARCHAR2_NT('ACCOUNT_GROUP_ID', 'GROUP_N
AME')
ACCOUNT_GROUP_LINK VARCHAR2_NT('ACCOUNT_GROUP_ID', 'ACCOUNT
_ID', 'SORT_ORDER')
The input to MULTISET in this situation is a query. If the query returns no rows, you get an empty collection. The subquery-in-a-column is called a scalar subquery. It must return precisely one value (or NULL). The collection of column names is that one value!
Now that I have a collection of column names per table, how can I scan it for one, two or more column names all at once?
I can perform the intersection of two such collections. The constructor for my schema-level type can be used to provide a hard-coded list, for example:
varchar2_nt('FUND_ID', 'SECURITY_ID', 'END_DATE')
Now I can do a MULTISET INTERSECT on this set and each column name set:
SELECT ut.table_name
,(SELECT CAST(
MULTISET(SELECT utc.column_name
FROM user_tab_columns utc
WHERE utc.table_name = ut.table_name) AS varchar2_nt
)
FROM DUAL)
MULTISET INTERSECT varchar2_nt(
'FUND_ID'
,'SECURITY_ID'
,'END_DATE'
)
column_list
FROM user_tables ut
ORDER BY ut.table_name;
TABLE_NAME COLUMN_LIST
------------------------------ ----------------------------------------
ACCOUNT VARCHAR2_NT()
ACCOUNT_GROUP VARCHAR2_NT()
ACCOUNT_GROUP_LINK VARCHAR2_NT()
ACCOUNT_MAP VARCHAR2_NT()
ACCOUNT_SUMMARY VARCHAR2_NT()
ACCOUNT_VECTORS VARCHAR2_NT()
ACCOUNT_VIEW VARCHAR2_NT()
ALERT VARCHAR2_NT()
ALERT_USER VARCHAR2_NT()
ANNUITY_ALLOCATION VARCHAR2_NT('FUND_ID')
ANNUITY_FEE VARCHAR2_NT('FUND_ID')
This unfortunately gives me an empty collection for those tables without any of these columns, and partial results for 1, 2, or 3 columns. I need some way to discern the fully populated collections from the rest.
For this, I can use the CARDINALITY function. I want only those tables that have all three columns:
SELECT *
FROM (SELECT ut.table_name
,(SELECT CARDINALITY(
CAST(
MULTISET(SELECT utc.column_name
FROM user_tab_columns utc
WHERE utc.table_name = ut.table_name) AS varchar2_nt
)
MULTISET INTERSECT varchar2_nt(
'PORTFOLIO_ID'
,'SECURITY_ID'
,'END_DATE'
)
)
FROM DUAL)
num_columns
FROM user_tables ut) t
WHERE t.num_columns = 3
ORDER BY t.table_name;
TABLE_NAME NUM_COLUMNS
------------------------------ -----------
SECURITY_ENHANCEMENT 3
If I were doing this in a PL/SQL block, I'd avoid the hard-coded constant by defining the collection once:
DECLARE
arr_collist varchar2_nt
:= varchar2_nt(
'PORTFOLIO_ID'
,'SECURITY_ID'
,'END_DATE'
);
BEGIN
FOR rec
IN (SELECT *
FROM (SELECT ut.table_name
,(SELECT CARDINALITY(
CAST(
MULTISET(
SELECT utc.column_name
FROM user_tab_columns utc
WHERE utc.table_name = ut.table_name
) AS varchar2_nt
)
MULTISET INTERSECT arr_collist
)
FROM DUAL)
num_columns
FROM user_tables ut) t
WHERE t.num_columns = CARDINALITY(arr_collist)
ORDER BY t.table_name) LOOP
DBMS_OUTPUT.put_line(rec.table_name);
END LOOP;
END;
/
SECURITY_ENHANCEMENT
Now suppose I want to see which tables share the most number of common column names. I can take the intersection of the self-join of all column lists and compare each table's column list to one another; then sort by the number of common column names.
SELECT table_name1
,table_name2
,(column_list1 MULTISET INTERSECT column_list2) column_list3
,CARDINALITY(column_list1 MULTISET INTERSECT column_list2) card
FROM (SELECT ut1.table_name table_name1
,(SELECT CAST(
MULTISET(SELECT utc.column_name
FROM user_tab_columns utc
WHERE utc.table_name = ut1.table_name) AS varchar2_nt
)
FROM DUAL)
column_list1
,t2.table_name2
,t2.column_list2
FROM user_tables ut1
INNER JOIN
(SELECT ut2.table_name table_name2
,(SELECT CAST(
MULTISET(
SELECT utc.column_name
FROM user_tab_columns utc
WHERE utc.table_name = ut2.table_name
) AS varchar2_nt
)
FROM DUAL)
column_list2
FROM user_tables ut2) t2
ON ut1.table_name != t2.table_name2)
WHERE CARDINALITY((column_list1 MULTISET INTERSECT column_list2)) > 0
ORDER BY card DESC;
TABLE_NAME1 TABLE_NAME2
------------------------------ ------------------------------
COLUMN_LIST3
--------------------------------------------------------------------------------
CARD
----------
MONTHLY_RETURN MONTHLY_RETURN_MATRIX
VARCHAR2_NT('FUND_ID', 'PROFILE_DATE', 'ROR_1M_C_NOLOAD', 'ROR_1M_G_NOLOAD', 'RO
R_2M_C_NOLOAD', 'ROR_3M_C_NOLOAD', 'ROR_4M_C_NOLOAD', 'ROR_5M_C_NOLOAD', 'ROR_6M
_C_NOLOAD', 'ROR_7M_C_NOLOAD', 'ROR_8M_C_NOLOAD', 'ROR_9M_C_NOLOAD', 'ROR_10M_C_
NOLOAD', 'ROR_11M_C_NOLOAD', 'ROR_YTD_C_NOLOAD', 'ROR_YTD_G_NOLOAD', 'ROR_1Y_C_N
OLOAD', 'ROR_2Y_A_NOLOAD', 'ROR_2Y_C_NOLOAD', 'ROR_3Y_A_NOLOAD', 'ROR_3Y_C_NOLOA
D', 'ROR_4Y_A_NOLOAD', 'ROR_4Y_C_NOLOAD', 'ROR_5Y_A_NOLOAD', 'ROR_5Y_C_NOLOAD',
'ROR_6Y_A_NOLOAD', 'ROR_7Y_A_NOLOAD', 'ROR_8Y_A_NOLOAD', 'ROR_9Y_A_NOLOAD', 'ROR
_10Y_A_NOLOAD', 'ROR_10Y_C_NOLOAD', 'ROR_15Y_C_NOLOAD', 'ROR_15Y_A_NOLOAD', 'ROR
_20Y_C_NOLOAD', 'ROR_20Y_A_NOLOAD', 'ROR_25Y_C_NOLOAD', 'ROR_25Y_A_NOLOAD', 'ROR
_INCP_C_NOLOAD', 'ROR_INCP_A_NOLOAD', 'ROR_ADV_INCP_C_NOLOAD', 'ROR_ADV_INCP_A_N
OLOAD', 'ROR_USER_INCP_C_NOLOAD', 'ROR_USER_INCP_A_NOLOAD', 'ROR_PERF_INCP_C_NOL
OAD', 'ROR_PERF_INCP_A_NOLOAD', 'ROR_1M_C_LOAD', 'ROR_2M_C_LOAD', 'ROR_3M_C_LOAD
', 'ROR_4M_C_LOAD', 'ROR_5M_C_LOAD', 'ROR_6M_C_LOAD', 'ROR_7M_C_LOAD', 'ROR_8M_C
_LOAD', 'ROR_9M_C_LOAD', 'ROR_10M_C_LOAD', 'ROR_11M_C_LOAD', 'ROR_YTD_C_LOAD', '
ROR_1Y_C_LOAD', 'ROR_2Y_A_LOAD', 'ROR_2Y_C_LOAD', 'ROR_3Y_A_LOAD', 'ROR_3Y_C_LOA
D', 'ROR_4Y_A_LOAD', 'ROR_4Y_C_LOAD', 'ROR_5Y_A_LOAD', 'ROR_5Y_C_LOAD', 'ROR_6Y_
A_LOAD', 'ROR_7Y_A_LOAD', 'ROR_8Y_A_LOAD', 'ROR_9Y_A_LOAD', 'ROR_10Y_A_LOAD', 'R
OR_10Y_C_LOAD', 'ROR_15Y_C_LOAD', 'ROR_15Y_A_LOAD', 'ROR_20Y_C_LOAD', 'ROR_20Y_A
_LOAD', 'ROR_25Y_C_LOAD', 'ROR_25Y_A_LOAD', 'ROR_INCP_C_LOAD', 'ROR_INCP_A_LOAD'
, 'ROR_ADV_INCP_C_LOAD', 'ROR_ADV_INCP_A_LOAD', 'ROR_USER_INCP_C_LOAD', 'ROR_USE
R_INCP_A_LOAD', 'ROR_PERF_INCP_C_LOAD', 'ROR_PERF_INCP_A_LOAD', 'RECORD_GUID', '
CREATORDB_ID', 'REVISION_NBR', 'GENERATION_ID', 'REFERENCE_COUNT', 'SYNC_SESSION
_ID')
92
I do get the same pair of tables listed twice, first as table_name1-table_name2 and then as table_name2-table_name1.
I can also use the MULTISET and related operators and functions directly in assignment statements in PL/SQL. For example:
DECLARE
x varchar2_nt := varchar2_nt();
BEGIN
x :=
varchar2_nt(
'a'
,'b'
,'b'
,'c'
)
MULTISET INTERSECT DISTINCT varchar2_nt(
'b'
,'c'
,'c'
,'d'
);
IF (x.COUNT > 0) THEN
DBMS_OUTPUT.put_line('There are ' || x.COUNT || ' common elements:');
FOR i IN x.FIRST .. x.LAST LOOP
DBMS_OUTPUT.put_line(x(i));
END LOOP;
ELSE
DBMS_OUTPUT.put_line('There are no common elements');
END IF;
END;
/
There are 2 common elements:
b
c
If you play around with this, you'll see the intersection of a non-empty set and an empty set is an empty set.
In addition to INTERSECT [DISTINCT], there's UNION [DISTINCT] and EXCEPT [DISTINCT].